Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing range checks #23

Merged
merged 5 commits into from
Jul 5, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 28 additions & 33 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,9 @@
#![deny(unused_mut)]

use std::{error, fmt};
use std::str::FromStr;
use std::ascii::AsciiExt;
use std::fmt::{Display, Formatter};
use std::str::FromStr;

/// Integer in the range `0..32`
#[derive(PartialEq, Eq, Debug, Copy, Clone, Default, PartialOrd, Ord, Hash)]
Expand Down Expand Up @@ -225,7 +226,7 @@ impl Bech32 {
for b in raw_hrp.bytes() {
// Valid subset of ASCII
if b < 33 || b > 126 {
return Err(Error::InvalidChar(b))
return Err(Error::InvalidChar(b as char))
}
let mut c = b;
// Lowercase
Expand All @@ -242,34 +243,28 @@ impl Bech32 {
}

// Check data payload
let mut data_bytes: Vec<u5> = Vec::new();
for b in raw_data.bytes() {
// Alphanumeric only
if !((b >= b'0' && b <= b'9') || (b >= b'A' && b <= b'Z') || (b >= b'a' && b <= b'z')) {
return Err(Error::InvalidChar(b))
let mut data_bytes = raw_data.chars().map(|c| {
// Only check if c is in the ASCII range, all invalid ASCII characters have the value -1
// in CHARSET_REV (which covers the whole ASCII range) and will be filtered out later.
if !c.is_ascii() {
return Err(Error::InvalidChar(c))
}
// Excludes these characters: [1,b,i,o]
if b == b'1' || b == b'b' || b == b'i' || b == b'o' {
return Err(Error::InvalidChar(b))
}
// Lowercase
if b >= b'a' && b <= b'z' {

if c.is_lowercase() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_lowercase and is_uppercase functions are probably slower (since they work for all Unicode characters) than the simple range checks used before, but they seem so much more idiomatic. The ascii-equivalent (is_ascii_lowercase) is still nightly-only, so if we wanted to avoid handwritten range checks we had to use our own trait for that. But I don't see the need for such optimizations right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of is_lowercase is:

    pub fn is_lowercase(self) -> bool {
        match self {
            'a'...'z' => true,
            c if c > '\x7f' => derived_property::Lowercase(c),
            _ => false,
        }
    }

So for ASCII characters, it should be fairly performant. is_uppercase has a similar structure.

has_lower = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a better (and preferably shorter/simpler) way of doing the checks for "all characters have the same case". At least none that doesn't need some bigger refactoring of the HRP processing, so I will leave it for now and maybe open another PR dedicated to refactoring.

} else if c.is_uppercase() {
has_upper = true;
}

// Uppercase
let c = if b >= b'A' && b <= b'Z' {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was dropping conversion to lower case intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional. When I looked at CHARSET_REV I noticed that there are entries for both cases, so the conversion to lowercase should be redundant. But I still have to verify that CHARSET_REV actually satisfies all my assumptions about it, that's one of the reasons this PR is still WIP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this use of CHARSET_REV in testing for proper range.

has_upper = true;
// Convert to lowercase
b + (b'a'-b'A')
} else {
b
};

data_bytes.push(u5::try_from_u8(CHARSET_REV[c as usize] as u8).expect(
"range was already checked above"
));
}
// c should be <128 since it is in the ASCII range, CHARSET_REV.len() == 128
let num_value = CHARSET_REV[c as usize];

if num_value > 31 || num_value < 0 {
return Err(Error::InvalidChar(c));
}

Ok(u5::try_from_u8(num_value as u8).expect("range checked above, num_value <= 31"))
}).collect::<Result<Vec<u5>, Error>>()?;

// Ensure no mixed case
if has_lower && has_upper {
Expand Down Expand Up @@ -402,7 +397,7 @@ pub enum Error {
/// The data or human-readable part is too long or too short
InvalidLength,
/// Some part of the string contains an invalid character
InvalidChar(u8),
InvalidChar(char),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this enum variant change a breaking change that needs a major version bump?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry, I will have to bump the version to 0.5.0. Would you generally agree that it's better to process unicode strings char-wise instead of their UTF-8 representation byte-wise? Because this change is what made this API break necessary.

/// Some part of the data has an invalid value
InvalidData(u8),
/// The bit conversion failed due to a padding issue
Expand Down Expand Up @@ -545,21 +540,21 @@ mod tests {
fn invalid_strings() {
let pairs: Vec<(&str, Error)> = vec!(
(" 1nwldj5",
Error::InvalidChar(b' ')),
("\x7f1axkwrx",
Error::InvalidChar(0x7f)),
Error::InvalidChar(' ')),
("abc1\u{2192}axkwrx",
Error::InvalidChar('\u{2192}')),
("an84characterslonghumanreadablepartthatcontainsthenumber1andtheexcludedcharactersbio1569pvx",
Error::InvalidLength),
("pzry9x0s0muk",
Error::MissingSeparator),
("1pzry9x0s0muk",
Error::InvalidLength),
("x1b4n0q5v",
Error::InvalidChar(b'b')),
Error::InvalidChar('b')),
("li1dgmt3",
Error::InvalidLength),
("de1lg7wt\u{ff}",
Error::InvalidChar(0xc3)), // ASCII 0xff -> \uC3BF in UTF-8
Error::InvalidChar('\u{ff}')),
);
for p in pairs {
let (s, expected_error) = p;
Expand All @@ -568,7 +563,7 @@ mod tests {
println!("{:?}", dec_result.unwrap());
panic!("Should be invalid: {:?}", s);
}
assert_eq!(dec_result.unwrap_err(), expected_error);
assert_eq!(dec_result.unwrap_err(), expected_error, "testing input '{}'", s);
}
}

Expand Down