Activate "char_classes" in boost regex library [duplicate] - c++

How do I create a regular expression that detects hexadecimal numbers in a text?
For example, ‘0x0f4’, ‘0acdadecf822eeff32aca5830e438cb54aa722e3’, and ‘8BADF00D’.

How about the following?
0[xX][0-9a-fA-F]+
Matches expression starting with a 0, following by either a lower or uppercase x, followed by one or more characters in the ranges 0-9, or a-f, or A-F

The exact syntax depends on your exact requirements and programming language, but basically:
/[0-9a-fA-F]+/
or more simply, i makes it case-insensitive.
/[0-9a-f]+/i
If you are lucky enough to be using Ruby, you can do:
/\h+/
EDIT - Steven Schroeder's answer made me realise my understanding of the 0x bit was wrong, so I've updated my suggestions accordingly.
If you also want to match 0x, the equivalents are
/0[xX][0-9a-fA-F]+/
/0x[0-9a-f]+/i
/0x[\h]+/i
ADDED MORE - If 0x needs to be optional (as the question implies):
/(0x)?[0-9a-f]+/i

Not a big deal, but most regex engines support the POSIX character classes, and there's [:xdigit:] for matching hex characters, which is simpler than the common 0-9a-fA-F stuff.
So, the regex as requested (ie. with optional 0x) is: /(0x)?[[:xdigit:]]+/

It's worth mentioning that detecting an MD5 (which is one of the examples) can be done with:
[0-9a-fA-F]{32}

This will match with or without 0x prefix
(?:0[xX])?[0-9a-fA-F]+

If you're using Perl or PHP, you can replace
[0-9a-fA-F]
with:
[[:xdigit:]]

Just for the record I would specify the following:
/^[xX]?[0-9a-fA-F]{6}$/
Which differs in that it checks that it has to contain the six valid characters and on lowercase or uppercase x in case we have one.

Another example: Hexadecimal values for css colors start with a pound sign, or hash (#), then six characters that can either be a numeral or a letter between A and F, inclusive.
^#[0-9a-fA-F]{6}

If you are looking for an specific hex character in the middle of the string, you can use "\xhh" where hh is the character in hexadecimal. I've tried and it works. I use framework for C++ Qt but it can solve problems in other cases, depends on the flavor you need to use (php, javascript, python , golang, etc.).
This answer was taken from:http://ult-tex.net/info/perl/

This one makes sure you have no more than three valid pairs:
(([a-fA-F]|[0-9]){2}){3}
Any more or less than three pairs of valid characters fail to match.

In Java this is allowed:
(?:0x?)?[\p{XDigit}]+$
As you see the 0x is optional (even the x is optional) in a non-capturing group.

In case you need this within an input where the user can type 0 and 0x too but not a hex number without the 0x prefix:
^0?[xX]?[0-9a-fA-F]*$

first, instead of ^ and $ use \b as this is a word delimiter and can help when the hash is not the only string in the line.
i came here looking for similar but specialized regex and came up with this:
\b(\d+[a-f]+\d+[\da-f]*|[a-f]+\d+[a-f]+[\da-f]*)\b
I needed to detect hashes like git commit identifiers (and similar) in console and more then matching all possible hashes i prioritize NOT matching random words or numbers like EB or 12345678
So a heuristic approach i made is that I assume a hash will be alternating between numbers and letters reasonably often and the chains of only numbers or only letters will be short.
Another important fact is that MD5 hash is 32 characters long (as mentioned by #Adaddinsane) and git displays a shortened version with only 10 characters, so above example can be modified as follows:
for 10-char long hashes i assume the groups will be at most 3-char long
\b(\d+[a-f]+\d+[\da-f]{1,7}|[a-f]+\d+[a-f]+[\da-f]{1,7})\b
for up to 32-char long hashes i assume the groups will be at most 5-char long
\b(\d+[a-f]+\d+[\da-f]{17,29}|[a-f]+\d+[a-f]+[\da-f]{17,29})\b
you can easily change a-f to a-fA-F for case insensitivity or add 0[xX] at the front for that 0x prefix matching
those examples will obviously not match exotic but valid hashes that have very long sequences of only numbers or only letters in the front or extreme hashes like only 0s
but this way i can match hashes and reduce accident false-positive matches significantly, like dir name or line number

Related

Regex to have two out of three character types [duplicate]

My client has requested that passwords on their system must following a specific set of validation rules, and I'm having great difficulty coming up with a "nice" regular expression.
The rules I have been given are...
Minimum of 8 character
Allow any character
Must have at least one instance from three of the four following character types...
Upper case character
Lower case character
Numeric digit
"Special Character"
When I pressed more, "Special Characters" are literally everything else (including spaces).
I can easily check for at least one instance for all four, using the following...
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9]).{8,}$
The following works, but it's horrible and messy...
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])).{8,}$
So you don't have to work it out yourself, the above is checking for (1,2,3|1,2,4|1,3,4|2,3,4) which are the 4 possible combinations of the 4 groups (where the number relates to the "types" in the set of rules).
Is there a "nicer", cleaner or easier way of doing this?
(Please note, this is going to be used in an <asp:RegularExpressionValidator> control in an ASP.NET website, so therefore needs to be a valid regex for both .NET and javascript.)
It's not much of a better solution, but you can reduce [^a-zA-Z0-9] to [\W_], since a word character is all letters, digits and the underscore character. I don't think you can avoid the alternation when trying to do this in a single regex. I think you have pretty much have the best solution.
One slight optimization is that \d*[a-z]\w_*|\d*[A-Z]\w_* ~> \d*[a-zA-Z]\w_*, so I could remove one of the alternation sets. If you only allowed 3 out of 4 this wouldn't work, but since \d*[A-Z][a-z]\w_* was implicitly allowed it works.
(?=.{8,})((?=.*\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])).*
Extended version:
(?=.{8,})(
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])|
(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|
(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])
).*
Because of the fourth condition specified by the OP, this regular expression will match even unprintable characters such as new lines. If this is unacceptable then modify the set that contains \W to allow for more specific set of special characters.
I'd like to improve the accepted solution with this one
^(?=.{8,})(
(?=.*[^a-zA-Z\s])(?=.*[a-z])(?=.*[A-Z])|
(?=.*[^a-zA-Z0-9\s])(?=.*\d)(?=.*[a-zA-Z])
).*$
The above Regex worked well for most scenarios except for strings such as "AAAAAA1$", "$$$$$$1a"
This could be an issue only in iOS ( Objective C and Swift) that the regex "\d" has issues
The following fix worked in iOS, i.e changing to [0-9] for digits
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])).{8,}$
Password must meet at least 3 out of the following 4 complexity rules,
[at least 1 uppercase character (A-Z) at least 1 lowercase character (a-z) at least 1 digit (0-9) at least 1 special character — do not forget to treat space as special characters too]
at least 10 characters
at most 128 characters
not more than 2 identical characters in a row (e.g., 111 not allowed)
'^(?!.(.)\1{2}) ((?=.[a-z])(?=.[A-Z])(?=.[0-9])|(?=.[a-z])(?=.[A-Z])(?=.[^a-zA-Z0-9])|(?=.[A-Z])(?=.[0-9])(?=.[^a-zA-Z0-9])|(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])).{10,127}$'
(?!.*(.)\1{2})
(?=.[a-z])(?=.[A-Z])(?=.*[0-9])
(?=.[a-z])(?=.[A-Z])(?=.*[^a-zA-Z0-9])
(?=.[A-Z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
.{10,127}

Single regex for complex password validation [duplicate]

This question already has answers here:
Regexp Java for password validation
(17 answers)
Closed 8 years ago.
I have to validate password so that they meet these rules
A) The password must contain characters from 3 of the following 4 classes:
English Upper Case Letters A, B, C, ... Z
English Lower Case Letters a, b, c, ... z
Westernised Arabic Numerals 0, 1, 2, ... 9
Non-alphanumeric (“special characters”)
For example, punctuation, symbols.
{},.<>;:'?/|`~!##$%^&*()_-+= space
B) The password must be at least8 characters long;
Can this be done in a single Regex. What would that Regex be?
This task isn't suitable for doing with a regular expression.
It can be done in a regular expression, but it'd be so convoluted and complicated that you're better off doing the check in some other way.
Just because something can be done with regular expressions doesn't mean it's a good idea.
I think using complicated regular expression isn't a way that should be used at all costs. In this case, using a simple method with four booleans will be easier to write, easier to read and probably also faster.
You could check that it is:
not purely numbers and alphanumerics (this is slightly more aggressive than your conditions say);
not purely lowercase and special characters
A single regular expression to check this would be something like
(?![A-Za-z0-9]+$|[a-z{},.<>;:'?/|`~!##$%^&*()_-+= -]+$).{8,}
I intentionally ignored your exact specification. In particular, I did not want to allow Pass1234, and I don't think it makes sense to set a maximum length, and I did not restrict the set of allowed characters at all (i.e. there are minimum requirements, but you can go wild and use control characters or accented characters if you like). These things are easy enough to fix if you disagree.
To strictly implement your spec, you could check that the password does not consist of purely any two groups; so not all upper and lower case, and not all lowercase and numbers, and not all uppercase and numbers, and not all numbers and specials, and not all lowercase and specials, and not all uppercase and specials, but again, this is somewhat tedious and IMHO counter-productive.
You are not saying which regex flavor you are using. I have assumed you have the Perl negative lookahead (?!...) at your disposal. This is significantly harder if you are restricted to traditional BRE or ERE syntax.
I think you have achieve a very close result with a single regular expressions. Here is an example:
^((?=.*[!##$%&,()_=/\.\-\*\+\?])[A-Za-z0-9!##$%&,()_=/\.\-\*\+\?]{8,20})$
This says:
At least 1 control character
Can contain alpha numeric characters
Is between 8 and 20 characters long

Regex matching numbers and decimals

I need a regex expression that will match the following:
.5
0.5
1.5
1234
but NOT
0.5.5
absnd (any letter character or space)
I have this that satisfies all but 0.5.5
^[.?\d]+$
This is a fairly common task. The simplest way I know of to deal with it is this:
^[+-]?(\d*\.)?\d+$
There are also other complications, such as whether you want to allow leading zeroes or commas or things like that. This can be as complicated as you want it to be. For example, if you want to allow the 1,234,567.89 format, you can go with this:
^[+-]?(\d*|\d{1,3}(,\d{3})*)(\.\d+)?\b$
That \b there is a word break, but I'm using it as a sneaky way to require at least one numeral at the end of the string. This way, an empty string or a single + won't match.
However, be advised that regexes are not the ideal way to parse numeric strings. All modern programming languages I know of have fast, simple, built-in methods for doing that.
Here's a much simpler solution that doesn't use any look-aheads or look-behinds:
^\d*\.?\d+$
To clearly understand why this works, read it from right to left:
At least one digit is required at the end.
7 works
77 works
.77 works
0.77 works
0. doesn't work
empty string doesn't work
A single period preceding the digit is optional.
.77 works
77 works
..77 doesn't work
Any number of digits preceding the (optional) period.
.77 works
0.77 works
0077.77 works
0077 works
Not using look-aheads and look-behinds has the added benefit of not having to worry about RegEx-based DOS attacks.
HTH
Nobody seems to be accounting for negative numbers. Also, some are creating a capture group which is unnecessary. This is the most thorough solution IMO.
^[+-]?(?:\d*\.)?\d+$
The following should work:
^(?!.*\..*\.)[.\d]+$
This uses a negative lookahead to make sure that there are fewer than two . characters in the string.
http://www.rubular.com/r/N3jl1ifJDX
This could work:
^(?:\d*\.)?\d+$

Regular expression to match string of 0's and 1's without '011' substring

I'm working on a problem (from Introduction to Automata Theory, Languages and Computer by Hopcroft, Motwani and Ullman) to write a regular expression that defines a language consisting of all strings of 0s and 1s not containing the substring 011.
Is the answer (0+1)* - 011 correct ? If not what should be the correct answer for this?
Edit: Updated to include start states and fixes, as per below comments.
If you are looking for all strings that do not have 011 as a substring rather than simply excluding the string 011:
A classic regex for that would be:
1*(0+01)*
Basically you can have as many ones at the beginning as you want, but as soon as you hit a zero, it's either zeros, or zero-ones that follow (since otherwise you'd get a zero-one-one).
A modern, not-really-regular regex would be:
^((?!011)[01])*$
IF, however, you want any string that is not 011, you can simply enumerate short string and wildcard the rest:
λ+0+1+00+01+10+11+(1+00+010)(0+1)*
And in modern regex:
^(?!011)[01]*$

Regular expression for a non-zero hex value

I am looking for a regular expression to determine when any of the values in a 32-bit hex value is non-zero.
The data patterns look like 0x00000000 and I want to know when any of the digits is non-zero. For example, if 0x00001000 or 0x10000000 or 0xB000000 would be capture by the regular expression, but not a 0x00000000 pattern. Right now I perform a walking pattern match of
0x[^0]
0x0[^0]
0x00[^0]
...
0x0000000[^0]
This will work, but I much rather have one pattern if possible. Thanks.
Mark
Edit: I didn't mention as the RegEx was not needed in a program, otherwise I would have used a different approach, but I was using the RegEx to search for values in a log file using UltraEdit. I could have developed a program or some other means to search, but I was just being lazy, just being honest. Ben S solution worked both in UltraEdit and Rad Software Regular Expression Designer. rampion solution didn't work in either tool, not sure why.
Why not test the hex value against zero? Simpler, faster, more readable.
If a regular expressiong is really necessary, 0x0*[1-9a-fA-F][0-9a-fA-F]* should do it.
It looks for as many zeros as it can until it finds a non-zero hex value, then gathers the rest of the hex regardless of if it's a zero or not.
Note: this will match any length hex, not just 32 bits.
/0x0*[1-9a-fA-F][0-9a-fA-F]*/
<atom>* means match the atom 0 or more times, so this pattern matches the 0x prefix, followed by 0 or more 0s, followed by a non-zero hex, followed by some hex.
Why not try something slighly different. Testing for a non-zero hex is much harder than testing for a zero hex. So test for zero and manually do the not.
bool IsNonZeroHex(string input) {
return !Regex.IsMatch(input, "^0x(0*)$");
}
/0x0*[^0]/
I think this should cover all cases (if it really has to be a regex):
^0x(?=0*[1-9a-fA-F]0*)[0-9a-fA-F]{8}$
Fixed size hex numbers can be looked up using negative lookahead as:
/(0x(?!0{8})[0-9a-fA-F]{8})/
A group is looked up beginning with 0x
then negative look ahead 0{8} (fails if found)
otherwise match [0-9a-fA-F]{8}
Works with PCRE, JavaScript, Python. Don't know which editors support negative lookahead.
Surely a simple string compare and if it DOES NOT EQUAL "0x00000000" you've got your match.
Am I over simplifying it? The is only one FALSE case, right? When the string is "0x00000000"?
Don't use RegEx unless you have to.
/0x0{0,7}[^0]/
'0x', followed by zero to seven '0', followed by something that is not '0'