Regular expression for a non-zero hex value - regex

I am looking for a regular expression to determine when any of the values in a 32-bit hex value is non-zero.
The data patterns look like 0x00000000 and I want to know when any of the digits is non-zero. For example, if 0x00001000 or 0x10000000 or 0xB000000 would be capture by the regular expression, but not a 0x00000000 pattern. Right now I perform a walking pattern match of
0x[^0]
0x0[^0]
0x00[^0]
...
0x0000000[^0]
This will work, but I much rather have one pattern if possible. Thanks.
Mark
Edit: I didn't mention as the RegEx was not needed in a program, otherwise I would have used a different approach, but I was using the RegEx to search for values in a log file using UltraEdit. I could have developed a program or some other means to search, but I was just being lazy, just being honest. Ben S solution worked both in UltraEdit and Rad Software Regular Expression Designer. rampion solution didn't work in either tool, not sure why.

Why not test the hex value against zero? Simpler, faster, more readable.
If a regular expressiong is really necessary, 0x0*[1-9a-fA-F][0-9a-fA-F]* should do it.
It looks for as many zeros as it can until it finds a non-zero hex value, then gathers the rest of the hex regardless of if it's a zero or not.
Note: this will match any length hex, not just 32 bits.

/0x0*[1-9a-fA-F][0-9a-fA-F]*/
<atom>* means match the atom 0 or more times, so this pattern matches the 0x prefix, followed by 0 or more 0s, followed by a non-zero hex, followed by some hex.

Why not try something slighly different. Testing for a non-zero hex is much harder than testing for a zero hex. So test for zero and manually do the not.
bool IsNonZeroHex(string input) {
return !Regex.IsMatch(input, "^0x(0*)$");
}

/0x0*[^0]/

I think this should cover all cases (if it really has to be a regex):
^0x(?=0*[1-9a-fA-F]0*)[0-9a-fA-F]{8}$

Fixed size hex numbers can be looked up using negative lookahead as:
/(0x(?!0{8})[0-9a-fA-F]{8})/
A group is looked up beginning with 0x
then negative look ahead 0{8} (fails if found)
otherwise match [0-9a-fA-F]{8}
Works with PCRE, JavaScript, Python. Don't know which editors support negative lookahead.

Surely a simple string compare and if it DOES NOT EQUAL "0x00000000" you've got your match.
Am I over simplifying it? The is only one FALSE case, right? When the string is "0x00000000"?
Don't use RegEx unless you have to.

/0x0{0,7}[^0]/
'0x', followed by zero to seven '0', followed by something that is not '0'

Related

Activate "char_classes" in boost regex library [duplicate]

How do I create a regular expression that detects hexadecimal numbers in a text?
For example, ‘0x0f4’, ‘0acdadecf822eeff32aca5830e438cb54aa722e3’, and ‘8BADF00D’.
How about the following?
0[xX][0-9a-fA-F]+
Matches expression starting with a 0, following by either a lower or uppercase x, followed by one or more characters in the ranges 0-9, or a-f, or A-F
The exact syntax depends on your exact requirements and programming language, but basically:
/[0-9a-fA-F]+/
or more simply, i makes it case-insensitive.
/[0-9a-f]+/i
If you are lucky enough to be using Ruby, you can do:
/\h+/
EDIT - Steven Schroeder's answer made me realise my understanding of the 0x bit was wrong, so I've updated my suggestions accordingly.
If you also want to match 0x, the equivalents are
/0[xX][0-9a-fA-F]+/
/0x[0-9a-f]+/i
/0x[\h]+/i
ADDED MORE - If 0x needs to be optional (as the question implies):
/(0x)?[0-9a-f]+/i
Not a big deal, but most regex engines support the POSIX character classes, and there's [:xdigit:] for matching hex characters, which is simpler than the common 0-9a-fA-F stuff.
So, the regex as requested (ie. with optional 0x) is: /(0x)?[[:xdigit:]]+/
It's worth mentioning that detecting an MD5 (which is one of the examples) can be done with:
[0-9a-fA-F]{32}
This will match with or without 0x prefix
(?:0[xX])?[0-9a-fA-F]+
If you're using Perl or PHP, you can replace
[0-9a-fA-F]
with:
[[:xdigit:]]
Just for the record I would specify the following:
/^[xX]?[0-9a-fA-F]{6}$/
Which differs in that it checks that it has to contain the six valid characters and on lowercase or uppercase x in case we have one.
Another example: Hexadecimal values for css colors start with a pound sign, or hash (#), then six characters that can either be a numeral or a letter between A and F, inclusive.
^#[0-9a-fA-F]{6}
If you are looking for an specific hex character in the middle of the string, you can use "\xhh" where hh is the character in hexadecimal. I've tried and it works. I use framework for C++ Qt but it can solve problems in other cases, depends on the flavor you need to use (php, javascript, python , golang, etc.).
This answer was taken from:http://ult-tex.net/info/perl/
This one makes sure you have no more than three valid pairs:
(([a-fA-F]|[0-9]){2}){3}
Any more or less than three pairs of valid characters fail to match.
In Java this is allowed:
(?:0x?)?[\p{XDigit}]+$
As you see the 0x is optional (even the x is optional) in a non-capturing group.
In case you need this within an input where the user can type 0 and 0x too but not a hex number without the 0x prefix:
^0?[xX]?[0-9a-fA-F]*$
first, instead of ^ and $ use \b as this is a word delimiter and can help when the hash is not the only string in the line.
i came here looking for similar but specialized regex and came up with this:
\b(\d+[a-f]+\d+[\da-f]*|[a-f]+\d+[a-f]+[\da-f]*)\b
I needed to detect hashes like git commit identifiers (and similar) in console and more then matching all possible hashes i prioritize NOT matching random words or numbers like EB or 12345678
So a heuristic approach i made is that I assume a hash will be alternating between numbers and letters reasonably often and the chains of only numbers or only letters will be short.
Another important fact is that MD5 hash is 32 characters long (as mentioned by #Adaddinsane) and git displays a shortened version with only 10 characters, so above example can be modified as follows:
for 10-char long hashes i assume the groups will be at most 3-char long
\b(\d+[a-f]+\d+[\da-f]{1,7}|[a-f]+\d+[a-f]+[\da-f]{1,7})\b
for up to 32-char long hashes i assume the groups will be at most 5-char long
\b(\d+[a-f]+\d+[\da-f]{17,29}|[a-f]+\d+[a-f]+[\da-f]{17,29})\b
you can easily change a-f to a-fA-F for case insensitivity or add 0[xX] at the front for that 0x prefix matching
those examples will obviously not match exotic but valid hashes that have very long sequences of only numbers or only letters in the front or extreme hashes like only 0s
but this way i can match hashes and reduce accident false-positive matches significantly, like dir name or line number

How can I quickly determine what ip range a regular expression evaluates to

I've been given a regular expression that looks for a range of IP addresses. I could sit down and manually determine what IP range it searches for but I was wondering if there is a tool that will do this for me. I've found lots of tools to do the opposite (take an ip range and convert it to a regular expression).
Here's an example regular expression: ^192.22.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5])).([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$
What range does this apply to and whats the quickest/easiest way to get it?
Thanks!
In the general case, there is no way to generate all strings which match an arbitrary regular expression. I doubt anybody wrote a specialized tool for your case, either.
Anyway, your example is simple and idiomatic; it looks for two dotted octet values which are one digit, or two digits, or three digits beginning with 1, or three digits beginning with 2 followed by a digit less than five, or 25 followed by a digit less than six; in other words, 0-255. So the range is 192.22.0.0-192.22.255.255 (192.22.0.0/16).
The web tool Regexplained might help you understand the regex a bit better!
As tripleee has mentioned, the regex captures the IP range 192.22.0.0 to 192.22.255.255
However, because the period (.) has not been escaped, it also mistakenly captures a string like:
192.22.255
The regex should be corrected with backslashes to:
^192\.22\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))\.([0-9]|[1-9][0-9]|1([0-9][0-9])|2([0-4][0-9]|5[0-5]))$

Is there a simple regex to compare numbers to x?

I want a regex that will match if a number is greater than or equal to an arbitrary number. This seems monstrously complex for such a simple task... it seems like you need to reinvent 'counting' in an explicit regex hand-crafted for the x.
For example, intuitively to do this for numbers greater than 25, I get
(\d{3,}|[3-9]\d|2[6-9]\d)
What if the number was 512345? Is there a simpler way?
Seems that there is no simpler way. regex is not thing that for numbers.
You may try this one:
\[1-9]d{6,}|
[6-9]\d{5}|
5[2-9]\d{4}|
51[3-9]\d{3}|
512[4-9]\d{2}|
5123[5-9]\d|
51234[6-9]
(newlines for clarity)
What if the number was 512345? Is there a simpler way?
No, a regex to match a number in a certain range will be a horrible looking thing (especially large numbers ranges).
Regex is simply not meant for such tasks. The better solution would be to "freely" match the digits, like \d+, and then compare them with the language's relational operators (<, >, ...).
In Perl you can use the conditional regexp construct (?(condition)yes-pattern) where the (condition) is (?{CODE}) to run arbitrary Perl code. If you make the yes-pattern be (*FAIL) then you have a regexp fragment which succeeds only when CODE returns false. Thus:
foreach (0 .. 50) {
if (/\A(\d+)(?(?{$1 <= 25})(*FAIL))\z/) {
say "$_ matches";
}
else {
say "$_ does not match";
}
}
The code-evaluation feature used to be marked as experimental but the latest 'perlre' manual page (http://perldoc.perl.org/perlre.html) seems to now imply it is a core language feature.
Technically, what you have is no longer a 'regular expression' of course, but some hybrid of regexp and external code.
I've never heard of a regex flavor that can do that. Writing a Perl module to generate the appropriate regex (as you mentioned in your comment) sounds like a good idea to me. In fact, I'd be surprised if it hasn't been done already. Check CPAN first.
By the way, your regex contains a few more errors besides the excess pipes Yuriy pointed out.
First, the "three or more digits" portion will match invalid numbers like 024 and 00000007. You can solve that by requiring the first digit to be greater than zero. If you want to allow for leading zeroes, you can match them separately.
The third part, 2[6-9]\d, only matches numbers >= 260. Perhaps you meant to make the third digit optional (i.e. 2[6-9]\d?), but that would be redundant.
You should anchor the regex somehow to make sure you aren't matching part of a longer number or a "word" with digits in it. I don't know the best way to do that in your particular situation, but word boundaries (i.e. \b) will probably be all you need.
End result:
\b0*([1-9]\d{2,}|[3-9]\d|2[6-9])\b

Regular expression to match string of 0's and 1's without '011' substring

I'm working on a problem (from Introduction to Automata Theory, Languages and Computer by Hopcroft, Motwani and Ullman) to write a regular expression that defines a language consisting of all strings of 0s and 1s not containing the substring 011.
Is the answer (0+1)* - 011 correct ? If not what should be the correct answer for this?
Edit: Updated to include start states and fixes, as per below comments.
If you are looking for all strings that do not have 011 as a substring rather than simply excluding the string 011:
A classic regex for that would be:
1*(0+01)*
Basically you can have as many ones at the beginning as you want, but as soon as you hit a zero, it's either zeros, or zero-ones that follow (since otherwise you'd get a zero-one-one).
A modern, not-really-regular regex would be:
^((?!011)[01])*$
IF, however, you want any string that is not 011, you can simply enumerate short string and wildcard the rest:
λ+0+1+00+01+10+11+(1+00+010)(0+1)*
And in modern regex:
^(?!011)[01]*$

RegEx Numeric Check in Range?

I'm new to StackOverflow, so please let me know if there is a better way to ask the following question.
I need to create a regular expression that detects whether a field in the database is numeric, and if it is numeric does it fall within a valid range (i.e. 1-50). I've tried [1-50], which works except for the instances where a single digit number is preceded by a 0 (i.e. 06). 06 should still be considered a valid number, since I can later convert that to a number.
I really appreciate your help! I'm trying to learn more about regular expressions, and have been learning all I can from: www.regular-expressions.info. If you guys have recommendations of other sites to bone up on this stuff I would appreciate it!
Try this
^(0?[1-9])|([1-4][0-9])|(50)$
The idea of this regex is to break the problem down into cases
0?[1-9] takes care of the single digit case allowing for an optional preceeding 0
[1-4][0-9] takes care of all numbers from 10 to 49. This also allwows for a preceeding 0 on a single digit
50 takes care of 50
Regular expressions work on characters (in this case digits), not numbers. You need to have a separate pattern for each number of digits in your pattern, and combine them with | (the OR operator) like the other answers have suggested. However, consider just checking if the text is numeric with a regular expression (like [0-9]+) and then converting to an integer and checking the integer is within range.
You can't easily do range checking with regular expressions. You can -- with some work -- develop a pattern that recognizes a numeric range, but it's usually quite complex, and difficult to modify for a slightly different range.
You're better off breaking this into two parts.
Recognize the number pattern (^\d+$).
Check the range of that number in an application program.
^0?[1-50]{1,2}$