Regex to match any integer greater than 1080? - regex

I'm trying to come up with a regex for any integer greater than 1080. So that the below numbers would match:
1081
1100
1111
1200
1280
4000
900000080
I came across this post: https://codeshare.co.uk/blog/regular-expression-regex-for-a-number-greater-than-1200/ but it didn't work for a number like 1300.

Doing this with regex is a lousy idea, but if you have a genuine need (like some software that only lets you use regex in filters), it's possible. Let's take it a step at a time, and let's work from larger numbers to smaller, because it makes it easier to think about:
Any number with at least five digits is okay: [1-9][0-9]{4,}
Any number 2,000 - 9,999 is okay: [2-9][0-9]{3}
Any number 1,100 - 1,999 is okay: 1[1-9][0-9]{2}
Any number 1,090 - 1,099 is okay: 109[0-9]
Any number 1,081 - 1,089 is okay: 108[1-9]
Anything that's left is a number <= 1080, or not a number.
Putting it all together in reverse order, ^(?:108[1-9]|109[0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-9][0-9]{4,})$ should work. If you want to be a little more lax with number formats you could allow an optional leading + or any number of leading 0s (but not include them in the part we're checking). That gets us
^\+?0*(?:108[1-9]|109[0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-9][0-9]{4,})$

Related

Understand %03.3u in printf format specification

I am using printf to output contents. Now I see the format specification as "%03.3u" in another person's code, as per my understanding the "03" before the dot already specifies the width of the output as 3 digits, and padding with zeros if there are not 3, while the "3" after the dot also specifies that there should be 3 digits output. Therefore, it seems "03" before the dot and "3" after the dot is duplicated.
I make the following tests:
char l[50];
sprintf(l, "%03.3u", 5);
sprintf(l, "%03u", 5);
sprintf(l, "%.3u", 5);
And confirm the output is always 005. So why someone else should use "%03.3u" instead of "%03u" or "%.3u"?
The output will be the same for the particular values you have used. The number before the . is the minimum field width while the number after (for the u conversion specifier, at least) it is the minimum number of digits to output. You can see the difference between the two with something like:
printf("%3.2u\n", 7)
which gives you space07 - minimum two digits output and minimum three characters wide.
However, the fact that you have the numbers the same means that you'll get three digits minimum in a field at least three characters wide. Even if you had used %03.2u (different minimums), the presence of that 0 means to left-pad with 0 rather than space, so you'd still see 005.
Bottom line is, to get the full three digits, you can use the 0 zero-pad modifier or the minimum digit count modifier but you don't need both.
However, since having both doesn't have any adverse effects beyond forcing people to question the sanity of those that wrote it :-), it's functionally okay.
The 03 is the field width with zero-padding. This means that a minimum of 3 characters are to be output, and if there were fewer than three, left-pad with zeroes.
The second 3 is the minimum number of digits to output.
When both of these are specified, the precision will be applied, and if the result is narrower than the minimum field width, then the output will be padded. For exampleprintf("q%6.3u", 5) will produce q 005 . (I use the q because stackoverflow formatting eats the spaces otherwise).
If you're printing an unsigned integer and you didn't use the sign flag, then the number of digits is the same as the field width (since the only output is digits). %03u, %.3u and %03.3u all have the same effect.
I guess the person wrote %03.3u since they did not properly understand the meaning of these things so they guessed something, it worked, and they decided to not make any further changes.
If you print a sign character then the field width differs from the digit count, e.g. you could experiment with %+3u versus %+.3u. Or if you use %d and print a negative number.

File size of 1 million rand numbers

The file by rand is 1 million random numbers. It is compressed down to 415 kb....how is this possible if it is impossible to compress random data.
Thank you.
Jon Hutton
You're most likely talking about the famous "A Million Random Digits" test data that was published in 1955. So it's digits, not numbers, as Mark already guessed, that's why the binary version is only 415,241 bytes. Also see Mark Nelson's homepage that has a link to the binary file.
Note that the end result (the binary file) is not compressible without knowing it - although there are some small redundancies in the file that come from the way it was created - see this forum entry for more details:
There are potentially other biases in the million random digits file
that I discussed years ago in comp.compression. The data was
originally generated by sampling a 5 bit counter driven by a noisy
oscillator to produce a set of 20,000 punched cards with 50 digits
each. But there was some correlation between consecutive digits, so
what they did was add adjacent pairs of cards modulo 10 to produce a
new set of cards which was published. That is why the sums of the
columns are even. Each of the original cards is counted twice.
Sounds like they're stored as one decimal digit per byte. So using only ten of the 256 possible bytes values leaves you with the potential for a log(256)/log(10) compression ratio on random digits, which is about 2.4. You're getting 2.35 (assuming "kb" = 1024 bytes). Voila.
You can get 2.4 quite easily by coding every three digits into ten bits, since 1024 > 1000. Then you can code 1,000,000 decimal digits into 416,667 bytes, or 406.9 KiB.
With a little more difficulty, using something like GMP, you could code it as a giant million-digit integer in binary, which would take 415,242 bytes, or 405.5 KiB. That would be as good as it gets for random decimal digits.

regex for number between numbers

I'm in need of a regex, which takes a minimum and a maximum number to determine valid input, And I want the maximum and minimum to be dynamic.
I have been trying to get this done using this link
https://stackoverflow.com/a/13473595/1866676
But couldn't get it to work. Can someone please let me know how to do this.
Let's say I want to make a html5 input box, and I Want it to only receive numbers from 100 to 1999
What would a regex for this like this look like?
First off, while it is possible to do this, I think if there is a simpler way to choose a number range such as <input type="number" min="1" max="100">, that way would be preferred.
Having said that, here's how the kind of regex you requested works:
ones: ^[0-9]$ // just set the numbers -- matches 0 to 9
tens: ^[1-3]?[0-9]$ //set max tens and max ones -- matches 0 to 39
tens where max does not end in 9 ^[1-2]?[0-9]$|^[3][0-4]$ // 0 to 34
only tens: ^[1][5-9]$|^[2-3][0-9]$|^[4][0-5]$ // 15 to 45
Here, lets pick an arbitrary number 1234 to 2345
^[1][2][3][4-9]$|
^[1][2][4-9][0-9]$|
^[1][3-9][0-9][0-9]$|
^[2][0-2][0-9][0-9]$|
^[2][3][0-3][0-9]$|
^[2][3][4][0-5]$
https://regex101.com/r/pP8rQ7/4
Basically the ending of the middle series always needs to be a straight range that can reach 9 unless we are dealing with the ones place, and if it cant, you have to build it upwards toward the middle each time we have a value that can't start in 0 and then once we reach a value that cant end in 9 break early and set it in the next condition.
Notice the pattern, as each place solidifies. Also keep in mind that when dealing with going from lower to higher places, optional operators ? should be used.
Its a bit complex, but its nowhere near impossible to design a custom range with a bit of thought.
If you are more specific, we can craft an exact example, but this is generally how it is done:beginning-range|middle-range|end-range
You should only need beginning or end-ranges in certain cases like if the min or max does not end in 9. the ? means that the range that comes after it is optional. (so for example in the first case it lets us have both single and double numbers.
so for 100 - 1999 it's quite simple actually because you have lots of 9's and 0's
/^[1-9][0-9][0-9]$|^[1][0-9][0-9][0-9]$/
https://regex101.com/r/pP8rQ7/1
Note: Single values don't need ranges [n] I just added them for readability.
Edit: There used to be a regex range generator at: http://gamon.webfactional.com/regexnumericrangegenerator/. It appears to be offline now.
Essentially, you can't.
For every numeric range, there exists a regex that will match numbers in that range, therefore it is possible to write code that can generate a such regex. But such a regex is not a simple reformatting of the range ends.
However, such code would require colossal effort and complexity to write compared to code that simply checked the number using numeric methods.
With HTML 5 simply put a range input...
<form>
Quantity (between 100 and 1999):
<input type="number" name="quantity" min="100" max="1999">
</form>
with regex:
^([12345679])(\d)(\d)|^(1)(\d)(\d)(\d)
So if you need to create the regex dinamically it's possible but a bit tricky and complex

Integer range and multiple of

I have a number of fields I want to validate on text entry with a regex for both matching a range (0..120) and must be a multiple of 5.
For example, 0, 5, 25, 120 are valid. 1, 16, 123, 130 are not valid.
I think I have the regex for multiple of 5:
^\d*\d?((5)|(0))\.?((0)|(00))?$
and the regex for the range:
120|1[01][0-9]|[2-9][0-9]
However, I dont know how to combine these, any help much appreciated!
You can't do that with a simple regex. At least not the range-part (especially if the range should be generic/changeable).
And even if you manage to write the regex, it will be very complex and unreadable.
Write the validation on your own, using a parseStringToInt() function of your language and simple < and > checks.
Update: added another regex (see below) to be used when the range of values is not 0..120 (it can even be dynamic).
The second regex in the question does not match numbers smaller than 20. You can change it to match smaller numbers that always end in 0 or 5 to be multiple by 5:
\b(120|(1[01]|[0-9])?[05])\b
How it works (starting from inside):
(1[01]|[0-9])? matches 10, 11 or any one-digit number (0 to 9); these are the hundreds and tens in the final number; the question mark (?) after the sub-expression makes it match 0 or 1 times; this way the regex can also match numbers having only one digit (0..9);
[05] that follows matches 0 or 5 on the last digit (the units); only the numbers that end in 0 or 5 are multiple of 5;
everything is enclosed in parenthesis because | has greater priority than \b;
the outer \b matches word boundaries; they prevent the regex match only 1..3 digits from a longer number or numbers that are embedded in strings; it prevents it matching 15 in 150 or 120 in abc120.
Using dynamic range of values
The regex above is not very complex and it can be used to match numbers between 0 and 120 that are multiple of 5. When the range of values is different it cannot be used any more. It can be modified to match, lets say, numbers between 20 and 120 (as the OP asked in a comment below) but it will become harder to read.
More, if the range of allowed values is dynamic then a regex cannot be used at all to match the values inside the range. The multiplicity with 5 however can be achieved using regex :-)
For dynamic range of values that are multiple of 5 you can use this expression:
\b([1-9][0-9]*)?[05]\b
Parse the matched string as integer (the language you use probably provides such a function or a library that contains it) then use the comparison operators (<, >) of the host language to check if the matched value is inside the desired range.
At the risk of being painfully obvious
120|1[01][05]|[2-9][05]
Also, why the 2?

Find a repeating symmetric bit pattern in a small stream of 128 bits

How can I quickly scan groups of 128 bits that are exact equal repeating binary patterns, such 010101... Or 0011001100...?
I have a number of 128 bit blocks, and wish to see if they match the patterns where the number of 1s is equal to number of 0s, eg 010101.... Or 00110011... Or 0000111100001111... But NOT 001001001...
The problem is that patterns may not start on their boundary, so the pattern 00110011.. May begin as 0110011..., and will end 1 bit shifted also (note the 128 bits are not circular, so start doesn't join to the end)
The 010101... Case is easy, it is simply 0xAAAA... Or 0x5555.... However as the patterns get longer, the permutations get longer. Currently I use repeating shifting values such as outlined in this question Fastest way to scan for bit pattern in a stream of bits but something quicker would be nice, as I'm spending 70% of all CPU in this routine. Other posters have solutions for general cases but I am hoping the symmetric nature of my pattern might lead to something more optimal.
If it helps, I am only interested in patterns up to 63 bits long, and most interested in the power of 2 patterns (0101... 00110011... 0000111100001111... Etc) while patterns such as 5 ones/5 zeros are present, these non power 2 sequences are less than 0.1%, so can be ignored if it helps the common cases go quicker.
Other constraints for a perfect solution would be small number of assembler instructions, no wildly random memory access (ie, large rainbow tables not ideal).
Edit. More precise pattern details.
I am mostly interested in the patterns of 0011 and 0000,1111 and 0000,0000,1111,1111 and 16zeros/ones and 32 zeros/ones (commas for readabily only) where each pattern repeats continuously within the 128 bits. Patterns that are not 2,4,8,16,32 bits long for the repeating portion are not as interesting and can be ignored. ( eg 000111... )
The complexity for scanning is that the pattern may start at any position, not just on the 01 or 10 transition. So for example, all of the following would match the 4 bit repeating pattern of 00001111... (commas every 4th bit for readability) (ellipses means repeats identically)
0000,1111.... Or 0001,1110... Or 0011,1100... Or 0111,1000... Or 1111,0000... Or 1110,0001... Or 1100,0011... Or 1000,0111
Within the 128bits, the same pattern needs to repeat, two different patterns being present is not of interest. Eg this is NOT a valid pattern. 0000,1111,0011,0011... As we have changed from 4 bits repeating to 2 bits repeating.
I have already verified the number of 1s is 64, which is true for all power 2 patterns, and now need to identify how many bits make up the repeating pattern (2,4,8,16,32) and how much the pattern is shifted. Eg pattern 0000,1111 is a 4 bit pattern, shifted 0. While 0111,1000... Is a 4 bit pattern shifted 3.
Lets start with the case where the patterns do start on their boundary. You can check the first bit and use it to determine your state. Then start looping through your block, check the first bit, increment a count, left shift and repeat until you find that you've gotten the opposite bit. You can now use this initial length as the bitset length. Reset the count to 1 then count the next set of opposite bits. When you switch, check the length against the initial length and error out if they're not equal. Here's a quick function - it seems to work as expected for chars, and it shouldn't be too hard to expand it to deal with blocks of 32 bytes.
unsigned char myblock = 0x33;
unsigned char mask = 0x80, prod = 0x00;
int setlen = 0, count = 0, ones=0;
prod = myblock & mask;
if(prod == 0x80)
ones = 1;
for(int i=0;i<8;i++){
prod = myblock & mask;
myblock = myblock << 1;
if((prod == 0x80 && ones) || (prod == 0x00 && !ones)){
count++;
}else{
if(setlen == 0) setlen = count;
if(count != setlen){
printf("Bad block\n");
return -1;
}
count = 1;
ones = ( ones == 1 ) ? 0 : 1;
}
}
printf("Good block of with % repeating bits\n",setlen);
return setlen;
Now to deal with blocks where there's an offset, I'd suggest counting the number of bits until the first 'flip'. Store this number, then run the above routine until you hit the last segment which should have length unequal to the rest of the sets. Add the initial bits to the last segment's length, and then you should be able to compare it with the size of the rest of the sets correctly.
This code is pretty small, and bit shifting through a buffer shouldn't require too much work on the CPU's part. I'd be interested to see how this solution ends up performing against your current one.
The Generic solution for this kind of problems is to create a good hashing function for the patterns and store each pattern in a hash map. Once you have the hash map created for the patterns then try to lookup in the table using the input stream. I don't have code yet but let me know if you are struck in code.. Please post it and I can work on it..
I've thought about making a state machine, so every next byte (out of 16) would advance its state and after some 16 state transitions you'd have the pattern identified. But that doesn't look very promising. Data structures and logic look more complex.
Instead, why not precompute all those 126 patterns (from 01 to 32 zeroes + 32 ones), sort them and perform binary search? That would give you at most 7 iterations of binary search. And you don't need to store all 16 bytes of every pattern as its halves are identical. That gives you 126*16/2=1008 bytes for the array of patterns. You also need something like 2 bytes per pattern to store the length of zero (one) runs and the shift relative to whatever pattern you consider unshifted. That's a total of 126*(16/2+2)=1260 bytes of data (should be gentle on the data cache) and very simple and tiny binary search algorithm. Basically, its just an improvement over the answer that you mentioned in the question.
You might want to try switching to linear search after 4-5 iterations of binary search. That may give a small boost to the overall algorithm.
Ultimately, the winner is determined by testing/profiling. And that's what you should do, get a few implementations and compare them on the real data in the real system.
The restriction of the pattern repeating it self all over the 128-stream makes the number of combinations limited and also the sequence will have properties making it easy to check:
One needs to iteratively check if high and low parts are same; if they are opposites, check if that particular length contains consecutive ones.
8-bit repeat at offset 3: 00011111 11100000 00011111 11100000
==> high and low 16 bits are the same
00011111 11100000 ==> high and low parts are inverted.
Not same, nor inverted means rejection of pattern.
At that point one needs to check if there's a sequence of ones -- add '1' to the left side and check if it's power of two: n==(n & -n) is the textbook check for that.