Understand %03.3u in printf format specification - c++

I am using printf to output contents. Now I see the format specification as "%03.3u" in another person's code, as per my understanding the "03" before the dot already specifies the width of the output as 3 digits, and padding with zeros if there are not 3, while the "3" after the dot also specifies that there should be 3 digits output. Therefore, it seems "03" before the dot and "3" after the dot is duplicated.
I make the following tests:
char l[50];
sprintf(l, "%03.3u", 5);
sprintf(l, "%03u", 5);
sprintf(l, "%.3u", 5);
And confirm the output is always 005. So why someone else should use "%03.3u" instead of "%03u" or "%.3u"?

The output will be the same for the particular values you have used. The number before the . is the minimum field width while the number after (for the u conversion specifier, at least) it is the minimum number of digits to output. You can see the difference between the two with something like:
printf("%3.2u\n", 7)
which gives you space07 - minimum two digits output and minimum three characters wide.
However, the fact that you have the numbers the same means that you'll get three digits minimum in a field at least three characters wide. Even if you had used %03.2u (different minimums), the presence of that 0 means to left-pad with 0 rather than space, so you'd still see 005.
Bottom line is, to get the full three digits, you can use the 0 zero-pad modifier or the minimum digit count modifier but you don't need both.
However, since having both doesn't have any adverse effects beyond forcing people to question the sanity of those that wrote it :-), it's functionally okay.

The 03 is the field width with zero-padding. This means that a minimum of 3 characters are to be output, and if there were fewer than three, left-pad with zeroes.
The second 3 is the minimum number of digits to output.
When both of these are specified, the precision will be applied, and if the result is narrower than the minimum field width, then the output will be padded. For exampleprintf("q%6.3u", 5) will produce q 005 . (I use the q because stackoverflow formatting eats the spaces otherwise).
If you're printing an unsigned integer and you didn't use the sign flag, then the number of digits is the same as the field width (since the only output is digits). %03u, %.3u and %03.3u all have the same effect.
I guess the person wrote %03.3u since they did not properly understand the meaning of these things so they guessed something, it worked, and they decided to not make any further changes.
If you print a sign character then the field width differs from the digit count, e.g. you could experiment with %+3u versus %+.3u. Or if you use %d and print a negative number.

Related

Regex to match any integer greater than 1080?

I'm trying to come up with a regex for any integer greater than 1080. So that the below numbers would match:
1081
1100
1111
1200
1280
4000
900000080
I came across this post: https://codeshare.co.uk/blog/regular-expression-regex-for-a-number-greater-than-1200/ but it didn't work for a number like 1300.
Doing this with regex is a lousy idea, but if you have a genuine need (like some software that only lets you use regex in filters), it's possible. Let's take it a step at a time, and let's work from larger numbers to smaller, because it makes it easier to think about:
Any number with at least five digits is okay: [1-9][0-9]{4,}
Any number 2,000 - 9,999 is okay: [2-9][0-9]{3}
Any number 1,100 - 1,999 is okay: 1[1-9][0-9]{2}
Any number 1,090 - 1,099 is okay: 109[0-9]
Any number 1,081 - 1,089 is okay: 108[1-9]
Anything that's left is a number <= 1080, or not a number.
Putting it all together in reverse order, ^(?:108[1-9]|109[0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-9][0-9]{4,})$ should work. If you want to be a little more lax with number formats you could allow an optional leading + or any number of leading 0s (but not include them in the part we're checking). That gets us
^\+?0*(?:108[1-9]|109[0-9]|1[1-9][0-9]{2}|[2-9][0-9]{3}|[1-9][0-9]{4,})$

Controlling newlines when writing out arrays in Fortran

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

Regex for UK registration number

I've been playing with creating a regular expression for UK registration numbers but have hit a wall when it comes to restricting overall length of the string in question. I currently have the following:
^(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})
This allows for an optional string (lower or upper case) of between 1 and 3 characters, followed by a mandatory numeric of between 1 and 3 characters and finally, a mandatory string (lower or upper case) of between 1 and 3 characters.
This works fine but I then want to apply a max length of 7 characters to the entire string but this is where I'm failing. I tried adding a 1,7 restriction to the end of the regex but the three 1,3 checks are superseding it and therefore allowing a max length of 9 characters.
Examples of registration numbers that need to pass are as follows:
A1
AAA111
AA11AAA
A1AAA
A11AAA
A111AAA
In the examples above, the A's represents any letter, upper or lower case and the 1's represent any number. The max length is the only restriction that appears not to be working. I disable the entry of a space so they can be assumed as never present in the string.
If you know what lengths you are after, I'd recommend you use the .length property which some languages expose for string length. If this is not an option, you could try using something like so: ^(?=.{1,7})(([a-zA-Z]?){1,3}(\d){1,3}([a-zA-Z]?){1,3})$, example here.

Legacy Fortran FORMAT Edit Descriptor Syntax - Number Before Type?

In a snippet of legacy FORTRAN code (actual compiler unknown, suspect it was circa FORTRAN-77), I found a statement like this:
100 FORMAT(5I7.2)
Which I interpret to mean:
Integer
Width 7 characters, of which
2 characters are decimals (e.g., '12345.67')
What I can't find is an explanation of the leading '5'. I assume it means something to the effect of "repeating group," say--five groups of seven integers...etc.
Is this interpretation correct?
Fortran 2008 defines the I edit descriptor in Section 10.7.2.2. The relevant paragraphs to your question are (excerpts):
1 The Iw and Iw .m edit descriptors indicate that the field to be edited occupies w positions, except when w is zero.
When w is zero, the processor selects the field width. On input, w shall not be zero. The specified input/output
list item shall be of type integer.
5 The output field for the Iw .m edit descriptor is the same as for the Iw edit descriptor, except that the digit-string
consists of at least m digits. If necessary, sufficient leading zeros are included to achieve the minimum of m digits.
This means that I7.2 will be 7 digits wide and at least two digits will always be displayed, 0-padded.
The preceding 5 in the edit descriptor is a repeat specification (Fortran 2008 10.3.1 paragraph 1) and is a repeat count of the following edit descriptor.
Put together, 5I7.2 will output 5 integers, each 7 digits wide displaying a minimum of 2 digits being zero padded to two digits if necessary.

Integer range and multiple of

I have a number of fields I want to validate on text entry with a regex for both matching a range (0..120) and must be a multiple of 5.
For example, 0, 5, 25, 120 are valid. 1, 16, 123, 130 are not valid.
I think I have the regex for multiple of 5:
^\d*\d?((5)|(0))\.?((0)|(00))?$
and the regex for the range:
120|1[01][0-9]|[2-9][0-9]
However, I dont know how to combine these, any help much appreciated!
You can't do that with a simple regex. At least not the range-part (especially if the range should be generic/changeable).
And even if you manage to write the regex, it will be very complex and unreadable.
Write the validation on your own, using a parseStringToInt() function of your language and simple < and > checks.
Update: added another regex (see below) to be used when the range of values is not 0..120 (it can even be dynamic).
The second regex in the question does not match numbers smaller than 20. You can change it to match smaller numbers that always end in 0 or 5 to be multiple by 5:
\b(120|(1[01]|[0-9])?[05])\b
How it works (starting from inside):
(1[01]|[0-9])? matches 10, 11 or any one-digit number (0 to 9); these are the hundreds and tens in the final number; the question mark (?) after the sub-expression makes it match 0 or 1 times; this way the regex can also match numbers having only one digit (0..9);
[05] that follows matches 0 or 5 on the last digit (the units); only the numbers that end in 0 or 5 are multiple of 5;
everything is enclosed in parenthesis because | has greater priority than \b;
the outer \b matches word boundaries; they prevent the regex match only 1..3 digits from a longer number or numbers that are embedded in strings; it prevents it matching 15 in 150 or 120 in abc120.
Using dynamic range of values
The regex above is not very complex and it can be used to match numbers between 0 and 120 that are multiple of 5. When the range of values is different it cannot be used any more. It can be modified to match, lets say, numbers between 20 and 120 (as the OP asked in a comment below) but it will become harder to read.
More, if the range of allowed values is dynamic then a regex cannot be used at all to match the values inside the range. The multiplicity with 5 however can be achieved using regex :-)
For dynamic range of values that are multiple of 5 you can use this expression:
\b([1-9][0-9]*)?[05]\b
Parse the matched string as integer (the language you use probably provides such a function or a library that contains it) then use the comparison operators (<, >) of the host language to check if the matched value is inside the desired range.
At the risk of being painfully obvious
120|1[01][05]|[2-9][05]
Also, why the 2?