Regex on multi-part field - regex

I have one input field that will represent a height in feet and inches. Users want an inout mask on the field as well, so when typed, it will separate the two halves of my 4 byte field with a slash; i.e. XX/XX.
The first XX is feet, and can basically be any number, although this height represents loading docks, so I suppose anything over 15 or 20 feet would be ridiculous.
The second XX are inches, and needs to be limited between 0 and 11. I thought I came up with what I needed with jquery masked input plug in, but i couldn't get it to allow a single digit in one of the XX columns, and I don't want to force the user to enter a leading 0.
I'm thinking this is probably pretty easy to knock out with reg ex...I'm sure someone here can prove me right!

I wrote a blog post about using regular expressions for validating number ranges. It's generally not recommended, but in this limited case perhaps it's okay.
If you assume that the first XX can be 0-30 (to be safe), you can use this:
(30|[0-2]\d)\|(0\d|11)
Debuggex Demo

Related

Controlling newlines when writing out arrays in Fortran

So I have some code that does essentially this:
REAL, DIMENSION(31) :: month_data
INTEGER :: no_days
no_days = get_no_days()
month_data = [fill array with some values]
WRITE(1000,*) (month_data(d), d=1,no_days)
So I have an array with values for each month, in a loop I fill the array with a certain number of values based on how many days there are in that month, then write out the results into a file.
It took me quite some time to wrap my head around the whole 'write out an array in one go' aspect of WRITE, but this seems to work.
However this way, it writes out the numbers in the array like this (example for January, so 31 values):
0.00000 10.0000 20.0000 30.0000 40.0000 50.0000 60.0000
70.0000 80.0000 90.0000 100.000 110.000 120.000 130.000
140.000 150.000 160.000 170.000 180.000 190.000 200.000
210.000 220.000 230.000 240.000 250.000 260.000 270.000
280.000 290.000 300.000
So it prefixes a lot of spaces (presumably to make columns line up even when there are larger values in the array), and it wraps lines to make it not exceed a certain width (I think 128 chars? not sure).
I don't really mind the extra spaces (although they inflate my file sizes considerably, so it would be nice to fix that too...) but the breaking-up-lines screws up my other tooling. I've tried reading several Fortran manuals, but while some of the mention 'output formatting', I have yet to find one that mentions newlines or columns.
So, how do I control how arrays are written out when using the syntax above in Fortran?
(also, while we're at it, how do I control the nr of decimal digits? I know these are all integer values so I'd like to leave out any decimals all together, but I can't change the data type to INTEGER in my code because of reasons).
You probably want something similar to
WRITE(1000,'(31(F6.0,1X))') (month_data(d), d=1,no_days)
Explanation:
The use of * as the format specification is called list directed I/O: it is easy to code, but you are giving away all control over the format to the processor. In order to control the format you need to provide explicit formatting, via a label to a FORMAT statement or via a character variable.
Use the F edit descriptor for real variables in decimal form. Their syntax is Fw.d, where w is the width of the field and d is the number of decimal places, including the decimal sign. F6.0 therefore means a field of 6 characters of width with no decimal places.
Spaces can be added with the X control edit descriptor.
Repetitions of edit descriptors can be indicated with the number of repetitions before a symbol.
Groups can be created with (...), and they can be repeated if preceded by a number of repetitions.
No more items are printed beyond the last provided variable, even if the format specifies how to print more items than the ones actually provided - so you can ask for 31 repetitions even if for some months you will only print data for 30 or 28 days.
Besides,
New lines could be added with the / control edit descriptor; e.g., if you wanted to print the data with 10 values per row, you could do
WRITE(1000,'(4(10(F6.0,:,1X),/))') (month_data(d), d=1,no_days)
Note the : control edit descriptor in this second example: it indicates that, if there are no more items to print, nothing else should be printed - not even spaces corresponding to control edit descriptors such as X or /. While it could have been used in the previous example, it is more relevant here, in order to ensure that, if no_days is a multiple of 10, there isn't an empty line after the 3 rows of data.
If you want to completely remove the decimal symbol, you would need to rather print the nearest integers using the nint intrinsic and the Iw (integer) descriptor:
WRITE(1000,'(31(I6,1X))') (nint(month_data(d)), d=1,no_days)

Meaning of 3F7.1 in Fortran data format

I am trying to create an MDM file using HLM 7 Student version, but since I don't have access to SPSS I am trying to import my data using ASCII input. As part of this process I am required to input the data format Fortran style. Try as I might I have not been able to understand this step. Could someone familiar with Fortran (or even better HLM itself) explain to me how this works? Here is my current understanding
From the example EG3.DAT they give
(A4,1X,3F7.1)
I think
A4 signifies that the ID is 4 characters long.
1X means skip a space.
F.1 means that it should read 1 decimal places.
I am very confused about what 3F7 might mean.
EG3.DAT
2020 380.0 40.3 12.5
2040 502.0 83.1 18.6
2180 777.0 96.6 44.4
Below are examples from the help documents.
Rules for format statement
Format statement example
EG1 data format
EG2 data format
EG3 data format
One similar question is Explaining Fortran Write Format. Unfortunately it does not explicitly treat the F descriptor.
3F7.1 means 3 floating point numbers, each printed over 7 characters, each with one decimal number behind the decimal point. Leading characters are blanks.
For reading you don't need the .1 info at all, just read a floating point number from those 7 characters.
You guessed the meaning of A4 (string of four characters) and 1X (one blank) correctly.
In Fortran, so-called data edit descriptors (which format the input or output of data) may have repeat specifications.
In the format (A4,1X,3F7.1) the data edit descriptors are A4 and F7.1. Only F7.1 has a repeat specification (the number before the F). This simply means that the format is as though the descriptor appeared repeated: like F7.1, F7.1, F7.1. With a repeat specification of 1, or not given, there is just the single appearance.
The format of the question, then, is like
(A4,1X,F7.1,F7.1,F7.1)
This format is one that is covered by the rules provided in one of the images of the question. In particular, the aspect of repeat specification is given in rule 2 with the corresponding example of rule 3.
Further, in Fortran proper, a repeat count specifier may also be * as special case: that's like an exceptionally large repeat count. *(F7.1) would be like F7.1, F7.1, F7.1, .... I see no indication that this is supported by HLM but if this is needed a very large repeat count may be given instead.
In 1X the 1 isn't a repeat specification but an integral, and necessary, part of the position edit descriptor.
Procedure for making MDM file from excel for HLM:
-Make sure ALL the characters in ALL the columns line up
Select a column, then right click and select Format Cells
Then click on 'Custom' and go to the 'Type' box and enter the number
of 0s you need to line everything up
-Remove all the tabs from the document and replace them with spaces.
Open the document in word and use find and replace
-To save the document as .dat
First save it as .txt
Then open it in Notepad and save it as .dat
To enter the data format (FORTRAN-Style)
The program wants to read the data file space by space, so you have to specify it perfectly so that it reads the whole set properly.
If something is off, even by a single space, then your descriptive stats will be wonky compared to if you check them in another program.
Enclose the code with brackets ()
Divide the entries with commas ,
-Need ID column for all levels
ID column needs to be sorted so that it is in order from smallest to
largest
Use A# with # being the number of characters in the ID
Use an X1 to
move from the ID to the next column
-Need to say how many characters are needed in each column
Use F
After F is the number of characters needed for that column -Use F# (#= number)
There need to be enough character spaces to provide one 'gap' space
between each column
There need to be enough to character spaces to allow for the decimal
As part of the F you need to specify the number of decimal places
You do this by adding a decimal point after the F number and then a
number to represent the spaces you need -F#.#
You can use a number in front of the F so as to 'repeat' it. Not
necessary though. -#F#.#
All in all, it should look something like this:
(A4,X1,F4.0,F5.1)
Helpful links:
https://books.google.de/books?id=VdmVtz6Wtc0C&pg=PA78&lpg=PA78&dq=data+format+fortran+style+hlm&source=bl&ots=kURJ6USN5e&sig=fdtsmTGSKFxn04wkxvRc2Vw1l5Q&hl=en&sa=X&ved=0ahUKEwi_yPurjYrYAhWIJuwKHa0uCuAQ6AEIPzAC#v=onepage&q&f=false
http://www.ssicentral.com/hlm/help6/error/Problems_creating_MDM_files.pdf
http://www.ssicentral.com/hlm/help7/faq/FAQ_Format_specifications_for_ASCII_data.pdf

How to extract number and/or string in sas

Generic drug names are sometimes formatted like this:
X-Y Tab 5-325 MG, where the drug of interest is X and the amount is 5. A different column has pre-extracted this as: 5 MG-325 M or 10 MG-325 depending on the amount of X.
Is there a way I can extract the amount that is associated with a MG? I am not sure if it's possible to use IS LIKE since the column is character format, and then converting the amount since there is a space between the number and MG.
the examples you are providing suggest that the pre-extracted string is a number space MG. If that's always true you can use the scan function to get the number part of your string.
amount = scan(pre-extracted,1,' ');
you can do this in a data step. Look up the scan function and you can see other options that may help you customize this further.

regex for number between numbers

I'm in need of a regex, which takes a minimum and a maximum number to determine valid input, And I want the maximum and minimum to be dynamic.
I have been trying to get this done using this link
https://stackoverflow.com/a/13473595/1866676
But couldn't get it to work. Can someone please let me know how to do this.
Let's say I want to make a html5 input box, and I Want it to only receive numbers from 100 to 1999
What would a regex for this like this look like?
First off, while it is possible to do this, I think if there is a simpler way to choose a number range such as <input type="number" min="1" max="100">, that way would be preferred.
Having said that, here's how the kind of regex you requested works:
ones: ^[0-9]$ // just set the numbers -- matches 0 to 9
tens: ^[1-3]?[0-9]$ //set max tens and max ones -- matches 0 to 39
tens where max does not end in 9 ^[1-2]?[0-9]$|^[3][0-4]$ // 0 to 34
only tens: ^[1][5-9]$|^[2-3][0-9]$|^[4][0-5]$ // 15 to 45
Here, lets pick an arbitrary number 1234 to 2345
^[1][2][3][4-9]$|
^[1][2][4-9][0-9]$|
^[1][3-9][0-9][0-9]$|
^[2][0-2][0-9][0-9]$|
^[2][3][0-3][0-9]$|
^[2][3][4][0-5]$
https://regex101.com/r/pP8rQ7/4
Basically the ending of the middle series always needs to be a straight range that can reach 9 unless we are dealing with the ones place, and if it cant, you have to build it upwards toward the middle each time we have a value that can't start in 0 and then once we reach a value that cant end in 9 break early and set it in the next condition.
Notice the pattern, as each place solidifies. Also keep in mind that when dealing with going from lower to higher places, optional operators ? should be used.
Its a bit complex, but its nowhere near impossible to design a custom range with a bit of thought.
If you are more specific, we can craft an exact example, but this is generally how it is done:beginning-range|middle-range|end-range
You should only need beginning or end-ranges in certain cases like if the min or max does not end in 9. the ? means that the range that comes after it is optional. (so for example in the first case it lets us have both single and double numbers.
so for 100 - 1999 it's quite simple actually because you have lots of 9's and 0's
/^[1-9][0-9][0-9]$|^[1][0-9][0-9][0-9]$/
https://regex101.com/r/pP8rQ7/1
Note: Single values don't need ranges [n] I just added them for readability.
Edit: There used to be a regex range generator at: http://gamon.webfactional.com/regexnumericrangegenerator/. It appears to be offline now.
Essentially, you can't.
For every numeric range, there exists a regex that will match numbers in that range, therefore it is possible to write code that can generate a such regex. But such a regex is not a simple reformatting of the range ends.
However, such code would require colossal effort and complexity to write compared to code that simply checked the number using numeric methods.
With HTML 5 simply put a range input...
<form>
Quantity (between 100 and 1999):
<input type="number" name="quantity" min="100" max="1999">
</form>
with regex:
^([12345679])(\d)(\d)|^(1)(\d)(\d)(\d)
So if you need to create the regex dinamically it's possible but a bit tricky and complex

Can someone provide a regex for validating and parsing a csv of integers and reals

I am new to regex and struggling to create an expression to parse a csv containing 1 to n values. The values can be integers or real numbers. The sample inputs would be:
1
1,2,3,4,5
1,2.456, 3.08, 0.5, 7
This would be used in c#.
Thanks,
Jerry
Use a CSV parser instead of RegEx.
There are several options - see this SO questions and answers and this one for the different options (built into the BCL and third party libraries).
The BCL provides the TextFieldParser (within the VisualBasic namespace, but don't let that put you off it).
A third party library that is liked by many is filehelpers.
Using REGEX for CSV parsing has been a 10 year jihad for me. I have found it remarkably frustrating, due to the boundary cases:
Numbers come in a variety of forms (here in the US, Canada):
1
1.
1.0
1000
1000.
1,000
1e3
1.0e3
1.0e+3
1.0e+003
-1
-1.0 (etc)
But of course, Europe has traditionally been different with regard to commas and decimal points:
1
1,0
1000
1.000e3
1e3
1,0e3
1,0e+3
1,0e+003
Which just ruins everything. So, we ignore the German and French and Continental standard because the comma just is impossible to work out whether it is separating values, or part of values. (The Continent likes TAB instead of COMMA)
I'll assume that you're "just" looking for numerical values separated from each other by commas and possible space-padding. The expression:
\s*(\-?\d+(?:\.\d*)?(?:[eE][\-+]?\d*)?)\s*
is a pretty fair parser of A NUMBER. Catches just about every reasonable case. Doesn't deal with imbedded commas though! It also trims off spaces, either side of the number.
From there, you can either build an iterative CSV string decomposer (walking each field, absorbing commas, assigning to an array, say), or use the scanf type function to do the same thing. I do prefer the iterative decomposition method - as it also allows you to parse out strings, hexadecimal, and virtually any other pattern you find in the data.
The regex you want is
#"([+-]?\d+(?:\.\d+)?)(?:$|,\s*)"
...from which you'll want capture group 1. However, don't use regex for something like this. String manipulation is much better when the input is in a very static, predictable format:
string[] nums = strInput.split(", ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
List<float> results = (from n in nums
select float.Parse(n)).ToList();
If you do use regex, make sure you do a global capture.
I think you would have to loop it to check for an unknown number of ints... or else something like this:
/ *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) *,? *([0-9.]*) */
and you could keep that going ",?([0-9]*)" as far as you wanted to, to account for a lot of numbers. The result would be an array of numbers....
http://jsfiddle.net/8URvL/1/