Convert and validate string

Convert and validate string - c++

I need to take time as user input in the form HH:MM and then validate it.
It needs to be a proper time in that certain format. Any good Ideas on how to do that?
I'm trying to make a function that will iterate through the string, validating each character, then convert them into numbers (or some kind of time stamp) so I can compare several strings to eachother.
I'm only using the std namespace.

Use boost::regex to match string and its parts (HH) and (MM) and use scanf to get hours and minutes.

It sounds more like an algorithm problem, I would:
1, check the length of the string if it's 5.
2, check if ':' is in the middle.
3, check HH is in the range.
4, check MM is in the range.
5, Convert it to the format which will bring convenience to you.

It may be overkill for this particular problem, but this kind of task is a great fit for a state machine. Basically, you'll want to read the input one character at a time, and each character can change the machine's state until you end up in a success or error state. For example:
First character
If not a number, change to error state
Otherwise store value and change to state 2
Second character
If not a number, change to error state
Otherwise multiply stored value by 10 and add second character. If the result is out of range, change to error state. Otherwise, change to state 3
Third character
If :, change to state 4, otherwise change to error state
Fourth character
Similar to First character, changing to state 5 upon success.
Fifth character
Similar to Second character, changing to state 6 upon success.
Success state
A winner is yuo!
Error state
Handle the error, duh.

Related

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

I am working with a messy manually maintained "database" that has a column containing a string with name,value pairs. I am trying to parse the entire column with regexp to pull out the values. The column is huge (>100,000 entries). As a proxy for my actual data, let's use this code:
line1={'''thing1'': ''-583'', ''thing2'': ''245'', ''thing3'': ''246'', ''morestuff'':, '''''};
line2={'''thing1'': ''617'', ''thing2'': ''239'', ''morestuff'':, '''''};
line3={'''thing1'': ''unexpected_string(with)parens5'', ''thing2'': 245, ''thing3'':''246'', ''morestuff'':, '''''};
mycell=vertcat(line1,line2,line3);
This captures the general issues encountered in the database. I want to extract what thing1, thing2, and thing3 are in each line using cellfun to output a scalar cell array. They should normally be 3 digit numbers, but sometimes they have an unexpected form. Sometimes thing3 is completely missing, without the name even showing up in the line. Sometimes there are minor formatting inconsistencies, like single quotes missing around the value, spaces missing, or dashes showing up in front of the three digit value. I have managed to handle all of these, except for the case where thing3 is completely missing.
My general approach has been to use expressions like this:
expr1='(?<=thing1''):\s?''?-?([\w\d().]*?)''?,';
expr2='(?<=thing2''):\s?''?-?([\w\d().]*?)''?,';
expr3='(?<=thing3''):\s?''?-?([\w\d().]*?)''?,';
This looks behind for thingX' and then tries to match : followed by zero or one spaces, followed by 0 or 1 single quote, followed by zero or one dash, followed by any combination of letters, numbers, parentheses, or periods (this is defined as the token), using a lazy match, until zero or one single quote is encountered, followed by a comma. I call regexp as regexp(___,'tokens','once') to return the matching token.
The problem is that when there is no match, regexp returns an empty array. This prevents me from using, say,
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),mycell);
unless I call it with 'UniformOutput',false. The problem with that is twofold. First, I need to then manually find the rows where there was no match. For example, I can do this:
emptyout=cellfun(#(x) isempty(x),out);
emptyID=find(emptyout);
backfill=cell(length(emptyID),1);
[backfill{:}]=deal('Unknown');
out(emptyID)=backfill;
In this example, emptyID has a length of 1 so this code is overkill. But I believe this is the correct way to generalize for when it is longer. This code will change every empty cell array in out with the string Unknown. But this leads to the second problem. I've now got a 'messy' cell array of non-scalar values. I cannot, for example, check unique(out) as a result.
Pardon the long-windedness but I wanted to give a clear example of the problem. Now my actual question is in a few parts:
Is there a way to accomplish what I'm trying to do without using 'UniformOutput',false? For example, is there a way to have regexp pass a custom string if there is no match (e.g. pass 'Unknown' if there is no match)? I can think of one 'cheat', which would be to use the | operator in the expression, and if the first token is not matched, look for something that is ALWAYS found. I would then still need to double back through the output and change every instance of that result to 'Unknown'.
If I take the 'UniformOutput',false approach, how can I recover a scalar cell array at the end to easily manipulate it (e.g. pass it through unique)? I will admit I'm not 100% clear on scalar vs nonscalar cell arrays.
If there is some overall different approach that I'm not thinking of, I'm also open to it.
Tangential to the main question, I also tried using a single expression to run regexp using 3 tokens to pull out the values of thing1, thing2, and thing3 in one pass. This seems to require 'UniformOutput',false even when there are no empty results from regexp. I'm not sure how to get a scalar cell array using this approach (e.g. an Nx1 cell array where each cell is a 3x1 cell).
At the end of the day, I want to build a table using these results:
mytable=table(out1,out2,out3);
Edit: Using celldisp sheds some light on the problem:
celldisp(out)
out{1}{1} =
246
out{2} =
Unknown
out{3}{1} =
246
I assume that I need to change the structure of out so that the contents of out{1}{1} and out{3}{1} are instead just out{1} and out{3}. But I'm not sure how to accomplish this if I need 'UniformOutput',false.

Note: I've not used MATLAB and this doesn't answer the "efficient" aspect, but...
How about forcing there to always be a match?
Just thinking about you really wanting a match to skip this problem, how about an empty match?
Looking on the MATLAB help page here I can see a 'emptymatch' option, perhaps this is something to try.
E.g.
the_thing_i_want_to_find|
Match "the_thing_i_want_to_find" or an empty match, note the | character.
In capture group it might look like this:
(the_thing_i_want_to_find|)

As a workaround, I have found that using regexprep can be used to find entries where thing3 is missing. For example:
replace='$1 ''thing3'': ''Unknown'', ''morestuff''';
missingexpr='(?<=thing2'':\s?)(''?-?[\w\d().]*?''?,) ''morestuff''';
regexprep(mycell{2},missingexpr,replace)
ans =
''thing1': '617', 'thing2': '239', 'thing3': 'Unknown', 'morestuff':, '''
Applying it to the entire array:
fixedcell=cellfun(#(x) regexprep(x,missingexpr,replace),mycell);
out=cellfun(#(x) regexp(x,expr3,'tokens','once'),fixedcell,'UniformOutput',false);
This feels a little roundabout, but it works.

cellfun can be replaced with a plain old for loop. Your code will either be equally fast, or maybe even faster. cellfun is implemented with a loop anyway, there is no advantage of using it other than fewer lines of code. In your explicit loop, you can then check the output of regexp, and build your output array any way you like.

Starting position for replace function in db2

I'm converting some Access VBA functionality to DB2 and found a vital difference. VBA lets you specify the starting point in the character string you're working on. DB2 doesn't have that option. It starts from position 1 and replaces whatever you want to be replaced in the whole string. How can I make DB2 start the replace at a specified place in the string? For example, my string is "Incongruent Plastics Incorporated" and I want to replace the second "Inc" at position 22 with "Inc". I'm doing this in a WHILE loop, going through long strings, replacing parts of them until they are less than a specified maximum (15 or 30 depending on the field).
I looked at the Locate function, but I'm not sure that's right.
Replace(a.PAYEE_STD_NAME, B.FullWord, B.abbreviation, B.mLastWord)
Where a.PAYEE_STD_NAME is the string I'm looking at, B.FullWord is what I want to replace, B.abbreviation is what I want to replace it with, and B.mLastWord is the position where I want to start replacing. Something like Replace("Incongruent Plastics Incorporated","Incorporated","Inc",22)
I expect the characters to be replaced starting in the position I need, towards the back of the string, not in the beginning.
Thanks!

Not that good at DB2, but that limitation can generally be worked around by using SUBSTR
The equivalent of Replace(a.PAYEE_STD_NAME, B.FullWord, B.abbreviation, B.mLastWord) would be:
CONCAT(SUBSTR(a.PAYEE_STD_NAME, 1, B.mLastWord - 1), Replace(SUBSTR(a.PAYEE_STD_NAME, b.mLastWord), B.FullWord, B.abbreviation))
This assumes b.mLastWord is greater than 1, if it's 1 you can use a normal REPLACE.

Maybe consider using REGEXP_REPLACE https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061496.html
and possibly consider recusrive SQL rather than looping logic

Avoiding Comments w/ C++ getline()

I'm using getline() to open a .cpp file.
getline(theFile, fileData);
I'm wondering if there is any way to have getline() avoid grabbing c++ comments (/*, */ and //)?
So far, trying something like this doesn't quite work.
if (fileData[i] == '/*')

I think it's unavoidable for you to read the comments, but you can dispose of them by reading through the file one character at a time.
To do this, you can load the file into a string and build a state machine with the following states:
This is actual code
The previous character was /
The previous character was *
I am a single-line comment
I am a multi-line comment
The state machine starts in State 1
If the machine is in State 1 and hits a / character, transition to State 2.
If the machine is in State 2 and hits a / character, transition to State 4. Otherwise, transition to State 1.
If the machine is in State 2 and hits a * character, transition to State 5. Otherwise, transition to State 1.
If the machine is in State 4 and hits a newline character, transition to State 1.
If the machine is in State 5 and hits a * character, transition to State 3.
If the machine is in State 3 and hits a / character, transition to State 1 (the multi-line comment ends). Otherwise, transition to State 5.
If you mark the positions of the characters where the machine enters and exits the comment states, you can then strip these characters from the string.
Alternatively, you could explore regular expressions, which provide ways of describing this kind of state machine very succinctly.

So, one problem is that if(fileData[i] == '/*') is testing if the char fileData[i] is equal to '/*' which is... Not a char.
To find if a line contains a comment, you will probably want to look into one of the following:
<regex> in C++11 (Boost has a regular expression library as well, if that's more your thing.)
strstr in vanilla C/C++.
For multi-line comments, you'll probably want to store something like store a flag indicating whether the state of the previous line was "in comment" or not, and then search for /* or */ according to that flag, updating it as you go.

Single quotation marks designate a char, and the char data type represent a SINGLE char.'/*' doesn't make sense, because it's two char while fileData[i] refers to a single char.
Your if statement needs to be far more robust.

Restrict users to enter numbers valid only till 2 decimal places C/C++

I am making an currency change program where I would be providing exact change to the input amount, for example a value of 23 would be one 20 dollars and 3 one dollar bills
I want to restrict the user to input the value only till 2 decimal places. For example: the valid inputs are
20, 20.4, 23.44 but an invalid input would be 20.523 or 20.000.
How can I do this is C/C++.
I read about one function that is setprecision but that is not what I want, setprecision allows to display the value till that decimal point, it still doesn't stop the user from entering any value.
Is there any way to do this?

Read the amount from the user as a string, either character by character or the entire line, and then check its format, and then convert it.

It's generally easier to let the user type whatever they want followed by the program rejecting the input if it isn't valid rather than restricting what they can type on a keystroke basis.
For keystroke analysis you would need a state machine with 4 states, which we can call Number, Numberdot, Numberdotone, and Numberdottwo. Your code would have to make the proper transitions for all keystrokes, including the arrow keys to move the cursor to some arbitrary place and the Backspace key. That's a lot of work.
With input validation, all you have to do is check the input using a regular expression, e.g. ^(([0-9]+) | ([0-9]+.[0-9]) | ([0-9]+.[0-9][0-9])$. This assumes that "20." is not valid. Then if it's invalid you tell the user and make them do it again.

I do not believe that there is any way to set the library to do this for you. Because of that you're going to have to do the work yourself.
There are may ways you can do this, but the only true way to handle restricting the input is to control reading it in yourself.
In this case you would loop on keyboard input, for ever keystroke you would have to decided if it can be accepted in the context of the past input, then display it. That is, if there is a decimal point you would only accept to more numbers. This also allows you to limit input to numbers and decimal places as well, not to mention input length.
The down side is you will have to handle all the editing commands. Even bare bones you would need to support delete and enter.

This is rather a task for the GUI you are using, than for core C/C++. Depending on your GUI/Web Toolkit you can give more or less detailed rules how data can or can not be entered.
If you are writing a normal GUI application you can control and modify the entered keys (in C or C++).
In a WEB application you can do similar things using javascript.
The best solution would be when all illegal input is impossible.

checking float inside a string and return result?

I have a text file which I geline to a string. The file is like this: 0.2abc 0.2 .2abc .2 abc.2abc abc.2 abc0.20 .2 . 20
I wanna check the result then parse it in to separate float. The result is:0.2 0.2abc 2 20 2abc abc0.20 abc
This is expalined: check if there is 2 digit (before and after '.' (full stop)) whether with char or not. If only 1 site of the '.' is digit the '.' will be full stop.
How can I parse a STRING to separate result like that? I did use iterator to check the '.' and pos of it, but still got stuck.

The first thing you need to do is split the input in words. Easy, just don't use .getline()
but instead rely on `while (cin >> strWord ) { /* do stuff with word*/ };
The second thing is to kick out bad input words early: words of 2 characters or less, with more than one ., or with the . first or last.
You now know that the . is somewhere in the middle. find() will give you an iterator. ++ and -- give you the next and previous iterators. * gives you the character that the iterator points to. isdigit() tells you whether that character is a digit. Add ingredients together and you're done.

Seems like some fairly complicated advice above -- and not necessarily helpful.
Your question does not make it entirely clear what the end result should look like. Do you want an array of floating point numbers? Do you just want the sum? Do you want to print out the results?
If you want help with homework, the best policy is to post your own attempt and then others can help you improve it, to make it work.
One approach that might help is to try to break the string into sub-strings (tokens) and discard the junk.
Write a function that accepts a character and returns true (this is part of a floating point number) or false (it isn't).
Scan along the string using an iterator or an index.
While current char is not part of a token, skip it.
If you find a token char, while current char is part of a token, copy it to another string
etc. to get all floating point substrings.
Then you can use std::stringstream or ::atof() to convert.
Have a bit of a go and post what you can get done.

sounds like you could use some regex to extract your number.
Try this regex in order to extract the floating values within a string.
[0-9]+\.[0-9]+
Keep in mind that this won't extract integer values. ie 234abc
I don't know if there is a built-in way to use regex in c++ but i found this library with a quick google search which allows you to use regex in c++

Sounds like you should look at the "Interpreter" Design Pattern.
Or you could use the "State" Design Pattern and do it by hand.
There should be plenty of examples of both on the web.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert and validate string - c++

Use boost::regex to match string and its parts (HH) and (MM) and use scanf to get hours and minutes.

It sounds more like an algorithm problem, I would: 1, check the length of the string if it's 5. 2, check if ':' is in the middle. 3, check HH is in the range. 4, check MM is in the range. 5, Convert it to the format which will bring convenience to you.

Related

Matlab: What's the most efficient approach to parse a large table or cell array with regexp when sometimes there is no match?

Starting position for replace function in db2

Avoiding Comments w/ C++ getline()

Restrict users to enter numbers valid only till 2 decimal places C/C++

checking float inside a string and return result?

Categories

Resources