Driver's license magnetic strip data format - c++

From this wikipedia article(http://en.wikipedia.org/wiki/Magnetic_stripe_card#cite_note-14), I understand the basic data format for driver's license. It starts with the location data which looks like this: %CODENVER^
I am wondering what if the city consists of two or more words like New York City?
What does the data output look like, and is it a white-space character that separates the words, or it's something else?
How do I write a c++ statement to return each word in the city name in different strings?

It will depend on the delimiter. States use different formats for their data. Mag stripes will have one delimiter to split the data into different sections, then another delimiter to split the sections into individual parts.
For an example, let's say that the data you want to parse is:
New^York^City
Use something like this to split it out:
int main()
{
std::string s = "New^York^City";
std::string delim = "^";
auto start = 0U;
auto end = s.find(delim);
while (end != std::string::npos)
{
std::cout << s.substr(start, end - start) << std::endl;
start = end + delim.length();
end = s.find(delim, start);
}
std::cout << s.substr(start, end);
}
Then your output should be:
New
York
City
Search more for C++ string parsing. I used the split function from here:
Parse (split) a string in C++ using string delimiter (standard C++)

Related

How to name regex group matches in C++ the way python does (?P<name_of_regex>(.*))

I have a string in my program that contains certain values for parameters. I need to extract the values from the parameters using regex.
The regex looks like this:
std::smatch param;
std::string str = "--name=AName --age=AnAge --gender=AGender"
if (std::regex_match(str, param, std::regex(".*--name=(\\w+) .*--age=(\\d+) .*--gender=(\\w+) .*")))
{
//if it finds the order of the regex will come here and the values for each will be stored in param[1-3]
}
The problem is the order of the params can come in different orders, for example:
std::string str = "--gender=AGender --name=AName --age=AnAge"
std::string str = "--age=AnAge --gender=AGender --name=AName"
std::string str = "--name=AName --gender=AGender --age=AnAge "
Is there a way to express in a single regex expression to be able to capture values despite of the order instead of doing on regex per parameter I want to find? If so how can I access such value? In python is possible to add an <id> before the desired group to then later access it using same identifier. In my example code I do that using smatch type variable but the access to it depends on the order that the string has and I cannot rely on that.
Use this regex:
"^(?=.*--name=(\\w+))(?=.*--age=(\\d+))(?=.*--gender=(\\w+)).+"
The one problem you'll run into is the fact that params won't be able to determine which item belongs to which parameter.
The way I would solve this problem would be to use std::string::find.
For example:
std::string str = "--name=AName --age=AnAge --gender=AGender";
size_t namePos = str.find("--name=");
size_t agePos = str.find("--age=");
size_t genderPos = str.find("--gender=");
std::string name = "";
std::string gender = "";
std::string age = "";
if(namePos != std::string::npos)
{
// Add 7 to namePos since the size of "--name=" is 7.
// Assuming that the delimiter of the name is whitespace so find the first
// whitespace after --name=
name = str.substr(namePos + 7, str.find_first_of(" \n\r", namePos + 7) - (namePos + 7));
}
if(agePos != std::string::npos)
{
// Add 6 to agePos since the size of "--age=" is 6.
// Assuming that the delimiter of the age is whitepace so find the first
// whitespace after --age=
age = str.substr(agePos + 6, str.find_first_of(" \n\r", agePos + 6) - (agePos + 6));
}
if(genderPos != std::string::npos)
{
// Add 9 to genderPos since the size of "--gender=" is 9.
// Assuming that the delimiter of the gender is whitespace so find the first
// whitespace after --gender=
gender = str.substr(genderPos + 9, str.find_first_of(" \n\r", genderPos + 9) - (genderPos + 9));
std::cout << name << " " << gender << " " << age << std::endl;
}
Output:
AName AGender AnAge
There are better tools to parse commandlines, but if you really want to use regex, you will find that Boost::Regex makes this much easier than the std::regex.
In particular, it supports named groups (see e.g. Boost Regular Expression: Getting the Named Group) which is the feature you request in your question title.
You can combine that with BOOST_REGEX_MATCH_EXTRA to keep all matches for all named groups (by default, only the last match for each capture group is accessible after the search.)
Then you can just make a big disjunction ((?<group1>...)|(?<group2>...)|...) in your regex for all the groups you may encounter, and you will be able to get all values out regardless of their order.

Allow user to pass a separator character by doubling it in C++

I have a C++ function that accepts strings in below format:
<WORD>: [VALUE]; <ANOTHER WORD>: [VALUE]; ...
This is the function:
std::wstring ExtractSubStringFromString(const std::wstring String, const std::wstring SubString) {
std::wstring S = std::wstring(String), SS = std::wstring(SubString), NS;
size_t ColonCount = NULL, SeparatorCount = NULL; WCHAR Separator = L';';
ColonCount = std::count(S.begin(), S.end(), L':');
SeparatorCount = std::count(S.begin(), S.end(), Separator);
if ((SS.find(Separator) != std::wstring::npos) || (SeparatorCount > ColonCount))
{
// SEPARATOR NEED TO BE ESCAPED, BUT DON'T KNOW TO DO THIS.
}
if (S.find(SS) != std::wstring::npos)
{
NS = S.substr(S.find(SS) + SS.length() + 1);
if (NS.find(Separator) != std::wstring::npos) { NS = NS.substr(NULL, NS.find(Separator)); }
if (NS[NS.length() - 1] == L']') { NS.pop_back(); }
return NS;
}
return L"";
}
Above function correctly outputs MANGO if I use it like:
ExtractSubStringFromString(L"[VALUE: MANGO; DATA: NOTHING]", L"VALUE")
However, if I have two escape separators in following string, I tried doubling like ;;, but I am still getting MANGO instead ;MANGO;:
ExtractSubStringFromString(L"[VALUE: ;;MANGO;;; DATA: NOTHING]", L"VALUE")
Here, value assigner is colon and separator is semicolon. I want to allow users to pass colons and semicolons to my function by doubling extra ones. Just like we escape double quotes, single quotes and many others in many scripting languages and programming languages, also in parameters in many commands of programs.
I thought hard but couldn't even think a way to do it. Can anyone please help me on this situation?
Thanks in advance.
You should search in the string for ;; and replace it with either a temporary filler char or string which can later be referenced and replaced with the value.
So basically:
1) Search through the string and replace all instances of ;; with \tempFill- It would be best to pick a combination of characters that would be highly unlikely to be in the original string.
2) Parse the string
3) Replace all instances of \tempFill with ;
Note: It would be wise to run an assert on your string to ensure that your \tempFill (or whatever you choose as the filler) is not in the original string to prevent an bug/fault/error. You could use a character such as a \n and make sure there are non in the original string.
Disclaimer:
I can almost guarantee there are cleaner and more efficient ways to do this but this is the simplest way to do it.
First as the substring does not need to be splitted I assume that it does not need to b pre-processed to filter escaped separators.
Then on the main string, the simplest way IMHO is to filter the escaped separators when you search them in the string. Pseudo code (assuming the enclosing [] have been removed):
last_index = begin_of_string
index_of_current_substring = begin_of_string
loop: search a separator starting at last index - if not found exit loop
ok: found one at ix
if char at ix+1 is a separator (meaning with have an escaped separator
remove character at ix from string by copying all characters after it one step to the left
last_index = ix+1
continue loop
else this is a true separator
search a column in [ index_of_current_substring, ix [
if not found: error incorrect string
say found at c
compare key_string with string[index_of_current_substring, c [
if equal - ok we found the key
value is string[ c+2 (skip a space after the colum), ix [
return value - search is finished
else - it is not our key, just continue searching
index_of_current_substring = ix+1
last_index = index_of_current_substring
continue loop
It should now be easy to convert that to C++

Searching for an alternative for strtok() in C++

I am using strtok to divide a string in several parts.
In this example, all sections will be read from the string, which are bounded by a colon or a semicolon
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = ":;";
char *p;
p = strtok(string, delimiter);
while(p != NULL) {
cout << "Result: " << p << endl;
p = strtok(NULL, delimiter);
}
As results I get:
Result: Alice1
Result: IscoolAlice2
Result: Alert555678
Result: Bob1
Result: knowsBeepBob2
Result: sees
But I would like to get this results:
Result: Alice1:
Result: Alice2;
Result: Bob1:
Result: Bob2;
The restriction is that I can only choose individual characters when I use strtok.
Does anyone know an alternative for strtok that I also can search for strings?
Or has anyone an idea to solve my problem?
You can not do that task with strtok since you need more complex search
Although I am not sure what is your string as delimiter but the same output can be done with:
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = "(?:Alice|Bob)\\d.";
std::regex regex( delimiter );
std::regex_iterator< const char* > first( std::begin( string ), std::end( string ), regex ), last;
while( first != last ){
std::cout << "Result: " << first->str() << '\n';
++first;
}
the output:
Result: Alice1;
Result: Alice2;
Result: Bob1;
Result: Bob2;
It's just a simple bit of scratch logic, along these lines:
char *ptr = string;
while(*ptr)
{
printf("Result:");
while(*ptr)
{
printf("%c", *ptr);
if(ispunc(*ptr))
{
ptr++;
printf("\n");
break;
}
else
{
ptr++;
}
}
}
It's not possible with your stated data set to properly split it the way you want. You can come up with a "just so" rule to split literally just the data you showed, but given the messy nature of the data it's highly likely it'll fail on other examples. Let's start with this token.
IscoolAlice2
How is a computer program supposed to know which part of this is the name and which is not? You want to get "Alice2" out of this. If you decide that a capital letter specifies a name then it will just spit out the "name" IscoolAlice2. The same with:
knowsBeepBob2
If you search for the first capital letter then the program will decide his name is BeepBob2, so in each case searching for the last occurance of a capital letter in the token finds the name. But what if a name contains two capital letters? The program will cut their name off and you can't do anything about that.
If you're happy to live with these sorts of limitations you can do an initial split via strtok using only the ; character, which gives:
Alice1:IscoolAlice2
Alert555678
Bob1:knowsBeepBob2
sees
Which is less than ideal. You could then specify a rule such that a name exists in any row which contains a : taking anything left of the : as a name, and then finding the last capital letter and anything from that point is also a name. That would give you the output you desire.
But the rules I outlined are extremely specific to the data that was just fed in. If anything about other samples of data deviates at all from this (e.g. a name with two capitals in it) then it will fail as there will be no way on Earth the program could determine where the "name" starts.
The only way to fix this is to go back to where the data is coming from and format it differently so that there is some sort of punctuation before the names.
Or alternatively you need a full database of all possible names that could appear, then search for them, find any characters up to the next : or ; and append them and print the name. But that seems extremely impractical.

read and get value btw whitespace and another character

how to get the 1st name. here is the sample of data.. first name here is Owen, Florencio. I need to read and get the value frm whitespace to ; ??
Owen;Grzegorek;Howard Miller Co;15410 Minnetonka Industrial Rd;Minnetonka;Hennepin;MN;55345;952-939-2973;952-939-4663;owen#grzegorek.com;http://www.owengrzegorek.com
Florencio;Hollberg;Hellenic Museum & Cultural Ctr;2211 Kenmere Ave;Burbank;Los
Use string::find to find the first instance of semi-colon, and do a string::substr.
string str = "Florencio;Hollberg;Hellenic Museum & Cultural Ctr;2211 Kenmere Ave;Burbank;Los";
std::size_t pos = str.find(";");
str = str.substr(0, pos);
cout << str << endl;
Output:
Florencio
Of course, you have to modify the code to suit your needs.

How to separate a line of input into multiple variables?

I have a file that contains rows and columns of information like:
104857 Big Screen TV 567.95
573823 Blender 45.25
I need to parse this information into three separate items, a string containing the identification number on the left, a string containing the item name, and a double variable containing the price. The information is always found in the same columns, i.e. in the same order.
I am having trouble accomplishing this. Even when not reading from the file and just using a sample string, my attempt just outputs a jumbled mess:
string input = "104857 Big Screen TV 567.95";
string tempone = "";
string temptwo = input.substr(0,1);
tempone += temptwo;
for(int i=1 ; temptwo != " " && i < input.length() ; i++)
{
temptwo = input.substr(j,j);
tempone += temp2;
}
cout << tempone;
I've tried tweaking the above code for quite some time, but no luck, and I can't think of any other way to do it at the moment.
You can find the first space and the last space using std::find_first_of and std::find_last_of . You can use this to better split the string into 3 - first space comes after the first variable and the last space comes before the third variable, everything in between is the second variable.
How about following pseudocode:
string input = "104857 Big Screen TV 567.95";
string[] parsed_output = input.split(" "); // split input string with 'space' as delimiter
// parsed_output[0] = 104857
// parsed_output[1] = Big
// parsed_output[2] = Screen
// parsed_output[3] = TV
// parsed_output[4] = 567.95
int id = stringToInt(parsed_output[0]);
string product = concat(parsed_output[1], parsed_output[2], ... ,parsed_output[length-2]);
double price = stringToDouble(parsed_output[length-1]);
I hope, that's clear.
Well try breaking down the files components:
you know a number always comes first, and we also know a number has no white spaces.
The string following the number CAN have whitespaces, but won't contain any numbers(i would assume)
After this title, you're going to have more numbers(with no whitespaces)
from these components, you can deduce:
grabbing the first number is as simple as reading in using the filestream <<.
getting the string requires you to check until you reach a number, grabbing one character at a time and inserting that into a string. the last number is just like the first, using the filestream <<
This seems like homework so i'll let you put the rest together.
I would try a regular expression, something along these lines:
^([0-9]+)\s+(.+)\s+([0-9]+\.[0-9]+)$
I am not very good at regex syntax, but ([0-9]+) corresponds to a sequence of digits (this is the id), ([0-9]+\.[0-9]+) is the floating point number (price) and (.+) is the string that is separated from the two number by sequences of "space" characters: \s+.
The next step would be to check if you need this to work with prices like ".50" or "10".