read and get value btw whitespace and another character - c++

how to get the 1st name. here is the sample of data.. first name here is Owen, Florencio. I need to read and get the value frm whitespace to ; ??
Owen;Grzegorek;Howard Miller Co;15410 Minnetonka Industrial Rd;Minnetonka;Hennepin;MN;55345;952-939-2973;952-939-4663;owen#grzegorek.com;http://www.owengrzegorek.com
Florencio;Hollberg;Hellenic Museum & Cultural Ctr;2211 Kenmere Ave;Burbank;Los

Use string::find to find the first instance of semi-colon, and do a string::substr.
string str = "Florencio;Hollberg;Hellenic Museum & Cultural Ctr;2211 Kenmere Ave;Burbank;Los";
std::size_t pos = str.find(";");
str = str.substr(0, pos);
cout << str << endl;
Output:
Florencio
Of course, you have to modify the code to suit your needs.

Related

How to name regex group matches in C++ the way python does (?P<name_of_regex>(.*))

I have a string in my program that contains certain values for parameters. I need to extract the values from the parameters using regex.
The regex looks like this:
std::smatch param;
std::string str = "--name=AName --age=AnAge --gender=AGender"
if (std::regex_match(str, param, std::regex(".*--name=(\\w+) .*--age=(\\d+) .*--gender=(\\w+) .*")))
{
//if it finds the order of the regex will come here and the values for each will be stored in param[1-3]
}
The problem is the order of the params can come in different orders, for example:
std::string str = "--gender=AGender --name=AName --age=AnAge"
std::string str = "--age=AnAge --gender=AGender --name=AName"
std::string str = "--name=AName --gender=AGender --age=AnAge "
Is there a way to express in a single regex expression to be able to capture values despite of the order instead of doing on regex per parameter I want to find? If so how can I access such value? In python is possible to add an <id> before the desired group to then later access it using same identifier. In my example code I do that using smatch type variable but the access to it depends on the order that the string has and I cannot rely on that.
Use this regex:
"^(?=.*--name=(\\w+))(?=.*--age=(\\d+))(?=.*--gender=(\\w+)).+"
The one problem you'll run into is the fact that params won't be able to determine which item belongs to which parameter.
The way I would solve this problem would be to use std::string::find.
For example:
std::string str = "--name=AName --age=AnAge --gender=AGender";
size_t namePos = str.find("--name=");
size_t agePos = str.find("--age=");
size_t genderPos = str.find("--gender=");
std::string name = "";
std::string gender = "";
std::string age = "";
if(namePos != std::string::npos)
{
// Add 7 to namePos since the size of "--name=" is 7.
// Assuming that the delimiter of the name is whitespace so find the first
// whitespace after --name=
name = str.substr(namePos + 7, str.find_first_of(" \n\r", namePos + 7) - (namePos + 7));
}
if(agePos != std::string::npos)
{
// Add 6 to agePos since the size of "--age=" is 6.
// Assuming that the delimiter of the age is whitepace so find the first
// whitespace after --age=
age = str.substr(agePos + 6, str.find_first_of(" \n\r", agePos + 6) - (agePos + 6));
}
if(genderPos != std::string::npos)
{
// Add 9 to genderPos since the size of "--gender=" is 9.
// Assuming that the delimiter of the gender is whitespace so find the first
// whitespace after --gender=
gender = str.substr(genderPos + 9, str.find_first_of(" \n\r", genderPos + 9) - (genderPos + 9));
std::cout << name << " " << gender << " " << age << std::endl;
}
Output:
AName AGender AnAge
There are better tools to parse commandlines, but if you really want to use regex, you will find that Boost::Regex makes this much easier than the std::regex.
In particular, it supports named groups (see e.g. Boost Regular Expression: Getting the Named Group) which is the feature you request in your question title.
You can combine that with BOOST_REGEX_MATCH_EXTRA to keep all matches for all named groups (by default, only the last match for each capture group is accessible after the search.)
Then you can just make a big disjunction ((?<group1>...)|(?<group2>...)|...) in your regex for all the groups you may encounter, and you will be able to get all values out regardless of their order.

Replace 8 characters after finding 3 characters from a string

Is it possible in C++ to replace 8 characters after finding 3 characters from a string
I tried below
Input:
txtvar = "This is for Testing Purpose line";
Expected should be output
txtvar = "This is for Testing XXXXXX line";
I tried below
std::string::size_type pos;
while (( pos = txtvar. find ("Testing")) ! = std::string::npos) {
txtvar.replace(pos, 9, XXXX);
}
After finding the Testing keyword next to that 9 characters should be replaced to "XXXXXXX"
Please help me on this
yes you can, the string class have methods for that, just look in the documentation for it:
//https://en.cppreference.com/w/cpp/string/basic_string
std::string txtvar {"This is for Testing Purpose line"};
//https://en.cppreference.com/w/cpp/string/basic_string/find
auto index {txtvar.find("Purpose")};
std::string t{"XXXXXXX"};
txtvar.replace(index, 7, t);
std::cout << txtvar << std::endl;

Searching for an alternative for strtok() in C++

I am using strtok to divide a string in several parts.
In this example, all sections will be read from the string, which are bounded by a colon or a semicolon
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = ":;";
char *p;
p = strtok(string, delimiter);
while(p != NULL) {
cout << "Result: " << p << endl;
p = strtok(NULL, delimiter);
}
As results I get:
Result: Alice1
Result: IscoolAlice2
Result: Alert555678
Result: Bob1
Result: knowsBeepBob2
Result: sees
But I would like to get this results:
Result: Alice1:
Result: Alice2;
Result: Bob1:
Result: Bob2;
The restriction is that I can only choose individual characters when I use strtok.
Does anyone know an alternative for strtok that I also can search for strings?
Or has anyone an idea to solve my problem?
You can not do that task with strtok since you need more complex search
Although I am not sure what is your string as delimiter but the same output can be done with:
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = "(?:Alice|Bob)\\d.";
std::regex regex( delimiter );
std::regex_iterator< const char* > first( std::begin( string ), std::end( string ), regex ), last;
while( first != last ){
std::cout << "Result: " << first->str() << '\n';
++first;
}
the output:
Result: Alice1;
Result: Alice2;
Result: Bob1;
Result: Bob2;
It's just a simple bit of scratch logic, along these lines:
char *ptr = string;
while(*ptr)
{
printf("Result:");
while(*ptr)
{
printf("%c", *ptr);
if(ispunc(*ptr))
{
ptr++;
printf("\n");
break;
}
else
{
ptr++;
}
}
}
It's not possible with your stated data set to properly split it the way you want. You can come up with a "just so" rule to split literally just the data you showed, but given the messy nature of the data it's highly likely it'll fail on other examples. Let's start with this token.
IscoolAlice2
How is a computer program supposed to know which part of this is the name and which is not? You want to get "Alice2" out of this. If you decide that a capital letter specifies a name then it will just spit out the "name" IscoolAlice2. The same with:
knowsBeepBob2
If you search for the first capital letter then the program will decide his name is BeepBob2, so in each case searching for the last occurance of a capital letter in the token finds the name. But what if a name contains two capital letters? The program will cut their name off and you can't do anything about that.
If you're happy to live with these sorts of limitations you can do an initial split via strtok using only the ; character, which gives:
Alice1:IscoolAlice2
Alert555678
Bob1:knowsBeepBob2
sees
Which is less than ideal. You could then specify a rule such that a name exists in any row which contains a : taking anything left of the : as a name, and then finding the last capital letter and anything from that point is also a name. That would give you the output you desire.
But the rules I outlined are extremely specific to the data that was just fed in. If anything about other samples of data deviates at all from this (e.g. a name with two capitals in it) then it will fail as there will be no way on Earth the program could determine where the "name" starts.
The only way to fix this is to go back to where the data is coming from and format it differently so that there is some sort of punctuation before the names.
Or alternatively you need a full database of all possible names that could appear, then search for them, find any characters up to the next : or ; and append them and print the name. But that seems extremely impractical.

Spliting string into a list of substrings

I have a string id <- "Hello these are words N12345678 hooray how fun".
I would like to extract just N12345678 from this string.
So far I have used strsplit(id, " "). Now I have
>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"
Which is of type list and of length 1 (despite apparently having 8 elements?)
If I then use id <- id[grep("^[N][0-9]",id)],
id is an empty list.
I think what I need to do is split the string into a list of length 8 with each element as a substring and then grep should be able to pick out the pattern, but I'm not sure how to go about that.
Use regmatches
> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"
If you insist on using strsplit. I think this can solve the problem:
id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]
Notice that I haven't changed your regex. It could be more precise expression such as ^N\\d+$.
Do you know about strtok? It will parse your input line on certain characters. For the purpose of my example, I am breaking off a piece of my string every time I hit a space.
tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
tempVar = strtok(NULL, " ");
//now tempVar picked up the next word, and will loop picking up the next word until the end of string
}
Using this, your "Hello these are words N123456789 Hooray" would do this:
tempVar would be Hello, then "these" etc etc.
Each time through the loop tempVar would get a new value. So i would suggest evaluating tempVar in the loop (before grabbing the next one) so that you can stop when you have N123456789
Try:
gsub('\\b[a-zA-Z]+\\b','',id)

Driver's license magnetic strip data format

From this wikipedia article(http://en.wikipedia.org/wiki/Magnetic_stripe_card#cite_note-14), I understand the basic data format for driver's license. It starts with the location data which looks like this: %CODENVER^
I am wondering what if the city consists of two or more words like New York City?
What does the data output look like, and is it a white-space character that separates the words, or it's something else?
How do I write a c++ statement to return each word in the city name in different strings?
It will depend on the delimiter. States use different formats for their data. Mag stripes will have one delimiter to split the data into different sections, then another delimiter to split the sections into individual parts.
For an example, let's say that the data you want to parse is:
New^York^City
Use something like this to split it out:
int main()
{
std::string s = "New^York^City";
std::string delim = "^";
auto start = 0U;
auto end = s.find(delim);
while (end != std::string::npos)
{
std::cout << s.substr(start, end - start) << std::endl;
start = end + delim.length();
end = s.find(delim, start);
}
std::cout << s.substr(start, end);
}
Then your output should be:
New
York
City
Search more for C++ string parsing. I used the split function from here:
Parse (split) a string in C++ using string delimiter (standard C++)