Searching for an alternative for strtok() in C++ - c++

I am using strtok to divide a string in several parts.
In this example, all sections will be read from the string, which are bounded by a colon or a semicolon
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = ":;";
char *p;
p = strtok(string, delimiter);
while(p != NULL) {
cout << "Result: " << p << endl;
p = strtok(NULL, delimiter);
}
As results I get:
Result: Alice1
Result: IscoolAlice2
Result: Alert555678
Result: Bob1
Result: knowsBeepBob2
Result: sees
But I would like to get this results:
Result: Alice1:
Result: Alice2;
Result: Bob1:
Result: Bob2;
The restriction is that I can only choose individual characters when I use strtok.
Does anyone know an alternative for strtok that I also can search for strings?
Or has anyone an idea to solve my problem?

You can not do that task with strtok since you need more complex search
Although I am not sure what is your string as delimiter but the same output can be done with:
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = "(?:Alice|Bob)\\d.";
std::regex regex( delimiter );
std::regex_iterator< const char* > first( std::begin( string ), std::end( string ), regex ), last;
while( first != last ){
std::cout << "Result: " << first->str() << '\n';
++first;
}
the output:
Result: Alice1;
Result: Alice2;
Result: Bob1;
Result: Bob2;

It's just a simple bit of scratch logic, along these lines:
char *ptr = string;
while(*ptr)
{
printf("Result:");
while(*ptr)
{
printf("%c", *ptr);
if(ispunc(*ptr))
{
ptr++;
printf("\n");
break;
}
else
{
ptr++;
}
}
}

It's not possible with your stated data set to properly split it the way you want. You can come up with a "just so" rule to split literally just the data you showed, but given the messy nature of the data it's highly likely it'll fail on other examples. Let's start with this token.
IscoolAlice2
How is a computer program supposed to know which part of this is the name and which is not? You want to get "Alice2" out of this. If you decide that a capital letter specifies a name then it will just spit out the "name" IscoolAlice2. The same with:
knowsBeepBob2
If you search for the first capital letter then the program will decide his name is BeepBob2, so in each case searching for the last occurance of a capital letter in the token finds the name. But what if a name contains two capital letters? The program will cut their name off and you can't do anything about that.
If you're happy to live with these sorts of limitations you can do an initial split via strtok using only the ; character, which gives:
Alice1:IscoolAlice2
Alert555678
Bob1:knowsBeepBob2
sees
Which is less than ideal. You could then specify a rule such that a name exists in any row which contains a : taking anything left of the : as a name, and then finding the last capital letter and anything from that point is also a name. That would give you the output you desire.
But the rules I outlined are extremely specific to the data that was just fed in. If anything about other samples of data deviates at all from this (e.g. a name with two capitals in it) then it will fail as there will be no way on Earth the program could determine where the "name" starts.
The only way to fix this is to go back to where the data is coming from and format it differently so that there is some sort of punctuation before the names.
Or alternatively you need a full database of all possible names that could appear, then search for them, find any characters up to the next : or ; and append them and print the name. But that seems extremely impractical.

Related

Using regex to parse out numbers

My problem is more or less self-explanatory, I want to write a regex to parse out numbers out of a string that user enters via console. I take the user input using:
getline(std::cin,stringName); //1 2 3 4 5
I asume that user enters N numbers followed by white spaces except the last number.
I have solved this problem by analyzing string char by char like this:
std::string helper = "";
std::for_each(stringName.cbegin(), strinName.cend(), [&](char c)
{
if (c == ' ')
{
intVector.push_back(std::stoi(helper.c_str()));
helper = "";
}
else
helper += c;
});
intVector.push_back(std::stoi(helper.c_str()));
I want to achieve the same behavior by using regex. I've wrote the following code:
std::regex rx1("([0-9]+ )");
std::sregex_iterator begin(stringName.begin(), stringName.end(), rx1);
std::sregex_iterator end;
while (begin != end)
{
std::smatch sm = *begin;
int number = std::stoi(sm.str(1));
std::cout << number << " ";
}
Problem with this regex occurs when it gets to the last number since it doesn't have space behind it, therefore it enters an infinite loop. Can someone give me an idea on how to fix this?
You're going to get an endless loop there because you never increment begin. If you do that, you'll get all the numbers except the last one (which, as you say, is not followed by a space).
But I don't understand why you feel it necessary to include the whitespace in the regular expression. If you just match a string of digits, the regex will automatically select the longest possible match, so the following character (if any) cannot be a digit.
I also see no value in the capture in the regex. If you wanted to restrict the capture to the number itself, you would have used ([0-9]+). (But since stoi only converts until it finds a non-digit, it doesn't matter.)
So you just use this:
std::regex rx1("[0-9]+");
for (auto it = std::sregex_iterator{str.begin(), str.end(), rx1},
end = std::sregex_iterator{};
it != end;
++it) {
std::cout << std::stoi(it->str(0)) << '\n';
}
(Live on coliru)

C++ , How can I ignore comma (,) from csv char *?

I have searched a lot about it on SO and solutions like "" the part where comma is are giving errors. Moreover it is using C++ :)
char *msg = new char[40];
msg = "1,2, Hello , how are you ";
char msg2[30];
strcpy_s(msg2, msg);
char * pch;
pch = strtok(msg2, ",");
while (pch != NULL)
{
cout << pch << endl;
pch = strtok(NULL, ",");
}
Output I want :
1
2
Hello , how are you
Out put it is producing
1
2
Hello
how are you
I have tried putting "" around Hello , how are you. But it did not help.
The CSV files are comma separated values. If you want a comma inside the value, you have to surround it with quotes.
Your example in CSV, as you need your output, should be:
msg = "1,2, \"Hello , how are you \"";
so the value Hello , how are you is surrounded with quotes.
This is the standard CSV. This has nothing to do with the behaviour of the strtok function.
The strtok function just searches, without considering anything else, the tokens you have passed to it, in this case the ,, thus it ignores the ".
In order to make it work as you want, you would have to tokenize with both tokens, the , and the ", and consider the previous found token in order to decide if the , found is a new value or it is inside quotes.
NOTE also that if you want to be completely conforming with the CSV specification, you should consider that the quotes may also be escaped, in order to have a quote character inside the value term. See this answer for an example:
Properly escape a double quote in CSV
NOTE 2: Just for completeness, here is the CSV specification (RFC-4180): https://www.rfc-editor.org/rfc/rfc4180

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Spliting string into a list of substrings

I have a string id <- "Hello these are words N12345678 hooray how fun".
I would like to extract just N12345678 from this string.
So far I have used strsplit(id, " "). Now I have
>id
>[[1]]
>[1] "Hello" "these" "are" "words" "N12345678" "hooray" "how"
>[8] "fun"
Which is of type list and of length 1 (despite apparently having 8 elements?)
If I then use id <- id[grep("^[N][0-9]",id)],
id is an empty list.
I think what I need to do is split the string into a list of length 8 with each element as a substring and then grep should be able to pick out the pattern, but I'm not sure how to go about that.
Use regmatches
> regmatches(id, regexpr("N[0-9]+", id))
[1] "N12345678"
If you insist on using strsplit. I think this can solve the problem:
id <- "Hello these are words N12345678 hooray how fun"
id = strsplit(id, " ")
id[[1]][grep("^N[1-9]", id[[1]])]
Notice that I haven't changed your regex. It could be more precise expression such as ^N\\d+$.
Do you know about strtok? It will parse your input line on certain characters. For the purpose of my example, I am breaking off a piece of my string every time I hit a space.
tempVar = strtok(string, " ");
// tempVar has "id" or everything up to the first space
while (tempVar != NULL)
{
tempVar = strtok(NULL, " ");
//now tempVar picked up the next word, and will loop picking up the next word until the end of string
}
Using this, your "Hello these are words N123456789 Hooray" would do this:
tempVar would be Hello, then "these" etc etc.
Each time through the loop tempVar would get a new value. So i would suggest evaluating tempVar in the loop (before grabbing the next one) so that you can stop when you have N123456789
Try:
gsub('\\b[a-zA-Z]+\\b','',id)

Find Group of Characters From String

I did a program to remove a group of Characters From a String. I have given below that coding here.
void removeCharFromString(string &str,const string &rStr)
{
std::size_t found = str.find_first_of(rStr);
while (found!=std::string::npos)
{
str[found]=' ';
found=str.find_first_of(rStr,found+1);
}
str=trim(str);
}
std::string str ("scott<=tiger");
removeCharFromString(str,"<=");
as for as my program, I got my output Correctly. Ok. Fine. If I give a value for str as "scott=tiger" , Then the searchable characters "<=" not found in the variable str. But my program also removes '=' character from the value 'scott=tiger'. But I don't want to remove the characters individually. I want to remove the characters , if i only found the group of characters '<=' found. How can i do this ?
The method find_first_of looks for any character in the input, in your case, any of '<' or '='. In your case, you want to use find.
std::size_t found = str.find(rStr);
This answer works on the assumption that you only want to find the set of characters in the exact sequence e.g. If you want to remove <= but not remove =<:
find_first_of will locate any of the characters in the given string, where you want to find the whole string.
You need something to the effect of:
std::size_t found = str.find(rStr);
while (found!=std::string::npos)
{
str.replace(found, rStr.length(), " ");
found=str.find(rStr,found+1);
}
The problem with str[found]=' '; is that it'll simply replace the first character of the string you are searching for, so if you used that, your result would be
scott =tiger
whereas with the changes I've given you, you'll get
scott tiger