C++ , How can I ignore comma (,) from csv char *? - c++

I have searched a lot about it on SO and solutions like "" the part where comma is are giving errors. Moreover it is using C++ :)
char *msg = new char[40];
msg = "1,2, Hello , how are you ";
char msg2[30];
strcpy_s(msg2, msg);
char * pch;
pch = strtok(msg2, ",");
while (pch != NULL)
{
cout << pch << endl;
pch = strtok(NULL, ",");
}
Output I want :
1
2
Hello , how are you
Out put it is producing
1
2
Hello
how are you
I have tried putting "" around Hello , how are you. But it did not help.

The CSV files are comma separated values. If you want a comma inside the value, you have to surround it with quotes.
Your example in CSV, as you need your output, should be:
msg = "1,2, \"Hello , how are you \"";
so the value Hello , how are you is surrounded with quotes.
This is the standard CSV. This has nothing to do with the behaviour of the strtok function.
The strtok function just searches, without considering anything else, the tokens you have passed to it, in this case the ,, thus it ignores the ".
In order to make it work as you want, you would have to tokenize with both tokens, the , and the ", and consider the previous found token in order to decide if the , found is a new value or it is inside quotes.
NOTE also that if you want to be completely conforming with the CSV specification, you should consider that the quotes may also be escaped, in order to have a quote character inside the value term. See this answer for an example:
Properly escape a double quote in CSV
NOTE 2: Just for completeness, here is the CSV specification (RFC-4180): https://www.rfc-editor.org/rfc/rfc4180

Related

Using one cout command to print multiple strings with each string placed on a different (text editor) line

Take a look at the following example:
cout << "option 1:
\n option 2:
\n option 3";
I know,it's not the best way to output a string,but the question is why does this cause an error saying that a " character is missing?There is a single string that must go to stdout but it just consists of a lot of whitespace charcters.
What about this:
string x="
string_test";
One may interpret that string as: "\nxxxxxxxxxxxxstring_test" where x is a whitespace character.
Is it a convention?
That's called multiline string literal.
You need to escape the embedded newline. Otherwise, it will not compile:
std::cout << "Hello world \
and stackoverflow";
Note: Backslashes must be immediately before the line ends as they need to escape the newline in the source.
Also you can use the fun fact "Adjacent string literals are concatenated by the compiler" for your advantage by this:
std::cout << "Hello World"
"Stack overflow";
See this for raw string literals. In C++11, we have raw string literals. They are kind of like here-text.
Syntax:
prefix(optional) R"delimiter( raw_characters )delimiter"
It allows any character sequence, except that it must not contain the
closing sequence )delimiter". It is used to avoid escaping of any
character. Anything between the delimiters becomes part of the string.
const char* s1 = R"foo(
Hello
World
)foo";
Example taken from cppreference.

How to delimit this text file? strtok

so there's a text file where I have 1. languages, a 2. text of a number written in the said language, 3. the base of the number and 4. the number written in digits. Here's a sample:
francais deux mille quatre cents 10 2400
How I went about it:
struct Nomen{
char langue[21], nomNombre [31], baseC[3], nombreC[21];
int base, nombre;
};
and in the main:
if(myfile.is_open()){
{
while(getline(myfile, line))
{
strcpy(Linguo[i].langue, strtok((char *)line.c_str(), " "));
strcpy(Linguo[i].nomNombre, strtok(NULL, " "));
strcpy(Linguo[i].baseC, strtok(NULL, " "));
strcpy(Linguo[i].nombreC, strtok(NULL, "\n"));
i++;
}
Difficulty: I'm trying to put two whitespaces as a delimiter, but it seems that strtok() counts it as if there were only one whitespace. The fact there are spaces in the text number, etc. is messing up the tokenization. How should I go about it?
strtok treats any single character in the provided string as a delimiter. It does not treat the string itself as a single delimiter. So " " (two spaces) is the same as " " (one space).
strtok will also treat multiple delimiters together as a single delimiter. So the input "t1 t2" will be tokenized as two tokens, "t1" and "t2".
As mentioned in comments, strtok is also writes the NUL character into the input to create the token strings. So, it is an error to pass the result of string::c_str() as input to the function. The fact that you need to cast the constant string should have been enough to dissuade you from this approach.
If you want to treat a double space as a delimiter, you will have to scan the string and search for them yourself. Given you are using C APIs, you can consider strstr. However, in C++, you can use string::find.
Here's an algorithm to parse your string manually:
Given an input string input:
language is the substring from the start of input to the first SPC character.
From where language ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
text is the substring from the start of input to the first double SPC sequence.
From where text ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
Parse base, and parse number.

Searching for an alternative for strtok() in C++

I am using strtok to divide a string in several parts.
In this example, all sections will be read from the string, which are bounded by a colon or a semicolon
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = ":;";
char *p;
p = strtok(string, delimiter);
while(p != NULL) {
cout << "Result: " << p << endl;
p = strtok(NULL, delimiter);
}
As results I get:
Result: Alice1
Result: IscoolAlice2
Result: Alert555678
Result: Bob1
Result: knowsBeepBob2
Result: sees
But I would like to get this results:
Result: Alice1:
Result: Alice2;
Result: Bob1:
Result: Bob2;
The restriction is that I can only choose individual characters when I use strtok.
Does anyone know an alternative for strtok that I also can search for strings?
Or has anyone an idea to solve my problem?
You can not do that task with strtok since you need more complex search
Although I am not sure what is your string as delimiter but the same output can be done with:
char string[] = "Alice1:IscoolAlice2; Alert555678;Bob1:knowsBeepBob2;sees";
char delimiter[] = "(?:Alice|Bob)\\d.";
std::regex regex( delimiter );
std::regex_iterator< const char* > first( std::begin( string ), std::end( string ), regex ), last;
while( first != last ){
std::cout << "Result: " << first->str() << '\n';
++first;
}
the output:
Result: Alice1;
Result: Alice2;
Result: Bob1;
Result: Bob2;
It's just a simple bit of scratch logic, along these lines:
char *ptr = string;
while(*ptr)
{
printf("Result:");
while(*ptr)
{
printf("%c", *ptr);
if(ispunc(*ptr))
{
ptr++;
printf("\n");
break;
}
else
{
ptr++;
}
}
}
It's not possible with your stated data set to properly split it the way you want. You can come up with a "just so" rule to split literally just the data you showed, but given the messy nature of the data it's highly likely it'll fail on other examples. Let's start with this token.
IscoolAlice2
How is a computer program supposed to know which part of this is the name and which is not? You want to get "Alice2" out of this. If you decide that a capital letter specifies a name then it will just spit out the "name" IscoolAlice2. The same with:
knowsBeepBob2
If you search for the first capital letter then the program will decide his name is BeepBob2, so in each case searching for the last occurance of a capital letter in the token finds the name. But what if a name contains two capital letters? The program will cut their name off and you can't do anything about that.
If you're happy to live with these sorts of limitations you can do an initial split via strtok using only the ; character, which gives:
Alice1:IscoolAlice2
Alert555678
Bob1:knowsBeepBob2
sees
Which is less than ideal. You could then specify a rule such that a name exists in any row which contains a : taking anything left of the : as a name, and then finding the last capital letter and anything from that point is also a name. That would give you the output you desire.
But the rules I outlined are extremely specific to the data that was just fed in. If anything about other samples of data deviates at all from this (e.g. a name with two capitals in it) then it will fail as there will be no way on Earth the program could determine where the "name" starts.
The only way to fix this is to go back to where the data is coming from and format it differently so that there is some sort of punctuation before the names.
Or alternatively you need a full database of all possible names that could appear, then search for them, find any characters up to the next : or ; and append them and print the name. But that seems extremely impractical.

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Tokenize a string based on quotes

I am trying to read data from a text file and split the read line based on quotes. For example
"Hi how" "are you" "thanks"
Expected output
Hi how
are you
thanks
My code:
getline(infile, line);
ch = strdup(line.c_str());
ch1 = strtok(ch, " ");
while (ch1 != NULL)
{
a3[i] = ch1;
ch1 = strtok(NULL, " ");
i++;
}
I don't know what to specify as delimiter string. I am using strtok() to split, but it failed. Can any one help me?
Please have a look at the example code here. You should provide "\"" as delimiter string to strtok.
For example,
ch1 = strtok (ch,"\"");
Probably your problem is related with representing escape sequences. Please have a look here for a list of escape sequences for characters.
Given your input: "Hi how" "are you" "thanks", if you use strtok with "\"" as the delimiter, it'll treat the spaces between the quoted strings as if they were also strings, so if (for example) you printed out the result strings, one per line, surrounded by square brackets, you'd get:
[Hi how]
[ ]
[are you]
[ ]
[thanks]
I.e., the blank character between each quoted string is, itself, being treated as a string. If the delimiter you supplied to strtok was " \"" (i.e., included both a quote and a space) that wouldn't happen, but then it would also break on the spaces inside the quoted strings.
Assuming you can depend on every item you care about being quoted, you want to skip anything until you get to a quote, ignore the quote, then read data into your input string until you get to another quote, then repeat the whole process.