How to delimit this text file? strtok - c++

so there's a text file where I have 1. languages, a 2. text of a number written in the said language, 3. the base of the number and 4. the number written in digits. Here's a sample:
francais deux mille quatre cents 10 2400
How I went about it:
struct Nomen{
char langue[21], nomNombre [31], baseC[3], nombreC[21];
int base, nombre;
};
and in the main:
if(myfile.is_open()){
{
while(getline(myfile, line))
{
strcpy(Linguo[i].langue, strtok((char *)line.c_str(), " "));
strcpy(Linguo[i].nomNombre, strtok(NULL, " "));
strcpy(Linguo[i].baseC, strtok(NULL, " "));
strcpy(Linguo[i].nombreC, strtok(NULL, "\n"));
i++;
}
Difficulty: I'm trying to put two whitespaces as a delimiter, but it seems that strtok() counts it as if there were only one whitespace. The fact there are spaces in the text number, etc. is messing up the tokenization. How should I go about it?

strtok treats any single character in the provided string as a delimiter. It does not treat the string itself as a single delimiter. So " " (two spaces) is the same as " " (one space).
strtok will also treat multiple delimiters together as a single delimiter. So the input "t1 t2" will be tokenized as two tokens, "t1" and "t2".
As mentioned in comments, strtok is also writes the NUL character into the input to create the token strings. So, it is an error to pass the result of string::c_str() as input to the function. The fact that you need to cast the constant string should have been enough to dissuade you from this approach.
If you want to treat a double space as a delimiter, you will have to scan the string and search for them yourself. Given you are using C APIs, you can consider strstr. However, in C++, you can use string::find.
Here's an algorithm to parse your string manually:
Given an input string input:
language is the substring from the start of input to the first SPC character.
From where language ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
text is the substring from the start of input to the first double SPC sequence.
From where text ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
Parse base, and parse number.

Related

Using one cout command to print multiple strings with each string placed on a different (text editor) line

Take a look at the following example:
cout << "option 1:
\n option 2:
\n option 3";
I know,it's not the best way to output a string,but the question is why does this cause an error saying that a " character is missing?There is a single string that must go to stdout but it just consists of a lot of whitespace charcters.
What about this:
string x="
string_test";
One may interpret that string as: "\nxxxxxxxxxxxxstring_test" where x is a whitespace character.
Is it a convention?
That's called multiline string literal.
You need to escape the embedded newline. Otherwise, it will not compile:
std::cout << "Hello world \
and stackoverflow";
Note: Backslashes must be immediately before the line ends as they need to escape the newline in the source.
Also you can use the fun fact "Adjacent string literals are concatenated by the compiler" for your advantage by this:
std::cout << "Hello World"
"Stack overflow";
See this for raw string literals. In C++11, we have raw string literals. They are kind of like here-text.
Syntax:
prefix(optional) R"delimiter( raw_characters )delimiter"
It allows any character sequence, except that it must not contain the
closing sequence )delimiter". It is used to avoid escaping of any
character. Anything between the delimiters becomes part of the string.
const char* s1 = R"foo(
Hello
World
)foo";
Example taken from cppreference.

C++ Find Word in String without Regex

I'm trying to find a certain word in a string, but find that word alone. For example, if I had a word bank:
789540132143
93
3
5434
I only want a match to be found for the value 3, as the other values do not match exactly. I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3.
There is no whitespace surrounding the values, and I am not allowed to use Regex. I'm looking for the fastest implementation of completing this task.
If you want to count the words you should use a string to int map. Read a word from your file using >> into a string then increment the map accordingly
string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
input >> word;
count[word]++;
}
using >> has the benefit that you don't have to worry about whitespace.
All depends on the definition of words: is it a string speparated from others with a whitespace ? Or are other word separators (e.g. coma, dot, semicolon, colon, parenntheses...) relevant as well ?
How to parse for words without regex:
Here an accetable approach using find() and its variant find_first_of():
string myline; // line to be parsed
string what="3"; // string to be found
string separator=" \t\n,;.:()[]"; // string separators
while (getline(cin, myline)) {
size_t nxt=0;
while ( (nxt=myline.find(what, nxt)) != string::npos) { // search occurences of what
if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
size_t nsep=myline.find_first_of(separator,nxt+1); // check if goes to end of wordd
if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
cout << "Line: "<<myline<<endl; // bingo !!
cout << "from pos "<<nxt<<" to " << nsep << endl;
}
}
nxt++; // ready for next occurence
}
}
And here the online demo.
The principle is to check if the occurences found correspond to a word, i.e. are at the begin of a string or begin of a word (i.e. the previous char is a separator) and that it goes until the next separator (or end of line).
How to solve your real problem:
You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts !
The best way to achieve this would certainly be to use a map<string, int> to store/updated a counter for each string encountered in the file.
You then just have to parse each line into words (you could use find_fisrst_of() as suggested above) and use the map:
mymap[word]++;

How to save " in a string in C++?

So I have the following code which doesn't work. I couldn't figure it out how to do it.
std::string str("Q850?51'18.23"");
First problem I face is " (quotation mark). I cannot save it as a string because at the end of the string I have two " characters and C++ doesn't let me save the whole string.
Second I want to split the string and save it in different variables.
E.g.;
double i = 850;
double j = 51;
double k = 18.23;
You will need to escape the quotation mark you require in the string;
std::string str("Q850?51'18.23\"");
// ^ escape the quote here
The cppreference site has a list of these escape sequences.
Alternatively you are use a raw string literal;
std::string str = R"(Q850?51'18.23")";
The second part of the problem is dependent on the format and predictability of the data;
If it is fixed width, a simple index and be used to extract the numbers and convert to the double you require.
If it is delimited with the characters above, you can consume the string to each of the delimiters extracting the numbers in-between them (you should be able to find suitable libraries to assist with this).
If it is some further unknown composition, you may be limited to consuming the string one character at a time and extracting the numerical values between the non-numerical values.
You need to escape your quote mark:
std::string str("Q850?51'18.23\"");
// ^
You need to escape your quote mark
Add a backslash before "
std::string str("Q850?51'18.23\"");

How to remove white space in the beginning of a sentence.

I made a vector that stores each sentence from a file. However, I noticed that each vector is stored differently. For example, if the file was "hello bob. how are you. hey there."
I used
while(getline(mFile, str, '.'))
to get each sentence and
vecString.push_back(str + '.');
to store each each sentence in the vector. So vector[0] would hold "hello bob.", vector[1] would hold " how are you.", and vector [3] would hold " hey there.". How do I get rid of the space in starting sentence of vector[2] and vector [3]?
The Boost String Algorithms Library has trimming functions.
There are many examples of this on stackoverflow. Have a look at these.
Removing leading and trailing spaces from a string
What's the best way to trim std::string?
Strip leading (i.e. left) whitespace using:
std::string s(" String with leading whitespace.");
s.erase(0, s.find_first_not_of(" \t"));
In addition to ' ' and '\t' consider also '\r', '\n', '\v', and '\f'.

Tokenize a string based on quotes

I am trying to read data from a text file and split the read line based on quotes. For example
"Hi how" "are you" "thanks"
Expected output
Hi how
are you
thanks
My code:
getline(infile, line);
ch = strdup(line.c_str());
ch1 = strtok(ch, " ");
while (ch1 != NULL)
{
a3[i] = ch1;
ch1 = strtok(NULL, " ");
i++;
}
I don't know what to specify as delimiter string. I am using strtok() to split, but it failed. Can any one help me?
Please have a look at the example code here. You should provide "\"" as delimiter string to strtok.
For example,
ch1 = strtok (ch,"\"");
Probably your problem is related with representing escape sequences. Please have a look here for a list of escape sequences for characters.
Given your input: "Hi how" "are you" "thanks", if you use strtok with "\"" as the delimiter, it'll treat the spaces between the quoted strings as if they were also strings, so if (for example) you printed out the result strings, one per line, surrounded by square brackets, you'd get:
[Hi how]
[ ]
[are you]
[ ]
[thanks]
I.e., the blank character between each quoted string is, itself, being treated as a string. If the delimiter you supplied to strtok was " \"" (i.e., included both a quote and a space) that wouldn't happen, but then it would also break on the spaces inside the quoted strings.
Assuming you can depend on every item you care about being quoted, you want to skip anything until you get to a quote, ignore the quote, then read data into your input string until you get to another quote, then repeat the whole process.