How to remove white space in the beginning of a sentence. - c++

I made a vector that stores each sentence from a file. However, I noticed that each vector is stored differently. For example, if the file was "hello bob. how are you. hey there."
I used
while(getline(mFile, str, '.'))
to get each sentence and
vecString.push_back(str + '.');
to store each each sentence in the vector. So vector[0] would hold "hello bob.", vector[1] would hold " how are you.", and vector [3] would hold " hey there.". How do I get rid of the space in starting sentence of vector[2] and vector [3]?

The Boost String Algorithms Library has trimming functions.

There are many examples of this on stackoverflow. Have a look at these.
Removing leading and trailing spaces from a string
What's the best way to trim std::string?

Strip leading (i.e. left) whitespace using:
std::string s(" String with leading whitespace.");
s.erase(0, s.find_first_not_of(" \t"));
In addition to ' ' and '\t' consider also '\r', '\n', '\v', and '\f'.

Related

Removing the first word from a sentence and store it c++

I am reading from a file in C++, and I want to remove all but the first word and store it,
sentence = sentence.substr(sentence.find_first_of(" \t") +
1);
this code remove the first word and keep the whole sentence , is there a way to store the removed word.
https://en.cppreference.com/w/cpp/string/basic_string/find_first_of
take position of first match from find_first_of and then sentence start pos to position from find_first_of
std::string w1 = sentence.substr(0, sentence.find_first_of(" \t"));

How to delimit this text file? strtok

so there's a text file where I have 1. languages, a 2. text of a number written in the said language, 3. the base of the number and 4. the number written in digits. Here's a sample:
francais deux mille quatre cents 10 2400
How I went about it:
struct Nomen{
char langue[21], nomNombre [31], baseC[3], nombreC[21];
int base, nombre;
};
and in the main:
if(myfile.is_open()){
{
while(getline(myfile, line))
{
strcpy(Linguo[i].langue, strtok((char *)line.c_str(), " "));
strcpy(Linguo[i].nomNombre, strtok(NULL, " "));
strcpy(Linguo[i].baseC, strtok(NULL, " "));
strcpy(Linguo[i].nombreC, strtok(NULL, "\n"));
i++;
}
Difficulty: I'm trying to put two whitespaces as a delimiter, but it seems that strtok() counts it as if there were only one whitespace. The fact there are spaces in the text number, etc. is messing up the tokenization. How should I go about it?
strtok treats any single character in the provided string as a delimiter. It does not treat the string itself as a single delimiter. So " " (two spaces) is the same as " " (one space).
strtok will also treat multiple delimiters together as a single delimiter. So the input "t1 t2" will be tokenized as two tokens, "t1" and "t2".
As mentioned in comments, strtok is also writes the NUL character into the input to create the token strings. So, it is an error to pass the result of string::c_str() as input to the function. The fact that you need to cast the constant string should have been enough to dissuade you from this approach.
If you want to treat a double space as a delimiter, you will have to scan the string and search for them yourself. Given you are using C APIs, you can consider strstr. However, in C++, you can use string::find.
Here's an algorithm to parse your string manually:
Given an input string input:
language is the substring from the start of input to the first SPC character.
From where language ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
text is the substring from the start of input to the first double SPC sequence.
From where text ends, skip over all whitespace, changing input to begin at the first non-whitespace character.
Parse base, and parse number.

Remove spaces from string before period and comma

I could have a string like:
During this time , Bond meets a stunning IRS agent , whom he seduces .
I need to remove the extra spaces before the comma and before the period in my whole string. I tried throwing this into a char vector and only not push_back if the current char was " " and the following char was a "." or "," but it did not work. I know there is a simple way to do it maybe using trim(), find(), or erase() or some kind of regex but I am not the most familiar with regex.
A solution could be (using regex library):
std::string fix_string(const std::string& str) {
static const std::regex rgx_pattern("\\s+(?=[\\.,])");
std::string rtn;
rtn.reserve(str.size());
std::regex_replace(std::back_insert_iterator<std::string>(rtn),
str.cbegin(),
str.cend(),
rgx_pattern,
"");
return rtn;
}
This function takes in input a string and "fixes the spaces problem".
Here a demo
On a loop search for string " ," and if you find one replace that to ",":
std::string str = "...";
while( true ) {
auto pos = str.find( " ," );
if( pos == std::string::npos )
break;
str.replace( pos, 2, "," );
}
Do the same for " .". If you need to process different space symbols like tab use regex and proper group.
I don't know how to use regex for C++, also not sure if C++ supports PCRE regex, anyway I post this answer for the regex (I could delete it if it doesn't work for C++).
You can use this regex:
\s+(?=[,.])
Regex demo
First, there is no need to use a vector of char: you could very well do the same by using an std::string.
Then, your approach can't work because your copy is independent of the position of the space. Unfortunately you have to remove only spaces around the punctuation, and not those between words.
Modifying your code slightly you could delay copy of spaces waiting to the value of the first non-space: if it's not a punctuation you'd copy a space before the character, otherwise you just copy the non-space char (thus getting rid of spaces.
Similarly, once you've copied a punctuation just loop and ignore the following spaces until the first non-space char.
I could have written code. It would have been shorter. But i prefer letting you finish your homework with full understanding of the approach.

Find Group of Characters From String

I did a program to remove a group of Characters From a String. I have given below that coding here.
void removeCharFromString(string &str,const string &rStr)
{
std::size_t found = str.find_first_of(rStr);
while (found!=std::string::npos)
{
str[found]=' ';
found=str.find_first_of(rStr,found+1);
}
str=trim(str);
}
std::string str ("scott<=tiger");
removeCharFromString(str,"<=");
as for as my program, I got my output Correctly. Ok. Fine. If I give a value for str as "scott=tiger" , Then the searchable characters "<=" not found in the variable str. But my program also removes '=' character from the value 'scott=tiger'. But I don't want to remove the characters individually. I want to remove the characters , if i only found the group of characters '<=' found. How can i do this ?
The method find_first_of looks for any character in the input, in your case, any of '<' or '='. In your case, you want to use find.
std::size_t found = str.find(rStr);
This answer works on the assumption that you only want to find the set of characters in the exact sequence e.g. If you want to remove <= but not remove =<:
find_first_of will locate any of the characters in the given string, where you want to find the whole string.
You need something to the effect of:
std::size_t found = str.find(rStr);
while (found!=std::string::npos)
{
str.replace(found, rStr.length(), " ");
found=str.find(rStr,found+1);
}
The problem with str[found]=' '; is that it'll simply replace the first character of the string you are searching for, so if you used that, your result would be
scott =tiger
whereas with the changes I've given you, you'll get
scott tiger

Tokenize a string based on quotes

I am trying to read data from a text file and split the read line based on quotes. For example
"Hi how" "are you" "thanks"
Expected output
Hi how
are you
thanks
My code:
getline(infile, line);
ch = strdup(line.c_str());
ch1 = strtok(ch, " ");
while (ch1 != NULL)
{
a3[i] = ch1;
ch1 = strtok(NULL, " ");
i++;
}
I don't know what to specify as delimiter string. I am using strtok() to split, but it failed. Can any one help me?
Please have a look at the example code here. You should provide "\"" as delimiter string to strtok.
For example,
ch1 = strtok (ch,"\"");
Probably your problem is related with representing escape sequences. Please have a look here for a list of escape sequences for characters.
Given your input: "Hi how" "are you" "thanks", if you use strtok with "\"" as the delimiter, it'll treat the spaces between the quoted strings as if they were also strings, so if (for example) you printed out the result strings, one per line, surrounded by square brackets, you'd get:
[Hi how]
[ ]
[are you]
[ ]
[thanks]
I.e., the blank character between each quoted string is, itself, being treated as a string. If the delimiter you supplied to strtok was " \"" (i.e., included both a quote and a space) that wouldn't happen, but then it would also break on the spaces inside the quoted strings.
Assuming you can depend on every item you care about being quoted, you want to skip anything until you get to a quote, ignore the quote, then read data into your input string until you get to another quote, then repeat the whole process.