Loop to split string by delimiter not respecting all deliminators

Loop to split string by delimiter not respecting all deliminators - c++

I'm splitting an std::string into words and append each to a doubly linked list. I've written an append_word method to add to the list, which I've verified is not the issue (adding words individually through this method is perfectly fine). My function is as follows:
int set_text(std::string text) {
u_int i, word_start, word_end;
i = word_start = word_end = 0;
while (i <= text.length()) {
if (text[i] == ' ' || text[i] == '\0') {
word_end = i;
std::string substr = text.substr(word_start, word_end);
// empty string argument to append_word has no effect
if (append_word("", substr) < 0)
return -1;
i = word_start = ++word_end;
} else {
i++
}
}
return 0;
}
However, calling this on "Lorem ipsum dolor" results in a first word of "Lorem", and a second word of "ipsum dolor" before a third word of "dolor". Why is the second space only treated as a delimiter in the third word but not the second?

Related

Checking both ends of a string not working

I am trying to the beginning of a string and the end. If the word has an uppercase letter we change it to lowercase. If the word has a space or '"' we erase the character. The first recursive call it should check and see that the end of the string has a capital letter and it should change it to lowercase. However when I output word[word.size()] it outputs a blank space, but it I output word[word.size() - 1] it will output the letter that I am looking for. I wasn't sure what the blank space is and how I should handle it as I don't want it in my string because it is causing comparison issues.
bool checkPalindrome(string word){
if (isupper(word[0]))
{
word[0] = tolower(word[0]);
}
if (isupper(word[word.size()]))
{
word[word.size()] = tolower(word[word.size()]);
}
//check if there is a space or "" if there is then delete that position from the string
if (word[0] == ' ' || word[0] == '"')
{
word.erase(1);
}
if (word[word.size()] == ' ' || word[word.size()] == '"')
{
word.pop_back();
}
if (word.size() > 1)
{
if (word[0] == word[word.size()])
{
word = word.substr(1, word.size() - 2);
return checkPalindrome(word, count);
}
else
{
return false;
}
}
else
{
return false;
}
int main()
{
ifstream inFile;
bool check = false;
string temp = "";
int count = 0;
vector<string> vect;
//Reading from a file line by line
inFile.open("words.txt");
if (inFile.is_open())
{
while (getline(inFile, temp))
{
vect.push_back(temp);
}
}
inFile.close();
for (auto i = 0; i < vect.size(); i++)
{
count = vect[i].size();
check = checkPalindrome(vect[1], count);
if (check == true)
{
cout << vect[i] << ", is a palindrome!\n";
}
else
{
cout << vect[i] << ", is not a palindrome.\n";
}
}
} return 0;

If the size of the string is 4, then there are only 4 elements: 0, 1, 2, and 3. There is no fifth element, so you cannot access element number four.
If a string's length is five:
Zero is the first element.
One is the second element.
Two is the third element.
Three is fourth element.
The fifth element is, of course Leeloo, which is not a character in the string. If a string's length is four, you should not attempt to access the fifth element (at least, not without her permission).
Ecto gamat.

How to remove an entire word in a sentence c++

I want to remove specific word in a sentences and I have tried to breakdown the sentences into word and compare the word but when i call the erase function , the index will update. I have tried another method while it will delete substring in a word which i dont want to. Can anyone help me out a bit? what approach i should use.
input
The house whirled around two or three times and rose slowly through the air.
output
The house whirled around two or three times and rose slowly through air.
here is my prototype of the function
int RemoveWordFromLine(string line, string word)
{
// ==========================
string tmp_str="",spacebar=" ";
int start=0,end=-1;
for(int i=0;i<line.length();i++)
{
if(isspace(line[i])||int(line[i])==44||int(line[i])==46)
{
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
if(tmp_str==word)
{
line.erase(start,end);
}
tmp_str="";
start=i+1;
end=i;
} else
{
tmp_str+=line[i];
end++;
}
}
if(tmp_str==word)
{
line.erase(start,end);
}
cout<<tmp_str<<" "<<start<<" "<<end<<endl; // compare
cout<<line<<endl;
// ==========================
}
ideone's link

You can write your function in the following way:
void RemoveWordFromLine(std::string &line, const std::string &word)
{
auto n = line.find(word);
if (n != std::string::npos)
{
line.erase(n, word.length());
}
}
And use it like:
std::string line("This is a wrong line");
RemoveWordFromLine(line, "wrong");
printf("line is: '%s'\n", line.c_str());
Prints out:
line is: 'This is a line'

You are passing end position instead of length of string to be deleted.
You just need to replace
if(tmp_str==word)
{
line.erase(start,end);
}
with
if(tmp_str==word)
{
line.erase(start,word.length()+1); //+1 to prevent 2 spaces
}
You could also do line.erase(start-1,word.length()+1); if you would like to keep '.' or ',' after deleted word.

With <regex>, you might do:
std::string RemoveWordFromLine(const std::string& line, std::string word)
{
// \s* : extra optional spaces at the start
// \b : word boundary
// Ideally, we should check that `word` doesn't break regex too:
std::regex reg("\\s*\\b" + word + "\\b");
return std::regex_replace(line, reg, "");
}
Demo

Remove multiple occurrences as well with this function
void removeWords(string& s, const string& p) {
size_t n = p.length();
for (size_t i = s.find(p);
i != string::npos;
i = s.find(p))
// take care of subword
if((i==0 && (i+n <= s.length()) && (s[i+n+1] == ' ' || s[i+ n+1] == '.'))|| //first word
(i + n <= s.length() && s[i - 1] == ' ' && (s[i+n]=='\0' ||s[i + n] == '.' || s[i+ n+1] == ' '))) //other word
s.erase(i, n);
}

C++ How to identify the number of words by spaces using a function

I need to make a function that identifies the number of words by spaces (Ignoring trailing and leading ones). I'm having trouble with the equation that recognizes a space within the string variable. It doesn't count a space like it should. I should be getting the number "2" with the test case I have.
#include<iostream>
#include<string>
using namespace std;
void numWords(string phrase)
{
for (int index = 0; index < phrase.length(); index++)
int words = 0;
int space = 0;
if (phrase[index] == ' ')
{
words++;
space++;
}
}
if (space == 0) //This is for single words with no spaces
{
words++;
}
cout << words << endl;
}
int main()
{
string phrase;
int words = 0;
numWords("hello, world");
//A test case
return 0;
}

Problems with your code / your approach:
You haven't initialized the number of words.
You count words and spaces the exact same way - but they're not the same.
You haven't properly defined what's a word:
Does the empty string contain a word?
If a space delimits words, is there an empty word between two consecutive spaces?
What about punctuation marks, do they count as parts of words?
What about non-printing characters?
So, there's little wonder your output doesn't satisfy your expectations.

check the solution of "dash-tom-bang" in :
C++ function to count all the words in a string
a Copy of it
#include <cctype>
int CountWords(const char* str)
{
if (str == NULL)
return error_condition; // let the requirements define this...
bool inSpaces = true;
int numWords = 0;
while (*str != NULL)
{
if (std::isspace(*str))
{
inSpaces = true;
}
else if (inSpaces)
{
numWords++;
inSpaces = false;
}
++str;
}
return numWords;
}

Split string path with space

I am writing a program that should receive 3 parameters by User: file_upload "local_path" "remote_path"
code example:
std::vector split(std::string str, char delimiter) {
std::vector<string> v;
std::stringstream src(str);
std::string buf;
while(getline(src, buf, delimiter)) {
v.push_back(buf);
}
return v;
}
void function() {
std::string input
getline(std::cin, input);
// user input like this: file_upload /home/Space Dir/file c:\dir\file
std::vector<std::string> v_input = split(input, ' ');
// the code will do something like this
if(v_input[0].compare("file_upload") == 0) {
FILE *file;
file = fopen(v_input[1].c_str(), "rb");
send_upload_dir(v_input[2].c_str());
// bla bla bla
}
}
My question is: the second and third parameter are directories, then they can contain spaces in name. How can i make the split function does not change the spaces of the second and third parameter?
I thought to put quotes in directories and make a function to recognize, but not work 100% because the program has other functions that take only 2 parameters not three. can anyone help?
EDIT: /home/user/Space Dir/file.out <-- path with space name.
If this happens the vector size is greater than expected, and the path to the directory will be broken.. this can not happen..
the vector will contain something like this:
vector[1] = /home/user/Space
vector[2] = Dir/file.out
and what I want is this:
vector[1] = /home/user/Space Dir/file.out

Since you need to accept three values from a single string input, this is a problem of encoding.
Encoding is sometimes done by imposing fixed-width requirements on some or all fields, but that's clearly not appropriate here, since we need to support variable-width file system paths, and the first value (which appears to be some kind of mode specifier) may be variable-width as well. So that's out.
This leaves 4 possible solutions for variable-width encoding:
1: Unambiguous delimiter.
If you can select a separator character that is guaranteed never to show up in the delimited values, then you can split on that. For example, if NUL is guaranteed never to be part of the mode value or the path values, then we can do this:
std::vector<std::string> v_input = split(input,'\0');
Or maybe the pipe character:
std::vector<std::string> v_input = split(input,'|');
Hence the input would have to be given like this (for the pipe character):
file_upload|/home/user/Space Dir/file.out|/home/user/Other Dir/blah
2: Escaping.
You can write the code to iterate through the input line and properly split it on unescaped instances of the separator character. Escaped instances will not be considered separators. You can parameterize the escape character. For example:
std::vector<std::string> escapedSplit(std::string str, char delimiter, char escaper ) {
std::vector<std::string> res;
std::string cur;
for (size_t i = 0; i < str.size(); ++i) {
if (str[i] == delimiter) {
res.push_back(cur);
cur.clear();
} else if (str[i] == escaper) {
++i;
if (i == str.size()) break;
cur.push_back(str[i]);
} else {
cur.push_back(str[i]);
} // end if
} // end for
if (!cur.empty()) res.push_back(cur);
return res;
} // end escapedSplit()
std::vector<std::string> v_input = escapedSplit(input,' ','\\');
With input as:
file_upload /home/user/Space\ Dir/file.out /home/user/Other\ Dir/blah
3: Quoting.
You can write the code to iterate through the input line and properly split it on unquoted instances of the separator character. Quoted instances will not be considered separators. You can parameterize the quote character.
A complication of this approach is that it is not possible to include the quote character itself inside a quoted extent unless you introduce an escaping mechanism, similar to solution #2. A common strategy is to allow repetition of the quote character to escape it. For example:
std::vector<std::string> quotedSplit(std::string str, char delimiter, char quoter ) {
std::vector<std::string> res;
std::string cur;
for (size_t i = 0; i < str.size(); ++i) {
if (str[i] == delimiter) {
res.push_back(cur);
cur.clear();
} else if (str[i] == quoter) {
++i;
for (; i < str.size(); ++i) {
if (str[i] == quoter) {
if (i+1 == str.size() || str[i+1] != quoter) break;
++i;
cur.push_back(quoter);
} else {
cur.push_back(str[i]);
} // end if
} // end for
} else {
cur.push_back(str[i]);
} // end if
} // end for
if (!cur.empty()) res.push_back(cur);
return res;
} // end quotedSplit()
std::vector<std::string> v_input = quotedSplit(input,' ','"');
With input as:
file_upload "/home/user/Space Dir/file.out" "/home/user/Other Dir/blah"
Or even just:
file_upload /home/user/Space" "Dir/file.out /home/user/Other" "Dir/blah
4: Length-value.
Finally, you can write the code to take a length before each value, and only grab that many characters. We could require a fixed-width length specifier, or skip a delimiting character following the length specifier. For example (note: light on error checking):
std::vector<std::string> lengthedSplit(std::string str) {
std::vector<std::string> res;
size_t i = 0;
while (i < str.size()) {
size_t len = std::atoi(str.c_str());
if (len == 0) break;
i += (size_t)std::log10(len)+2; // +1 to get base-10 digit count, +1 to skip delim
res.push_back(str.substr(i,len));
i += len;
} // end while
return res;
} // end lengthedSplit()
std::vector<std::string> v_input = lengthedSplit(input);
With input as:
11:file_upload29:/home/user/Space Dir/file.out25:/home/user/Other Dir/blah

I had similar problem few days ago and solve it like this:
First I've created a copy, Then replace the quoted strings in the copy with some padding to avoid white spaces, finally I split the original string according to the white space indexes from the copy.
Here is my full solution:
you may want to also remove the double quotes, trim the original string and so on:
#include <sstream>
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string padString(size_t len, char pad)
{
ostringstream ostr;
ostr.fill(pad);
ostr.width(len);
ostr<<"";
return ostr.str();
}
void splitArgs(const string& s, vector<string>& result)
{
size_t pos1=0,pos2=0,len;
string res = s;
pos1 = res.find_first_of("\"");
while(pos1 != string::npos && pos2 != string::npos){
pos2 = res.find_first_of("\"",pos1+1);
if(pos2 != string::npos ){
len = pos2-pos1+1;
res.replace(pos1,len,padString(len,'X'));
pos1 = res.find_first_of("\"");
}
}
pos1=res.find_first_not_of(" \t\r\n",0);
while(pos1 < s.length() && pos2 < s.length()){
pos2 = res.find_first_of(" \t\r\n",pos1+1);
if(pos2 == string::npos ){
pos2 = res.length();
}
len = pos2-pos1;
result.push_back(s.substr(pos1,len));
pos1 = res.find_first_not_of(" \t\r\n",pos2+1);
}
}
int main()
{
string s = "234 \"5678 91\" 8989";
vector<string> args;
splitArgs(s,args);
cout<<"original string:"<<s<<endl;
for(size_t i=0;i<args.size();i++)
cout<<"arg "<<i<<": "<<args[i]<<endl;
return 0;
}
and this is the output:
original string:234 "5678 91" 8989
arg 0: 234
arg 1: "5678 91"
arg 2: 8989

Remove Duplicate words (only if followed) from char array

I am a little bit stuck and cant find out what is wrong here.
I have an assignment to enter a sentence into char array and if there are duplicate and followed words(example : same same , diff diff. but not : same word same.) they should be removed.
here is the function I wrote:
void Same(char arr[], char temp[]){
int i = 0, j = 0, f = 0, *p, k = 0, counter = 0;
for (i = 0; i < strlen(arr); i++){
while (arr[i] != ' ' && i < strlen(arr)){
temp[k] = arr[i];
i++;
k++;
counter++;
}
temp[k] = '\0';
k = 0;
p = strstr((arr + i), (temp + j));
if (p != NULL && (*p == arr[i])){
for (f = 0; f < strlen(p); f++){
*p = '*';
p++;
}
f = 0;
}
j = counter;
}
}

strtok is a handy function to grab the next word from a list (strsep is a better one, but is less likely to be available on your system). Using strtok, an approach like the following might work, at least for simple examples...
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAXPHRASELEN 1000
#define MAXTOKLEN 100
int main(int argc, char ** argv)
{
// Here is the sentence we are looking at
char * tmp = "This is a test and and another test";
// We will copy it to this variable
char phrase[MAXPHRASELEN+1];
strcpy(phrase, tmp);
// And will put the altered text in this variable
char new_phrase[MAXPHRASELEN+1];
// This will be the last word we looked at
char * lasttok = malloc(MAXTOKLEN+1);
// This will be the current word
char * tok = malloc(MAXTOKLEN+1);
// Both words are initially empty
new_phrase[0] = '\0';
lasttok[0] = '\0';
// Get the first word
lasttok = strtok(phrase, " ");
// If there is a word...
if (lasttok != NULL) {
// Put it in the altered text and add a space
strcat(new_phrase, lasttok);
strcat(new_phrase, " ");
// As long as there is a next word
while ( (tok = strtok(NULL, " ")) != NULL ) {
// See if it is the same as the last word
if (strcmp(tok,lasttok) != 0) {
// If it isn't, copy it to the altered text
strcat(new_phrase, tok);
// and add a space
strcat(new_phrase, " ");
// The current word becomes the last word
lasttok = tok;
}
}
}
// Print the lot
printf("%s\n", new_phrase);
}
If you really must write your own routine for grabbing the individual words, you could do worse than emulate strtok. It maintains a pointer to the beginning of current word in the string and puts a null character at the next separator (space character). When called again, it just moves the pointer to the character past the null, and puts another null after the next separator. Most string functions, when passed the pointer, will see the null as the end of the string and so just deal with the current word.
Minus comments, headers, and initialisation, it looks less threatening...
lasttok = strtok(phrase, " ");
if (lasttok != NULL) {
strcat(new_phrase, lasttok);
strcat(new_phrase, " ");
while ( (tok = strtok(NULL, " ")) != NULL ) {
if (strcmp(tok,lasttok) != 0) {
strcat(new_phrase, tok);
strcat(new_phrase, " ");
lasttok = tok;
}
}
}
printf("%s\n", new_phrase);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Loop to split string by delimiter not respecting all deliminators - c++

Related

Checking both ends of a string not working

How to remove an entire word in a sentence c++

C++ How to identify the number of words by spaces using a function

Split string path with space

Remove Duplicate words (only if followed) from char array

Categories

Resources