Split string path with space

Split string path with space - c++

I am writing a program that should receive 3 parameters by User: file_upload "local_path" "remote_path"
code example:
std::vector split(std::string str, char delimiter) {
std::vector<string> v;
std::stringstream src(str);
std::string buf;
while(getline(src, buf, delimiter)) {
v.push_back(buf);
}
return v;
}
void function() {
std::string input
getline(std::cin, input);
// user input like this: file_upload /home/Space Dir/file c:\dir\file
std::vector<std::string> v_input = split(input, ' ');
// the code will do something like this
if(v_input[0].compare("file_upload") == 0) {
FILE *file;
file = fopen(v_input[1].c_str(), "rb");
send_upload_dir(v_input[2].c_str());
// bla bla bla
}
}
My question is: the second and third parameter are directories, then they can contain spaces in name. How can i make the split function does not change the spaces of the second and third parameter?
I thought to put quotes in directories and make a function to recognize, but not work 100% because the program has other functions that take only 2 parameters not three. can anyone help?
EDIT: /home/user/Space Dir/file.out <-- path with space name.
If this happens the vector size is greater than expected, and the path to the directory will be broken.. this can not happen..
the vector will contain something like this:
vector[1] = /home/user/Space
vector[2] = Dir/file.out
and what I want is this:
vector[1] = /home/user/Space Dir/file.out

Since you need to accept three values from a single string input, this is a problem of encoding.
Encoding is sometimes done by imposing fixed-width requirements on some or all fields, but that's clearly not appropriate here, since we need to support variable-width file system paths, and the first value (which appears to be some kind of mode specifier) may be variable-width as well. So that's out.
This leaves 4 possible solutions for variable-width encoding:
1: Unambiguous delimiter.
If you can select a separator character that is guaranteed never to show up in the delimited values, then you can split on that. For example, if NUL is guaranteed never to be part of the mode value or the path values, then we can do this:
std::vector<std::string> v_input = split(input,'\0');
Or maybe the pipe character:
std::vector<std::string> v_input = split(input,'|');
Hence the input would have to be given like this (for the pipe character):
file_upload|/home/user/Space Dir/file.out|/home/user/Other Dir/blah
2: Escaping.
You can write the code to iterate through the input line and properly split it on unescaped instances of the separator character. Escaped instances will not be considered separators. You can parameterize the escape character. For example:
std::vector<std::string> escapedSplit(std::string str, char delimiter, char escaper ) {
std::vector<std::string> res;
std::string cur;
for (size_t i = 0; i < str.size(); ++i) {
if (str[i] == delimiter) {
res.push_back(cur);
cur.clear();
} else if (str[i] == escaper) {
++i;
if (i == str.size()) break;
cur.push_back(str[i]);
} else {
cur.push_back(str[i]);
} // end if
} // end for
if (!cur.empty()) res.push_back(cur);
return res;
} // end escapedSplit()
std::vector<std::string> v_input = escapedSplit(input,' ','\\');
With input as:
file_upload /home/user/Space\ Dir/file.out /home/user/Other\ Dir/blah
3: Quoting.
You can write the code to iterate through the input line and properly split it on unquoted instances of the separator character. Quoted instances will not be considered separators. You can parameterize the quote character.
A complication of this approach is that it is not possible to include the quote character itself inside a quoted extent unless you introduce an escaping mechanism, similar to solution #2. A common strategy is to allow repetition of the quote character to escape it. For example:
std::vector<std::string> quotedSplit(std::string str, char delimiter, char quoter ) {
std::vector<std::string> res;
std::string cur;
for (size_t i = 0; i < str.size(); ++i) {
if (str[i] == delimiter) {
res.push_back(cur);
cur.clear();
} else if (str[i] == quoter) {
++i;
for (; i < str.size(); ++i) {
if (str[i] == quoter) {
if (i+1 == str.size() || str[i+1] != quoter) break;
++i;
cur.push_back(quoter);
} else {
cur.push_back(str[i]);
} // end if
} // end for
} else {
cur.push_back(str[i]);
} // end if
} // end for
if (!cur.empty()) res.push_back(cur);
return res;
} // end quotedSplit()
std::vector<std::string> v_input = quotedSplit(input,' ','"');
With input as:
file_upload "/home/user/Space Dir/file.out" "/home/user/Other Dir/blah"
Or even just:
file_upload /home/user/Space" "Dir/file.out /home/user/Other" "Dir/blah
4: Length-value.
Finally, you can write the code to take a length before each value, and only grab that many characters. We could require a fixed-width length specifier, or skip a delimiting character following the length specifier. For example (note: light on error checking):
std::vector<std::string> lengthedSplit(std::string str) {
std::vector<std::string> res;
size_t i = 0;
while (i < str.size()) {
size_t len = std::atoi(str.c_str());
if (len == 0) break;
i += (size_t)std::log10(len)+2; // +1 to get base-10 digit count, +1 to skip delim
res.push_back(str.substr(i,len));
i += len;
} // end while
return res;
} // end lengthedSplit()
std::vector<std::string> v_input = lengthedSplit(input);
With input as:
11:file_upload29:/home/user/Space Dir/file.out25:/home/user/Other Dir/blah

I had similar problem few days ago and solve it like this:
First I've created a copy, Then replace the quoted strings in the copy with some padding to avoid white spaces, finally I split the original string according to the white space indexes from the copy.
Here is my full solution:
you may want to also remove the double quotes, trim the original string and so on:
#include <sstream>
#include<iostream>
#include<vector>
#include<string>
using namespace std;
string padString(size_t len, char pad)
{
ostringstream ostr;
ostr.fill(pad);
ostr.width(len);
ostr<<"";
return ostr.str();
}
void splitArgs(const string& s, vector<string>& result)
{
size_t pos1=0,pos2=0,len;
string res = s;
pos1 = res.find_first_of("\"");
while(pos1 != string::npos && pos2 != string::npos){
pos2 = res.find_first_of("\"",pos1+1);
if(pos2 != string::npos ){
len = pos2-pos1+1;
res.replace(pos1,len,padString(len,'X'));
pos1 = res.find_first_of("\"");
}
}
pos1=res.find_first_not_of(" \t\r\n",0);
while(pos1 < s.length() && pos2 < s.length()){
pos2 = res.find_first_of(" \t\r\n",pos1+1);
if(pos2 == string::npos ){
pos2 = res.length();
}
len = pos2-pos1;
result.push_back(s.substr(pos1,len));
pos1 = res.find_first_not_of(" \t\r\n",pos2+1);
}
}
int main()
{
string s = "234 \"5678 91\" 8989";
vector<string> args;
splitArgs(s,args);
cout<<"original string:"<<s<<endl;
for(size_t i=0;i<args.size();i++)
cout<<"arg "<<i<<": "<<args[i]<<endl;
return 0;
}
and this is the output:
original string:234 "5678 91" 8989
arg 0: 234
arg 1: "5678 91"
arg 2: 8989

Related

Word counter returning incorrect number of words

I've been trying to create a program that reads text from a file and stores it in a string. I feed the string to a function that counts every word in the string.
However its only accurate assuming the user leaves some whitespace at the end of a line and doesn't creates blank lines.... not a very good word counter.
Creating a blank line results in a false increment to the word count.
I'm not sure if my main problem is using a boolean to do this or checking for whitespace and '\n' characters.
bool countingLetters = false;
int wordCount = 0;
for (int i = 0; i < text.length(); i++)
{
if (text[i] == ' ' && countingLetters == true)
{
countingLetters = false;
wordCount++;
}
if (text[i] != ' ' && countingLetters == false)
{
countingLetters = true;
}
if (text[i] == '\n' && countingLetters == true)
{
countingLetters = false;
wordCount++;
}
}

Your code is basically a state machine. To complete your solution, just count in the string ending.
Add this to the end of your code:
if(countingLetters) { // word at the end of string, without any space charactor
wordCount++;
}
Or if you can be sure it's C-style string, like std::string, you can just index 1 pass the last charactor, and handle '\0'in same way of space and '\n' .
To improve your code, use isspace (and this covers more space charactor, including '\t', etc.). And better to use else if pattern. Also, it's not good pratice to ==true. Just use boolean as condition.
Or maybe, isalpha(c) fits more to your need.
bool countingLetters = false;
int wordCount = 0;
for (char c:text) {
if (!isalpha(c) && countingLetters) { // this also works for newline
countingLetters = false;
++wordCount;
} else if (isalpha(c) && !countingLetters) {
countingLetters = true;
} // otherwise just skip
}
if(countingLetters) { // word at the end of string, without any space charactor
++wordCount;
}
And it's not acceptable to insert extra charactor just for such a simple task. For example, text may be const.

An alternative is to count the beginning of a "word".
Let us say the beginning of a word is a letter after a non-letter. We can adjust this if desired.
int wordCount = 0;
int prior = '\n'; // some non-letter
for (int i = 0; i < text.length(); i++) {
if (isalpha(text[i]) && !isalpha(prior)) {
wordCount++;
}
prior = text[i];
}

C++ also provides some very high-level ways to do this.
One is by using a loop over a stringstream, which splits text on whitespace:
#include <sstream>
#include <string>
std::size_t count_words( const std::string& s )
{
std::size_t count = 0;
std::istringstream ss( s );
std::string t;
while (ss >> t) count += 1;
return count;
}
Another is using a stream iterator algorithm:
#include <iterator>
#include <sstream>
#include <string>
std::size_t count_words( const std::string& s )
{
std::istringstream ss( s );
return std::distance(
std::istream_iterator <std::string> ( ss ),
std::istream_iterator <std::string> ()
);
}
Yet another is using a regular expression:
#include <iterator>
#include <regex>
#include <string>
std::size_t count_words( const std::string& s )
{
std::regex re( "\\w+" );
return std::distance(
std::sregex_iterator( s.begin(), s.end(), re ),
std::sregex_iterator()
);
}
I’m sure there are many more, but those three are the ones that come off the top of my head.

Function to separate each word from a string and put them into a vector, without using auto keyword?

I'm really stuck here. So I can't edit the main function, and inside it there is a function call with the only parameter being the string. How can I make this function put each word from the string into a vector, without using the auto keyword? I realize that this code is probably really wrong but its my best attempt at what it should look like.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
vector<string> extract_words(const char * sentence[])
{
string word = "";
vector<string> list;
for (int i = 0; i < sentence.size(); ++i)
{
while (sentence[i] != ' ')
{
word = word + sentence[i];
}
list.push_back(word);
}
}
int main()
{
sentence = "Help me please" /*In the actual code a function call is here that gets input sentence.*/
if (sentence.length() > 0)
{
words = extract_words(sentence);
}
}

Do you know how to read "words" from std::cin?
Then you can put that string in a std::istringstream which works like std::cin but for "reading" strings instead.
Use the stream extract operator >> in a loop to get all the words one by one, and add them to the vector.
Perhaps something like:
std::vector<std::string> get_all_words(std::string const& string)
{
std::vector<std::string> words;
std::istringstream in(string);
std::string word;
while (in >> word)
{
words.push_back(word);
}
return words;
}
With a little more knowledge of C++ and its standard classes and functions, you can actually make the function a lot shorter:
std::vector<std::string> get_all_words(std::string const& string)
{
std::istringstream in(string);
return std::vector<std::string>(std::istream_iterator<std::string>(in),
std::istream_iterator<std::string>());
}

I recommend making the argument to the function a const std::string& instead of const char * sentence[]. A std::string has many member functions, like find_first_of, find_first_not_of and substr and more that could help a lot.
Here's an example using those mentioned:
std::vector<std::string> extract_words(const std::string& sentence)
{
/* Control char's, "whitespaces", that we don't want in our words:
\a audible bell
\b backspace
\f form feed
\n line feed
\r carriage return
\t horizontal tab
\v vertical tab
*/
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
while(true)
{
// Skip whitespaces by finding the first non-whitespace, starting at
// "begin":
begin = sentence.find_first_not_of(whitespaces, begin);
// If no non-whitespace char was found, break out:
if(begin == std::string::npos) break;
// Search for a whitespace starting at "begin + 1":
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
// Store the result by creating a substring from "begin" with the
// length "end - begin":
list.push_back(sentence.substr(begin, end - begin));
// If no whitespace was found, break out:
if(end == std::string::npos) break;
// Set "begin" to the char after the found whitespace before the loop
// makes another lap:
begin = end + 1;
}
return list;
}
Demo
With the added restriction "no breaks", this could be a variant. It does exactly the same as the above, but without using break:
std::vector<std::string> extract_words(const std::string& sentence)
{
static const char whitespaces[] = " \t\n\r\a\b\f\v";
std::vector<std::string> list;
std::size_t begin = 0;
bool loop = true;
while(loop)
{
begin = sentence.find_first_not_of(whitespaces, begin);
if(begin == std::string::npos) {
loop = false;
} else {
std::size_t end = sentence.find_first_of(whitespaces, begin + 1);
list.push_back(sentence.substr(begin, end - begin));
if(end == std::string::npos) {
loop = false;
} else {
begin = end + 1;
}
}
}
return list;
}

How do I parse comma-delimited string in C++ with some elements being quoted with commas?

I have a comma-delimited string that I want to store in a string vector. The string and vectors are:
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v;
Ideally I want the strings 'abc' and 'test, 1' to be stored without the single quotes as below, but I can live with storing them with single quotes:
v[0] = "1";
v[1] = "10";
v[2] = "abc";
v[3] = "test, 1";

bool nextToken(const string &s, string::size_type &start, string &token)
{
token.clear();
start = s.find_first_not_of(" \t", start);
if (start == string::npos)
return false;
string::size_type end;
if (s[start] == '\'')
{
++start;
end = s.find('\'', start);
}
else
end = s.find_first_of(" \t,", start);
if (end == string::npos)
{
token = s.substr(start);
start = s.size();
}
else
{
token = s.substr(start, end-start);
if ((s[end] != ',') && ((end = s.find(',', end + 1)) == string::npos))
start = s.size();
else
start = end + 1;
}
return true;
}
string s = "1, 10, 'abc', 'test, 1'", token;
vector<string> v;
string::size_type start = 0;
while (nextToken(s, start, token))
v.push_back(token);
Demo

What you need to do here, is make yourself a parser that parses as you want it to. Here I have made a parsing function for you:
#include <string>
#include <vector>
using namespace std;
vector<string> parse_string(string master) {
char temp; //the current character
bool encountered = false; //for checking if there is a single quote
string curr_parse; //the current string
vector<string>result; //the return vector
for (int i = 0; i < master.size(); ++i) { //while still in the string
temp = master[i]; //current character
switch (temp) { //switch depending on the character
case '\'': //if the character is a single quote
if (encountered) encountered = false; //if we already found a single quote, reset encountered
else encountered = true; //if we haven't found a single quote, set encountered to true
[[fallthrough]];
case ',': //if it is a comma
if (!encountered) { //if we have not found a single quote
result.push_back(curr_parse); //put our current string into our vector
curr_parse = ""; //reset the current string
break; //go to next character
}//if we did find a single quote, go to the default, and push_back the comma
[[fallthrough]];
default: //if it is a normal character
if (encountered && isspace(temp)) curr_parse.push_back(temp); //if we have found a single quote put the whitespace, we don't care
else if (isspace(temp)) break; //if we haven't found a single quote, trash the whitespace and go to the next character
else if (temp == '\'') break; //if the current character is a single quote, trash it and go to the next character.
else curr_parse.push_back(temp); //if all of the above failed, put the character into the current string
break; //go to the next character
}
}
for (int i = 0; i < result.size(); ++i) {
if (result[i] == "") result.erase(result.begin() + i);
//check that there are no empty strings in the vector
//if there are, delete them
}
return result;
}
This parses your string as you want it to, and returns a vector. Then, you can use it in your program:
#include <iostream>
int main() {
string s = "1, 10, 'abc', 'test, 1'";
vector<string> v = parse_string(s);
for (int i = 0; i < v.size(); ++i) {
cout << v[i] << endl;
}
}
and it properly prints out:
1
10
abc
test, 1

A proper solution would require a parser implementation. If you need a quick hack, just write a cell reading function (demo). The c++14's std::quoted manipulator is of great help here. The only problem is the manipulator requires a stream. This is easily solved with istringstream - see the second function. Note that the format of your string is CELL COMMA CELL COMMA... CELL.
istream& get_cell(istream& is, string& s)
{
char c;
is >> c; // skips ws
is.unget(); // puts back in the stream the last read character
if (c == '\'')
return is >> quoted(s, '\'', '\\'); // the first character of the cell is ' - read quoted
else
return getline(is, s, ','), is.unget(); // read unqoted, but put back comma - we need it later, in get function
}
vector<string> get(const string& s)
{
istringstream iss{ s };
string cell;
vector<string> r;
while (get_cell(iss, cell))
{
r.push_back( cell );
char comma;
iss >> comma; // expect a cell separator
if (comma != ',')
break; // cell separator not found; we are at the end of stream/string - break the loop
}
if (char c; iss >> c) // we reached the end of what we understand - probe the end of stream
throw "ill formed";
return r;
}
And this is how you use it:
int main()
{
string s = "1, 10, 'abc', 'test, 1'";
try
{
auto v = get(s);
}
catch (const char* e)
{
cout << e;
}
}

C++ separate string by selected commas

I was reading the following question Parsing a comma-delimited std::string on how to split a string by a comma (Someone gave me the link from my previous question) and one of the answers was:
stringstream ss( "1,1,1,1, or something else ,1,1,1,0" );
vector<string> result;
while( ss.good() )
{
string substr;
getline( ss, substr, ',' );
result.push_back( substr );
}
But what if my string was like the following, and I wanted to separate values only by the bold commas and ignoring what appears inside <>?
<a,b>,<c,d>,,<d,l>,
I want to get:
<a,b>
<c,d>
"" //Empty string
<d,l>
""
Given:<a,b>,,<c,d> It should return: <a,b> and "" and <c,d>
Given:<a,b>,<c,d> It should return:<a,b> and <c,d>
Given:<a,b>, It should return:<a,b> and ""
Given:<a,b>,,,<c,d> It should return:<a,b> and "" and "" and <c,d>
In other words, my program should behave just like the given solution above separated by , (Supposing there is no other , except the bold ones)
Here are some suggested solution and their problems:
Delete all bold commas: This will result in treating the following 2 inputs the same way while they shouldn't
<a,b>,<c,d>
<a,b>,,<c,d>
Replace all bold commas with some char and use the above algorithm: I can't select some char to replace the commas with since any value could appear in the rest of my string

Adding to #Carlos' answer, apart from regex (take a look at my comment); you can implement the substitution like the following (Here, I actually build a new string):
#include <algorithm>
#include <iostream>
#include <string>
int main() {
std::string str;
getline(std::cin,str);
std::string str_builder;
for (auto it = str.begin(); it != str.end(); it++) {
static bool flag = false;
if (*it == '<') {
flag = true;
}
else if (*it == '>') {
flag = false;
str_builder += *it;
}
if (flag) {
str_builder += *it;
}
}
}

Why not replace one set of commas with some known-to-not-clash character, then split it by the other commas, then reverse the replacement?
So replace the commas that are inside the <> with something, do the string split, replace again.

I think what you want is something like this:
vector<string> result;
string s = "<a,b>,,<c,d>"
int in_string = 0;
int latest_comma = 0;
for (int i = 0; i < s.size(); i++) {
if(s[i] == '<'){
result.push_back(s[i]);
in_string = 1;
latest_comma = 0;
}
else if(s[i] == '>'){
result.push_back(s[i]);
in_string = 0;
}
else if(!in_string && s[i] == ','){
if(latest_comma == 1)
result.push_back('\n');
else
latest_comma = 1;
}
else
result.push_back(s[i]);
}

Here is a possible code that scans a string one char at a time and splits it on commas (',') unless they are masked between brackets ('<' and '>').
Algo:
assume starting outside brackets
loop for each character:
if not a comma, or if inside brackets
store the character in the current item
if a < bracket: note that we are inside brackets
if a > bracket: note that we are outside brackets
else (an unmasked comma)
store the current item as a string into the resulting vector
clear the current item
store the last item into the resulting vector
Only 10 lines and my rubber duck agreed that it should work...
C++ implementation: I will use a vector to handle the current item because it is easier to build it one character at a time
std::vector<std::string> parse(const std::string& str) {
std::vector<std::string> result;
bool masked = false;
std::vector<char> current; // stores chars of the current item
for (const char c : str) {
if (masked || (c != ',')) {
current.push_back(c);
switch (c) {
case '<': masked = true; break;
case '>': masked = false;
}
}
else { // unmasked comma: store item and prepare next
current.push_back('\0'); // a terminating null for the vector data
result.push_back(std::string(&current[0]));
current.clear();
}
}
// do not forget the last item...
current.push_back('\0');
result.push_back(std::string(&current[0]));
return result;
}
I tested it with all your example strings and it gives the expected results.

Seems quite straight forward to me.
vector<string> customSplit(string s)
{
vector<string> results;
int level = 0;
std::stringstream ss;
for (char c : s)
{
switch (c)
{
case ',':
if (level == 0)
{
results.push_back(ss.str());
stringstream temp;
ss.swap(temp); // Clear ss for the new string.
}
else
{
ss << c;
}
break;
case '<':
level += 2;
case '>':
level -= 1;
default:
ss << c;
}
}
results.push_back(ss.str());
return results;
}

string::replace not working correctly 100% of the time?

I'm trying to replace every space character with '%20' in a string, and I'm thinking of using the built in replace function for the string class.
Currently, I have:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = 0; i < len; i++) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}
When I pass in the string "_a_b_c_e_f_g__", where the underscores represent space, my output is "%20a%20b%20c%20e_f_g__". Again, underscores represent space.
Why is that the spaces near the beginning of the string are replaced, but the spaces towards the end aren't?

You are making s longer with each replacement, but you are not updating len which is used in the loop condition.

Modifying the string that you are just scanning is like cutting the branch under your feet. It may work if you are careful, but in this case you aren't.
Namely, you take the string len at the beginning but with each replacement your string gets longer and you are pushing the replacement places further away (so you never reach all of them).
The correct way to cut this branch is from its end (tip) towards the trunk - this way you always have a safe footing:
void replaceSpace(string& s)
{
int len = s.length();
string str = "%20";
for(int i = len - 1; i >= 0; i--) {
if(s[i] == ' ') {
s.replace(i, 1, str);
}
}
}

You're growing the string but only looping to its initial size.
Looping over a collection while modifying it is very prone to error.
Here's a solution that doesn't:
void replace(string& s)
{
string s1;
std::for_each(s.begin(),
s.end(),
[&](char c) {
if (c == ' ') s1 += "%20";
else s1 += c;
});
s.swap(s1);
}

As others have already mentioned, the problem is you're using the initial string length in your loop, but the string gets bigger along the way. Your loop never reaches the end of the string.
You have a number of ways to fix this. You can correct your solution and make sure you go to the end of the string as it is now, not as it was before you started looping.
Or you can use #molbdnilo 's way, which creates a copy of the string along the way.
Or you can use something like this:
std::string input = " a b c e f g ";
std::string::size_type pos = 0;
while ((pos = input.find(' ', pos)) != std::string::npos)
{
input.replace(pos, 1, "%20");
}

Here's a function that can make it easier for you:
string replace_char_str(string str, string find_str, string replace_str)
{
size_t pos = 0;
for ( pos = str.find(find_str); pos != std::string::npos; pos = str.find(find_str,pos) )
{
str.replace(pos ,1, replace_str);
}
return str;
}
So if when you want to replace the spaces, try it like this:
string new_str = replace_char_str(yourstring, " ", "%20");
Hope this helps you ! :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Split string path with space - c++

Related

Word counter returning incorrect number of words

Function to separate each word from a string and put them into a vector, without using auto keyword?

How do I parse comma-delimited string in C++ with some elements being quoted with commas?

C++ separate string by selected commas

string::replace not working correctly 100% of the time?

Categories

Resources