wordexp and strings with spaces

wordexp and strings with spaces - c++

I am trying to expand variables in a string that contains a unix file path. For example the string is:
std::string path = "$HOME/Folder With Two Spaces Next To Each Other".
This is my code for wordexp I use:
#include <wordexp.h>
#include <string>
#include <iostream>
std::string env_subst(const std::string &path)
{
std::string result = "";
wordexp_t p;
if (!::wordexp(path.c_str(), &p, 0))
{
if (p.we_wordc >= 1)
{
result = std::string(p.we_wordv[0]);
for (uint32_t i = 1; i < p.we_wordc; ++i)
{
result += " " + std::string(p.we_wordv[i]);
}
}
::wordfree(&p);
return result;
}
else
{
// Illegal chars found
return path;
}
}
int main()
{
std::string teststring = "$HOME/Folder With Two Spaces Next To Each Other";
std::string result = env_subst(teststring);
std::cout << "Result: " << result << std::endl;
return 0;
}
The output is:
Result: /home/nidhoegger/Folder With Two Spaces Next To Each Other
You see, that whereas there were two spaces between the words in the input, there is now only a single space.
Is there an easy way to fix that?

The reason your code removes the double spaces in your path is because your for loop adds only a single space after every word, regardless of the actual number of spaces. A possible solution to this problem would be to locate all the spaces in your path string beforehand, and then add them in. For example, you could use something like this:
std::string spaces[p.we_wordc];
uint32_t pos = path.find(" ", 0);
uint32_t j=0;
while(pos!=std::string::npos){
while(path.at(pos)==' '){
spaces[j]+=" ";
pos++;
}
pos=path.find(" ", pos+1);
j++;
}
to iterate through your path using std::string::find and store the spaces in a string array. Then, you could modify the line in your for loop to
result += spaces[i-1] + std::string(p.we_wordv[i]);
to add in the appropriate number of spaces.

If you want to preserve spaces in an unusually named file, enclose it in braces: std::string teststring = "\"~/filename with spaces\"";. But it doesn't make sense to note how many spaces there were in an original string, since you'll have to skip pairs of " and basically redo what wordexp() does. Leaving multiple spaces in a command doesn't make much sense: ls -al is exactly the same as ls -al, so the trimming is justified. And the OP's code is perfectly valid -- no need to add anything else.
P.S. Decided to add this as a note because I fell in the same pit as OP.

Related

Sub-strings and delimiters

I'm trying to get a sentence delimited by certain characters (either a space, comma, or a dot) to check if it's a palindrome. If the input is "hello,potato.", I'll check this symmetry on "hello" alone and then potato alone.
The problem is, while I'm doing the first iteration of the loop that searches for the delimiter, the word "hello" is stored in the sub-sentence, but on the second iteration the word that should be stored as "potato" will be "potato.". And I am unable to remove the "." delimiter from the end of the input string.
for(int i=0;i<sentence.length();i++)
{
if(sentence[i]==' '||sentence[i]=='.'||sentence[i]==',')
{ //the couts is just to help me debug/trace
cout<<"i is now : "<<i<<endl;
if(i==delindex && i==sentence.length()-1)
{
subsentence=sentence.substr(temp+1,subsentence.length()-1);
}
else
{
subsentence=sentence.substr(delindex,i);
cout<<subsentence<<endl;
temp=delindex-1;
delindex=i+1;
}
}
}
What would be the best way to go about this?

god bless you man that strtok is what i have been looking for
Actually, you don't need strtok (and should probably avoid it for various safety reasons), as std::string has a wonderful method called find_first_of which acts pretty much like strtok, as in it accepts a bunch of chars and returns index when it stumbles on any of the chars. However to make robust tokenizer a combination of find_first_of and find_first_not_of is more suitable in this case.
Therefore you could simplify your token searching to:
#include <iostream>
#include <string>
int main()
{
std::string sentence = "hello,potato tomato.";
std::string delims = " .,";
size_t beg, pos = 0;
while ((beg = sentence.find_first_not_of(delims, pos)) != std::string::npos)
{
pos = sentence.find_first_of(delims, beg + 1);
std::cout << sentence.substr(beg, pos - beg) << std::endl;
}
}
https://ideone.com/rhMyvG

How to count how many words are in line?Smarter way?

How to find out how many words are in line? I now that method where you count how many there are spaces. But what if someone hit 2 spaces or start line with space.
Is there any other or smarter way to solve this?
And is there any remark on my way of solving it or my code?
I solved it like this:
#include <iostream>
#include <cctype>
#include <cstring>
using namespace std;
int main( )
{
char str[80];
cout << "Enter a string: ";
cin.getline(str,80);
int len;
len=strlen(str);
int words = 0;
for(int i = 0; str[i] != '\0'; i++) //is space after character
{
if (isalpha(str[i]))
{
if(isspace(str[i+1]))
words++;
}
}
if(isalpha(str[len]))
{
words++;
}
cout << "The number of words = " << words+1 << endl;
return 0;
}

The std one-liner is:
words= distance(istream_iterator<string>(istringstream(str)), istream_iterator<string>());

streams by default skip spaces (multiple also).
So if you do something like:
string word;
int numWords = 0;
while (cin >> word) ++numWords;
That should count the number of words for simple cases (not considering what the format of a word is, skipping spaces).
If you want per line, you could read first the line, create a stream from a string, and do a similar thing like this:
string line, word;
int wordCount = 0;
getline(cin, line);
stringstream lineStream(line);
while (lineStream >> word) ++wordCount;
You should not use cin.getline and should prefer the free function std::getline, which takes a string that can be grown up and prevents stack overflows (lol). Stick to the free function for better safety.

First, you need a very specific definition of "word." Most of the answers will give slightly different counts than your attempt because you're using different definitions of what constitutes a word. Your example specifically requires alpha characters in certain positions. The answers based on streams will allow any non-space character to be part of a word.
The general solution is to come up with a precise definition of a word, transform this into a regular expression or finite state machine, and then count each instance of a match.
Here's a sample state machine solution:
std::size_t CountWords(const std::string &line) {
std::size_t count = 0;
enum { between_words, in_word } state = between_words;
for (const auto c : line) {
switch (state) {
case between_words:
if (std::isalpha(c)) {
state = in_word;
++count;
}
break;
case in_word:
if (std::isspace(c)) state = between_words;
break;
}
}
return count;
}
Some test cases to consider (and that highlight the differences among the definitions of a word):
"" empty string
" " just spaces
"a"
" one "
"count two"
"hyphenated-word"
"\"That's Crazy!\" she said." punctuation between alpha characters and adjacent spaces
"the answer is 42" should the number count as a word?

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.

This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.

You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.

I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

Struggling with a textcounter

I tried to write myself a textcounter which tells me how many characters and words are in a piece of text. Every time I try to paste in a long piece of text for it to count, it will crash or display something random.
Does anyone have any suggestions?
This is what I have written:
#include <iostream>
#include <string>
using namespace std;
int main()
{
cout << "Text counter\nPlease insert text.\n";
string text = "";
getline(cin, text);
double countTotal = text.size();
cout << "Total characters: " << countTotal << "\n";
int wordCount = 1;
for (int chrSearch = 0; chrSearch < (int)text.size(); chrSearch++)
{
char chr = text.at(chrSearch);
if(chr == ' ')
{
wordCount++;
}
}
cout << "Total words: " << wordCount << "\n";
return 0;
}

First of all, the code reads at most one line: std::getline(std::cin, line) stops reading upon receiving the first newline. You can specify a character where to stop, e.g, the character '\0' is unlikely to be present in typical text. For example, you could use:
std::string text;
if (std::getline(std::cin, text, '\0')) {
// do something with the read text
}
You should also always check that input was successful. While the above would work with short texts, when the texts become large it makes more sense to read them one line at a time and eventually reading a line will fail when the end of the stream is reached.
In case you don't like the approach of reading everything up to a null character, you could read the entire stream using code like this:
std::istreambuf_iterator<char> it(std::cin), end;
std::string text(it, end);
if (!text.empty()) {
// do something with the read text
}
A few notes on the other parts of the code:
Don't use double where you mean to use an integer. You may want to use a bigger integer, e.g., unsigned long or unsigned long long but double is for floating point values.
When iterating through a sequence you should either use an unsigned integer type when dealing with indices, e.g., unsigned int or std::size_t. This way there is no need to cast the size(). Preferably you'd use iterators:
for (auto it(text.begin()), end(text.end()); it != end; ++it) {
char chr(*it);
// ...
}
or
for (char chr: text) {
// ...
}
Note that your word count is wrong if there are two consecutive spaces. Also, if you don't break your text using line breaks, you need to use '\n' as an additional whitespace character separating words. If you want to consider all spaces, you should actually use something like this to determine if a character is a space:
if (std::isspace(static_cast<unsigned char>(chr)) { ... }
The static_cast<unsigned char>(chr) is needed because char tends to be signed and using a negative value with std::isspace() results in undefined behavior. Casting the character to unsigned char avoids any problems. Note that negative characters are not entirely uncommon: for example, the second character of my last name (the u-umlaut 'ü') normally result in a negative char, e.g., when UTF-8 or ISO-Latin-1 encoding is used.

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Kind of looking for affirmation here. I have some hand-written code, which I'm not shy to say I'm proud of, which reads a file, removes leading whitespace, processes newline escapes '\' and removes comments starting with #. It also removes all empty lines (also whitespace-only ones). Any thoughts/recommendations? I could probably replace some std::cout's with std::runtime_errors... but that's not a priority here :)
const int RecipeReader::readRecipe()
{
ifstream is_recipe(s_buffer.c_str());
if (!is_recipe)
cout << "unable to open file" << endl;
while (getline(is_recipe, s_buffer))
{
// whitespace+comment
removeLeadingWhitespace(s_buffer);
processComment(s_buffer);
// newline escapes + append all subsequent lines with '\'
processNewlineEscapes(s_buffer, is_recipe);
// store the real text line
if (!s_buffer.empty())
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}
is_recipe.close();
return 0;
}
void RecipeReader::processNewlineEscapes(string &s_string, ifstream &is_stream)
{
string s_temp;
size_t sz_index = s_string.find_first_of("\\");
while (sz_index <= s_string.length())
{
if (getline(is_stream,s_temp))
{
removeLeadingWhitespace(s_temp);
processComment(s_temp);
s_string = s_string.substr(0,sz_index-1) + " " + s_temp;
}
else
cout << "Error: newline escape '\' found at EOF" << endl;
sz_index = s_string.find_first_of("\\");
}
}
void RecipeReader::processComment(string &s_string)
{
size_t sz_index = s_string.find_first_of("#");
s_string = s_string.substr(0,sz_index);
}
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
const size_t sz_length = s_string.size();
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index <= sz_length)
s_string = s_string.substr(sz_index);
else if ((sz_index > sz_length) && (sz_length != 0)) // "empty" lines with only whitespace
s_string.clear();
}
Some extra info: the first s_buffer passed to the ifstream contains the filename, std::string s_buffer is a class data member, so is std::vector v_s_recipe. Any comment is welcome :)
UPDATE: for the sake of not being ungrateful, here is my replacement, all-in-one function that does what I want for now (future holds: parenthesis, maybe quotes...):
void readRecipe(const std::string &filename)
{
string buffer;
string line;
size_t index;
ifstream file(filename.c_str());
if (!file)
throw runtime_error("Unable to open file.");
while (getline(file, line))
{
// whitespace removal
line.erase(0, line.find_first_not_of(" \t\r\n\v\f"));
// comment removal TODO: store these for later output
index = line.find_first_of("#");
if (index != string::npos)
line.erase(index, string::npos);
// ignore empty buffer
if (line.empty())
continue;
// process newline escapes
index = line.find_first_of("\\");
if (index != string::npos)
{
line.erase(index,string::npos); // ignore everything after '\'
buffer += line;
continue; // read next line
}
else // no newline escapes found
{
buffer += line;
recipe.push_back(buffer);
buffer.clear();
}
}
}

Definitely ditch the hungarian notation.

It's not bad, but I think you're thinking of std::basic_string<T> too much as a string and not enough as an STL container. For example:
void RecipeReader::removeLeadingWhitespace(string &s_string)
{
s_string.erase(s_string.begin(),
std::find_if(s_string.begin(), s_string.end(), std::not1(isspace)));
}

A few comments:
As another answer (+1 from me) said - ditch the hungarian notation. It really doesn't do anything but add unimportant trash to every line. In addition, ifstream yielding an is_ prefix is ugly. is_ usually indicates a boolean.
Naming a function with processXXX gives very very little information on what it is actually doing. Use removeXXX, like you did with the RemoveLeadingWhitespace function.
The processComment function does an unnecessary copy and assignment. Use s.erase(index, string::npos); (npos is default, but this is more obvious).
It's not clear what your program does, but you might consider storing a different file format (like html or xml) if you need to post-process your files like this. That would depend on the goal.
using find_first_of('#') may give you some false positives. If it's present in quotes, it's not necessarily indicating a comment. (But again, this depends on your file format)
using find_first_of(c) with one character can be simplified to find(c).
The processNewlineEscapes function duplicates some functionality from the readRecipe function. You may consider refactoring to something like this:
-
string s_buffer;
string s_line;
while (getline(is_recipe, s_line)) {
// Sanitize the raw line.
removeLeadingWhitespace(s_line);
removeComments(s_line);
// Skip empty lines.
if (s_line.empty()) continue;
// Add the raw line to the buffer.
s_buffer += s_line;
// Collect buffer across all escaped lines.
if (*s_line.rbegin() == '\\') continue;
// This line is not escaped, now I can process the buffer.
v_s_recipe.push_back(s_buffer);
s_buffer.clear();
}

I'm not big on methods that modify the parameters. Why not return strings rather than modifying the input arguments? For example:
string RecipeReader::processComment(const string &s)
{
size_t index = s.find_first_of("#");
return s_string.substr(0, index);
}
I personally feel this clarifies intent and makes it more obvious what the method is doing.

I'd consider replacing all your processing code (almost everything you've written) with boost::regex code.

A few comments:
If s_buffer contains the file name to be opened, it should have a better name like s_filename.
The s_buffer member variable should not be reused to store temporary data from reading the file. A local variable in the function would do as well, no need for the buffer to be a member variable.
If there is not need to have the filename stored as a member variable it could just be passed as a parameter to readRecipe()
processNewlineEscapes() should check that the found backslash is at the end of the line before appending the next line. At the moment any backslash at any position triggers adding of the next line at the position of the backslash. Also, if there are several backslashes, find_last_of() would probably easier to use than find_first_of().
When checking the result of find_first_of() in processNewlineEscapes() and removeLeadingWhitespace() it would be cleaner to compare against string::npos to check if anything was found.
The logic at the end of removeLeadingWhitespace() could be simplified:
size_t sz_index = s_string.find_first_not_of(" \t");
if (sz_index != s_string.npos)
s_string = s_string.substr(sz_index);
else // "empty" lines with only whitespace
s_string.clear();

You might wish to have a look at Boost.String. It's a simple collection of algorithms to work with streams, and notably features trim methods :)
Now, on to the review itself:
Don't bother to remove the hungarian notation, if it's your style then use it, however you should try and improve the names of methods and variables. processXXX is definitely not indicating anything useful...
Functionally, I am worried about your assumptions: the main issue here is that you do not care for espace sequences (\n uses a backslash for example) and you do not worry for the presence of strings of charachters: std::cout << "Process #" << pid << std::endl; would yield an invalid line because of your "comment" preprocessing
Furthermore, since you remove the comments before processing the newline escapes:
i = 3; # comment \
running comment
will be parsed as
i = 3; running comment
which is syntactically incorrect.
From an interface point of view: there is not benefit in having the methods being class members here, you don't need an instance of RecipeReader really...
And finally, I find it awkward that two methods would read from the stream.
Little peeve of mine: returning by const value does not serve any purpose.
Here is my own version, as I believe than showing is easier than discussing:
// header file
std::vector<std::string> readRecipe(const std::string& fileName);
std::string extractLine(std::ifstream& file);
std::pair<std:string,bool> removeNewlineEscape(const std::string& line);
std::string removeComment(const std::string& line);
// source file
#include <boost/algorithm/string.hpp>
std::vector<std::string> readRecipe(const std::string& fileName)
{
std::vector<std::string> result;
ifstream file(fileName.c_str());
if (!file) std::cout << "Could not open: " << fileName << std::endl;
std::string line = extractLine(file);
while(!line.empty())
{
result.push_back(line);
line = extractLine(file);
} // looping on the lines
return result;
} // readRecipe
std::string extractLine(std::ifstream& file)
{
std::string line, buffer;
while(getline(file, buffer))
{
std::pair<std::string,bool> r = removeNewlineEscape(buffer);
line += boost::trim_left_copy(r.first); // remove leading whitespace
// based on the current locale
if (!r.second) break;
line += " "; // as we append, we insert a whitespace
// in order unintended token concatenation
}
return removeComment(line);
} // extractLine
//< Returns the line, minus the '\' character
//< if it was the last significant one
//< Returns a boolean indicating whether or not the line continue
//< (true if it's necessary to concatenate with the next line)
std::pair<std:string,bool> removeNewlineEscape(const std::string& line)
{
std::pair<std::string,bool> result;
result.second = false;
size_t pos = line.find_last_not_of(" \t");
if (std::string::npos != pos && line[pos] == '\')
{
result.second = true;
--pos; // we don't want to have this '\' character in the string
}
result.first = line.substr(0, pos);
return result;
} // checkNewlineEscape
//< The main difficulty here is NOT to confuse a # inside a string
//< with a # signalling a comment
//< assuming strings are contained within "", let's roll
std::string removeComment(const std::string& line)
{
size_t pos = line.find_first_of("\"#");
while(std::string::npos != pos)
{
if (line[pos] == '"')
{
// We have detected the beginning of a string, we move pos to its end
// beware of the tricky presence of a '\' right before '"'...
pos = line.find_first_of("\"", pos+1);
while (std::string::npos != pos && line[pos-1] == '\')
pos = line.find_first_of("\"", pos+1);
}
else // line[pos] == '#'
{
// We have found the comment marker in a significant position
break;
}
pos = line.find_first_of("\"#", pos+1);
} // looking for comment marker
return line.substr(0, pos);
} // removeComment
It is fairly inefficient (but I trust the compiler for optmizations), but I believe it behaves correctly though it's untested so take it with a grain of salt. I have focused mainly on solving the functional issues, the naming convention I follow is different from yours but I don't think it should matter.

I want to point out a small and sweet version which lacks \ support but skips whitespace-lines and comments. (Note the std::ws in the call to std::getline.
#include <algorithm>
#include <iostream>
#include <sstream>
#include <string>
int main()
{
std::stringstream input(
" # blub\n"
"# foo bar\n"
" foo# foo bar\n"
"bar\n"
);
std::string line;
while (std::getline(input >> std::ws, line)) {
line.erase(std::find(line.begin(), line.end(), '#'), line.end());
if (line.empty()) {
continue;
}
std::cout << "line: \"" << line << "\"\n";
}
}
Output:
line: "foo"
line: "bar"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

wordexp and strings with spaces - c++

Related

Sub-strings and delimiters

How to count how many words are in line?Smarter way?

Splitting sentences and placing in vector

Struggling with a textcounter

std::string manipulation: whitespace, "newline escapes '\'" and comments #

Categories

Resources