Trimming internal whitespace in std::string - c++

I'm looking for an elegant way to transform an std::string from something like:
std::string text = " a\t very \t ugly \t\t\t\t string ";
To:
std::string text = "a very ugly string";
I've already trimmed the external whitespace with boost::trim(text);
[edit]
Thus, multiple whitespaces, and tabs, are reduced to just one space
[/edit]
Removing the external whitespace is trivial. But is there an elegant way of removing the internal whitespace that doesn't involve manual iteration and comparison of previous and next characters? Perhaps something in boost I have missed?

You can use std::unique with std::remove along with ::isspace to compress multiple whitespace characters into single spaces:
std::remove(std::unique(std::begin(text), std::end(text), [](char c, char c2) {
return ::isspace(c) && ::isspace(c2);
}), std::end(text));

std::istringstream iss(text);
text = "";
std::string s;
while(iss >> s){
if ( text != "" ) text += " " + s;
else text = s;
}
//use text, extra whitespaces are removed from it

Most of what I'd do is similar to what #Nawaz already posted -- read strings from an istringstream to get the data without whitespace, and then insert a single space between each of those strings. However, I'd use an infix_ostream_iterator from a previous answer to get (IMO) slightly cleaner/clearer code.
std::istringstream buffer(input);
std::copy(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
infix_ostream_iterator<std::string>(result, " "));

#include <boost/algorithm/string/trim_all.hpp>
string s;
boost::algorithm::trim_all(s);

If you check out https://svn.boost.org/trac/boost/ticket/1808, you'll see a request for (almost) this exact functionality, and a suggested implementation:
std::string trim_all ( const std::string &str ) {
return boost::algorithm::find_format_all_copy(
boost::trim_copy(str),
boost::algorithm::token_finder (boost::is_space(),boost::algorithm::token_compress_on),
boost::algorithm::const_formatter(" "));
}

Here is a possible version using regular expressions. My GCC 4.6 doesn't have regex_replace yet, but Boost.Regex can serve as a drop-in replacement:
#include <string>
#include <iostream>
// #include <regex>
#include <boost/regex.hpp>
#include <boost/algorithm/string/trim.hpp>
int main() {
using namespace std;
using namespace boost;
string text = " a\t very \t ugly \t\t\t\t string ";
trim(text);
regex pattern{"[[:space:]]+", regex_constants::egrep};
string result = regex_replace(text, pattern, " ");
cout << result << endl;
}

Related

Splitting string with colons and spaces?

So I've made my code work for separating the string:
String c;
for (int j = 0 ; j < count; j++) {
c += ip(ex[j]);
}
return c;
}
void setup() {
Serial.begin(9600);
}
I have had no luck with this, so any help would be greatly appreciated!
I would simply add a delimiter to your tokenizer. From a strtok() description the second parameter "is the C string containing the delimiters. These may vary from one call to another".
So add a 'space' delimiter to your tokenization: ex[i] = strtok(NULL, ": "); trim any whitespace from your tokens, and throw away any empty tokens. The last two shouldn't be necessary, because the delimiters won't be part of your collected tokens.
I'd suggest using <regex> library if the compiler of yours supports C++11.
#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex ws_re(":| +");
void printTokens(const std::string& input)
{
std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
int main()
{
const std::string text1 = "...:---:...";
std::cout<<"no whitespace:\n";
printTokens(text1);
std::cout<<"single whitespace:\n";
const std::string text2 = "..:---:... ..:---:...";
printTokens(text2);
std::cout<<"multiple whitespaces:\n";
const std::string text3 = "..:---:... ..:---:...";
printTokens(text3);
}
The description of library is on cppreference. If you are not familiar with regular expressions, the part in the code above const std::regex ws_re(":| +"); means that there should be either ':' symbol or (or in regular expressions denoted by pipe symbol '|') any amount of whitespaces ('+' stands for 'one or more symbol that stands before the plus sign'). Then one is able to use this regular expression to tokenize any input with std::sregex_token_iterator. For more complex cases than whitespaces, there is wonderful regex101.com.The only disadvantage I could think of is that regex engine is likely to be slower than simple handwritten tokenizer.

Regex works only on first occurance?

Update: Kindly read my comment on jignatius's answer
I wrote the following code to find specific matches in a string using regex and to delete them and replace with another value, but it doesn't work as expected.
For example given the following input:
f={a,b}+{c,d}
I would expect it to delete both {a,b} and {c,d} but it only works on the first one, what is wrong with my code?
After Some checking I can see that the first loop is entered only once, but why?
There is a standard library function, std::regex_replace, in the header <regex> that does what to want to do: text replacement based on a regex. That will simplify things quite a lot for you instead of using a hand crafted loop.
You just need to supply the input string, the regex to match against, and the replacement string:
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d}";
auto newstring = std::regex_replace(mystring, reg, "title");
std::cout << newstring; //f=title+title
}
Note: it's also easier to use a raw string literal with the format R"(literal)" to avoid using double backslashes to escape special characters in the regex.
Demo
In your comment you say that the replacement text can change. In that case, you will have to do a loop, not a straight forward regex replace.
You can use std::regex_iterator, a read-only forward iterator that will call std::regex_search() for you. You can use a string stream to build the new string:
#include <iostream>
#include <regex>
#include <string>
#include <sstream>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d} + c";
std::vector<std::string> replacements = { "rep1", "rep2", "rep3" };
int i = 0;
auto start = std::sregex_iterator(mystring.begin(), mystring.end(), reg);
auto end = std::sregex_iterator{};
std::ostringstream ss;
for (std::sregex_iterator it = start; it != end; ++it)
{
std::smatch mat = *it;
ss << mat.prefix() << replacements[i++];
//If last match, stream suffix
if (std::next(it) == end)
{
ss << mat.suffix();
}
}
std::cout << ss.str(); //f=rep1+rep2 + c
}
Note that the prefix() method of the std::smatch object will give you the substring from the target string to the beginning of the match. Then you place your replacement text into the stream. Finally, you should use the suffix() method of the std::smatch object to stream any trailing text between the last match and the end of your target string.
Demo

How to copy a file into another file but replace a word with a user entered word?

I am trying to copy a file to another file, but change a word with what the user has entered. So far I have come up with this:
while (getline(openningTheFile, line, ' ')) //line is a string and openningTheFile is an ifstream
{
if (line == wordToBeDeleted)
{
line = wordToReplaceWith;
}
if (line == "\n")
{
newFile << endl; //newFile is an ofstream
}
newFile << line << " ";
}
But the problem is that this code does not read the word after the "\n" because the delimiter is spaces.
Could anyone point me in the right direction?
Strategy I recommend:
Read the file line by line using std::getline.
Look for the string that you would like to replace in the line using std::string::find.
If it is found, replace it with the new string.
Repeat steps 2 and 3 until the string is not found.
Output the updated line.
Here's the core code for that:
while (getline(openningTheFile, line))
{
std::string::size_type pos;
while ( (pos = line.find(wordToBeDeleted)) != std::string::npos )
{
line.replace(pos, wordToBeDeleted.length(), wordToReplaceWith);
}
newFile << line << '\n';
}
You are reading a line of text with std::getline.
You will need to find the word within the text line, replace the word, then write the text line to the output file.
One method is to use std::stringstream and operator >> to extract the word from the string.
Another is to use std::string::find to locate the position of the word.
You can do that by modifying the loop to read word-wise from the line read using std::istringstream:
while (getline(openningTheFile, line))
{
std::istringstream iss(line);
std::string word;
bool firstWord = true;
while(iss >> word)
{
if(word == wordToBeDeleted) {
newFile << wordToReplaceWith;
if(!firstWord) {
newFile << " ";
}
firstWord = false;
}
}
newFile << `\n`;
}
Here is a solution that uses the power of boost::iostreams to solve the task in a high level but still very flexible way.
For OP's case this is propably like using a sledge hammer to crack a nut, but if one needs flexibility or has to deal with more complex cases, it might be the right tool.
I'm using a filtering stream combined with a regular expression. This allows us to replace a search pattern with a substitution string on-the-fly, without creating any intermediate string.
#include <iostream>
#include <string>
#include <sstream>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/regex.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/regex.hpp>
namespace io = boost::iostreams;
using namespace std;
int main()
{
// Your input "file" - you may replace it by an std::ifstream object.
istringstream in( "why waste time learning learninglearning, when ignorance is instantaneous?" );
// Search pattern and substitution string.
string findRegEx( R"(\blearning\b)" );
string replaceStr( "sleeping" );
// Build a regular expression filter and attach it to the input stream.
io::filtering_istream inFiltered;
inFiltered.push( io::regex_filter( boost::regex( findRegEx ), replaceStr ) );
inFiltered.push( in );
// Copy the filtered input to the output, replacing the search word on-the-fly.
// Replace "cout" by your output file, e. g. an std::ofstream object.
io::copy( inFiltered, cout );
cout << endl;
}
Live demo
Output:
why waste time sleeping learninglearning, when ignorance is instantaneous?
Notes:
The actual regular expression is \blearning\b
We don't need to escape the backslash because we are using a raw string literal. Very neat for stuff like this.
The regular expression searches for the whole word "learning" (\b denotes a word boundary). That's why it only replaces the first occurence of "learning" and not "learninglearning".

How to capitalize a word in a C++ string?

I have a std::string and wish for the first letter to be capitalized and the rest lower case.
One way I could do this is:
const std::string example("eXamPLe");
std::string capitalized = boost::to_lower_copy(example);
capitalized[0] = toupper(capitalized[0]);
Which would yield capitalized as:
"Example"
But perhaps there is a more straight forward way to do this?
If the string is indeed just a single word, std::string capitalized = boost::locale::to_title (example) should do it. Otherwise, what you've got is pretty compact.
Edit: just noticed that the boost::python namespace has a str class with a capitalize() method which sounds like it would work for multi word strings (assuming you want what you described and not title case). Using a python string just to gain that functionality is probably a bad idea, however.
A boost-less solution is:
#include <iostream>
#include <string>
#include <algorithm>
int main()
{
const std::string example("eXamPLe");
std::string s = example;
s[0] = toupper(s[0]);
std::transform(s.begin()+1, s.end(), s.begin()+1, tolower);
std::cout << s << "\n";
}
I think the string variable name is example and the string stored in it is "example".
So try this:
example[0] = toupper(example[0]);
for(int i=1 ; example[i] != '\0' ; ++i){
example[i] = tolower(example[i]);
}
cout << example << endl;
This might give you the first character CAPITALIZED and the rest of the string becomes lowercase.
It's not quite different from the original solution but just a different approach.

CString Parsing Carriage Returns

Let's say I have a string that has multiple carriage returns in it, i.e:
394968686
100630382
395950966
335666021
I'm still pretty amateur hour with C++, would anyone be willing to show me how you go about: parsing through each "line" in the string ? So I can do something with it later (add the desired line to a list). I'm guessing using Find("\n") in a loop?
Thanks guys.
while (!str.IsEmpty())
{
CString one_line = str.SpanExcluding(_T("\r\n"));
// do something with one_line
str = str.Right(str.GetLength() - one_line.GetLength()).TrimLeft(_T("\r\n"));
}
Blank lines will be eliminated with this code, but that's easily corrected if necessary.
You could try it using stringstream. Notice that you can overload the getline method to use any delimeter you want.
string line;
stringstream ss;
ss << yourstring;
while ( getline(ss, line, '\n') )
{
cout << line << endl;
}
Alternatively you could use the boost library's tokenizer class.
You can use stringstream class in C++.
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
int main()
{
string str = "\
394968686\
100630382\
395950966\
335666021";
stringstream ss(str);
vector<string> v;
string token;
// get line by line
while (ss >> token)
{
// insert current line into a std::vector
v.push_back(token);
// print out current line
cout << token << endl;
}
}
Output of the program above:
394968686
100630382
395950966
335666021
Note that no whitespace will be included in the parsed token, with the use of operator>>. Please refer to comments below.
If your string is stored in a c-style char* or std::string then you can simply search for \n.
std::string s;
size_t pos = s.find('\n');
You can use string::substr() to get the substring and store it in a list. Pseudo code,
std::string s = " .... ";
for(size_t pos, begin = 0;
string::npos != (pos = s.find('\n'));
begin = ++ pos)
{
list.push_back(s.substr(begin, pos));
}