Splitting string with colons and spaces? - c++

So I've made my code work for separating the string:
String c;
for (int j = 0 ; j < count; j++) {
c += ip(ex[j]);
}
return c;
}
void setup() {
Serial.begin(9600);
}
I have had no luck with this, so any help would be greatly appreciated!

I would simply add a delimiter to your tokenizer. From a strtok() description the second parameter "is the C string containing the delimiters. These may vary from one call to another".
So add a 'space' delimiter to your tokenization: ex[i] = strtok(NULL, ": "); trim any whitespace from your tokens, and throw away any empty tokens. The last two shouldn't be necessary, because the delimiters won't be part of your collected tokens.

I'd suggest using <regex> library if the compiler of yours supports C++11.
#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex ws_re(":| +");
void printTokens(const std::string& input)
{
std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
int main()
{
const std::string text1 = "...:---:...";
std::cout<<"no whitespace:\n";
printTokens(text1);
std::cout<<"single whitespace:\n";
const std::string text2 = "..:---:... ..:---:...";
printTokens(text2);
std::cout<<"multiple whitespaces:\n";
const std::string text3 = "..:---:... ..:---:...";
printTokens(text3);
}
The description of library is on cppreference. If you are not familiar with regular expressions, the part in the code above const std::regex ws_re(":| +"); means that there should be either ':' symbol or (or in regular expressions denoted by pipe symbol '|') any amount of whitespaces ('+' stands for 'one or more symbol that stands before the plus sign'). Then one is able to use this regular expression to tokenize any input with std::sregex_token_iterator. For more complex cases than whitespaces, there is wonderful regex101.com.The only disadvantage I could think of is that regex engine is likely to be slower than simple handwritten tokenizer.

Related

How to match \n in regex but skip n from following text?

I'm trying to extract the numerical data from the mentioned string, however, when using the given pattern, I also miss out on the following string.
pattern : [^\\n)]+
409416,-15.84361,-22.66174,-15.777729,-11.565274,0.184927,2.184308,-2.918847,-1.438143,-1.832789,-2.392894,-2.936923,-1.699626,-1.699626,-0.559298,-0.559298,-0.559298,-0.559298,-0.559298\n0.268223,0.088596,-0.953149,-1.344175,0.197503,4.143355,3.463934,0.289587,-0.063034,0.35563,2.322007,-13.589606,-11.883781,17.186039,-7.376517,10.132304,-4.420093,0.77321,5.358715,3.092631,0.457418,1.67359,4.545597,1.758356,1.758356,0.544843,0.544843,0.544843,0.544843,0.544843\n-4.421537,-2.864239,-3.992804,-2.769629,-0.345838,1.462282,-0.733731,-1.554252,0.376582,5.262342,7.720245,-14.295092,-14.852295,-16.991022,15.644931,14.116446,-4.67732,-6.69726,-0.406152,1.403272,-1.297639,-2.341637,-1.378868,-2.402558,-2.402558,-3.345482,-3.345482,-3.345482,-3.345482,-3.345482\n0.303624,-1.55541,-1.163894,-0.002663,1.203844,0.47408,-1.725865,-1.635311,-0.809665,1.496815,0.127842,2.615432,1.528776,-34.86355,4.610298,1.973559,-2.828502,1.598024,1.195854,0.623229,-1.526112,-0.921527,-0.346238,-0.905547,-0.905547,0.348902,0.348902,0.348902,0.348902,0.348902\n0.03196,-1.725865,-1.523449,-1.086656,-0.183773,0.516694,0.561972,0.292971,-0.183773,-0.002663,-2.048133,-13.026555,-17.415792,29.832436,3.382483,2.988304,-1.811093,0.114525,0.386189,-0.628556,-1.704558,-1.853707,-1.222488,-1.182537,-1.182537,-0.255684,-0.255684,-0.255684,-0.255684,-0.255684\n0.287644,-1.054696,-1.134597,-0.761725,0.109198,0.242367,-0.415486,-0.191763,-0.514031,0.138495,0.596595,4.54904,-4.29602,5.593082,7.870266,2.460956,1.787123,0.70313,-0.258347,0.103872,-0.26101,-0.058594,0.189099,0.713784,0.713784,-0.114525,-0.114525,-0.114525,-0.114525,-0.114525',Badminton_Smash
Required: string without \n.
Link: https://regex101.com/r/sMtFzd/1
Use splitting with
\\n|\)
See proof.
C++ supports splitting, see Split a string using C++11.
Use
#include <iostream>
#include <regex>
using namespace std;
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
int main() {
std::string input("409416,-15.84361,-22.66174,-15.777729,-11.565274,0.184927,2.184308,-2.918847,-1.438143,-1.832789,-2.392894,-2.936923,-1.699626,-1.699626,-0.559298,-0.559298,-0.559298,-0.559298,-0.559298\n0.268223,0.088596,-0.953149,-1.344175,0.197503,4.143355,3.463934,0.289587,-0.063034,0.35563,2.322007,-13.589606,-11.883781,17.186039,-7.376517,10.132304,-4.420093,0.77321,5.358715,3.092631,0.457418,1.67359,4.545597,1.758356,1.758356,0.544843,0.544843,0.544843,0.544843,0.544843\n-4.421537,-2.864239,-3.992804,-2.769629,-0.345838,1.462282,-0.733731,-1.554252,0.376582,5.262342,7.720245,-14.295092,-14.852295,-16.991022,15.644931,14.116446,-4.67732,-6.69726,-0.406152,1.403272,-1.297639,-2.341637,-1.378868,-2.402558,-2.402558,-3.345482,-3.345482,-3.345482,-3.345482,-3.345482\n0.303624,-1.55541,-1.163894,-0.002663,1.203844,0.47408,-1.725865,-1.635311,-0.809665,1.496815,0.127842,2.615432,1.528776,-34.86355,4.610298,1.973559,-2.828502,1.598024,1.195854,0.623229,-1.526112,-0.921527,-0.346238,-0.905547,-0.905547,0.348902,0.348902,0.348902,0.348902,0.348902\n0.03196,-1.725865,-1.523449,-1.086656,-0.183773,0.516694,0.561972,0.292971,-0.183773,-0.002663,-2.048133,-13.026555,-17.415792,29.832436,3.382483,2.988304,-1.811093,0.114525,0.386189,-0.628556,-1.704558,-1.853707,-1.222488,-1.182537,-1.182537,-0.255684,-0.255684,-0.255684,-0.255684,-0.255684\n0.287644,-1.054696,-1.134597,-0.761725,0.109198,0.242367,-0.415486,-0.191763,-0.514031,0.138495,0.596595,4.54904,-4.29602,5.593082,7.870266,2.460956,1.787123,0.70313,-0.258347,0.103872,-0.26101,-0.058594,0.189099,0.713784,0.713784,-0.114525,-0.114525,-0.114525,-0.114525,-0.114525',Badminton_Smash");
std::string rgx("\\\\n|\\)");
for (auto const c : split(input, rgx)) {
std::cout << c << "\n";
}
return 0;
}
See C++ proof.

Regex works only on first occurance?

Update: Kindly read my comment on jignatius's answer
I wrote the following code to find specific matches in a string using regex and to delete them and replace with another value, but it doesn't work as expected.
For example given the following input:
f={a,b}+{c,d}
I would expect it to delete both {a,b} and {c,d} but it only works on the first one, what is wrong with my code?
After Some checking I can see that the first loop is entered only once, but why?
There is a standard library function, std::regex_replace, in the header <regex> that does what to want to do: text replacement based on a regex. That will simplify things quite a lot for you instead of using a hand crafted loop.
You just need to supply the input string, the regex to match against, and the replacement string:
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d}";
auto newstring = std::regex_replace(mystring, reg, "title");
std::cout << newstring; //f=title+title
}
Note: it's also easier to use a raw string literal with the format R"(literal)" to avoid using double backslashes to escape special characters in the regex.
Demo
In your comment you say that the replacement text can change. In that case, you will have to do a loop, not a straight forward regex replace.
You can use std::regex_iterator, a read-only forward iterator that will call std::regex_search() for you. You can use a string stream to build the new string:
#include <iostream>
#include <regex>
#include <string>
#include <sstream>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d} + c";
std::vector<std::string> replacements = { "rep1", "rep2", "rep3" };
int i = 0;
auto start = std::sregex_iterator(mystring.begin(), mystring.end(), reg);
auto end = std::sregex_iterator{};
std::ostringstream ss;
for (std::sregex_iterator it = start; it != end; ++it)
{
std::smatch mat = *it;
ss << mat.prefix() << replacements[i++];
//If last match, stream suffix
if (std::next(it) == end)
{
ss << mat.suffix();
}
}
std::cout << ss.str(); //f=rep1+rep2 + c
}
Note that the prefix() method of the std::smatch object will give you the substring from the target string to the beginning of the match. Then you place your replacement text into the stream. Finally, you should use the suffix() method of the std::smatch object to stream any trailing text between the last match and the end of your target string.
Demo

Reading integers from a string with no space

I'm trying to read numbers from a string e.g.
if
string str = "1Hi15This10";
I want to get (1,15,10)
I tried by index but I read 10 as 1 and 0 not 10.
I could not use getline because the string is not separated by anything.
Any ideas?
without regex you can do this
std::string str = "1Hi15This10";
for (char *c = &str[0]; *c; ++c)
if (!std::isdigit(*c) && *c != '-' && *c != '+') *c = ' ';
the integers are now seperated by a space delimiter which is trivial to parse
IMHO the best way would be to use a regular expression.
#include <iostream>
#include <iterator>
#include <string>
#include <regex>
int main() {
std::string s = "1Hi15This10";
std::regex number("(\\d+)"); // -- match any group of one or more digits
auto begin = std::sregex_iterator(s.begin(), s.end(), number);
// iterate over all valid matches
for (auto i = begin; i != std::sregex_iterator(); ++i) {
std::cout << " " << i->str() << '\n';
// and additional processing, e.g. parse to int using std::stoi() etc.
}
}
Output:
1
15
10
Live example here on ideone.
Yes, you could just write your own loop for this, but:
you will probably make some silly mistake somewhere,
a regular expression can be adapted to serve many different search patterns; e.g. what if tmr you decide you want to support decimal/negative/floating point numbers; these cases and many others will be easily supported with a regex, probably not so much with your custom solution :)

C++ Remove punctuation from String

I got a string and I want to remove all the punctuations from it. How do I do that? I did some research and found that people use the ispunct() function (I tried that), but I cant seem to get it to work in my code. Anyone got any ideas?
#include <string>
int main() {
string text = "this. is my string. it's here."
if (ispunct(text))
text.erase();
return 0;
}
Using algorithm remove_copy_if :-
string text,result;
std::remove_copy_if(text.begin(), text.end(),
std::back_inserter(result), //Store output
std::ptr_fun<int, int>(&std::ispunct)
);
POW already has a good answer if you need the result as a new string. This answer is how to handle it if you want an in-place update.
The first part of the recipe is std::remove_if, which can remove the punctuation efficiently, packing all the non-punctuation as it goes.
std::remove_if (text.begin (), text.end (), ispunct)
Unfortunately, std::remove_if doesn't shrink the string to the new size. It can't because it has no access to the container itself. Therefore, there's junk characters left in the string after the packed result.
To handle this, std::remove_if returns an iterator that indicates the part of the string that's still needed. This can be used with strings erase method, leading to the following idiom...
text.erase (std::remove_if (text.begin (), text.end (), ispunct), text.end ());
I call this an idiom because it's a common technique that works in many situations. Other types than string provide suitable erase methods, and std::remove (and probably some other algorithm library functions I've forgotten for the moment) take this approach of closing the gaps for items they remove, but leaving the container-resizing to the caller.
#include <string>
#include <iostream>
#include <cctype>
int main() {
std::string text = "this. is my string. it's here.";
for (int i = 0, len = text.size(); i < len; i++)
{
if (ispunct(text[i]))
{
text.erase(i--, 1);
len = text.size();
}
}
std::cout << text;
return 0;
}
Output
this is my string its here
When you delete a character, the size of the string changes. It has to be updated whenever deletion occurs. And, you deleted the current character, so the next character becomes the current character. If you don't decrement the loop counter, the character next to the punctuation character will not be checked.
ispunct takes a char value not a string.
you can do like
for (auto c : string)
if (ispunct(c)) text.erase(text.find_first_of(c));
This will work but it is a slow algorithm.
Pretty good answer by Steve314.
I would like to add a small change :
text.erase (std::remove_if (text.begin (), text.end (), ::ispunct), text.end ());
Adding the :: before the function ispunct takes care of overloading .
The problem here is that ispunct() takes one argument being a character, while you are trying to send a string. You should loop over the elements of the string and erase each character if it is a punctuation like here:
for(size_t i = 0; i<text.length(); ++i)
if(ispunct(text[i]))
text.erase(i--, 1);
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
#include <iostream>
#include <string>
using namespace std;
int main()
{
string s;//string is defined here.
cout << "Please enter a string with punctuation's: " << endl;//Asking for users input
getline(cin, s);//reads in a single string one line at a time
/* ERROR Check: The loop didn't run at first because a semi-colon was placed at the end
of the statement. Remember not to add it for loops. */
for(auto &c : s) //loop checks every character
{
if (ispunct(c)) //to see if its a punctuation
{
c=' '; //if so it replaces it with a blank space.(delete)
}
}
cout << s << endl;
system("pause");
return 0;
}
Another way you could do this would be as follows:
#include <ctype.h> //needed for ispunct()
string onlyLetters(string str){
string retStr = "";
for(int i = 0; i < str.length(); i++){
if(!ispunct(str[i])){
retStr += str[i];
}
}
return retStr;
This ends up creating a new string instead of actually erasing the characters from the old string, but it is a little easier to wrap your head around than using some of the more complex built in functions.
I tried to apply #Steve314's answer but couldn't get it to work until I came across this note here on cppreference.com:
Notes
Like all other functions from <cctype>, the behavior of std::ispunct
is undefined if the argument's value is neither representable as
unsigned char nor equal to EOF. To use these functions safely with
plain chars (or signed chars), the argument should first be converted
to unsigned char.
By studying the example it provides, I am able to make it work like this:
#include <string>
#include <iostream>
#include <cctype>
#include <algorithm>
int main()
{
std::string text = "this. is my string. it's here.";
std::string result;
text.erase(std::remove_if(text.begin(),
text.end(),
[](unsigned char c) { return std::ispunct(c); }),
text.end());
std::cout << text << std::endl;
}
Try to use this one, it will remove all the punctuation on the string in the text file oky.
str.erase(remove_if(str.begin(), str.end(), ::ispunct), str.end());
please reply if helpful
i got it.
size_t found = text.find('.');
text.erase(found, 1);

Trimming internal whitespace in std::string

I'm looking for an elegant way to transform an std::string from something like:
std::string text = " a\t very \t ugly \t\t\t\t string ";
To:
std::string text = "a very ugly string";
I've already trimmed the external whitespace with boost::trim(text);
[edit]
Thus, multiple whitespaces, and tabs, are reduced to just one space
[/edit]
Removing the external whitespace is trivial. But is there an elegant way of removing the internal whitespace that doesn't involve manual iteration and comparison of previous and next characters? Perhaps something in boost I have missed?
You can use std::unique with std::remove along with ::isspace to compress multiple whitespace characters into single spaces:
std::remove(std::unique(std::begin(text), std::end(text), [](char c, char c2) {
return ::isspace(c) && ::isspace(c2);
}), std::end(text));
std::istringstream iss(text);
text = "";
std::string s;
while(iss >> s){
if ( text != "" ) text += " " + s;
else text = s;
}
//use text, extra whitespaces are removed from it
Most of what I'd do is similar to what #Nawaz already posted -- read strings from an istringstream to get the data without whitespace, and then insert a single space between each of those strings. However, I'd use an infix_ostream_iterator from a previous answer to get (IMO) slightly cleaner/clearer code.
std::istringstream buffer(input);
std::copy(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
infix_ostream_iterator<std::string>(result, " "));
#include <boost/algorithm/string/trim_all.hpp>
string s;
boost::algorithm::trim_all(s);
If you check out https://svn.boost.org/trac/boost/ticket/1808, you'll see a request for (almost) this exact functionality, and a suggested implementation:
std::string trim_all ( const std::string &str ) {
return boost::algorithm::find_format_all_copy(
boost::trim_copy(str),
boost::algorithm::token_finder (boost::is_space(),boost::algorithm::token_compress_on),
boost::algorithm::const_formatter(" "));
}
Here is a possible version using regular expressions. My GCC 4.6 doesn't have regex_replace yet, but Boost.Regex can serve as a drop-in replacement:
#include <string>
#include <iostream>
// #include <regex>
#include <boost/regex.hpp>
#include <boost/algorithm/string/trim.hpp>
int main() {
using namespace std;
using namespace boost;
string text = " a\t very \t ugly \t\t\t\t string ";
trim(text);
regex pattern{"[[:space:]]+", regex_constants::egrep};
string result = regex_replace(text, pattern, " ");
cout << result << endl;
}