How to match \n in regex but skip n from following text? - regex

I'm trying to extract the numerical data from the mentioned string, however, when using the given pattern, I also miss out on the following string.
pattern : [^\\n)]+
409416,-15.84361,-22.66174,-15.777729,-11.565274,0.184927,2.184308,-2.918847,-1.438143,-1.832789,-2.392894,-2.936923,-1.699626,-1.699626,-0.559298,-0.559298,-0.559298,-0.559298,-0.559298\n0.268223,0.088596,-0.953149,-1.344175,0.197503,4.143355,3.463934,0.289587,-0.063034,0.35563,2.322007,-13.589606,-11.883781,17.186039,-7.376517,10.132304,-4.420093,0.77321,5.358715,3.092631,0.457418,1.67359,4.545597,1.758356,1.758356,0.544843,0.544843,0.544843,0.544843,0.544843\n-4.421537,-2.864239,-3.992804,-2.769629,-0.345838,1.462282,-0.733731,-1.554252,0.376582,5.262342,7.720245,-14.295092,-14.852295,-16.991022,15.644931,14.116446,-4.67732,-6.69726,-0.406152,1.403272,-1.297639,-2.341637,-1.378868,-2.402558,-2.402558,-3.345482,-3.345482,-3.345482,-3.345482,-3.345482\n0.303624,-1.55541,-1.163894,-0.002663,1.203844,0.47408,-1.725865,-1.635311,-0.809665,1.496815,0.127842,2.615432,1.528776,-34.86355,4.610298,1.973559,-2.828502,1.598024,1.195854,0.623229,-1.526112,-0.921527,-0.346238,-0.905547,-0.905547,0.348902,0.348902,0.348902,0.348902,0.348902\n0.03196,-1.725865,-1.523449,-1.086656,-0.183773,0.516694,0.561972,0.292971,-0.183773,-0.002663,-2.048133,-13.026555,-17.415792,29.832436,3.382483,2.988304,-1.811093,0.114525,0.386189,-0.628556,-1.704558,-1.853707,-1.222488,-1.182537,-1.182537,-0.255684,-0.255684,-0.255684,-0.255684,-0.255684\n0.287644,-1.054696,-1.134597,-0.761725,0.109198,0.242367,-0.415486,-0.191763,-0.514031,0.138495,0.596595,4.54904,-4.29602,5.593082,7.870266,2.460956,1.787123,0.70313,-0.258347,0.103872,-0.26101,-0.058594,0.189099,0.713784,0.713784,-0.114525,-0.114525,-0.114525,-0.114525,-0.114525',Badminton_Smash
Required: string without \n.
Link: https://regex101.com/r/sMtFzd/1

Use splitting with
\\n|\)
See proof.
C++ supports splitting, see Split a string using C++11.
Use
#include <iostream>
#include <regex>
using namespace std;
std::vector<std::string> split(const string& input, const string& regex) {
// passing -1 as the submatch index parameter performs splitting
std::regex re(regex);
std::sregex_token_iterator
first{input.begin(), input.end(), re, -1},
last;
return {first, last};
}
int main() {
std::string input("409416,-15.84361,-22.66174,-15.777729,-11.565274,0.184927,2.184308,-2.918847,-1.438143,-1.832789,-2.392894,-2.936923,-1.699626,-1.699626,-0.559298,-0.559298,-0.559298,-0.559298,-0.559298\n0.268223,0.088596,-0.953149,-1.344175,0.197503,4.143355,3.463934,0.289587,-0.063034,0.35563,2.322007,-13.589606,-11.883781,17.186039,-7.376517,10.132304,-4.420093,0.77321,5.358715,3.092631,0.457418,1.67359,4.545597,1.758356,1.758356,0.544843,0.544843,0.544843,0.544843,0.544843\n-4.421537,-2.864239,-3.992804,-2.769629,-0.345838,1.462282,-0.733731,-1.554252,0.376582,5.262342,7.720245,-14.295092,-14.852295,-16.991022,15.644931,14.116446,-4.67732,-6.69726,-0.406152,1.403272,-1.297639,-2.341637,-1.378868,-2.402558,-2.402558,-3.345482,-3.345482,-3.345482,-3.345482,-3.345482\n0.303624,-1.55541,-1.163894,-0.002663,1.203844,0.47408,-1.725865,-1.635311,-0.809665,1.496815,0.127842,2.615432,1.528776,-34.86355,4.610298,1.973559,-2.828502,1.598024,1.195854,0.623229,-1.526112,-0.921527,-0.346238,-0.905547,-0.905547,0.348902,0.348902,0.348902,0.348902,0.348902\n0.03196,-1.725865,-1.523449,-1.086656,-0.183773,0.516694,0.561972,0.292971,-0.183773,-0.002663,-2.048133,-13.026555,-17.415792,29.832436,3.382483,2.988304,-1.811093,0.114525,0.386189,-0.628556,-1.704558,-1.853707,-1.222488,-1.182537,-1.182537,-0.255684,-0.255684,-0.255684,-0.255684,-0.255684\n0.287644,-1.054696,-1.134597,-0.761725,0.109198,0.242367,-0.415486,-0.191763,-0.514031,0.138495,0.596595,4.54904,-4.29602,5.593082,7.870266,2.460956,1.787123,0.70313,-0.258347,0.103872,-0.26101,-0.058594,0.189099,0.713784,0.713784,-0.114525,-0.114525,-0.114525,-0.114525,-0.114525',Badminton_Smash");
std::string rgx("\\\\n|\\)");
for (auto const c : split(input, rgx)) {
std::cout << c << "\n";
}
return 0;
}
See C++ proof.

Related

Splitting string with colons and spaces?

So I've made my code work for separating the string:
String c;
for (int j = 0 ; j < count; j++) {
c += ip(ex[j]);
}
return c;
}
void setup() {
Serial.begin(9600);
}
I have had no luck with this, so any help would be greatly appreciated!
I would simply add a delimiter to your tokenizer. From a strtok() description the second parameter "is the C string containing the delimiters. These may vary from one call to another".
So add a 'space' delimiter to your tokenization: ex[i] = strtok(NULL, ": "); trim any whitespace from your tokens, and throw away any empty tokens. The last two shouldn't be necessary, because the delimiters won't be part of your collected tokens.
I'd suggest using <regex> library if the compiler of yours supports C++11.
#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>
const std::regex ws_re(":| +");
void printTokens(const std::string& input)
{
std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
int main()
{
const std::string text1 = "...:---:...";
std::cout<<"no whitespace:\n";
printTokens(text1);
std::cout<<"single whitespace:\n";
const std::string text2 = "..:---:... ..:---:...";
printTokens(text2);
std::cout<<"multiple whitespaces:\n";
const std::string text3 = "..:---:... ..:---:...";
printTokens(text3);
}
The description of library is on cppreference. If you are not familiar with regular expressions, the part in the code above const std::regex ws_re(":| +"); means that there should be either ':' symbol or (or in regular expressions denoted by pipe symbol '|') any amount of whitespaces ('+' stands for 'one or more symbol that stands before the plus sign'). Then one is able to use this regular expression to tokenize any input with std::sregex_token_iterator. For more complex cases than whitespaces, there is wonderful regex101.com.The only disadvantage I could think of is that regex engine is likely to be slower than simple handwritten tokenizer.

Regex works only on first occurance?

Update: Kindly read my comment on jignatius's answer
I wrote the following code to find specific matches in a string using regex and to delete them and replace with another value, but it doesn't work as expected.
For example given the following input:
f={a,b}+{c,d}
I would expect it to delete both {a,b} and {c,d} but it only works on the first one, what is wrong with my code?
After Some checking I can see that the first loop is entered only once, but why?
There is a standard library function, std::regex_replace, in the header <regex> that does what to want to do: text replacement based on a regex. That will simplify things quite a lot for you instead of using a hand crafted loop.
You just need to supply the input string, the regex to match against, and the replacement string:
#include <iostream>
#include <regex>
#include <string>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d}";
auto newstring = std::regex_replace(mystring, reg, "title");
std::cout << newstring; //f=title+title
}
Note: it's also easier to use a raw string literal with the format R"(literal)" to avoid using double backslashes to escape special characters in the regex.
Demo
In your comment you say that the replacement text can change. In that case, you will have to do a loop, not a straight forward regex replace.
You can use std::regex_iterator, a read-only forward iterator that will call std::regex_search() for you. You can use a string stream to build the new string:
#include <iostream>
#include <regex>
#include <string>
#include <sstream>
int main()
{
std::regex reg(R"(\{[^}]*\})");
std::string mystring = "f={a,b}+{c,d} + c";
std::vector<std::string> replacements = { "rep1", "rep2", "rep3" };
int i = 0;
auto start = std::sregex_iterator(mystring.begin(), mystring.end(), reg);
auto end = std::sregex_iterator{};
std::ostringstream ss;
for (std::sregex_iterator it = start; it != end; ++it)
{
std::smatch mat = *it;
ss << mat.prefix() << replacements[i++];
//If last match, stream suffix
if (std::next(it) == end)
{
ss << mat.suffix();
}
}
std::cout << ss.str(); //f=rep1+rep2 + c
}
Note that the prefix() method of the std::smatch object will give you the substring from the target string to the beginning of the match. Then you place your replacement text into the stream. Finally, you should use the suffix() method of the std::smatch object to stream any trailing text between the last match and the end of your target string.
Demo

Extracting multiple strings from a single line in a text file in c++

So i have a text file that contains information about books (title,author,genre) on every line that would look like this '[title]' '[author]' '[genre]'. How could i divide this line in 3 different strings so that each one is the title/author/genre?
You can split string according ANY rule if you can define regexp for that rule , then use sregex_token_iterator to enumerate all matches in string. This example would save all matches into a vector.
#include <vector>
#include <iostream>
#include <string>
#include <regex>
std::vector<std::string> get_params(const std::string& sentence)
{
std::regex reg("([^\']*)");
std::vector<std::string> names(
std::sregex_token_iterator(sentence.begin(), sentence.end(), reg),
std::sregex_token_iterator());
return names;
}
int main()
{
std::string str = "\'String1\' \'String2\' \'String3\'";
std::vector<std::string> v = get_params(str);
for (auto const& s : v)
std::cout << s << '\n';
}

How to express an assembly lw/sw instruction using regular expression regex library?

I want to detect when the user enters "lw 2, 3(9)" , but it can't read the parenthesis, I used this code but it still doesn't detect the parenthesis.
{ R"((\w+) ([[:digit:]]+), ([[:digit:]]+) (\\([[:digit:]]+\\)) )"}
Can someone please help?
You need to be careful with excessive spaces in the pattern, and since you are using a raw string literal, you should not double escape special chars:
R"((\w+) ([[:digit:]]+), ([[:digit:]]+)(\([[:digit:]]+\)))"
^^^ ^ ^^
It might be a good idea to replace literal spaces with [[:space:]]+.
C++ demo printing lw 2, 3(9):
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main() {
regex rx(R"((\w+) ([[:digit:]]+), ([[:digit:]]+)(\([[:digit:]]+\)))");
string s("Text lw 2, 3(9) here");
smatch m;
if (regex_search(s, m, rx)) {
std::cout << m[0] << std::endl;
}
return 0;
}
R"((\w+) (\d+), (\d+)(\(\d+\)))"
worked for me
Since you didn't specify whether you want to capture something or not, I'll provide both snippets.
You don't have to escape characters with raw string literals but you do have to escape capture groups
#include <iostream>
#include <string>
#include <regex>
int main()
{
std::string str = "lw 2, 3(9)";
{
std::regex my_regex(R"(\w+ \d+, \d+\(\d+\))");
if (std::regex_search(str, my_regex)) {
std::cout << "Detected\n";
}
}
{
// With capture groups
std::regex my_regex(R"((\w+) (\d+), (\d+)(\(\d+\)))");
std::smatch match;
if (std::regex_search(str, match, my_regex)) {
std::cout << match[0] << std::endl;
}
}
}
Live example
An additional improvement could be to handle multiple spacing (if that is allowed in your particular case) with \s+.
I can't help but notice that EJP's concerns might also be spot-on: this is a very fragile solution parsing-wise.

Trimming internal whitespace in std::string

I'm looking for an elegant way to transform an std::string from something like:
std::string text = " a\t very \t ugly \t\t\t\t string ";
To:
std::string text = "a very ugly string";
I've already trimmed the external whitespace with boost::trim(text);
[edit]
Thus, multiple whitespaces, and tabs, are reduced to just one space
[/edit]
Removing the external whitespace is trivial. But is there an elegant way of removing the internal whitespace that doesn't involve manual iteration and comparison of previous and next characters? Perhaps something in boost I have missed?
You can use std::unique with std::remove along with ::isspace to compress multiple whitespace characters into single spaces:
std::remove(std::unique(std::begin(text), std::end(text), [](char c, char c2) {
return ::isspace(c) && ::isspace(c2);
}), std::end(text));
std::istringstream iss(text);
text = "";
std::string s;
while(iss >> s){
if ( text != "" ) text += " " + s;
else text = s;
}
//use text, extra whitespaces are removed from it
Most of what I'd do is similar to what #Nawaz already posted -- read strings from an istringstream to get the data without whitespace, and then insert a single space between each of those strings. However, I'd use an infix_ostream_iterator from a previous answer to get (IMO) slightly cleaner/clearer code.
std::istringstream buffer(input);
std::copy(std::istream_iterator<std::string>(buffer),
std::istream_iterator<std::string>(),
infix_ostream_iterator<std::string>(result, " "));
#include <boost/algorithm/string/trim_all.hpp>
string s;
boost::algorithm::trim_all(s);
If you check out https://svn.boost.org/trac/boost/ticket/1808, you'll see a request for (almost) this exact functionality, and a suggested implementation:
std::string trim_all ( const std::string &str ) {
return boost::algorithm::find_format_all_copy(
boost::trim_copy(str),
boost::algorithm::token_finder (boost::is_space(),boost::algorithm::token_compress_on),
boost::algorithm::const_formatter(" "));
}
Here is a possible version using regular expressions. My GCC 4.6 doesn't have regex_replace yet, but Boost.Regex can serve as a drop-in replacement:
#include <string>
#include <iostream>
// #include <regex>
#include <boost/regex.hpp>
#include <boost/algorithm/string/trim.hpp>
int main() {
using namespace std;
using namespace boost;
string text = " a\t very \t ugly \t\t\t\t string ";
trim(text);
regex pattern{"[[:space:]]+", regex_constants::egrep};
string result = regex_replace(text, pattern, " ");
cout << result << endl;
}