Regular Expression or Not (Getting not all text that satisfies regexp) - c++

I want use regex to find something in string (or QString) that is between " (quotes).
My simple String: x="20.51167" and I want 20.51167.
Is it possible with Regular Expressions ??
On start I had somthing like this string :
<S id="1109" s5="1" nr="1183" n="Some text" test=" " x="20.53843" y="50.84443">
Using regexp like: (nr=\"[0-9]+\") (y=\"[0-9 .^\"]+\")" etc I get my simple string like x="20.51167". Maybe this is wrong way and I can get value that is between quotes at one time ??

For your particular example, this will work:
#include <QRegExp>
#include <QString>
#include <iostream>
int main()
{
//Here's your regexp.
QRegExp re("\"[^\"^=]+\"");
//Here's your sample string:
QString test ="<S id=\"1109\" s5=\"1\" nr=\"1183\" n=\"Some text\" test=\" \" x=\"20.53843\" y=\"50.84443\">";
int offset = 0;
while( offset = re.indexIn( test, offset + 1 ) )
{
if(offset == -1)
break;
QString res = re.cap().replace("\"", "");
bool ok;
int iRes;
float fRes;
if( res.toInt( &ok ) && ok )
{
iRes = res.toInt();
std::cout << "int: " << iRes << std::endl;
}
else if ( res.toFloat( &ok ) && ok )
{
fRes = res.toFloat();
std::cout << "float: " << fRes << std::endl;
}
else
std::cout << "string: " << res.toStdString() << std::endl;
}
}
The output will be;
int: 1109
int: 1
int: 1183
string: Some text
string:
float: 20.5384
float: 50.8444

Try this works. untested
="([^"]+)"
The above captures anything that is in-between =" "

In this expression : (nr=\"[0-9]+\") (y=\"[0-9 .^\"]+\")"
Delete the last quote after )
For your regular expression try :
x=^[0-9]+.[0-9]{5}

If you want to find anything in quotes, I guess the regex should read:
"([^"]*)"
(anything that is not a quote between quotes)

You just have to move your capturing group inside the quotes:
x=\"([0-9.]+)\"

Related

What is the problem with this regex? Suffix from smatch is always empty

I have this regex
int main ()
{
const std::string rna = "Feb 11 16:55:35.127897 oms1_OMSController Data: <RBC rbcID=9001><ChannelStates rbcTimeStamp=20200211152135Z><channelStateList><CS channelID=RCS0-OMS-Diag available=no/></channelStateList></ChannelStates></RBC>" ;
const boost::regex re( "^([[:alpha:]]{3})[[:space:]]+([[:digit:]]+) ([[:digit:]]{2}):([[:digit:]]{2}):([[:digit:]]{2})(\\.[0-9]{6})? ([[:alnum:]_\\.]*) (([[:alpha:]]{3})[[:space:]]+([[:digit:]]+) ([[:digit:]]{2}):([[:digit:]]{2}):([[:digit:]]{2})(\\.[0-9]{6})? ([[:alnum:]_\\.]*))?.*.*?" ) ;
boost::smatch match ;
if( boost::regex_search( rna, match, re ) )
{
std::cout << "SUFFIX " << match.suffix().str() << std::endl;
for(auto i = match.begin(); i != match.end(); i++)
std::cout << *i << std::endl;
}
}
Maybe someone can help me to understand why my suffix is ​​always empty ? If i remove .*.*? from the end of regex the suffix will be ok. What does mean .*.*? ?
I also know . means any character and * means 0 or more.

Use regex to validate string not starting with specific character and string length

I have a function with the following if statements:
if (name.length() < 10 || name.length() > 64)
{
return false;
}
if (name.front() == L'.' || name.front() == L' ')
{
return false;
}
I was curious to see if can do this using the following regular expression:
^(?!\ |\.)([A-Za-z]{10,46}$)
to dissect the above expression the first part ^(?!\ |.) preforms a negative look ahead to assert that it is impossible for the string to start with space or dot(.) and the second part should take care of the string length condition. I wrote the following to test the expression out:
std::string randomStrings [] = {" hello",
" hellllloooo",
"\.hello",
".zoidbergvddd",
"asdasdsadadasdasdasdadasdsad"};
std::regex txt_regex("^(?!\ |\.)([A-Za-z]{10,46}$)");
for (const auto& str : randomStrings)
{
std::cout << str << ": " << std::boolalpha << std::regex_match(str, txt_regex) << '\n';
}
I expected the last one to to match since it does not start with space or dot(.) and it meets the length criteria. However, this is what I got:
hello: false
hellllloooo: false
.hello: false
.zoidbergvddd: false
asdasdsadadasdasdasdadasdsad: false
Did I miss something trivial here or this is not possible using regex? It seems like it should be.
Feel free to suggest a better title, I tried to be as descriptive as possible.
Change your regular expression to: "^(?![\\s.])([A-Za-z]{10,46}$)" and it will work.
\s refers to any whitespace and you need to escape the \ inside the string and that's why it becomes \\s.
You can also check this link
You need to turn on compiler warnings. It would have told you that you have an unknown escape sequence in your regex. I recommend using a raw literal.
#include <iostream>
#include <regex>
int main() {
std::string randomStrings[] = { " hello", " hellllloooo", ".hello",
".zoidbergvddd", "asdasdsadadasdasdasdadasdsad" };
std::regex txt_regex(R"foo(^(?!\ |\.)([A-Za-z]{10,46}$))foo");
for (const auto& str : randomStrings) {
std::cout << str << ": " << std::boolalpha
<< std::regex_match(str, txt_regex) << '\n';
}
}
clang++-3.8 gives
hello: false
hellllloooo: false
.hello: false
.zoidbergvddd: false
asdasdsadadasdasdasdadasdsad: true

How to match string with wildcard using C++11 regex

This is alist of static strings, it only uses wildcard at begin or end of the string. No any other regex rules.
AAAA, BBBB*, *CCCC, *DDDD* .
I need to find a given string match any of the string in this list. I'm looking for something like this.
bool isMatch(std::string str)
{
std::vector<string> my_list = {AAAA, BBBB*, *CCCC, *DDDD*};
if(str.matchAny(my_list))
return true;
return false;
}
I don't like to uses any 3rd parties like boost. Thinking this can be achieve by C++11 std::regex? Or is there any other simple way?
A regular expression would be overkill here. Just look for each of the character sequences in the appropriate place:
str == "AAAA"
str.find("BBBB") == 0
str.find("CCCC") == str.size() - 4
str.find("DDDD") != std::string::npos
Here's how I've usually done it, I replace "\\*" with ".*" and "\\?" with ".".
Here's the C++ code for it.
#include <iostream>
#include <regex>
using namespace std;
int main()
{
regex star_replace("\\*");
regex questionmark_replace("\\?");
string data = "AAAABBBCCDDDD";
string pattern = "*CC*";
auto wildcard_pattern = regex_replace(
regex_replace(pattern, star_replace, ".*"),
questionmark_replace, ".");
cout << "Wildcard: " << pattern << " Regex: " << wildcard_pattern << endl;
regex wildcard_regex("^" + wildcard_pattern + "$");
if (regex_match(data, wildcard_regex))
cout << "Match!" << endl;
else
cout << "No match!" << endl;
return 0;
}
Here's a link to runnable code on onlinegdb

Anything like substr but instead of stopping at the byte you specified, it stops at a specific string [duplicate]

This question already has answers here:
How do you search a std::string for a substring in C++?
(6 answers)
Closed 8 years ago.
I have a client for a pre-existing server. Let's say I get some packets "MC123, 456!##".
I store these packets in a char called message. To print out a specific part of them, in this case the numbers part of them, I would do something like "cout << message.substr(3, 7) << endl;".
But what if I receive another message "MC123, 456, 789!##". "cout << message.substr(3,7)" would only print out "123, 456", whereas I want "123, 456, 789". How would I do this assuming I know that every message ends with "!##".
First - Sketch out the indexing.
std::string packet1 = "MC123, 456!##";
// 0123456789012345678
// ^------^ desired text
std::string packet2 = "MC123, 456, 789!##";
// 0123456789012345678
// ^-----------^ desired text
The others answers are ok. If you wish to use std::string find,
consider rfind and find_first_not_of, as in the following code:
// forward
void messageShow(std::string packet,
size_t startIndx = 2);
// /////////////////////////////////////////////////////////////////////////////
int main (int, char** )
{
// 012345678901234567
// |
messageShow("MC123, 456!##");
messageShow("MC123, 456, 789!##");
messageShow("MC123, 456, 789, 987, 654!##");
// error test cases
messageShow("MC123, 456, 789##!"); // missing !##
messageShow("MC123x 456, 789!##"); // extraneous char in packet
return(0);
}
void messageShow(std::string packet,
size_t startIndx) // default value 2
{
static size_t seq = 0;
seq += 1;
std::cout << packet.size() << " packet" << seq << ": '"
<< packet << "'" << std::endl;
do
{
size_t bangAtPound_Indx = packet.rfind("!##");
if(bangAtPound_Indx == std::string::npos){ // not found, can't do anything more
std::cerr << " '!##' not found in packet " << seq << std::endl;
break;
}
size_t printLength = bangAtPound_Indx - startIndx;
const std::string DIGIT_SPACE = "0123456789, ";
size_t allDigitSpace = packet.find_first_not_of(DIGIT_SPACE, startIndx);
if(allDigitSpace != bangAtPound_Indx) {
std::cerr << " extraneous char found in packet " << seq << std::endl;
break; // something extraneous in string
}
std::cout << bangAtPound_Indx << " message" << seq << ": '"
<< packet.substr(startIndx, printLength) << "'" << std::endl;
}while(0);
std::cout << std::endl;
}
This outputs
13 packet1: 'MC123, 456!##'
10 message1: '123, 456'
18 packet2: 'MC123, 456, 789!##'
15 message2: '123, 456, 789'
28 packet3: 'MC123, 456, 789, 987, 654!##'
25 message3: '123, 456, 789, 987, 654'
18 packet4: 'MC123, 456, 789##!'
'!##' not found in packet 4
18 packet5: 'MC123x 456, 789!##'
extraneous char found in packet 5
Note: String indexes start at 0. The index of the digit '1' is 2.
The correct approach is to look for existence / location of the "known termination" string, then take the substring up to (but not including) that substring.
Something like
str::string termination = "!#$";
std::size_t position = inputstring.find(termination);
std::string importantBit = message.substr(0, position);
You could check the front of the string separately as well. Combining these, you could use regular expressions to make your code more robust, using a regex like
MC([0-9,]+)!#\$
This will return the bit between MC and !#$ but only if it consists entirely of numbers and commas. Obviously you can adapt this as needed.
UPDATE you asked in your comment how to use the regular expression. Here is a very simple program. Note - this is using C++11: you need to make sure our compiler supports it.
#include <iostream>
#include <regex>
int main(void) {
std::string s ("ABC123,456,789!#$");
std::smatch m;
std::regex e ("ABC([0-9,]+)!#\\$"); // matches the kind of pattern you are looking for
if (std::regex_search (s,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
}
}
On my Mac, I can compile the above program with
clang++ -std=c++0x -stdlib=libc++ match.cpp -o match
If instead of just digits and commas you want "anything" in your expression (but it's still got fixed characters in front and behind) you can simply do
std::regex e ("ABC(.*)!#\\$");
Here, .+ means "zero or more of 'anything'" - but followed by !#$. The double backslash has to be there to "escape" the dollar sign, which has special meaning in regular expressions (it means "the end of the string").
The more accurately your regular expression reflects exactly what you expect, the better you will be able to trap any errors. This is usually a very good thing in programming. "Always check your inputs".
One more thing - I just noticed you mentioned that you might have "more stuff" in your string. This is where using regular expressions quickly becomes the best. You mentioned a string
MC123, 456!##*USRChester.
and wanted to extract 123, 456 and Chester. That is - stuff between MC and !#$, and more stuff after USR (if that is even there). Here is the code that shows how that is done:
#include <iostream>
#include <regex>
int main(void) {
std::string s1 ("MC123, 456!#$");
std::string s2 ("MC123, 456!#$USRChester");
std::smatch m;
std::regex e ("MC([0-9, ]+)!#\\$(?:USR)?(.*)$"); // matches the kind of pattern you are looking for
if (std::regex_search (s1,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
}
if (std::regex_search (s2,m,e)) {
std::cout << "match[0] = " << m[0] << std::endl;
std::cout << "match[1] = " << m[1] << std::endl;
std::cout << "match[2] = " << m[2] << std::endl;
if (match[2].length() > 0) {
std::cout << m[2] << ": " << m[1] << std::endl;
}
}
}
Output:
match[0] = MC123, 456!#$
match[1] = 123, 456
match[2] =
match[0] = MC123, 456!#$USRChester
match[1] = 123, 456
match[2] = Chester
Chester: 123, 456
The matches are:
match[0] : "everything in the input string that was consumed by the Regex"
match[1] : "the thing in the first set of parentheses"
match[2] : "The thing in the second set of parentheses"
Note the use of the slightly tricky (?:USR)? expression. This says "This might (that's the ()? ) be followed by the characters USR. If it is, skip them (that's the ?: part) and match what follows.
As you can see, simply testing whether m[2] is empty will tell you whether you have just numbers, or number plus "the thing after the USR". I hope this gives you an inkling of the power of regular expressions for chomping through strings like yours.
If you are sure about the ending of the message, message.substr(3, message.size()-6) will do the trick.
However, it is good practice to check everything, just to avoid surprises.
Something like this:
if (message.size() < 6)
throw error;
if (message.substr(0,3) != "MCX") //the exact numbers do not match in your example, but you get the point...
throw error;
if (message.substr(message.size()-3) != "!##")
throw error;
string data = message.substr(3, message.size()-6);
Just calculate the offset first.
string str = ...;
size_t start = 3;
size_t end = str.find("!##");
assert(end != string::npos);
return str.substr(start, end - start);
You can get the index of "!##" by using:
message.find("!##")
Then use that answer instead of 7. You should also check for it equalling std::string::npos which indicates that the substring was not found, and take some different action.
string msg = "MC4,512,541,3123!##";
for (int i = 2; i < msg.length() - 3; i++) {
if (msg[i] != '!' && msg[i + 1] != '#' && msg[i + 2] != '#')
cout << msg[i];
}
or use char[]
char msg[] = "MC4,123,54!##";
sizeof(msg -1 ); //instead of msg.length()
// -1 for the null byte at the end (each char takes 1 byte so the size -1 == number of chars)

Boost regex don't match tabs

I'm using boost regex_match and I have a problem with matching no tab characters.
My test application looks as follows:
#include <iostream>
#include <string>
#include <boost/spirit/include/classic_regex.hpp>
int
main(int args, char** argv)
{
boost::match_results<std::string::const_iterator> what;
if(args == 3) {
std::string text(argv[1]);
boost::regex expression(argv[2]);
std::cout << "Text : " << text << std::endl;
std::cout << "Regex: " << expression << std::endl;
if(boost::regex_match(text, what, expression, boost::match_default) != 0) {
int i = 0;
std::cout << text;
if(what[0].matched)
std::cout << " matches with regex pattern!" << std::endl;
else
std::cout << " does not match with regex pattern!" << std::endl;
for(boost::match_results<std::string::const_iterator>::const_iterator it=what.begin(); it!=what.end(); ++it) {
std::cout << "[" << (i++) << "] " << it->str() << std::endl;
}
} else {
std::cout << "Expression does not match!" << std::endl;
}
} else {
std::cout << "Usage: $> ./boost-regex <text> <regex>" << std::endl;
}
return 0;
}
If I run the program with these arguments, I don't get the expected result:
$> ./boost-regex "`cat file`" "(?=.*[^\t]).*"
Text : This text includes some tabulators
Regex: (?=.*[^\t]).*
This text includes some tabulators matches with regex pattern!
[0] This text includes some tabulators
In this case I would have expected that what[0].matched is false, but it's not.
Is there any mistake in my regular expression?
Or do I have to use other format/match flag?
Thank you in advance!
I am not sure what you want to do. My understanding is, you want the regex to fail as soon as there is a tab in the text.
Your positive lookahead assertion (?=.*[^\t]) is true as soon as it finds a non tab, and there are a lot of non tabs in your text.
If you want it to fail, when there is a tab, go the other way round and use a negative lookahead assertion.
(?!.*\t).*
this assertion will fail as soon as it find a tab.