Comparing regex in qt - c++

I have a regex which I hope means any file with extension listed:
((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))
How to compare it in Qt against selected file?

Your actual RegEx itself doesn't have double backslashes (just when you fit it into a string literal). And you'll need some kind of wildcard if you want to use it to match full filenames. There's a semantic issue of whether you want a file called just ".cpp" to match or not. What about case sensitivity?
I'll assume for the moment that you want at least one other character in the beginning and use .+:
.+((\.cpp$)|(\.cxx$)|(\.c$)|(\.hpp$)|(\.h$))
So this should work:
QRegExp rx (".+((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))");
bool isMatch = rx.exactMatch(filename);
But with the expressive power of a whole C++ compiler at your beck and call, it can be a bit stifling to use regular expressions. You might have an easier time adapting code if you write it more like:
bool isMatch = false;
QStringList fileExtensionList;
fileExtensionList << "CPP" << "CXX" << "C" << "HPP" << "H";
QStringList splitFilenameList = filename.split(".");
if(splitFilenameList.size() > 1) {
QString fileExtension = splitFilenameList[splitFilenameList.size() - 1];
isMatch = fileExtensionList.contains(fileExtension.toUpper()));
}

Related

Extract string matching a specific format

Given a QString, I want to extract a substring from the main string input.
e.g. I have a QString reading something like:
\\\\?\\Volume{db41aa6a-c0b8-11e9-bc8a-806e6f6e6963}\\
I need to extract the string (if a string with the format exists) using a template/format matching a regex format (\w){8}([-](\w){4}){3}[-](\w){12} as shown below:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
and it should return
db41aa6a-c0b8-11e9-bc8a-806e6f6e6963
if found, else an empty QString.
Currently, I can achieve this by doing something like:
string.replace("{", "").replace("}", "").replace("\\", "").replace("?", "").replace("Volume", "");
But this is tedious and inefficient, and tailored to a specific request.
Is there a generalized function that enables me to extract a substring using a regex format or other?
Update
To clarity after #Emma's answer, I want e.g. QString::extract("(\w){8}([-](\w){4}){3}[-](\w){12}") which returns db41aa6a-c0b8-11e9-bc8a-806e6f6e6963.
Here's a bunch of ways to extract part of a string as presented in the question. I don't know how much of the string format is fixed vs. variable, so possibly not all of these examples would be practical. Also some examples below are using QStringRef class which can be more efficient but must have the original string (the one being referenced) available while any references are active (see warning in docs).
const QString str("\\\\?\\Volume{db41aa6a-c0b8-11e9-bc8a-806e6f6e6963}\\");
// Treat str as a list delimited by "{" and "}" chars.
const QString sectResult = str.section('{', 1, 1).section('}', 0, 0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QString sectRxResult = str.section(QRegExp("\\{|\\}"), 1, 1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Example using QStringRef, though this could also be just QString::split() which returns QString copies.
const QVector<QStringRef> splitRef = str.splitRef(QRegExp("\\{|\\}"));
const QStringRef splitRefResult = splitRef.value(1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Use regular expressions to find/extract matching string
const QRegularExpression rx("\\w{8}(?:-(\\w){4}){3}-\\w{12}"); // match a UUID string
const QRegularExpressionMatch match = rx.match(str);
const QString rxResultStr = match.captured(0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QStringRef rxResultRef = match.capturedRef(0); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
const QRegularExpression rx2(".+\\{([^{\\}]+)\\}.+"); // capture anything inside { } brackets
const QRegularExpressionMatch match2 = rx2.match(str);
const QString rx2ResultStr = match2.captured(1); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
// Make a copy for replace so that our references to the original string remain valid.
const QString replaceResult = QString(str).replace(rx2, "\\1"); // = "db41aa6a-c0b8-11e9-bc8a-806e6f6e6963"
qDebug() << sectResult << sectRxResult << splitRefResult << rxResultStr
<< rxResultRef << rx2ResultStr << replaceResult;
Maybe,
Volume{(\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b)}
or just,
\b[0-9a-f]{8}\b-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-\b[0-9a-f]{12}\b
for a full match might be a bit closer.
If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.
RegEx Circuit
jex.im visualizes regular expressions:
Source
Searching for UUIDs in text with regex

How to search a string for multiple substrings

I need to check a short string for matches with a list of substrings. Currently, I do this like shown below (working code on ideone)
bool ContainsMyWords(const std::wstring& input)
{
if (std::wstring::npos != input.find(L"white"))
return true;
if (std::wstring::npos != input.find(L"black"))
return true;
if (std::wstring::npos != input.find(L"green"))
return true;
// ...
return false;
}
int main() {
std::wstring input1 = L"any text goes here";
std::wstring input2 = L"any text goes here black";
std::cout << "input1 " << ContainsMyWords(input1) << std::endl;
std::cout << "input2 " << ContainsMyWords(input2) << std::endl;
return 0;
}
I have 10-20 substrings that I need to match against an input. My goal is to optimize code for CPU utilization and reduce time complexity for an average case. I receive input strings at a rate of 10 Hz, with bursts to 10 kHz (which is what I am worried about).
There is agrep library with source code written in C, I wonder if there is a standard equivalent in C++. From a quick look, it may be a bit difficult (but doable) to integrate it with what I have.
Is there a better way to match an input string against a set of predefined substrings in C++?
The best thing is to use a regular expression search, if you use the following regular expression:
"(white)|(black)|(green)"
that way, with only one pass over the string, you'll get in group 1 if a match was found for the "white" substring (and beginning and end points), in group 2 if a match of the "black" substring (and beginning and end points), and in group 3 if a match of the "green" substring. As you get, from group 0 the position of the end of the match, you can begin a new search to look for more matches, and everything in one pass over the string!!!
You could use one big if, instead of several if statements. However, Nathan's Oliver solution with std::any_of is faster than that though, when making the array of the substrings static (so that they do not get to be recreated again and again), as shown below.
bool ContainsMyWordsNathan(const std::wstring& input)
{
// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};
return std::any_of(std::begin(keywords), std::end(keywords),
[&](const std::wstring& str){return input.find(str) != std::string::npos;});
}
PS: As discussed in Algorithm to find multiple string matches:
The "grep" family implement the multi-string search in a very efficient way. If you can use them as external programs, do it.

Qt Using QRegularExpression multiline option

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?
The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp
you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

How to match absolute value using regex

I am having trouble with absolute value in regex in C++. This is what I have as the pattern:
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|"); // load -|M(x)|
I am trying to use std::tr1::regex_match( IR, result, loadNM ) to match. But it is not matching anything, even though it should be.
I'm using Visual Stuido 2010 compilier
shortened version of program (included above is iostream and regex)
int main()
{
std::string IR = "load -|M(x)|";
std::smatch result;
std::tr1::regex loadAbsNM("load -|M\\((\\d+)\\)|");
if( std::tr1::regex_match( IR , result, loadAbsNM ) )
{
int x = 2;
std::cout << "matched!" << std::endl;
}
else
{
std::cout << "!UNABLE TO DECODE INSTRUCTION!" << std::endl;
}
}
output produced
!UNABLE TO DECODE INSTRUCTION!
Note that from your code, you're not going to have a match. The letter x won't match the regex \d+.
Also, I'm not too sure whether you need a backslash in front of the pipe character. As you may know, pipe (|) is used to separate possible entries: (a|b) means a or b.
Finally, since their is a pipe at the end, the expression matches the empty string which is often a bad idea.
I would suggest something like this:
"load -\\|M\\((\\d+)\\)\\|"
But that won't match:
"load -|M(x)|"
You'd need to use a number instead of 'x' as in:
"load -|M(123)|"

Regex to filter strings

I need to filter strings based on two requirements
1) they must start with "city_date"
2) they should not have "metro" anywhere in the string.
This need to be done in just one check.
To start I know it should be like this but dont know hoe to eliminate strings with "metro"
string pattern = "city_date_"
Added: I need to use the regex for a SQL LIKE statement. hence i need it in a string.
Use a negative lookahead assertion (I don't know if this is supported in your regex lib)
string pattern = "^city_date(?!.*metro)"
I also added an anchor ^ at the start, that will match the start of the string.
The negative lookahead assertion (?!.*metro) will fail, if there is the string "metro" somewhere ahead.
Regular expressions are usually far more expensive than direct comparisons. If direct comparisons can easily express the requirements, use them. This problem doesn't need the overhead of a regular expression. Just write the code:
std::string str = /* whatever */
const std::string head = "city_date";
const std::string exclude = "metro";
if (str.compare(head, 0, head.size) == 0 && str.find(exclude) == std::string::npos) {
// process valid string
}
by using javascript
input="contains the string your matching"
var pattern=/^city_date/g;
if(pattern.test(input)) // to match city_data at the begining
{
var patt=/metro/g;
if(patt.test(input)) return "false";
else return input; //matched string without metro
}
else
return "false"; //unable to match city_data