QRegEx help, RegEx in general - c++

I'm in the process of attempting to learn RegEx. I've been tasked with generating a QPixmap out of several hundred *.png files. Ideally, it would be a PixMap matrix.
I think that QRegEx is the best way to perform this action so I can insert the pixmaps into a matrix without having to sort.
My pattern I'm trying to match:
runner_(int)_(int).png
Where the first integer has bounds [-1, 13] and the second [00, 20]. There is a leading zero on the second integer.
This is my code attempt:
// find the png files in the thing
QDir fileDir(iconPath);
QFileInfoList fileList = fileDir.entryInfoList();
QRegExp rxlen("runner_([^\\_]{1,1}])_([^\\_]{1,1}]).png");
foreach (const QFileInfo &info, fileList) {
qDebug() << info.fileName();
int pos = rxlen.indexIn(info.fileName());
if (pos > 1) {
qDebug() << rxlen.cap(1);
qDebug() << rxlen.cap(2);
} else {
qDebug() << "Didn't find any";
}
}
My question: Please help with the RegEx expression.
Please be gentle, I'm new to RegEx (started learning it about an hour ago!)
Thanks :)

{1,1} is absolutely useless, means something that's used between 1 and 1 times, ie once. You can just write the element in the string.
Since you already have your pattern down all nice and proper, you can just build the regex straight from it:
runner_(-1|[0-9]|0+[0-9]|0*1[0123])_([0-9]|0+[0-9]|0*1[0-9]|20)\.png
Basically just writing patterns for all numbers in your range.
Edited to escape the dot.
Edited again to allow leading zeroes.

Related

How to speed up regex searching for large quantity of potentially large files in C++?

I'm trying to make a program to read user inputted wildcard files and wildcard strings using an excel document as a configuration file. For example the user may be able to enter in C:\Read*.txt, and any files in the C drive that start with Read and then any characters after read and are text files will be included in the search.
They could search for Message: * and all strings beginning with "Message: " and ending with any sequence of characters would get matched.
So far it is a working program but the problem is that the speed efficiency is quite terrible and I need it to be able to search very large files. I'm using a filestream and the regex class to do so and I'm not sure what is taking so much time.
The bulk of the time in my code is being spent in the following loop (I've only included the lines above the while loop so you can better understand what I'm trying to do):
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++){expressions.emplace_back(regex_patterns.at(i));}
auto startTimer = high_resolution_clock::now();
// Open file and begin reading
ifstream stream1(filePath);
if (stream1.is_open())
{
int count = 0;
while (getline(stream1, line))
{
// Continue to next step if line is empty, no point in searching it.
if (line.size() == 0)
{
// Continue to next step if line is empty, no point in searching it.
continue;
}
// Loop through each search string, if match, save line number and line text,
for (int i = 0; i < expressions.size(); i++)
{
size_t found = regex_search(line, matches, expressions.at(i));
if (found == 1)
{
lineNumb.push_back(count);
lineTextToSave.push_back(line);
}
}
count = count + 1;
}
}
auto stopTimer = high_resolution_clock::now();
auto duration2 = duration_cast<milliseconds>(stopTimer - startTimer);
cout << "Time to search file: " << duration2.count() << "\n";
Is there a better method of searching files than this? I tried looking up many things but haven't found a programmatic example that I've understood thus far.
Some ideas by order of priority:
You could join all the regex patterns together to form a single regex instead of matching r regexes on each line. This will speed up your program by a factor of r. Example: (R1)|(R2)|(...)|(Rr)
Ensure you are compiling the regex before usage.
Do not add the final .* to your regex pattern.
Some ideas but non-portable:
Memory map the file instead of reading through iostreams
Consider if it is worth reimplementing grep instead of calling to grep through popen()

How to search a string for multiple substrings

I need to check a short string for matches with a list of substrings. Currently, I do this like shown below (working code on ideone)
bool ContainsMyWords(const std::wstring& input)
{
if (std::wstring::npos != input.find(L"white"))
return true;
if (std::wstring::npos != input.find(L"black"))
return true;
if (std::wstring::npos != input.find(L"green"))
return true;
// ...
return false;
}
int main() {
std::wstring input1 = L"any text goes here";
std::wstring input2 = L"any text goes here black";
std::cout << "input1 " << ContainsMyWords(input1) << std::endl;
std::cout << "input2 " << ContainsMyWords(input2) << std::endl;
return 0;
}
I have 10-20 substrings that I need to match against an input. My goal is to optimize code for CPU utilization and reduce time complexity for an average case. I receive input strings at a rate of 10 Hz, with bursts to 10 kHz (which is what I am worried about).
There is agrep library with source code written in C, I wonder if there is a standard equivalent in C++. From a quick look, it may be a bit difficult (but doable) to integrate it with what I have.
Is there a better way to match an input string against a set of predefined substrings in C++?
The best thing is to use a regular expression search, if you use the following regular expression:
"(white)|(black)|(green)"
that way, with only one pass over the string, you'll get in group 1 if a match was found for the "white" substring (and beginning and end points), in group 2 if a match of the "black" substring (and beginning and end points), and in group 3 if a match of the "green" substring. As you get, from group 0 the position of the end of the match, you can begin a new search to look for more matches, and everything in one pass over the string!!!
You could use one big if, instead of several if statements. However, Nathan's Oliver solution with std::any_of is faster than that though, when making the array of the substrings static (so that they do not get to be recreated again and again), as shown below.
bool ContainsMyWordsNathan(const std::wstring& input)
{
// do not forget to make the array static!
static std::wstring keywords[] = {L"white",L"black",L"green", ...};
return std::any_of(std::begin(keywords), std::end(keywords),
[&](const std::wstring& str){return input.find(str) != std::string::npos;});
}
PS: As discussed in Algorithm to find multiple string matches:
The "grep" family implement the multi-string search in a very efficient way. If you can use them as external programs, do it.

How to organize or extract info from a QByteArray

I have a programm that recieves a full block in a single QByteArray. This block is "divided" with 'carriage returns' followed by 'end lines' (\r\n). In the middle of all this junk I have a date. Most specifically in the third line (between the second and the third \r\n).
Every time I try to extract this date from the ByteArray I end up with some random junk. How to be more precise with the QByteArray?
What is the best way of extracting this date without altering my ByteArray? Take in consideration that I don't know the date and it can even be in the wrong format.
Just for understanding purposes, here is an example of my ByteArray:
RandomName=name\r\nRandomID=ID\r\nRandomDate=date\r\nRandomTime=time\r\nRandomWhatever=whatever(...)
EDIT:
Sorry for bad english.
Let's say I have the following text sent to me:
ProgName = Marcus
ProgID = 180
ProgDate = 15.01.16
ProgTime = 13:39
(More info)......
However, none of this information is useful to me... except the Date. Everything was stored in a single QByteArray (Let's call it 'ba'). So this is my ba:
ProgName(space)=(space)Marcus\r\nProgID(space)=(space)180\r\nProgDate(space)=(space)15.01.16\r\nProgTime(space)=(space)13:39\r\n (keeps going)
My problem is: Storing "15.01.16" (the "ProgDate") in a QString without altering or destroying ba.
There are a variety of ways, but try one of the following solutions.
1) using split()
foreach (auto subByte, yourByteArray.replace("\r\n", "\n").split('\n')) {
qDebug() << subByte;
foreach (auto val, subByte.split('=')) {
qDebug() << val;
}
}
2) using QRegularExpression/QRegularExpressionMatchIterator, making all pair(key, value)
QRegularExpression re("(\\w+)=(\\w+)");
QRegularExpressionMatchIterator i = re.globalMatch(yourByteArray);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);
}
3) using QRegularExpression/QRegularExpressionMatch
QRegularExpression re("(RandomDate)=(\\w+)");
QRegularExpressionMatch match = re.match(yourByteArray);
if (match.hasMatch())
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);

Qt Using QRegularExpression multiline option

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?
The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp
you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

Comparing regex in qt

I have a regex which I hope means any file with extension listed:
((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))
How to compare it in Qt against selected file?
Your actual RegEx itself doesn't have double backslashes (just when you fit it into a string literal). And you'll need some kind of wildcard if you want to use it to match full filenames. There's a semantic issue of whether you want a file called just ".cpp" to match or not. What about case sensitivity?
I'll assume for the moment that you want at least one other character in the beginning and use .+:
.+((\.cpp$)|(\.cxx$)|(\.c$)|(\.hpp$)|(\.h$))
So this should work:
QRegExp rx (".+((\\.cpp$)|(\\.cxx$)|(\\.c$)|(\\.hpp$)|(\\.h$))");
bool isMatch = rx.exactMatch(filename);
But with the expressive power of a whole C++ compiler at your beck and call, it can be a bit stifling to use regular expressions. You might have an easier time adapting code if you write it more like:
bool isMatch = false;
QStringList fileExtensionList;
fileExtensionList << "CPP" << "CXX" << "C" << "HPP" << "H";
QStringList splitFilenameList = filename.split(".");
if(splitFilenameList.size() > 1) {
QString fileExtension = splitFilenameList[splitFilenameList.size() - 1];
isMatch = fileExtensionList.contains(fileExtension.toUpper()));
}