How to organize or extract info from a QByteArray - c++

I have a programm that recieves a full block in a single QByteArray. This block is "divided" with 'carriage returns' followed by 'end lines' (\r\n). In the middle of all this junk I have a date. Most specifically in the third line (between the second and the third \r\n).
Every time I try to extract this date from the ByteArray I end up with some random junk. How to be more precise with the QByteArray?
What is the best way of extracting this date without altering my ByteArray? Take in consideration that I don't know the date and it can even be in the wrong format.
Just for understanding purposes, here is an example of my ByteArray:
RandomName=name\r\nRandomID=ID\r\nRandomDate=date\r\nRandomTime=time\r\nRandomWhatever=whatever(...)
EDIT:
Sorry for bad english.
Let's say I have the following text sent to me:
ProgName = Marcus
ProgID = 180
ProgDate = 15.01.16
ProgTime = 13:39
(More info)......
However, none of this information is useful to me... except the Date. Everything was stored in a single QByteArray (Let's call it 'ba'). So this is my ba:
ProgName(space)=(space)Marcus\r\nProgID(space)=(space)180\r\nProgDate(space)=(space)15.01.16\r\nProgTime(space)=(space)13:39\r\n (keeps going)
My problem is: Storing "15.01.16" (the "ProgDate") in a QString without altering or destroying ba.

There are a variety of ways, but try one of the following solutions.
1) using split()
foreach (auto subByte, yourByteArray.replace("\r\n", "\n").split('\n')) {
qDebug() << subByte;
foreach (auto val, subByte.split('=')) {
qDebug() << val;
}
}
2) using QRegularExpression/QRegularExpressionMatchIterator, making all pair(key, value)
QRegularExpression re("(\\w+)=(\\w+)");
QRegularExpressionMatchIterator i = re.globalMatch(yourByteArray);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);
}
3) using QRegularExpression/QRegularExpressionMatch
QRegularExpression re("(RandomDate)=(\\w+)");
QRegularExpressionMatch match = re.match(yourByteArray);
if (match.hasMatch())
qDebug() << match.captured(0)<< match.captured(1) << match.captured(2);

Related

Qt Regular Expression Escape Sequence Problem

I'm struggling to get a regular expression implemented. I'm using Qt creator on an Ubuntu system. I tested my regex against an example number with a 3rd party tool. So I believe the problem is not with the expression.
My desired reg ex:
/\b(9410 ?\d{18})\b/i
I am putting the regex string into a QString variable. Which results in an error:
QString test = "/\b(9410 ?\d{18})\b/i"; unknown escape sequence '\d'
In an attempt to fix, I add an extra \ at the point of the error:
QString test = "/\b(9410 ?\\d{18})\b/i";
qWarning() << test;
Debugger indicates (note the \\):
/\b(9410 ?\\d{18})\b/i
I also tried a raw string:
QString test = R"(/\b(9410 ?\d{18})\b/i)";
qWarning() << test;
Debugger shows all single \ replaced with \\.
/\\b(9410 ?\\d{18})\\b/i
None of these attempts has resulted in a working reg ex. There is something fishy going on with the back slashes. Appreciate your thoughts. I must be missing something simple...
EDIT: Here is some simplified code. When I run this it returns "FALSE" indicating no match. I tested this regex and number at regex101.com. Works there. That's why I believe something is flawed in my implementation. Just can't put my finger on it.
QRegularExpression re;
QString test = R"(/\b(9410 ?\d{18})\b/i)";
re.setPattern(test);
if(re.match("9410811298370146293071").hasMatch())
{
qWarning() << "TRUE";
}
else {
qWarning() << "FALSE";
}
Cleaned up the regex and it now matches.
QRegularExpression re;
QString test = R"(9410 ?\d{18})";
re.setPattern(test);
if(re.match("9410811298370146293071").hasMatch())
{
qWarning() << "TRUE";
}
else {
qWarning() << "FALSE";
}

Qt Using QRegularExpression multiline option

I'm writing a program that use QRegularExpression and MultilineOption, I wrote this code but matching stop on first line. Why? Where am I doing wrong?
QString recv = "AUTH-<username>-<password>\nINFO-ID:45\nREG-<username>-<password>-<name>-<status>\nSEND-ID:195-DATE:12:30 2/02/2015 <esempio>\nUPDATEN-<newname>\nUPDATES-<newstatus>\n";
QRegularExpression exp = QRegularExpression("(SEND)-ID:(\\d{1,4})-DATE:(\\d{1,2}):(\\d) (\\d{1,2})\/(\\d)\/(\\d{2,4}) <(.+)>\\n|(AUTH)-<(.+)>-<(.+)>\\n|(INFO)-ID:(\\d{1,4})\\n|(REG)-<(.+)>-<(.+)>-<(.+)>-<(.+)>\\n|(UPDATEN)-<(.+)>\\n|(UPDATES)-<(.+)>\\n", QRegularExpression::MultilineOption);
qDebug() << exp.pattern();
QRegularExpressionMatch match = exp.match(recv);
qDebug() << match.lastCapturedIndex();
for (int i = 0; i <= match.lastCapturedIndex(); ++i) {
qDebug() << match.captured(i);
}
Can someone help me?
The answer is you should use .globalMatch method rather than .match.
See QRegularExpression documentation on that:
Attempts to perform a global match of the regular expression against
the given subject string, starting at the position offset inside the
subject, using a match of type matchType and honoring the given
matchOptions. The returned QRegularExpressionMatchIterator is
positioned before the first match result (if any).
Also, you can remove the QRegularExpression::MultilineOption option as it is not being used.
Sample code:
QRegularExpressionMatchIterator i = exp.globalMatch(recv);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
Actually I google'd this question having similar issue, but I couldn't agree completely with an answer, as I think most of the questions about multi-line matching with new QRegularExpression can be answered as following:
use QRegularExpression::DotMatchesEverythingOption option which allows (.) to match newline characters. Which is extremely useful then porting from QRegExp
you got an or Expression and the first one is true, job is done.
you need to split the string and loop the array to compare with this Expression will work i think.
If the data every times have the same struct you can use something like this:
"(AUTH)-<([^>]+?)>-<([^>]+?)>\\nINFO-ID:(\\d+)\\n(REG)-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>-<([^>]+?)>\\n(SEND)-ID:(\\d+)-DATE:(\\d+):(\\d+) (\\d+)/(\\d+)/(\\d+) <([^>]+?)>\\n(UPDATEN)-<([^>]+?)>\\n(UPDATES)-<([^>]+?)>"
21 Matches

JSON parser that can handle large input (2 GB)?

So far, I've tried (without success):
QJsonDocument – "document too large" (looks like the max size is artificially capped at 1 << 27 bytes)
Boost.PropertyTree – takes up 30 GB RAM and then segfaults
libjson – takes up a few gigs of RAM and then segfaults
I'm gonna try yajl next, but Json.NET handles this without any issues so I'm not sure why it should be such a big problem in C++.
Check out https://github.com/YasserAsmi/jvar. I have tested it with a large database (SF street data or something, which was around 2GB). It was quite fast.
Well, I'm not proud of my solution, but I ended up using some regex to split my data up into top-level key-value pairs (each one being only a few MB), then just parsed each one of those pairs with Qt's JSON parser and passed them into my original code.
Yajl would have been exactly what I needed for something like this, but I went with the ugly regex hack because:
Fitting my logic into Yajl's callback structure would have involved rewriting enough of my code to be a pain, and this is just for a one-off MapReduce job so the code itself doesn't matter long-term anyway.
The data set is controlled by me and guaranteed to always work with my regex.
For various reasons, adding dependencies to Elastic MapReduce deployments is a bigger hassle than it should be (and static Qt compilation is buggy), so for the sake of not doing more work than necessary I'm inclined to keep dependencies to a minimum.
This still works and performs well (both time-wise and memory-wise).
Note that the regex I used happens to work for my data specifically because the top-level keys (and only the top level keys) are integers; my code below is not a general solution, and I wouldn't ever advise a similar approach over a SAX-style parser where reasons #1 and #2 above don't apply.
Also note that this solution is extra gross (splitting and manipulating JSON strings before parsing + special cases for the start and end of the data) because my original expression that captured the entire key-value pairs broke down when one of the pairs happened to exceed PCRE's backtracking limit (it's incredibly annoying in this case that that's even a thing, especially since it's not configurable through either QRegularExpression or grep).
Anyway, here's the code; I am deeply ashamed:
QFile file( argv[1] );
file.open( QIODevice::ReadOnly );
QTextStream textStream( &file );
QString jsonKey;
QString jsonString;
QRegularExpression jsonRegex( "\"-?\\d+\":" );
bool atEnd = false;
while( atEnd == false )
{
QString regexMatch = jsonRegex.match
(
jsonString.append( textStream.read(1000000) )
).captured();
bool isRegexMatched = regexMatch.isEmpty() == false;
if( isRegexMatched == false )
{
atEnd = textStream.atEnd();
}
if( atEnd || (jsonKey.isEmpty() == false && isRegexMatched) )
{
QString jsonObjectString;
if( atEnd == false )
{
QStringList regexMatchSplit = jsonString.split( regexMatch );
jsonObjectString = regexMatchSplit[0]
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
jsonObjectString = jsonObjectString
.left( jsonObjectString.size() - 1 )
.append( RIGHT_BRACE )
;
jsonKey = regexMatch;
jsonString = regexMatchSplit[1];
}
else
{
jsonObjectString = jsonString
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
}
QJsonObject jsonObject = QJsonDocument::fromJson
(
jsonObjectString.toUtf8()
).object();
QString key = jsonObject.keys()[0];
... process data and store in boost::interprocess::map ...
}
else if( isRegexMatched )
{
jsonKey = regexMatch;
jsonString = jsonString.split( regexMatch )[1];
}
}
I've recently finished (probably still a bit beta) such a library:
https://github.com/matiu2/json--11
If you use the json_class .. it'll load it all into memory, which is probably not what you want.
But you can parse it sequentially by writing your own 'mapper'.
The included mapper, iterates through the JSON, mapping the input to JSON classes:
https://github.com/matiu2/json--11/blob/master/src/mapper.hpp
You could write your own that does whatever you want with the data, and feed a file stream into it, so as not to load the whole lot into memory.
So as an example to get you started, this just outputs the json data in some random format, but doesn't fill up the memory any (completely untested nor compiled):
#include "parser.hpp"
#include <fstream>
#include <iterator>
#include <string>
int main(int argc, char **) {
std::ifstream file("hugeJSONFile.hpp");
std::istream_iterator<char> input(file);
auto parser = json::Parser(input);
using Parser = decltype(parser);
using std::cout;
using std::endl;
switch (parser.getNextType()) {
case Parser::null:
parser.readNull();
cout << "NULL" << endl;
return;
case Parser::boolean:
bool val = parser.readBoolean();
cout << "Bool: " << val << endl;
case Parser::array:
parser.consumeOneValue();
cout << "Array: ..." << endl;
case Parser::object:
parser.consumeOneValue();
cout << "Map: ..." << endl;
case Parser::number: {
double val = parser.readNumber<double>();
cout << "number: " << val << endl;
}
case Parser::string: {
std::string val = parser.readString();
cout << "string: " << val << endl;
}
case Parser::HIT_END:
case Parser::ERROR:
default:
// Should never get here
throw std::logic_error("Unexpected error while parsing JSON");
}
return 0;
}
Addendum
Originally I had planned for this library to never copy any data. eg. read a string just gave you a start and end iterator to the string data in the input, but because we actually need to decode the strings, I found that methodology too impractical.
This library automatically converts \u0000 codes in JSON to utf8 encoding in standard strings.
When dealing with records you can for example format your json and use the newline as a separator between objects, then parse each line separately eg:
"records": [
{ "someprop": "value", "someobj": { ..... } ... },
.
.
.
or:
"myobj": {
"someprop": { "someobj": {}, ... },
.
.
.
I just faced the same problem with Qt's 5.12 JSON support. Fortunately starting with Qt 5.15 (64 Bit) reading of large JSON files (I tested 1GB files) works flawlessly.

QRegEx help, RegEx in general

I'm in the process of attempting to learn RegEx. I've been tasked with generating a QPixmap out of several hundred *.png files. Ideally, it would be a PixMap matrix.
I think that QRegEx is the best way to perform this action so I can insert the pixmaps into a matrix without having to sort.
My pattern I'm trying to match:
runner_(int)_(int).png
Where the first integer has bounds [-1, 13] and the second [00, 20]. There is a leading zero on the second integer.
This is my code attempt:
// find the png files in the thing
QDir fileDir(iconPath);
QFileInfoList fileList = fileDir.entryInfoList();
QRegExp rxlen("runner_([^\\_]{1,1}])_([^\\_]{1,1}]).png");
foreach (const QFileInfo &info, fileList) {
qDebug() << info.fileName();
int pos = rxlen.indexIn(info.fileName());
if (pos > 1) {
qDebug() << rxlen.cap(1);
qDebug() << rxlen.cap(2);
} else {
qDebug() << "Didn't find any";
}
}
My question: Please help with the RegEx expression.
Please be gentle, I'm new to RegEx (started learning it about an hour ago!)
Thanks :)
{1,1} is absolutely useless, means something that's used between 1 and 1 times, ie once. You can just write the element in the string.
Since you already have your pattern down all nice and proper, you can just build the regex straight from it:
runner_(-1|[0-9]|0+[0-9]|0*1[0123])_([0-9]|0+[0-9]|0*1[0-9]|20)\.png
Basically just writing patterns for all numbers in your range.
Edited to escape the dot.
Edited again to allow leading zeroes.

QString splitting multiple delimiters

I'm having trouble splitting a QString properly. Unless I'm mistaken, for multiple delimiters I need a regex, and I can't seem to figure out an expression as I'm quite new to them.
the string is text input from a file:
f 523/845/1 524/846/2 562/847/3 564/848/4
I need each number seperately to put into an array.
Some codes....
QStringList x;
QString line = in.readLine();
while (!line.isNull()) {
QRegExp sep("\\s*/*");
x = line.split(sep);
Any pointers?
Cheers
Change your regular expression like this:
QRegExp sep("(\\s+|/)");
then x will have every number.
I found it quite useful to try out RegEx's interactively. Nowadays there are a lot of online tools even, for example: http://gskinner.com/RegExr/
You can put your search text there and play with the RegEx to see what is matched when.
You could use the strtok function, which split a QString with one or more different tokens.
It would be like this:
QString a = "f 523/845/1 524/846/2 562/847/3 564/848/4";
QByteArray ba = a.toLocal8Bit();
char *myString = ba.data();
char *p = strtok(myString, " /");
while (p) {
qDebug() << "p : " << p;
p = strtok(NULL, " /");
}
You can set as many tokens as you need. For further info visit the cplusplus page of this particular function. http://www.cplusplus.com/reference/cstring/strtok/
Regards!.