How to QRegExp "[propertyID="anything"] "? - regex

I am parsing a file which contains following packets:
[propertyID="123000"] {
fillColor : #f3f1ed;
minSize : 5;
lineWidth : 3;
}
To scan just this [propertyID="123000"] fragment I havre this QRegExp
QRegExp("^\b\[propertyID=\"c+\"\]\b");
but that does not work? Here I have example code to parse that file above:
QRegExp propertyIDExp= QRegExp("\\[propertyID=\".*\"]");
propertyIDExp.setMinimal(true);
QFile inputFile(fileName);
if (inputFile.open(QIODevice::ReadOnly))
{
QTextStream in(&inputFile);
while (!in.atEnd())
{
QString line = in.readLine();
// if does not catch if line is for instance
// [propertyID="123000"] {
if( line.contains(propertyIDExp) )
{
//.. further processing
}
}
inputFile.close();
}

QRegExp("\\[propertyID=\".+?\"\\]")
You can use ..It will match any character except newline.Also use +? to make it non greedy or it will stop at the last instance of " in the same line

Use the following expression:
QRegExp("\\[propertyID=\"\\d+\"]");
See regex demo
In Qt regex, you need to escape regex special characters with double backslashes, and to match digits, you can use the shorthand class \d. Also, \b word boundary prevented your regex from matching since it cannot match between the string start and [ and between ] and a space (or use \B instead).
To match anything in between quotes, use a negated character class:
QRegExp("\\[propertyID=\"[^\"]*\"]");
See another demo
As an alternative, you can use lazy dot matching with the help of .* and QRegExp::setMinimal():
QRegExp rx("\\[propertyID=\".*\"]");
rx.setMinimal(true);
In Qt, . matches any character including a newline, so please be careful with this option.

Related

how can extract the name from a line

Assume that I have a line from a file that I want to read:
>NZ_FNBK01000055.1 Halorientalis regularis
So how can extract the name from that line that begins with a greater than sign; everything following the greater-than sign (and excluding the newline at the end of the line) is the name.
The name should be:
NZ_FNBK01000055.1 Halorientalis regularis
Here is my code so far:
bool file::load(istream& file)
{
string line;
while(getline(genomeSource, line)){
if(line.find(">") != string::npos)
{
m_name =
}
}
return true;
}
You could easily handle both conditions using regular expressions. c++ introduced <regex> in c++11. Using this and a regex like:
>.*? (.*?) .*$
> Get the literal character
.*? Non greedy search for anything stopping at a space
(.*?) Non greedy search sor anything stopping at a space but grouping the characters before hand.
.*$ Greedy search until the end of the string.
With this you can easily check if this line meets your criteria and get the name at the same time. Here is a test showing it working. For the code, the c++11 regex lib is very simple:
std::string s = ">NZ_FNBK01000055.1 Halorientalis regularis ";
std::regex rgx(">.*? (.*?) .*$"); // Make the regex
std::smatch matches;
if(std::regex_search(s, matches, rgx)) { // Do a search
if (matches.size() > 1) { // If there are matches, print them.
std::cout << "The name is " << matches[1].str() << "\n";
}
}
Here is a live example.

Regex, ignore matches that might occur inside a string

Say for example I have the test string:
this is text "This is a quote { containing } some characters" blah blah { inside }
I would like to match every pair of curly brackets and the text in between using the expression
\{[^{]*?\}
but ignore any matches that might occur inside of a string, namely the { containing } portion of the string, or even be able to match only { text } of the following test string
more text "text text { { { } " { text } words
Well this works:
{[^}]*}(?=(?:[^"]*"[^"]*"[^"]*)*$)
But I'm not sure that it's bullet proof. You can view it online:
{[^}]*} get the curly content
(?=(?:[^"]*"[^"]*"[^"]*)*$) ensure that it's followed by an even number of ".
Note: This regex doesn't take account of escaped double quotes.

Qt5 Qregexp : why my pattern can't work?

I get this problem When I open a text file, I can't get any matched string. Then I test this pattern: .* but I can either get nothing. I'm sure the text file can be read, and the pattern can be accepted in grep. Thank you.
QList<Nmap_result> ans;
QFile file(path);
if(!file.open(QFile::ReadOnly|QFile::Text))
{
exit(1);
}
QString text = file.readAll();
QRegExp reg(QRegExp::escape(".*"));
reg.indexIn(text);
qDebug()<<reg.capturedTexts().join("|")<<endl<<reg.captureCount()<<endl;
Sorry, I should not use escape. But when I change it like this:
QString text = file.readAll();
qDebug()<<text<<endl;
QRegExp reg("[0-9]");
//reg.indexIn(text); //first bind expr test
reg.exactMatch(text); //second bind expr test
qDebug()<<reg.capturedTexts().join("|!!!!!|")<<endl<<reg.captureCount()<<endl;
I use
reg.indexIn(text);
to bind this string to regexp, it return a number,but when I use the next expr
reg.exeacMatch(text);
I get nothing.
Why do you call QRegExp::escape method ?
Try this instead:
QRegExp reg(".*");
Calling QRegExp::escape, your regular expression becomes similar to this string: "\\.\\*". This string indicates that you want to match a dot immediatly followed by a star. This is not the intented use here: match zero or more characters (.*).

Regular expression for highlighting words in quotes int qt5

I use QHighlighter class, and used regExp to highlight words in quotes:
void Highlighter::highlightBlock(const QString &text)
{
QRegExp expr("\"(.*?)\"");
int index = expr.indexIn(text);
while(index >=0)
{
int length = expr.matchedLength();
setFormat(index, length, Qt::red);
index = expr.indexIn(text, index+length);
}
}
It doesn't work. Work this:
"\".*\""
But it highlights unnecessary. What regular expression is correct?
Just higlight everything between quotes
QRegExp("\"([^\"]*)\"");
highlight single words (run in loop with offset to match words)
QRegExp("\"(\\w)*\"");
How to match words in quotes:
('|")[^\1]*?\1
Example:
http://regex101.com/r/iF5aA1

Forking out matches from a string

How to get all matches from a string using regex?
I have a string:
".+(.cpp$|.cxx$|.d$|.h$|.hpp$)"
and I would like to get only the cpp cxx d h and hpp parts.
EDIT:
So basically I would like to construct regex which would match any string of characters starting with dot and ending with $.
I've tried the pattern: "\\.[^$+]+" which is supposed to match dot and everything else except $ and plus one or more times but this gets just the first .cpp part and I need all of them
Since you mention Qt in your question, here is how you would do it using QRegExp:
#include <QtCore>
#include <QtDebug>
int main(int argc, char **argv) {
QCoreApplication app(argc, argv);
QString target(".+(.cpp$|.cxx$|.d$|.h$|.hpp$)");
QRegExp pattern("\\.(\\w+)\\$");
QStringList matches;
int pos = 0;
while ((pos = pattern.indexIn(target, pos)) != -1) {
matches << pattern.cap(1);
pos += pattern.matchedLength();
}
qDebug() << matches; // "cpp", "cxx", "d", "h", "hpp"
return app.exec();
}
There's no generic solution as it really depends on how your regex implementation works and how it can be called - and considering there's no standard one for C++ (yet), you should mention which one you're using.
First of all you have to escape ., if it's meant to match a . and not just "any character". Also, I'd change the regex: "\.(d|[ch](?:pp|xx)?)$". This way you keep the dot as well as the line ending outside your match.
For the actual call (which will depend on your implementation) you'll have to use some kind of MATCH_ALL or GLOBAL_MATCH flag or simply loop over your input string, always starting after the previous match. Considering the line ending, you might simply use it once per input line (as I don't know your input data).
Find the location of the last "." and test the remaining string against all the suffixes you're interested in.
Since you are only interested in the elements between the punctuation marks, you can use them as separator to split the string with QStringList::split:
QString target = ".+(.cpp$|.cxx$|.d$|.h$|.hpp$)";
QStringList extensions = target.split(QRegExp("\\W+"), QString::SkipEmptyParts);
qDebug() << extensions; // ("cpp", "cxx", "d", "h", "hpp")