How to Get substring from given QString in Qt - c++

I have a QString like this:
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication"
What I need to do is to create substrings as follow:
SoftwareName = MY_DISPLAY_OS //text after ':'
Version = 10.25.10086-1
Release = 2022-3
I tried using QString QString::sliced(qsizetype pos, qsizetype n) const but didn't worked as I'm using 5.9 and this is supported on 6.0.
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication";
QString SoftwareName = fileData.sliced(fileData.lastIndexOf(':'), fileData.indexOf('.'));
Please help me to code this in Qt.

Use QString::split 3 times:
Split by QLatin1Char('=') to two parts:
SOFT_PACKAGES.ABC
MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication
Next, split 2nd part by QLatin1Char(':'), probably again to just 2 parts if there can never be more than 2 parts, so the 2nd part can contain colons:
MY_DISPLAY_OS
MY-Display-OS.2022-3.10.25.10086-1.myApplication
Finally, split 2nd part of previous step by QLatin1Char('.'):
MY-Display-OS
2022-3
10
25
10086-1
myApplication
Now just assemble your required output strings from these parts. If exact number of parts is unknown, you can get Version = 10.25.10086-1 by removing two first elements and last element from the final list above, and then joining the rest by QLatin1Char('.'). If indexes are known and fixed, you can just use QStringLiteral("%1.%2.%3").arg(....

One way is using
QString::mid(int startIndex, int howManyChar);
so you probably want something like this:
QString fileData = "SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication";
QString SoftwareName = fileData.mid(fileData.indexOf('.')+1, (fileData.lastIndexOf(':') - fileData.indexOf('.')-1));
To extract the other part you requested and if the number of '.' characters remains constant along all strings you want to check you can use the second argument IndexOf to find shift the starting location to skip known many occurences of '.', so for example
int StartIndex = 0;
int firstIndex = fileData.indexOf('.');
for (int i=0; i<=6; i++) {
StartIndex += fileData.indexOf('.', firstIndex+StartIndex);
}
int EndIndex = fileData.indexOf('.', StartIndex+8);
should give the right indices to be cut out with
QString SoftwareVersion = fileData.mid(StartIndex, EndIndex - StartIndex);
If the strings to be parsed stay less consistent in this way, try switching to regular expressions, they are the more flexible approach.

In my experience, using regular expressions for these types of tasks is generally simpler and more robust. You can do this with a regular expressions with the following:
// Create the regular expression.
// Using C++ raw string literal to reduce use of escape characters.
QRegularExpression re(R"(.+=([\w_]+):[\w-]+\.(\d+-\d+)\.(\d+\.\d+\.\d+-?\d+))");
// Match against your string
auto match = re.match("SOFT_PACKAGES.ABC=MY_DISPLAY_OS:MY-Display-OS.2022-3.10.25.10086-1.myApplication");
// Now extract the portions you are interested in
// match.captured(0) is always the full string that matched the entire expression
const auto softwareName = match.captured(1);
const auto version = match.captured(3);
const auto release = match.captured(2);
Of course for this to make sense, you have to understand regex, so here is my explanation of the regex used here:
.+=([\w_]+):[\w-]+\.(\d+-\d+)\.(\d+\.\d+\.\d+-?\d+)
.+=
get all characters up to and including the first equals sign
([\w_]+)
capture one or more word characters (alphanumeric characters) or underscores
:
a colon
[\w-]+\.
one or more alphanumeric or dash characters followed by a single period
(\d+-\d+)
capture one or more of digits followed by a dash followed by one or more digits
\.
a single period
(\d+\.\d+\.\d+-?\d*)
capture three sets of digits with periods in between, then an optional dash, and any number of digits (could be zero digits)
I think it is generally easier to make a regex that handles changes to the input - lets say version becomes 10.25.10087 - more easily than manually parsing things by index.
Regex is a powerful tool once you get used to it, but it can certainly seem daunting at first.
Example of this regex on regex101.com: https://regex101.com/r/dj3Z4U/1

Related

Capture number inside tag in Qt

My tag struct looks like this:
<sml8/>
combination of < , sml , digits (one or two) and />
Is there anyway to capture number inside tag?
for example in above I want capture 8 inside
I've defined regular expression and I tried to capture it by digit position but it's not working for me.
QRegExp rxlen("<sml(.*)/>");
int index = rxlen.pos(3);
I guess it's not correct way and it gives me position of digit although I want value of digit (or digits).
You need to use capturedTexts() together with <sml(\\d{1,2})/> regex (it matches <sml literally, then 1 or 2 digits capturing them into Captured group 1, then />:
QString str = "<sml8/>";
QRegExp rxlen("<sml(\\d{1,2})/>");
int pos = rxlen.indexIn(str);
QStringList list = rxlen.capturedTexts();
QString my_number = list[1];

How to save " in a string in C++?

So I have the following code which doesn't work. I couldn't figure it out how to do it.
std::string str("Q850?51'18.23"");
First problem I face is " (quotation mark). I cannot save it as a string because at the end of the string I have two " characters and C++ doesn't let me save the whole string.
Second I want to split the string and save it in different variables.
E.g.;
double i = 850;
double j = 51;
double k = 18.23;
You will need to escape the quotation mark you require in the string;
std::string str("Q850?51'18.23\"");
// ^ escape the quote here
The cppreference site has a list of these escape sequences.
Alternatively you are use a raw string literal;
std::string str = R"(Q850?51'18.23")";
The second part of the problem is dependent on the format and predictability of the data;
If it is fixed width, a simple index and be used to extract the numbers and convert to the double you require.
If it is delimited with the characters above, you can consume the string to each of the delimiters extracting the numbers in-between them (you should be able to find suitable libraries to assist with this).
If it is some further unknown composition, you may be limited to consuming the string one character at a time and extracting the numerical values between the non-numerical values.
You need to escape your quote mark:
std::string str("Q850?51'18.23\"");
// ^
You need to escape your quote mark
Add a backslash before "
std::string str("Q850?51'18.23\"");

C++ boost::regex multiples captures

I'm trying to recover multiples substrings thanks to boost::regex and put each one in a var. Here my code :
unsigned int i = 0;
std::string string = "--perspective=45.0,1.33,0.1,1000";
std::string::const_iterator start = string.begin();
std::string::const_iterator end = string.end();
std::vector<std::string> matches;
boost::smatch what;
boost::regex const ex(R"(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+),(^-?\d*\.?\d+))");
string.resize(4);
while (boost::regex_search(start, end, what, ex)
{
std::string stest(what[1].first, what[1].second);
matches[i] = stest;
start = what[0].second;
++i;
}
I'm trying to extract each float of my string and put it in my vector variable matches. My result, at the moment, is that I can extract the first one (in my vector var, I can see "45" without double quotes) but the second one in my vector var is empty (matches[1] is "").
I can't figure out why and how to correct this. So my question is how to correct this ? Is my regex not correct ? My smatch incorrect ?
Firstly, ^ is symbol for the beginning of a line. Secondly, \ must be escaped. So you should fix each (^-?\d*\.?\d+) group to (-?\\d*\\.\\d+). (Probably, (-?\\d+(?:\\.\\d+)?) is better.)
Your regular expression searches for the number,number,number,number pattern, not for the each number. You add only the first substring to matches and ignore others. To fix this, you can replace your expression with (-?\\d*\\.\\d+) or just add all the matches stored in what to your matches vector:
while (boost::regex_search(start, end, what, ex))
{
for(int j = 1; j < what.size(); ++j)
{
std::string stest(what[j].first, what[j].second);
matches.push_back(stest);
}
start = what[0].second;
}
You are using ^ at several times in your regex. That's why it didn't match. ^ means the beginning of the string. Also you have an extra ) at the end of the regex. I don't know that closing bracket doing there.
Here is your regex after correction:
(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+),(-?\d*\.?\d+)
A better version of your regex can be(only if you want to avoid matching numbers like .01, .1):
(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?),(-?\d+(?:\.\d+)?)
A repeated search in combination with a regular expression that apparently is built to match all of the target string is pointless.
If you are searching repeatedly in a moving window delimited by a moving iterator and string.end() then you should reduce the pattern to something that matches a single fraction.
If you know that the number of fractions in your string is/must be constant, match once, not in a loop and extract the matched substrings from what.

How to separate a line of input into multiple variables?

I have a file that contains rows and columns of information like:
104857 Big Screen TV 567.95
573823 Blender 45.25
I need to parse this information into three separate items, a string containing the identification number on the left, a string containing the item name, and a double variable containing the price. The information is always found in the same columns, i.e. in the same order.
I am having trouble accomplishing this. Even when not reading from the file and just using a sample string, my attempt just outputs a jumbled mess:
string input = "104857 Big Screen TV 567.95";
string tempone = "";
string temptwo = input.substr(0,1);
tempone += temptwo;
for(int i=1 ; temptwo != " " && i < input.length() ; i++)
{
temptwo = input.substr(j,j);
tempone += temp2;
}
cout << tempone;
I've tried tweaking the above code for quite some time, but no luck, and I can't think of any other way to do it at the moment.
You can find the first space and the last space using std::find_first_of and std::find_last_of . You can use this to better split the string into 3 - first space comes after the first variable and the last space comes before the third variable, everything in between is the second variable.
How about following pseudocode:
string input = "104857 Big Screen TV 567.95";
string[] parsed_output = input.split(" "); // split input string with 'space' as delimiter
// parsed_output[0] = 104857
// parsed_output[1] = Big
// parsed_output[2] = Screen
// parsed_output[3] = TV
// parsed_output[4] = 567.95
int id = stringToInt(parsed_output[0]);
string product = concat(parsed_output[1], parsed_output[2], ... ,parsed_output[length-2]);
double price = stringToDouble(parsed_output[length-1]);
I hope, that's clear.
Well try breaking down the files components:
you know a number always comes first, and we also know a number has no white spaces.
The string following the number CAN have whitespaces, but won't contain any numbers(i would assume)
After this title, you're going to have more numbers(with no whitespaces)
from these components, you can deduce:
grabbing the first number is as simple as reading in using the filestream <<.
getting the string requires you to check until you reach a number, grabbing one character at a time and inserting that into a string. the last number is just like the first, using the filestream <<
This seems like homework so i'll let you put the rest together.
I would try a regular expression, something along these lines:
^([0-9]+)\s+(.+)\s+([0-9]+\.[0-9]+)$
I am not very good at regex syntax, but ([0-9]+) corresponds to a sequence of digits (this is the id), ([0-9]+\.[0-9]+) is the floating point number (price) and (.+) is the string that is separated from the two number by sequences of "space" characters: \s+.
The next step would be to check if you need this to work with prices like ".50" or "10".

RegEx to find words with characters

I've found answers to many of my questions here but this time I'm stuck. I've looked at 100's of questions but haven't found an answer that solves my problem so I'm hoping for your help :D
Considering the following list of words:
iris
iridium
initialization
How can I use regex to find words in this list when I am looking using exactly the characters u, i, i? I'm expecting the regex to find "iridium" only because it is the only word in the list that has two i's and one u.
What I've tried
I've been searching both here and elsewhere but haven't come across any that helps me.
[i].*[i].*[u]
matches iridium, as expected, and not iris nor initialization. However, the characters i, i, u must be in that sequence in the word, which may or may not be the case. So trying with a different sequence
[u].*[i].*[i]
This does not match iridium (but I want it to, iridium contains u, i, i) and I'm stuck for what to do to make it match. Any ideas?
I know I could try all sequences (in the example above it would be iiu; iui; uii) but that gets messy when I'm looking for more characters (say 6, tnztii which would match initialization).
[t].*[n].*[z].*[t].*[i].*[i]
[t].*[z].*[n].*[t].*[i].*[i]
[t].*[z].*[n].*[i].*[t].*[i]
..... (long list until)
[i].*[n].*[i].*[t].*[z].*[t] (the first matching sequence)
Is there a way to use regex to find the word, irrespective of the sequence of the characters?
I don't think there's a way to solve this with RegularExpressions which does not end in a horribly convoluted expression - might be possible with LookForward and LookBehind expressions, but I think it's probably faster and less messy if you simply solve this programmatically.
Chop the string up by its whitespaces and then iterate over all the words and count the instances your characters appear inside this word. To speed things up, discard all words with a length less than your character number requirement.
Is this an academic exercise, or can you use more than a single regular expression? Is there a language wrapped around this? The simplest way to do what you want is to have a regexp that matches just i or u, and examine (count) the matches. Using python, it could be a one-liner. What are you using?
The part you haven't gotten around to yet is that there might be additional i's or u's in the word. So instead of matching on .*, match on [^iu].
Here's what I would do:
Array.prototype.findItemsByChars = function(charGroup) {
console.log('charGroup:',charGroup);
charGroup = charGroup.toLowerCase().split('').sort().join('');
charGroup = charGroup.match(/(.)\1*/g);
for (var i = 0; i < charGroup.length; i++) {
charGroup[i] = {char:charGroup[i].substr(0,1),count:charGroup[i].length};
console.log('{char:'+charGroup[i].char+' ,count:'+charGroup[i].count+'}');
}
var matches = [];
for (var i = 0; i < this.length; i++) {
var charMatch = 0;
//console.log('word:',this[i]);
for (var j = 0; j < charGroup.length; j++) {
try {
var count = this[i].match(new RegExp(charGroup[j].char,'g')).length;
//console.log('\tchar:',charGroup[j].char,'count:',count);
if (count >= charGroup[j].count) {
if (++charMatch == charGroup.length) matches.push(this[i]);
}
} catch(e) { break };
}
}
return matches.length ? matches : false;
};
var words = ['iris','iridium','initialization','ulisi'];
var matches = words.findItemsByChars('iui');
console.log('matches:',matches);
EDIT: Let me know if you need any explanation.
I know this is a really old post, but I found this topic really interesting and thought people might look for a similar answer some day.
So the goal is to match all words with a specific set of characters in any order. There is a simple way to do this using lookaheads :
\b(?=(?:[^i\W]*i){2})(?=[^u\W]*u)\w+\b
Here is how it works :
We use one lookahead (?=...) for each letter to be matched
In this, we put [^x\W]*x where x is the the letter that must be present.
We then make this pattern occur n times, where n is the number of times that x must appear in th word using (?:...){n}
The resulting regex for a letter x having to appear n times in the word is then (?=(?:[^x\W]*x){n})
All you have to do then is to add this pattern for each letter and add \w+ at the end to match the word !