I have the QString line like this "567\n1.23456 2.34567\n1.23456 2.34"
And I want only "whole" float numbers only between \n characters.
I need QStringList after split() that contains only this float numbers. QString::split() can use RegEx so maybe I can use som regex here.
i tried QStringList myList = QString("56\n1.12345 2.34567\n1.23456 2.34").split('\n') that returns me ["1.2345 2.34567"] so i need split this again to ["1.23456"] and ["2.34567"]
The Qt documentation for QString::split has your answer
QString str;
QStringList list;
str = "Some text\n\twith strange whitespace.";
list = str.split(QRegularExpression("\\s+"));
// list: [ "Some", "text", "with", "strange", "whitespace." ]
this regex \d+(\.\d+)? will give you any float/int number!
You should split at QRegularExpression("\\s+"). \s means whitespace (which includes both =space and \n=newline), + means one or more, and you need to escape the backslash.
Related
I want to use Regex to acquire some ID's in a cellstring array, the array looks like this:
myString = '(['US04650Y1001', 'US90274P3029', 'HON WI', 'US41165F1012'])';
My pattern for regex is as follows:
pattern = '[A-Za-z0-9.^_]+';
newArr = regexp(myString, pattern,'match');
I'd like to get the ID called 'HON WI', but with my current pattern, its splitting it into two because my pattern can't deal with the whitespace properly. I would like to get the whole "HON WI", as well as my other strings, everything that's in '', these might have special characters like ^, . or _, but I don't know how to add the whitespace.
I already tried stuff like this, without success:
pattern = '[A-Za-z0-9.^_\s]+';
My new array should have, in each cell, the strings/ID's contained in myString (US04650Y1001, US90274P3029, HON WI and US41165F1012) with dimensions 1x4.
Another approach that seems to work but not entirely sure:
myString = strrep(myString,'([','');
myString = strrep(myString,'])','');
myString = regexp(myString,',','split');
myString = strrep(myString,'''','');
This seems to get me what I want, but I would like to know how can I alter the regex on my first approach.
Many thanks in advance.
You may use a mere '([^']+)' regex and use 'tokens' to get the captures:
myString = '([''US04650Y1001'', ''US90274P3029'', ''HON WI'', ''US41165F1012''])';
pattern = '''([^'']+)''';
newArr = regexp(myString, pattern,'match', 'tokens');
The newArr will look like
{
[1,1] = 'US04650Y1001'
[1,2] = 'US90274P3029'
[1,3] = 'HON WI'
[1,4] = 'US41165F1012'
}
You may option is to use lookaround assertions. The following will match any string made of alphanumeric character or underscore (\w), space (' ') or characters . or ^, that is located between quotes. This will specifically exclude the blank space next to the comma, in the separation between tokens, i.e. ', ' does not give a match.
Note that \s will match any blank space character (including tab, newline), this is why a space is preferred here:
pattern2='(?<='')[\w.^ ]+(?='')';
pattern2 =
(?<=')[\w.^ ]+(?=')
newArr = regexp(myString, pattern2,'match');
newArr'
ans =
'US04650Y1001'
'US90274P3029'
'HON WI'
'US41165F1012'
How can I remove every occurence of special characters ^ and $ in a QString?
I tried:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[^$]."));
You missed to escape the ^. To escape that, a \ is needed, but that also needs to be escaped because of C strings. Also you want one ore more occurences to match with +.
This regular expression should work: [\\^$]+, see online.
So it has to be:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[\\^$]+"));
Another possibility as said in the comments below by Joe P is:
QString str = "^TEST$^TEST$";
str = str.remove(QRegularExpression("[$^]+"));
because the ^ has just a special meaning at the beginning, where you have to escape it to get it literally, see online.
You can also try using a regular expression where you can remove every non-alphanumeric character:
QString str = "$om<Mof*%njas"
str = str.remove(QRegExp("[^a-zA-Z\\d\\s]"));
I need to determine whether a string begins with a number - I've tried the following to no avail:
if (matches("^[0-9].*)", upper(text))) str = "Title"""
I'm new to DXL and Regex - what am I doing wrong?
You need the caret character to indicate a match only at the start of a string. I added the plus character to match all the numbers, although you might not need it for your situation. If you're only looking for numbers at the start, and don't care if there is anything following, you don't need anymore.
string str1 = "123abc"
string str2 = "abc123"
string strgx = "^[0-9]+"
Regexp rgx = regexp2(strgx)
if(rgx(str1)) { print str1[match 0] "\n" } else { print "no match\n" }
if(rgx(str2)) { print str2[match 0] "\n" } else { print "no match\n" }
The code block above will print:
123
no match
#mrhobo is correct, you want something like this:
Regexp numReg = "^[0-9]"
if(numReg text) str = "Title"
You don't need upper since you are just looking for numbers. Also matches is more for finding the part of the string that matches the expression. If you just want to check that the string as a whole matches the expression then the code above would be more efficient.
Good luck!
At least from example I found this example should work:
Regexp plural = regexp "^([0-9].*)$"
if plural "15systems" then print "yes"
Resource:
http://www.scenarioplus.org.uk/papers/dxl_regexp/dxl_regexp.htm
I have a string like '[1]-[2]-[3],[4]-[5],[6,7,8],[9]' or '[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]', I'd like the Pattern to get the list result, but don't know how to figure out the pattern. Basically the comma is the split, but [6,7,8] itself contains the comma as well.
the string: [1]-[2]-[3],[4]-[5],[6,7,8],[9]
the result:
[1]-[2]-[3]
[4]-[5]
[6,7,8]
[9]
or
the string: [Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]
the result:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
,(?=\[)
This pattern splits on any comma that is followed by a bracket, but keeps the bracket within the result text.
The (?=*stuff*) is known as a "lookahead assertion". It acts as a condition for the match but is not itself part of the match.
In C# code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
foreach(String s in Regex.Split(inputstring, #",(?=\[)"))
System.Console.Out.WriteLine(s);
In Java code:
String inputstring = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile(",(?=\\[)"));
for(String s : p.split(inputstring))
System.out.println(s);
Either produces:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
Although I believe the best approach here is to use split (as presented by #j__m's answer), here's an approach that uses matching rather than splitting.
Regex:
(\[.*?\](?!-))
Example usage:
String input = "[Computers]-[Apple]-[Laptop],[Cables]-[Cables,Connectors],[Adapters]";
Pattern p = Pattern.compile("(\\[.*?\\](?!-))");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Resulting output:
[Computers]-[Apple]-[Laptop]
[Cables]-[Cables,Connectors]
[Adapters]
An answer that doesn't use regular expressions (if that's worth something in ease of understanding what's going on) is:
substitute "]#[" for "],["
split on "#"
I'm really bad in regular expressions, so please help me.
I need to find in string any pieces like #text.
text mustn't contain any space characters (\\s). It's length must be at least 2 characters ({2,}), and it must contain at least 1 letter(QChar::isLetter()).
Examples:
#c, #1, #123456, #123 456, #123_456 are incorrect
#cc, #text, #text123, #123text are correct
I use QRegExp.
QRegExp rx("#(\\S+[A-Za-z]\\S*|\\S*[A-Za-z]\\S+)$");
bool result = (rx.indexIn(str) == 0);
rx either finds a non-whitespace followed by a letter and by an unspecified number of non-whitespace characters, or a letter followed by at least non-whitespace.
Styne666 gave the right regex.
Here is a little Perl script which is trying to match its first argument with this regex:
#!/usr/bin/env perl
use strict;
use warnings;
my $arg = shift;
if ($arg =~ m/(#(?=\d*[a-zA-Z])[a-zA-Z\d]{2,})/) {
print "$1 MATCHES THE PATTERN!\n";
} else {
print "NO MATCH\n";
}
Perl is always great to quickly test your regular expressions.
Now, your question is a bit different. You want to find all the substrings in your text string,
and you want to do it in C++/Qt. Here is what I could come up with in couple of minutes:
#include <QtCore/QCoreApplication>
#include <QRegExp>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
QString str = argv[1];
QRegExp rx("[\\s]?(\\#(?=\\d*[a-zA-Z])[a-zA-Z\\d]{2,})\\b");
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1)
{
QString token = rx.cap(1);
cout << token.toStdString().c_str() << endl;
pos += rx.matchedLength();
}
return 0;
}
To make my test I feed it an input like this (making a long string just one command line argument):
peter#ubuntu01$ qt-regexp "#hjhj 4324 fdsafdsa #33e #22"
And it matches only two words: #hjhj and #33e.
Hope it helps.
The shortest I could come up with (which should work, but I haven't tested extensively) is:
QRegExp("^#(?=[0-9]*[A-Za-z])[A-Za-z0-9]{2,}$");
Which matches:
^ the start of the string
# a literal hash character
(?= then look ahead (but don't match)
[0-9]* zero or more latin numbers
[A-Za-z] a single upper- or lower-case latin letter
)
[A-Za-z0-9]{2,} then match at least two characters which may be upper- or lower-case latin letters or latin numbers
$ then find and consume the end of the line
Technically speaking though this is still wrong. It only matches latin letters and numbers. Replacing a few bits gives you:
QRegExp("^#(?=\\d*[^\\d\\s])\\w{2,}$");
This should work for non-latin letters and numbers but this is totally untested. Have a quick read of the QRegExp class reference for an explanation of each escaped group.
And then to match within larger strings of text (again, untested):
QRegExp("\b#(?=\\d*[^\\d\\s])\\w{2,}\b");
A useful tool is the Regular Expressions Example which comes with the SDK.
use this regular expression. hope fully your problem will solve with given RE.
^([#(a-zA-Z)]+[(a-zA-Z0-9)]+)*(#[0-9]+[(a-zA-Z)]+[(a-zA-Z0-9)]*)*$