A regex for extracting " ; " or "=" symbols from source code? - regex

For example
int val = 13;
Serial.begin(9600);
val = DigitalWrite(900,HIGH);
I really want to extract special symbols like = and ;.
I've been able to extracted symbols that appear adjacent in the code, but I need all occurrences.
I tried [^ "//"A-Za-z\t\n0-9]* and [\;\=\{\}\,]+. Neither worked.
what's wrong?
i had made a rule for my scanner like below.(had been changed)
semicolon [;]([\n]|[^ "//"])
assignment (.)?[=]+
brace ([{]|[}])([\n]|[^ "//"])
roundbarcket ("()")" "
the problem was occurred like these situations
int val= 13; // it couldn't recognize "=" because "val" and "=" is adjoined. i want to recognize them either adjoined or not
serial.read(); // it couldn't recognize () and ; with individually. if i add semicolon rule and roundbarcket rule, (); was recognized.
how can i solve them ?

You want to break "DigitalWrite(900,HIGH);" into "DigitalWrite" "(" "900" "," "HIGH" ")" ";". I think looping each substring is the fastest way.
string text = "val = DigitalWrite(900,HIGH);";
string[] symbols = new string[] { "(", ")", ",", "=", ";"};
List<string> tokens = new List<string>();
string word = "";
for( int i = 0; i < text.Length; i++ )
{
string letter = text.Substring( i, 1 );
if( !letter.Equals( " " ) )
{
if( tokens.Contains( letter ) )
{
if( word.Length > 0 )
{
tokens.Add( word );
word = "";
}
tokens.Add( letter );
}
else
{
word += letter;
if(i == text.Length - 1 )
tokens.Add( word );
}
}
}

So searching for ";" and "=" is the ultimate goal you want to achieve?
In such case, why don't you just use something like .find() function?
Or, you can split strings by ";" first and search for "=" after.
If you want to grab text between "=" and ";", try use =([^;]*); or =(.*?);

Related

How can i split a QString by a delimiter and not if that delimiter is somewhere enclosed in square-brackets

If i have a QString in the form of QString s = QString("A:B[1:2]:C:D"); i want somehow to split by ':', but only, if not enclosed in square-brackets.
So the desited output of the above QString woud be "A", "B[1:2]", "C", "D"
Right now, I can only think of something like replacing ':' in the range of s.indexOf('[') and s.indexOf(']'), then split and afterwards replace back to ':' in any remaining split, but that seems rather inconvenient.
EDIT: based on comments and answers: any number in square-brackets shall be the same after splitting. There are characters, e.g.: ';' that i can use t as temporary replacement for ':'
Any better idea?
Usually, I like the idea of using a regular expression here for split directly, but I could not come up with one quickly. So here it your idea of first replacing the unwanted colon with something else (here a semicolon) and then split on the remaining colons and replace the semicolon back to a colon on the separate strings.
#include <QDebug>
#include <QRegularExpression>
#include <QString>
int main()
{
QString string("A:B[1:2]:C:D");
// This replaces all occurences of "[x:y]" by "[x;y]" with
// x and y being digits.
// \\[ finds exactly the character '['. It has to be masked
// by backslashes because it's a special character in regular
// expressions.
// (\\d) is a capture for a digit that can be used in the
// resulting string as \\1, \\2 and so on.
string = string.replace(QRegularExpression("\\[(\\d):(\\d)\\]"), "[\\1;\\2]");
// split on the remaining colons
QStringList elements = string.split(':');
// Iterate over all fractions the string was split into
foreach(QString element, elements) {
// Replace the semicolons back to colons.
qDebug() << element.replace(QRegularExpression("\\[(\\d);(\\d)\\]"), "[\\1:\\2]");
}
}
The output:
"A"
"B[1:2]"
"C"
"D"
Probably far from optimal but... you could do an initial split on ':' and then post-process the results to coalesce items containing '[' and ']'. So, given your initial string, something like...
QString s("A:B[1:2]:C:D");
QStringList l = s.split(':');
for (int i = 0; i < l.size(); ++i) {
if (l[i].contains('[')) {
l[i] += ":" + l[i +1];
l.takeAt(i + 1);
}
}
This assumes, of course, that any given '[', ']' pair will have at most one intervening ':'.
I will provide my working code as answer, but accept any better idea:
So first i replace any colon inside of square-brackets:
QString ShuntingYard::replaceIndexColons(QString& expression)
{
int index = 0;
while (expression.indexOf('[', index) != -1)
{
int open = expression.indexOf('[', index);
int close = expression.indexOf(']', open);
int colon = expression.indexOf(':', open);
if (colon > open && colon < close)
expression.replace(colon, 1, ';');
index = open + 1;
}
return expression;
}
Then, i can split with splitExpression, this splits by several delimiters, including :
expression = replaceIndexColons(expression);
QStringList list = splitExpression(expression);
Q_FOREACH(QString s, list)
{
s.replace(";", ":");
}
and put it back together...

C++ regex_match not working

Here is part of my code
bool CSettings::bParseLine ( const char* input )
{
//_asm INT 3
std::string line ( input );
std::size_t position = std::string::npos, comment;
regex cvarPattern ( "\\.([a-zA-Z_]+)" );
regex parentPattern ( "^([a-zA-Z0-9_]+)\\." );
regex cvarValue ( "\\.[a-zA-Z0-9_]+[ ]*=[ ]*(\\d+\\.*\\d*)" );
std::cmatch matchedParent, matchedCvar;
if ( line.empty ( ) )
return false;
if ( !std::regex_match ( line.c_str ( ), matchedParent, parentPattern ) )
return false;
if ( !std::regex_match ( line.c_str ( ), matchedCvar, cvarPattern ) )
return false;
...
}
I try to separate with it lines which I read from file - lines look like:
foo.bar = 15
baz.asd = 13
ddd.dgh = 66
and I want to extract parts from it - e.g. for 1st line foo.bar = 15, I want to end up with something like:
a = foo
b = bar
c = 15
but now, regex is returning always false, I tested it on many online regex checkers, and even in visual studio, and it's working great, do I need some different syntax for C++ regex_match? I'm using visual studio 2013 community
The problem is that std::regex_match must match the entire string but you are trying to match only part of it.
You need to either use std::regex_search or alter your regular expression to match all three parts at once:
#include <regex>
#include <string>
#include <iostream>
const auto test =
{
"foo.bar = 15"
, "baz.asd = 13"
, "ddd.dgh = 66"
};
int main()
{
const std::regex r(R"~(([^.]+)\.([^\s]+)[^0-9]+(\d+))~");
// ( 1 ) ( 2 ) ( 3 ) <- capture groups
std::cmatch m;
for(const auto& line: test)
{
if(std::regex_match(line, m, r))
{
// m.str(0) is the entire matched string
// m.str(1) is the 1st capture group
// etc...
std::cout << "a = " << m.str(1) << '\n';
std::cout << "b = " << m.str(2) << '\n';
std::cout << "c = " << m.str(3) << '\n';
std::cout << '\n';
}
}
}
Regular expression: https://regex101.com/r/kB2cX3/2
Output:
a = foo
b = bar
c = 15
a = baz
b = asd
c = 13
a = ddd
b = dgh
c = 66
To focus on regex patterns I'd prefer to use raw string literals in c++:
regex cvarPattern ( R"rgx(\.([a-zA-Z_]+))rgx" );
regex parentPattern ( R"rgx(^([a-zA-Z0-9_]+)\.)rgx" );
regex cvarValue ( R"rgx(\.[a-zA-Z0-9_]+[ ]*=[ ]*(\d+\.*\d*))rgx" );
Everything between the rgx( )rgx delimiters doesn't need any extra escaping for c++ char literal characters.
Actually what you have written in your question resembles to those regular expressions I've been writing as raw string literals.
You probably simply meant something like
regex cvarPattern ( R"rgx(.([a-zA-Z_]+))rgx" );
regex parentPattern ( R"rgx(^([a-zA-Z0-9_]+).)rgx" );
regex cvarValue ( R"rgx(.[a-zA-Z0-9_]+[ ]*=[ ]*(\d+(\.\d*)?))rgx" );
I didn't dig in deeper, but I'm not getting all of these escaped characters in your regular expression patterns now.
As for your question in the comment, you can use a choice of matching sub-pattern groups, and check for which of them was applied in the matches structure:
regex cvarValue (
R"rgx(.[a-zA-Z0-9_]+[ ]*=[ ]*((\d+)|(\d+\.\d?)|([a-zA-Z]+)){1})rgx" );
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You probably don't need these cvarPattern and parentPattern regular expressions to inspect other (more detailed) views about the matching pattern.

Use Meteor Match and Regex to check strings

I'm checking an array of strings for a specific combination of patterns. I'm having trouble using Meteor's Match function and regex literal together. I want to check if the second string in the array is a url.
addCheck = function(line) {
var firstString = _.first(line);
var secondString = _.indexOf(line, 1);
console.log(secondString);
var urlRegEx = /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+#)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+#)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-]*)?\??(?:[\-\+=&;%#\.\w]*)#?(?:[\.\!\/\\\w]*))?)/g;
if ( firstString == "+" && Match.test(secondString, urlRegEx) === true ) {
console.log( "detected: + | line = " + line )
} else {
// do stuff if we don't detect a
console.log( "line = " + line );
}
}
Any help would be appreciated.
Match.test is used to test the structure of a variable. For example: "it's an array of strings, or an object including the field createdAt", etc.
RegExp.test on the other hand, is used to test if a given string matches a regular expression. That looks like what you want.
Try something like this instead:
if ((firstString === '+') && urlRegEx.test(secondString)) {
...
}

Regex for the string with '#'

I wondering how should be the regex string for the string containig '#'
e.g.
abc#def#ghj#ijk
I wanna get
#def
#ghj
#ijk
I tried #[\S]+ but it selects the whole #def#ghj#ijk Any ideas ?
Edit
The code below selects only #Me instead of #MessageBox. Why ?
var m = new RegExp('#[^\s#]+').exec('http://localhost/Lorem/10#MessageBox');
if (m != null) {
var s = '';
for (i = 0; i < m.length; i++) {
s = s + m[i] + "\n";
}
}
Edit 2
the double backslash solved that problem. '#[^\\s#]+'
Try #[^\s#]+ to match # followed by a sequence of one or mor characters which are neither # nor whitespace.
Match all characters that are not #:
#[^#]+

In a RegEx with multiple subexpressions (i.e. using parenthesis), how do I know which one it matched?

So, for example:
//The string to search through
var str = "This is a string /* with some //stuff in here";
//I'm matching three possible things: "here" or "//" or "/*"
var regEx = new RegExp( "(here)|(\\/\\/)|(\\/\\*)", "g" );
//Loop and find them all
while ( match = regEx.exec( str ) )
{
//Which one is matched? The first parenthesis subexpression? The second?
alert( match[ 0 ] );
}
How do i know I matched the "(//)" instead of the "(here)" without running another regex against the returned match?
You can check which group is defined:
var str = "This is a string /* with some //stuff in here";
var regEx = /(here)|(\/\/)|(\/\*)/g;
while(match = regEx.exec(str)){
var i;
for(i = 1; i < 3; i++){
if(match[i] !== undefined)
break;
}
alert("matched group " + i + ": " + match[i]);
}
Running at http://jsfiddle.net/zLD5V/