Regular Expression to parse some special cases of C Code - regex

I am trying to check generated C Code with a regular expression.
Actually the lines I need to check always start the same way
R_Wrt_somename(V_var)
or
R_Wrt_othername((int64) (V_var2 * 3))
I already got an expression for the first one, but I am not able to get a fitting expression for the second possibility of function call.
Is there someone able to help me out with this problem? I also would appreciate a regular expression with explanation as I just started working with them.
The expression for the first function type:
R_Wrt_(\w+)\((\s*(V_)[a-zA-Z_0-9\[\] ]+)

Here is a regex that should fetch you expected results:
R_Wrt_(\w+)\((?:\((\w+)\)\s*)?\(?(\s*(V_)[a-zA-Z_0-9\[\]* ]+)\)*
See demo
The regex matches:
R_Wrt_ - literal R_Wrt
(\w+) - 1 or more English letters, digits or underscore (captured into Group 1)
\( - a literal opening parenthesis
(?:\((\w+)\)\s*)? - an optional non-capturing group (so as not to mess the groups) that matches...
\( - a literal opening parenthesis
(\w+) - 1 or more English letters, digits or underscore (captured into Group 2)
\)\s* - a literal closing parenthesis with optional whitespace
\(? - a literal optional opening parenthesis
(\s*(V_)[a-zA-Z_0-9\[\]* ]+) - a capturing group 3 (from your original regex) matching...
\s* - optional whitespace
(V_) - literal V_ (captured into Group 4)
[a-zA-Z_0-9\[\]* ]+ - 1 or more characters from the set
\)* - 0 or more literal closing parentheses.

Related

Pattern matching with regex

I am trying to make a regex pattern which matches a word/character a space and then comma seperated values with or without spaces before and after. I have found difficulty making a pattern which did this and was wondering if someone could help me.
Example of what it should match:
ages 19, 43,91
I was trying something like this, "(^[^\s])([^,]+)+", but it only matched the first one.
You can try pattern:
\S+(?:\s*\d+\s*,?)+
Regex demo.
\S+ - this will match one or more non-whitespace characters
(?:\s*\d+\s*,?)+ - non-capturing group.
\s* - match 0 or more whitespace characters
\d+ - match 1 or more digits
\s* - match 0 or more whitespace characters
,? - optionally match ,
+ - You may repeat this non-capturing group 1 or more times

Regex to capture everything after optional token

I have fields which contain data in the following possible formats (each line is a different possibility):
AAA - Something Here
AAA - Something Here - D
Something Here
Note that the first group of letters (AAA) can be of varying lengths.
What I am trying to capture is the "Something Here" or "Something Here - D" (if it exists) using PCRE, but I can't get the Regex to work properly for all three cases. I have tried:
- (.*) which works fine for cases 1 and 2 but obviously not 3;
(?<= - )(.*) which also works fine for cases 1 and 2;
(?! - )(.+)| - (.+) works for cases 2 and 3 but not 1.
I feel like I'm on the verge of it but I can't seem to crack it.
Thanks in advance for your help.
Edit: I realized that I was unclear in my requirements. If there is a trailing " - D" (the letter in the data is arbitrary but should only be a single character), that needs to be captured as well.
About the patterns that you tried:
- (.*)This pattern will match the first occurrence of - followed by matching the rest of the line. It will match too much for the second example as the .* will also match the second occurrence of -
(?<= - )(.*)This pattern will match the same as the first example without the - as it asserts that is should occur directly to the left
(?! - )(.+)| - (.+) This pattern uses a negative lookahead which asserts what is directly to the right is not (?! - ). As none of the example start with - , the whole line will be matched directly after the negative lookahead due to .+ and the second part after the alternation | will not be evaluated
If the first group of letters can be of varying length, you could make the match either specific matching 1 or more uppercase characters [A-Z]+ or 1+ word characters \w+.
To get a more broad match, you could match 1 or more non whitespace characters using \S+
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*
Explanation
^ Start of string
(?:\S+\h-\h)? Optionally match the first group of non whitespace chars followed by - between horizontal whitespace chars
\K Clear the match buffer (Forget what is currently matched)
\S+ Match 1+ non whitespace characters
(?: Non capture group
\h(?!-\h) Match a horizontal whitespace char and assert what is directly to the right is not - followed by another horizontal whitespace char
\S+ Match 1+ non whitespace chars
)* Close non capture group and repeat 1+ times to match more "words" separated by spaces
Regex demo
Edit
To match an optional hyphen and trailing single character, you could add an optional non capturing group (?:-\h\S\h*)?$ and assert the end of the string if the pattern should match the whole string:
^(?:\S+\h-\h)?\K\S+(?:\h(?!-\h)\S+)*\h*(?:-\h\S\h*)?$
Regex demo
You may use
^(?:.*? - )?\K.*?(?= - | *$)
^(?:.*?\h-\h)?\K.*?(?=\h-\h|\h*$)
See the regex demo
Details
^ - start of string
-(?:.*? - )? - an optional non-capturing group matching any 0+ chars other than line break chars as few as possible up to the first space-space
\K - match reset operator
.*? - any 0+ chars other than line break chars as few as possible
(?= - | *$) - space-space or 0+ spaces till the end of string should follow immediately on the right.
Note that \h matches any horizontal whitespace chars.
^(?:[A-Z]+ - \K)?.*\S
demo
Since "Something Here" can be anything, there's no reason to specially describe the eventual last letter in the pattern. You don't need something more complicated.
With this pattern I assume that you are not interested by the trailing spaces, that's why I ended it with \S. If you want to keep them, remove the \S and change the previous quantifier to +.

Extracting a number in string inside brackets with Regexextract

I am trying to extract number only (float?) from accounting numbers in google sheet with abbrev. units like K,M,B and sometimes in a bracket when negative. Sorry I am so new in regex, how to write a regular express covering different possibilities like (213M),(31.23B)?
\(([0-9.]+\.\[0-9.]+)\)
You may use
\((-?\d+(?:\.\d+)?)[KMB]\)
Details
\( - a literal ( char
(-?\d+(?:\.\d+)?) - Group 1:
-? - an optional -
\d+ - 1+ digits
(?:\.\d+)? - an optional non-capturing group matching one or zero occurrences of a dot followed with 1+ digits
[KMB] - a character class matching K, M or B
\) - a literal ) char.
See the regex demo.

Regex expression for 2 identical strings in a row

So I am trying to create a regex expression for the following template.
"[alphaNumeric]String/String.xcl"
So
[a1B2c3]Hello/Hello.xcl would pass
a1B2c3]hello/hello.xcl fails
[a1B2c3]Hello/hello.xcl fails
[a1B2c3]hello/hello.xc fails
I have tried the following so far:
\[[\da-zA-Z]+\][a-z]+\/[a-z]+\.xcl$
How do I check if the middle strings are identical?
Use a backreference:
\[[a-zA-Z0-9]+\]([^/]+)/\1\.xcl
The term in parenthesis captures the first part of your path. We may then refer to it later in the regex using \1.
Depending on how you plan to use this regex, you might need optional starting and closing anchors (^ and $).
Demo
You may capture the part after brackets and use a backreference after /:
^\[[\da-zA-Z]+]([A-Za-z]+)\/\1\.xcl$
^^^^^^^^^^ ^^
See the regex demo
Details
^ - start of the string
\[ - a [
[\da-zA-Z]+ - 1+ alphanumeric chars
] - a ] char
([A-Za-z]+) - Capturing group 1: one or more letters
\/ - a slash
\1 - a backreference to capturing group 1 value
\.xcl - .xcl substring
$ - end of string.
NOTE: If you do not care about what kind of chars there can be inside brackets, you may replace [\da-zA-Z]+ with [^\]]+.
NOTE2: If you want to match any chars on both ends of /, replace ([A-Za-z]+) with ([^\/]+).

Matching Word Regex

Hello i want to match with regex this word
(Parc Installé)
from this text:
31/1/2017 17:19:23,4245986,ct0001#Intotel.int,Parc Installé,100.100.30.100
I did this regex ',[A-Za-zA-zÀ-ú+ \/\w+0-9._%+-]+,'
But the result is : 4245986 ans Parc Installé.
How can i match only Parc Installé
You may try a regex based on a lookahead that will require a comma and digits/commas after it up to the end of string:
[^,]+(?=\s*,[\d.]+$)
See this regex demo
Details:
[^,]+ - 1 or more chars other than ,
(?=\s*,[\d.]+$) - a lookahead requiring
\s* - zero or more whitespaces
, - a comma
[\d.]+ - 1+ digits or dots up to...
$ - ... the end of string
To make it a bit more restrictive, you may replace the lookahead with (?=\s*,\d+(?:\.\d+){3}$) to require 4 sequences of dot-separated 1+ digits. See this regex demo.
If a lookahead is not supported (case with a RE2 engine), you might want to use a capturing group based solution:
([^,]+)\s*,[\d.]+$
Here, the part within (...) will be captured into Group 1 and will be accessible via a backreference or a function like =REGEXEXTRACT in Google Spreasheets that only retrieves the contents of a capturing group if the latter is present in the pattern.