Regex: Deal \r\n as normal word - regex

I'm doing a small project which can calculate the count of functions in C++ files(.cpp).
I used the following Regex as "function pattern":
/[a-z|A-Z]+\s*::\s*~?[a-z|A-Z]+\(.*\)/gm
It works for most cases, but fails when there are new line breaks in ().
void CXYZRScanPanel::OnPrepareScanning()
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k)
{
//This one is ok.
}
void CXYZRScanPanel::OnPrepareScanning(int k,
int j)
{
//This one fails.
}
I'm thinking if there is anything "stronger" than the .* which can skip the \r\n.
Thanks for any help.
If there is no such a thing, I will probably remove all /r/n within () before doing the such.

You could write the pattern using a negated character class starting with [^ matching any char except ( and ) which will also match a newline.
Note that you can omit the | in the character class.
[a-zA-Z]+\s*::\s*~?[a-zA-Z]+(\([^()]*\))
The pattern matches:
[a-zA-Z]+ Match 1+ times chars a-zA-Z
\s*::\s* Match :: between optional whitespace chars
~? Match an optional ~ char
[a-zA-Z]+ Match 1+ times chars a-zA-Z
( Capture group 1
\([^()]*\) Optionally match any char except ( and ) between parenthesis
) Close group 1
See a regex demo

Related

Regex match strings with different values

for i,v in array
for i , v in array
for i , v in array
for i, v in array
for i,v in array
for i, v in array
for[\s+,.](.+)
https://regex101.com/r/Vd3w7C/2
How i could match anything after the v
but
i,v, and in array will have different values
i mean something like:
for ppp,gflgkf heekd gfvb
You could use
\bfor\s+[^\s,]+(?:\s*,\s*[^\s,]+)*\s+(.+)
The pattern matches:
\bfor\s+ Match for and 1+ whitespace chars
[^\s,]+ Match 1+ times any char except a whitspace char or ,
(?: Non capture group
\s*,\s*[^\s,]+ Match a comma between optional whitespace chars, and match at least a single char other than a comma or whitespace chars
)*\s+ Close the group and optionally repeat it followed by 1+ whitespace chars
(.+) Capture 1+ times any char except a newline in group 1
See a regex demo.

A regular expression to replace different combinations of double quotes inside a double-quoted string

A regular expression to replace different combinations of double quotes inside a double-quoted string.
Can't clear JSON with one regular expression (for PCRE). I just don't know what to do next.
("title":")[\s\S]+(", "partid":)
I've tried various search and replacement options. For example, ("title":"[^"])(")([^"])(")(, "p) $1$3$4$5, then the same for two double quotes, for three, and so on.
Examples of strings:
{ "DT_RowId":"c2a839fb-580a-11e8-bac6-00155d080416", **"title":"Гайка 7/16"-14" UNC топорна;14H813;P88344 12""**, "partid":"S.4964", "manufacturerid":"2a7dc482-af13-11de-88d3-00e081b05e17", "manufacturer":"SPAREX", "quantity":">10", "price":"8.93", "actionprice":"", "rep":1, "img":0 } , { "DT_RowId":"05d8b40c-ec93-11dd-8f72-00e081b05e05", "title":"Нож ротора (зам.501060)", "partid":"501063", "manufacturerid":"3a7e891f-07ba-11de-8a95-00e081b05e17", "manufacturer":"Geringhoff", "quantity":">10", "price":"932.27", "actionprice":"584.90", "rep":1, "img":1 } , { "DT_RowId":"b7c6c9ee-adca-11e3-8202-00155d012119", **"title":"Олива моторна "CASTROL VECTON" 10W40 E4"/E7", 208L"**, "partid":"RB-V14E4E7-208L", "manufacturerid":"763d805e-c53b-11de-9210-00e081b05e05", "manufacturer":"CASTROL", "quantity":">10", "price":"111.60", "actionprice":"", "rep":1, "img":1 } , { "DT_RowId":"05d8b41d-ec93-11dd-8f72-00e081b05e05", **"title":"Н""о"ж"**, "partid":"501251", "manufacturerid":"3a7e891f-07ba-11de-8a95-00e081b05e17", "manufacturer":"Geringhoff", "quantity":">10", "price":"719.45", "actionprice":"", "rep":1, "img":1 }
Please help. Please help. How can I remove or escape double quotes between "title":" and ", "partid":
You may use
(?:\G(?!\A)|"title":").*?\K"(?=.*?"\s*,\s*"partid":)
Replace with an empty string. See the regex demo.
Details
(?:\G(?!\A)|"title":") - end of the previous match or "title":" string
.*? - any 0+ chars, other than linebreak chars, as few as possible
\K - a match reset operator
" - a " char
(?=.*?"\s*,\s*"partid":) - followed with any 0+ chars, other than linebreak chars, as few as possible, ", 0+ whitespaces, ,, 0+ whitespaces and "partid":.

How to get lines until an empty newline

I want to get a bloc of lines which contains < or > operator until an empty newline
i try with this regex .*[<>][^,\r\n]+?\(.*\S.*,.*\S.*\).*(?:(\n).*)
You find here my example : https://regex101.com/r/UQYLB5/1/
Expected Result :
MATCH 1 :
BAR18>17M(3,5.2)V
MATCH 2 :
BAR19>1.243037M(3,5.2)V
INFORMATION PROCESS
TAKE B/F: 19V[1]
LIGHT PC CARD:
MATCH 3 :
TEFAL17>1.262259M(4.5,5.5)V
SISS17 : 1789-ID
LIGHT 19/17
MAPPING NICE :
MATCH 4 :
MASCARPONE19>493.818969M(3,5.2)V
BATA17 : CDER78945 -- 1875
LEFT ERREUR - CAME BACK
MATCH 5 :
REPAR_178>748.515487M(4.5,5.5)V
CHAN1 / STEREO MIX
If you don't want to match lines which could consist of spaces only, you could use match either < or > and match at least a non whitespace char \S in the following lines:
^[^<>\r\n]*[<>].*(?:\r?\n[^\r\n\S]*\S.*)*
The pattern will match:
^ Start of string
[^<>\r\n]* Match any char except < `
[<>].* Match either < or > and the rest of the line
(?: Non capture group
\r?\n Match a newline
[^\r\n\S]* Match any char except a newline
\S.* Match a non whitespace char and the rest of the line
)* Close the group and repeat 0+ times
Regex demo
If the first line should also contain a , after matching < or >:
^[^<>\r\n]*[<>][^\r\n,]*,.*(?:\r?\n[^\r\n\S]*\S.*)*
Regex demo

Split string on commas ignoring commas, brackets, braces in parenthesis, quotes

I am attempting to split a comma separated list. I want to ignore commas that are in parenthesis, brackets, braces and quotes using regex. To be more precise I am trying to do this in postgres POSIX regexp_split_to_array.
My knowledge of regex is not great and by searching on stack overflow I was able to get a partial solution, I can split the string if it does not contain nested parenthesis, brackets, braces. Here is the regex:
,(?![^()]*+\))(?![^{}]*+})(?![^\[\]]*+\])(?=(?:[^"]|"[^"]*")*$)
Test case:
0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3], "text, text (test)", {a1:1, a2:3, a3:{a1=1, s2=2}, a4:"asasad, sadsas, asasdasd"}
Here is the demo
The problem is that in i.e. (1,2,(1,2)) the first 2 commas get matched if there is a nested parenthesis.
Even though regex is not the best way to go, here is a solution with recursive matching:
(?>(?>\([^()]*(?R)?[^()]*\))|(?>\[[^[\]]*(?R)?[^[\]]*\])|(?>{[^{}]*(?R)?[^{}]*})|(?>"[^"]*")|(?>[^(){}[\]", ]+))(?>[ ]*(?R))*
If we break it down, there is a group with some stuff inside, followed by more of the same kind of matching, separated by optional spaces.
(?> <---- start matching
... <---- some stuff inside
) <---- end matching
(?>
[ ]* <---- optional spaces
(?R) <---- match the entire thing again
)* <---- can be repeated
From your example 0, (1,2), (1,2,(1,2)) [1,2,3,[1,2]], [1,2,3],..., we want to match:
0
(1,2)
(1,2,(1,2)) [1,2,3,[1,2]]
[1,2,3]
...
For the third match, the stuff inside will match (1,2,(1,2)) and [1,2,3,[1,2]], which are separated by a space.
The stuff inside is a series of options:
(?>
(?>...)| <---- will match balanced ()
(?>...)| <---- will match balanced []
(?>...)| <---- will match balanced {}
(?>...)| <---- will match "..."
(?>...) <---- will match anything else without space or comma
)
Here are the options:
\( <---- literal (
[^()]* <---- any number of chars except ( or )
(?R)? <---- match the entire thing optionally
[^()]* <---- any number of chars except ( or )
\) <---- literal )
\[ <---- literal [
[^[\]]* <---- any number of chars except [ or ]
(?R)? <---- match the entire thing optionally
[^[\]]* <---- any number of chars except [ or ]
\] <---- literal ]
{ <---- literal {
[^{}]* <---- any number of chars except { or }
(?R)? <---- match the entire thing optionally
[^{}]* <---- any number of chars except { or }
} <---- literal }
" <---- literal "
[^"]* <---- any number of chars except "
" <---- literal "
[^(){}[\]", ]+ <---- one or more chars except comma, or space, or these: (){}[]"
Note that this does not match a comma-separated list, but the items in such a list. The exclusion of comma and space in the last option above causes it to stop matching at comma or space (except for space we explicitly allowed between repeated matches).

Having at least one-nonwhitespace

str must be true if it has at least one non-whitespace enclosed in the parenthesis:
str = (a)
str = ( as bs)
str = (as e)
and false if it has non-whitespace at all
str = ( )
Im not sure if i can do that + but this condition is also passing the 0 non-whitespaces. Correct it please.
/^\([\S+\s*]+\)$\.test(str)/
You can use this:
/^\(.*\S.*\)$/.test(str)
This matches any character, then a non-whitespace character (that makes it at least one non-whitespace character), and then any character till the end.
You can use the following:
^\((?!\s*\)).+\)$
This matches an open parentheses ( and then it fails if it is followed just by whitespaces and a ), or it accepts the entire line.
Assuming str must satisfy TRUE and FALSE and nesting is implicitly not allowed
^(?:[^()]*\([^\S()]*[^\s()][^\S()]*\))+[^()]*$
expanded
^
(?:
[^()]*
\(
[^\S()]*
[^\s()]
[^\S()]*
\)
)+
[^()]*
$