Writing a Regex pattern c++ - c++

I need help completing this regex pattern.
Here is the full string:
INSERT((1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
Here is the portion of the string I am trying to search for using regex_search:
(1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449)
Here is my regex pattern and code:
regex pattern2("\\(|[0-9]|[a-z]|[A-Z]|\,|\"|");
regex_search (substring,matcher,pattern2);
for(auto x:matcher)
{
substring1 = matcher.suffix().str();
cout << substring1 << endl;
}
substring will output:
1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
So not what I need. Would appreciate some help.

To match (1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449) you can use a capturing group and a negated character class.
[^*()\r\n]+\((\([^()\r\n]+\))[^()\r\n]+\)
In parts
[^*()\r\n]+ Match 1+ times any char except the listed
\( Match first opening parenthesis
( Capture group 1
\( Match second opening parenthesis
[^()\r\n]+ Match 1+ times any char except the listed
\) Close second opening parenthesis
) Close group 1
[^()\r\n]+
\) Close first opening parenthesis
Regex demo
You could also make the pattern more restrict by using repeating non capturing groups and use the allowed characters from the character classes that you intended to use:
[a-zA-Z0-9]+\(\(((?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+")(?:,(?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+"))+)\),[a-zA-Z0-9]+\)
Regex demo

Related

Match group followed by group with different ending

For example, let's say I have a list of words:
words.txt
accountable
accountant
accountants
accounted
I want to match "accountant\naccountants"
I've tried /(\n\w+){2}s/, but \w+ seems to be perfectly matching different things.
My RegEx also matches the following undesirable texts:
action
actionables
actionable
actions
Am I reaching out too far in what regex can do?
You could for example use a capture group, and match a newline followed by a backreference to the same captured text and an s char.
If the first word can also be at the start of the string, instead of being preceded by a newline, you can use an anchor ^ instead.
^(\w+)\n\1s$
^ Start of string
(\w+) Capture group 1, match 1+ word chars
\n\1s Match a newline, backreference \1 to match the same text as group 1 and an s char
$ End of string
Regex demo

Exclude curly brace matches

I have the following strings:
logger.debug('123', 123)
logger.debug(`123`,123)
logger.debug('1bc','test')
logger.debug('1bc', `test`)
logger.debug('1bc', test)
logger.debug('1bc', {})
logger.debug('1bc',{})
logger.debug('1bc',{test})
logger.debug('1bc',{ test })
logger.debug('1bc',{ test})
logger.debug('1bc',{test })
Instead of debug there can be other calls like warn, fatal etc.
All quote pairs can be "", '' or ``.
I need to create a regular express which matches case 1 - 5 but not 6 - 11.
That's what I've come up with:
logger.*\(['`].*['`],\s*.([^{.*}])
This also matches 8 - 11, so I'm suspecting this part is wrong ([^{.*}]) but I don't get it why.
You can try this
logger\.[^(]+\((?:"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|`(?:\\`|[^`])*`),[^{}]*?\)
Regex Demo
P.S:- This pattern can be shorten if we are sure there won't be any mismatch of quotes, also if there won't be any escaped quote inside string
If there's no escaped string
logger\.[^(]+\((?:"[^"]*"|'[^']*'|`[^`]*`),[^{}]*?\)
If there's no quotes in between string. i.e no strings like "mr's jhon
logger\.[^(]+\(([`"'])[^"'`]*\1,[^{}]*?\)
If there are no quotes between the quoted parts, you could make use of a capturing group to match one of the quote types (['`"]) and use a backreference \1 to match the closing quote type.
The \r\n in the negated character class is to not cross newline boundaries.
The pattern will match either the quoted parts or 1+ times a word character for the first part.
The second part matches any char except { or } or ) using a negated character class.
logger\.[^(\r\n]+\((?:(['`"])[^'`"]+\1|\w+),[^{})\r\n]+\)
That will match
logger\. Match logger.
[^(\r\n]+ Match 1+ times any char except ( or a newline
\( Match (
(?: Non capture group
(['`"]) Capture group 1
[^'`"]+\1 Match 1+ times any char except the quote types, backreference to the captured
| or
\w+ Match 1+ word chars
), Close non capture group and match ,
[^{})\r\n]+ Match 1+ times any char except { } ) or a newline
\) Match )
Regex demo

Seeking help on Regular expression

How can I extract this string from the text using regex
text: {abcdefgh="test-name-test-name-w2-a"} 54554654654 .654654654
Expected output: test-name-test-name-w2
Note: I tried this "([^\s]*)" and the output is test-name-test-name-w2-a. But need the output as I mentioned just above.
You can try with this regex
.*\"(.*)-.*\".*
The link to regex101 is test
You could extend the negated character class to also exclude - and ". Then use a repeating pattern using the same character class preceded with a -
The value is in the first capturing group.
"([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+"
" Match a " char
( Capture group 1
[^\s-"]+ Match 1+ times any char except - " or a whitespace char
(?: Non capturing group
[^\s-"]+Match 1+ times any char except - " or a whitespace char
)* Close non capturing group, repeat 0+ times
) Close capture group
-[^\s-"]+ Match 1+ times any char except - " or a whitespace char
" Match a " char
Regex101 demo
(On regex101 at the FLAVOR panel you can switch between PCRE and Golang)
Update
To match where the word test is present and not for example test1 you could use a negative lookahead (?![^"\s]*\btest\w) to assert no presence of test followed by a word character.
""(?![^"\s]*\btest\w)([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+""
Regex demo

Regex (PCRE) exclude certain words from match result

I need to get only the string with names that is in Bold:
author={Trainor, Sarah F and Calef, Monika and Natcher, David and Chapin, F Stuart and McGuire, A David and Huntington, Orville and Duffy, Paul and Rupp, T Scott and DeWilde, La'Ona and Kwart, Mary and others},
Is there a way to skip all 'and' 'others' words from match result?
Tried to do lots of things, but nothing works as i expect
(?<=\{).+?(?<=and\s).+(?=\})
Instead of using omission, you could be better off by implementing rules which expect a specific format in order to match the examples you've provided:
([A-Z]+[A-Za-z]*('[A-Za-z]+)*, [A-Z]? ?[A-Z]+[A-Za-z]*('[A-Za-z]+)*( [A-Z])?)
https://regex101.com/r/9LGqn3/3
You could make use of \G and a capturing group to get you the matches.
The values are in capturing group 1.
(?:author={|\G(?!^))([^\s,]+,(?:\h+[^\s,]+)+)\h+and\h+(?=[^{}]*\})
About the pattern
(?: Non capturing group
author={ Match literally
| Or
\G(?!^) Assert position at the end of previous match, not at the start
) Close non capturing group
( Capture group 1
[^\s,]+, Match not a whitespace char or comma, then match a comma
(?:\h+[^\s,]+)+ Repeat 1+ times matching 1+ horizontal whitespace chars followed by matching any char except a whitespace char and a comma
) Close group 1
\h+and\h+ Match and between 1+ horizontal whitespaces
(?=[^{}]*\}) Assert what is on the right is a closing }
Regex demo

Regex to remove all parentheses except most external ones

I have been trying and reading many similar SO answers with no luck.
I need to remove parentheses in the text inside parentheses keeping the text. Ideally with 1 regex... or maybe 2?
My text is:
Alpha (Bravo( Charlie))
I want to achieve:
Alpha (Bravo Charlie)
The best I got so far is:
\\(|\\)
but it gets:
Alpha Bravo Charlie
You can use a regex like this:
(\(.*?)\((.*?)\)
With this replacement string:
$1$2
Regex demo
Update: as per ııı comment, since I don't know your full sample text I provide this regex in case you have this scenario
(\([^)]*)\((.*?)\)
Regex demo
From your post and comments, it seems you want to remove only the inner most parenthesis, for which you can use following regex,
\(([^()]*)\)
And replace with $1 or \1 depending upon your language.
In this regex \( matches a starting parenthesis and \) matches a closing parenthesis and ([^()]*) ensures the captured text doesn't contain either ( or ) which ensures it is the innermost parenthesis and places the captured text in group1, and whole match is replaced by what got captured in group1 text, thus getting rid of the inner most parenthesis and retaining the text inside as it is.
Demo
Your pattern \(|\) uses an alternation then will match either an opening or closing parenthesis.
If according to the comments there is only 1 pair of nested parenthesis, you could match:
(\([^()]*)\(([^()]*\)[^()]*)\)
( Start capturing group
\( Match opening parenthesis
[^()]* Match 0+ times not ( or )
) Close group 1
\( Match
( Capturing group 2
\([^()]*\) match from ( till )
[^()]* Match 0+ times not ( or )
) close capturing group
\) Match closing parenthesis
And replace with the first and the second capturing group.
Regex demo