How can I extract this string from the text using regex
text: {abcdefgh="test-name-test-name-w2-a"} 54554654654 .654654654
Expected output: test-name-test-name-w2
Note: I tried this "([^\s]*)" and the output is test-name-test-name-w2-a. But need the output as I mentioned just above.
You can try with this regex
.*\"(.*)-.*\".*
The link to regex101 is test
You could extend the negated character class to also exclude - and ". Then use a repeating pattern using the same character class preceded with a -
The value is in the first capturing group.
"([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+"
" Match a " char
( Capture group 1
[^\s-"]+ Match 1+ times any char except - " or a whitespace char
(?: Non capturing group
[^\s-"]+Match 1+ times any char except - " or a whitespace char
)* Close non capturing group, repeat 0+ times
) Close capture group
-[^\s-"]+ Match 1+ times any char except - " or a whitespace char
" Match a " char
Regex101 demo
(On regex101 at the FLAVOR panel you can switch between PCRE and Golang)
Update
To match where the word test is present and not for example test1 you could use a negative lookahead (?![^"\s]*\btest\w) to assert no presence of test followed by a word character.
""(?![^"\s]*\btest\w)([^\s-"]+(?:-[^\s-"]+)*)-[^\s-"]+""
Regex demo
Related
Is there a regex to extract all spaces that separate key+value pairs and ignoring those delimited by double quotes
sample:
key1=value1 key1=value1 spaces="some spaces in text" nested1="key2=value2 key2=value2 key2=value2" nested2="key2=value2, key2=value2, key2=value2" quoted="his name is \"no body\""
this is where i come for so far: (?<!,) (?=\w+=), but of course it doesn't work.
[^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+)\K[ \t]+
PCRE demo
No need to write back. just matches space delimiters.
can replace with new delimiter
([^\s="]+\s*=\s*(?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^\s=]+))([ \t]+)
Python demo
Can write back \1 or \2 if needed.
can replace with new delimiter
note - the part of the above expressions matching the field info
could benifit by placing Atomic group around (?>) but not strictly
necessary as the field structure is fairly concise.
are other options to garantee integrity as well like matching every
character with the use of the \G anchor if availibul.
let me know if need this approach.
many ways to go here
Here is another option:
".*?(?<!\\)"(*SKIP)(*F)| +
See the online demo
Please do let me know if it actually does what is required as I'm unsure. Anyways, here is a breakdown:
" - A literal double quote.
.*? - Anything but newline zero or more times but lazy.
(?<!\\) - A negative lookbehind for \.
" - A literal double quote.
(*SKIP)(*F) - Consume all characters of matches, force a failure and continue matching.
| - Alternation.
+ - One or more space characters.
If it's Python you are using, you'll need a reference to the PypI regex module.
You could do that with the following PCRE-compatible regular expression.
\G[^" \n]*(?:(?<!\\)"(?:[^\n"]|(?<=\\)")*(?<!\\)"[^" \n]*)*\K +
Start your engine!
\G : assert position at the end of the previous match
or the start of the string for the first match
[^" \n]* : match 0+ chars other than those in char class
(?: : begin non-capture group
(?<!\\) : use negative lookbehind to assert next char is not
preceded by a backslash
" : match double-quote
(?: : begin non-capture group
[^"\n] : match a char other than those in char class
| : or
(?<=\\) : use positive lookbehind to assert next char is
preceded by a backslash
" : match double-quote
) :end non-capture group
* : match non-capture group 0+ times
(?<!\\) : use negative lookbehind to assert next char is not
" : match double-quote
[^" \n]* : match 0+ chars other than those in char class
) : end non-capture group
* : match non-capture group 0+ times
\K : forget everything matched so far and reset start of match
\ + : match 1+ spaces
I have the following strings:
logger.debug('123', 123)
logger.debug(`123`,123)
logger.debug('1bc','test')
logger.debug('1bc', `test`)
logger.debug('1bc', test)
logger.debug('1bc', {})
logger.debug('1bc',{})
logger.debug('1bc',{test})
logger.debug('1bc',{ test })
logger.debug('1bc',{ test})
logger.debug('1bc',{test })
Instead of debug there can be other calls like warn, fatal etc.
All quote pairs can be "", '' or ``.
I need to create a regular express which matches case 1 - 5 but not 6 - 11.
That's what I've come up with:
logger.*\(['`].*['`],\s*.([^{.*}])
This also matches 8 - 11, so I'm suspecting this part is wrong ([^{.*}]) but I don't get it why.
You can try this
logger\.[^(]+\((?:"(?:\\"|[^"])*"|'(?:\\'|[^'])*'|`(?:\\`|[^`])*`),[^{}]*?\)
Regex Demo
P.S:- This pattern can be shorten if we are sure there won't be any mismatch of quotes, also if there won't be any escaped quote inside string
If there's no escaped string
logger\.[^(]+\((?:"[^"]*"|'[^']*'|`[^`]*`),[^{}]*?\)
If there's no quotes in between string. i.e no strings like "mr's jhon
logger\.[^(]+\(([`"'])[^"'`]*\1,[^{}]*?\)
If there are no quotes between the quoted parts, you could make use of a capturing group to match one of the quote types (['`"]) and use a backreference \1 to match the closing quote type.
The \r\n in the negated character class is to not cross newline boundaries.
The pattern will match either the quoted parts or 1+ times a word character for the first part.
The second part matches any char except { or } or ) using a negated character class.
logger\.[^(\r\n]+\((?:(['`"])[^'`"]+\1|\w+),[^{})\r\n]+\)
That will match
logger\. Match logger.
[^(\r\n]+ Match 1+ times any char except ( or a newline
\( Match (
(?: Non capture group
(['`"]) Capture group 1
[^'`"]+\1 Match 1+ times any char except the quote types, backreference to the captured
| or
\w+ Match 1+ word chars
), Close non capture group and match ,
[^{})\r\n]+ Match 1+ times any char except { } ) or a newline
\) Match )
Regex demo
I need help completing this regex pattern.
Here is the full string:
INSERT((1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
Here is the portion of the string I am trying to search for using regex_search:
(1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449)
Here is my regex pattern and code:
regex pattern2("\\(|[0-9]|[a-z]|[A-Z]|\,|\"|");
regex_search (substring,matcher,pattern2);
for(auto x:matcher)
{
substring1 = matcher.suffix().str();
cout << substring1 << endl;
}
substring will output:
1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449),geography)
So not what I need. Would appreciate some help.
To match (1574,"Greene County, Missouri",mo,50,29,77,05000US29077,285449) you can use a capturing group and a negated character class.
[^*()\r\n]+\((\([^()\r\n]+\))[^()\r\n]+\)
In parts
[^*()\r\n]+ Match 1+ times any char except the listed
\( Match first opening parenthesis
( Capture group 1
\( Match second opening parenthesis
[^()\r\n]+ Match 1+ times any char except the listed
\) Close second opening parenthesis
) Close group 1
[^()\r\n]+
\) Close first opening parenthesis
Regex demo
You could also make the pattern more restrict by using repeating non capturing groups and use the allowed characters from the character classes that you intended to use:
[a-zA-Z0-9]+\(\(((?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+")(?:,(?:[a-zA-Z0-9]+|"[a-zA-Z0-9]+(?:,? [a-zA-Z0-9]+)+"))+)\),[a-zA-Z0-9]+\)
Regex demo
I need to get the LDAP group names from this example string:
"user.ldap.groups.name" = "M-Role13" AND ("user.ldap.groups.name"= "M Role1" OR "user.ldap.groups.name" = "M.Group-Role16" OR "user.ldap.groups.name"="Admin Role" ) AND "common.platform" = "iOS" AND ( AND "ios.PersonalHotspotEnabled" = true ) AND "common.retired" = False
I'm using this regex to match the parts of the string that contains an LDAP group
("user\.ldap\.groups\.name"?.=.?".+?(.*?)")(?!"user\.ldap\.groups\.name")
but it is matching in group2 the name without the first character.
https://regex101.com/r/2Aby6K/1
A few notes about the pattern you tried
The reason it misses the first character is because this part .+? requires at least a single character
Note that in this part "?.=.?" it matches an optional ", an equals sign between any char due to the dot where the second dot is optional and then "
This part (.*?)")(?!"user\.ldap\.groups\.name") uses a non greedy dot .*? which will give up as least as possible to satisfy the condition to match a " which is not directly followed by user.ldap.groups.name. See an example of an incorrect match.
What you might do is use a negated character class
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"
In parts
"user\.ldap\.groups\.name" Match
\s*=\s* Match = between 0+ whitespace chars on the left and right
"( Match " and start capturing group
[^"]+ Match any char except " 1+ times
)" Close group and match "
Regex demo
Or if you want to include the negative lookahead:
"user\.ldap\.groups\.name"\s*=\s*"([^"]+)"(?!"user\.ldap\.groups\.name")
Regex demo
I have the following regex:
(href[\s]?=[\s]?)(\"[^"]*\/*[^"]*\")
using the following Test String:
href="http://mysite.io/Plan-documents"
I get two capturing groups. One with the href= and the other is everything past that. Now I want to only display matches where there is an uppercase letter anywhere in the second capture group. I tried:
(href[\s]?=[\s]?)(\"[A-Z]*[^"]*\/*[^"]*\")
to try and only have this regex come back with URL's that have uppercase in them. No luck. Regardless if I modify the test string as:
href="http://mysite.io/plan-documents"
I still get a match. I only want to match on the href string if there any at least one uppercase in the string past the href=.
Thanks.
You don't get the right matches because in your second capturing group all what is between double quotes uses a quantifier * which matches 0 or more times.
First the engine matches 0+ times [A-Z]*. It is not present but it is ok, because of the 0+ times quantifier. Then the next part [^"]* will match until right before it encounters the next "
The following \/* is not there but is also ok because of the 0+ times quantifier followed by [^"]* which is also ok.
What you might do instead is first match not an uppercase until you match an uppercase and then match until the closing double quotes.
(href\s?=\s?)("[^A-Z\s]*[A-Z][^\s"]*")
Explanation
(href\s?=\s?) Capture group, match href= surrounded by optional whitespace char
(" Start capture group and match "
[^A-Z\s]* Match 0+ times not an uppercase or whitespace char
[A-Z] Match 1 uppercase char
[^"\s]* Match 0+ times not " or a whitespace char
") Match " and close capture group
Regex demo
Without using groups, you could use:
href\s?=\s?"[^A-Z\s]*[A-Z][^\s"]*"
Regex demo