I was trying to write some regex to be able to fetch the value of banana. So given this list of text.
So essentially, for each line, I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
apple=1|banana=2.5|oranges=1
banana=2.5|apple=1|oranges=1
apple=1|oranges=1|banana=2.5
apple=1|oranges=1|banana=-2.5
banana=2.5
I got as far as writing (?i)banana=(.*) but of course it gets everything after the exact match.
Do you guys have any solutions?
Thanks!
I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
You may use a negated character class instead of a greedy dot pattern:
(?i)banana=([^|]*)
See the regex demo
The greedy dot, .*, matches any 0+ chars other than line break chars (in NFA engines) as many as possible (usually, up to the end of the line).
If you use [^|], a negated character class, it will match any char but |.
Pattern details
(?i) - case insensitive modifier
banana= - a literal substring (prepend with \b to match it as a whole word)
([^|]*) - Capturing group 1: any 0+ chars other than | (to avoid empty matches, replace * with + quantifier).
Related
I have 2 variants of strings:
some_prefix.needed part*some_suffix
some_prefix.needed part
I need only 'needed part' to be matched.
Left boundary is always dot.
Right boundary is asterisk (if exists) or end of line.
Already tried:
/.*[.](.*)[*].*/ - is working for first case
/.*[.](.*)/ - is working for second case
How to do the same with one regex?
You can use
/\.([^*]+)/
See the regex demo.
Details
\. - a dot
([^*]+) - Group 1: any one or more chars other than a *.
You can also make sure you get the rightmost match by using .* before the pattern (as in the original regex):
/.*\.([^*]+)/
If supported, you might also use a lookbehind to assert a . to the left.
(?<=\.)[^*]+
The pattern matches:
(?<=\.) Positive lookbehind, assert . directly to the left
[^*]+ Match 1+ times any char except * using a negated character class
Regex demo
I have a regex
[a-zA-Z][a-z]
I have to change this regex such that the regex should not accept string that starts with "de","DE","dE" and "De" .I cannot use look behind or look ahead because my system does not support it?
There's a solution without a lookahead or lookbehind, but you need to be able to use groups.
The idea there is to create a sort of "honeypot" that will match your negative results and keep only the results that do interest you.
In your case, that would write:
[dD][eE].*|(<your-regex>)
If the proposition is de<anything> (case insensitive here), it will match, but group(1) will be null.
On the other hand, matching diZ for instance would match not match what is before the or and would therefore fall into the group(1).
Finally, if the proposition doesn't start with de and doesn't match your regex, well, there will be no groups to get at all.
If you need to be sure that your proposition will match the whole provided string, you can update the regex thus:
^(?:[dD][eE].*|(<your-regex>))$
Note that ?: is not a lookahead of any kind, it serves to mark the group as non-capturing, so that <your-regex> will still be captured by group(1) (would become group(2) otherwise and the capture of a group is not always a transparent operation, performance-wise).
Simply ignore those characters:
[a-ce-z][a-df-z][a-gi-kwxyzWZXZ]
Make sure the flag is set to case insensitive. Also, [a-gi-kwxyzWZXZ] can then be modified to [a-gi-kwxyz].
EDIT:
As pointed out in this comment, the regex here won't support other words that start with d but are not followed by e. In this case, negative lookahead is a possible solution:
^(?!de)[a-z]+
This matches anything not starting with "DE" (case insensitive, without look arounds, allowing leading whitespace):
^ *+(?:[^Dd].|.[^Ee])<your regex for rest of input>
See live demo.
The possessive quantifier *+ used for whitespace prevents [^Dd] from being allowed to match a space via backtracking, making this regex hardened against leading spaces.
You can use an alternation excluding matching the d and D from the first character, or exclude matching the e as the second character.
Note that the pattern [a-zA-Z][a-z] matches at least 2 characters, so will the following pattern:
^(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z]).*
^ Start of string
(?: Non capture group
[abce-zABCE-Z][a-z] Match a char a-zA-Z without d and D followed by a lowercase char a-z
| or
[a-zA-Z][a-df-z] Match a char a-zA-Z followed by a lowercase chars a-z without e
) Close non capture grou
.* Match 0+ times any char except a newline
Regex demo
Another option is to use word boundaries \b instead of an anchor ^
\b(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z])[a-zA-Z]*\b
Regex demo
I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?
You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.
try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace
If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.
^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.
I have a string which I need to extract the "migration" value from (dynamic content).
The problem is that there are several patterns on the marked section.
Instead of defining 2 regex I would like to have it on single one.
(?i)Host: api-(.*?).A9net.io
(?i)Host: stt-(.*?).A9net.io
One pattern: Host: api-**migration**.A9net.io
Second pattern: Host: stt-**migration**.A9net.io
I need the migration value extracted
You might use an alternation to match either api or sst. Note to escape the dot to match it literally.
(?i)Host: (?:api|stt)-(.*?)\.A9net\.io
Regex demo
The (.*?) matches 0+ times which would also match when migration is not there. In that case you could use (.+?) instead to at least match 1 char.
If the migration value can not contain a dot, you might also use a negated character class to match 1+ times not a dot ([^.]+)
You could use this pattern: (?i)^Host: (?:stt|api)-([^.]+).A9net.io$
As already mentioned, alternation is key to your problem.
Additionally, it's recommended to use negated character class instead of lazy quantifier (such as +?) when possible. In this case it's [^.]+ - it matches one or more characters other than dot, so it will match untill first occurence of a dot, which is what you want when using lazu quantifier followed by dot.
Demo
I have a text in which I want to get only the hexadecimal codes.
Like: "thisissometextthisistext\x64\x6f\x6e\x74\x74\x72\x61\x6e\x73\x6c\x61\x74\x65somemoretextoverhere"
It's possible to get the hex codes with \x..
But it doesn't seems I can do something like (^\x..) to select everything but the hex codes.
Any workarounds?
You may use a (?s)((?:\\x[a-fA-F0-9]{2})+)|. regex (that will match and capture into Group 1 any 1+ sequences of hex values OR will just match any other char including a line break char) and replace with a conditional replacement pattern (?{1}$1\n:) (that will reinsert the hex value chain or will replace the match with an empty string):
Find What: (?s)((?:\\x[a-fA-F0-9]{2})+)|.
Replace With: (?{1}$1\n:)
Regex Details:
(?s) - same as . matches newline option ON
((?:\\x[a-fA-F0-9]{2})+) - Group 1 capturing one or more sequences of
\\x - a \\x
[a-fA-F0-9]{2} - 2 letters from a to f or digits
| - or
. - any single char.
Replacement pattern:
(?{1} - if Group 1 matches:
$1\n - replace with its contents + a newline
: - else replace with an empty string
) - end of the replacement pattern.
try ^.*?((\\x[a-f0-9]{2})+).*$ and replace with $1
and it should just leave the hex code
then after replace
If you are already able to find the hexcodes with your regex, couldn't you just use that information to delete all of the hexcodes from the string (or from a clone of the string if you need to preserve the original) and you would be left with all text except for hexcodes.
^ acts as a negation token only inside (and at the beginning) of a character class, you can't use it to negate substrings of several characters.
To select all that isn't \xhh you can use this pattern:
\G(?:\\x[a-f0-9]{2})*+\K(?=.|\n)[^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)*
it matches the \xhhs first and removes them from the match using the \K feature (that removes all on the left). The other part of the pattern [^\\]*(?:\\(?!x[a-f0-9]{2})[^\\]*)* matches all that isn't a \xhh. Since this subpattern can match the empty string at the end of the string, I added the lookahead (?=.|\n) to ensure there's at least one character.
\G forces all matches to be contigous. In other words it matches the position at the end of the previous match.