Trim value after 2 text patterns - regex

I have a string which I need to extract the "migration" value from (dynamic content).
The problem is that there are several patterns on the marked section.
Instead of defining 2 regex I would like to have it on single one.
(?i)Host: api-(.*?).A9net.io
(?i)Host: stt-(.*?).A9net.io
One pattern: Host: api-**migration**.A9net.io
Second pattern: Host: stt-**migration**.A9net.io
I need the migration value extracted

You might use an alternation to match either api or sst. Note to escape the dot to match it literally.
(?i)Host: (?:api|stt)-(.*?)\.A9net\.io
Regex demo
The (.*?) matches 0+ times which would also match when migration is not there. In that case you could use (.+?) instead to at least match 1 char.
If the migration value can not contain a dot, you might also use a negated character class to match 1+ times not a dot ([^.]+)

You could use this pattern: (?i)^Host: (?:stt|api)-([^.]+).A9net.io$
As already mentioned, alternation is key to your problem.
Additionally, it's recommended to use negated character class instead of lazy quantifier (such as +?) when possible. In this case it's [^.]+ - it matches one or more characters other than dot, so it will match untill first occurence of a dot, which is what you want when using lazu quantifier followed by dot.
Demo

Related

Regex - How to prevent any string that starts with "de" but cannot use lookahead or lookbehind?

I have a regex
[a-zA-Z][a-z]
I have to change this regex such that the regex should not accept string that starts with "de","DE","dE" and "De" .I cannot use look behind or look ahead because my system does not support it?
There's a solution without a lookahead or lookbehind, but you need to be able to use groups.
The idea there is to create a sort of "honeypot" that will match your negative results and keep only the results that do interest you.
In your case, that would write:
[dD][eE].*|(<your-regex>)
If the proposition is de<anything> (case insensitive here), it will match, but group(1) will be null.
On the other hand, matching diZ for instance would match not match what is before the or and would therefore fall into the group(1).
Finally, if the proposition doesn't start with de and doesn't match your regex, well, there will be no groups to get at all.
If you need to be sure that your proposition will match the whole provided string, you can update the regex thus:
^(?:[dD][eE].*|(<your-regex>))$
Note that ?: is not a lookahead of any kind, it serves to mark the group as non-capturing, so that <your-regex> will still be captured by group(1) (would become group(2) otherwise and the capture of a group is not always a transparent operation, performance-wise).
Simply ignore those characters:
[a-ce-z][a-df-z][a-gi-kwxyzWZXZ]
Make sure the flag is set to case insensitive. Also, [a-gi-kwxyzWZXZ] can then be modified to [a-gi-kwxyz].
EDIT:
As pointed out in this comment, the regex here won't support other words that start with d but are not followed by e. In this case, negative lookahead is a possible solution:
^(?!de)[a-z]+
This matches anything not starting with "DE" (case insensitive, without look arounds, allowing leading whitespace):
^ *+(?:[^Dd].|.[^Ee])<your regex for rest of input>
See live demo.
The possessive quantifier *+ used for whitespace prevents [^Dd] from being allowed to match a space via backtracking, making this regex hardened against leading spaces.
You can use an alternation excluding matching the d and D from the first character, or exclude matching the e as the second character.
Note that the pattern [a-zA-Z][a-z] matches at least 2 characters, so will the following pattern:
^(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z]).*
^ Start of string
(?: Non capture group
[abce-zABCE-Z][a-z] Match a char a-zA-Z without d and D followed by a lowercase char a-z
| or
[a-zA-Z][a-df-z] Match a char a-zA-Z followed by a lowercase chars a-z without e
) Close non capture grou
.* Match 0+ times any char except a newline
Regex demo
Another option is to use word boundaries \b instead of an anchor ^
\b(?:[abce-zABCE-Z][a-z]|[a-zA-Z][a-df-z])[a-zA-Z]*\b
Regex demo

Match a part of a string using regex

I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regex get a value after a delimiter

I was trying to write some regex to be able to fetch the value of banana. So given this list of text.
So essentially, for each line, I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
apple=1|banana=2.5|oranges=1
banana=2.5|apple=1|oranges=1
apple=1|oranges=1|banana=2.5
apple=1|oranges=1|banana=-2.5
banana=2.5
I got as far as writing (?i)banana=(.*) but of course it gets everything after the exact match.
Do you guys have any solutions?
Thanks!
I would like to be able to get whatever comes after banana= and have it stop at | if it exists.
You may use a negated character class instead of a greedy dot pattern:
(?i)banana=([^|]*)
See the regex demo
The greedy dot, .*, matches any 0+ chars other than line break chars (in NFA engines) as many as possible (usually, up to the end of the line).
If you use [^|], a negated character class, it will match any char but |.
Pattern details
(?i) - case insensitive modifier
banana= - a literal substring (prepend with \b to match it as a whole word)
([^|]*) - Capturing group 1: any 0+ chars other than | (to avoid empty matches, replace * with + quantifier).

Regex match everything except a specific character

I am trying to set up a regex that will match the following:
*test*
*t*
*te*
But, I do not want it to match:
*test**
The general rules are:
Must start with the beginning of the line (^) or a whitespace character (\s)
Must have one and only one *
Can match any character
Must match one more *
Must end with end of the line ($) or a whitespace character (\s)
I have generated the following regex:
(\s|^)\*([^\*].+?[^\*])\*(\s|$)
This nearly satisfies my requirements; however, because of the two [^\*] groups within the second capturing group, it seems to require that capturing group to be 3 characters or more. *tes* matches, but *t* and *te* do not.
I have three specific questions:
Why does the character negation lead to the 3 character limit?
Is there a better way to express "any character except" than I have done here?
Any thoughts on a better regex to satisfy my requirements?
The problem in the regex is an extra . in the capturing group
[^\*].+?[^\*]
^
This will match a character except * followed by one or more of any characters except newline.
As the character class is repeated twice, you can use + quantifier to match one or more characters.
(\s|^)\*([^\*]+?)\*(\s|$)
Demo
You can also use non-capturing groups to exclude the extra matches.
(?:\s|^)\*([^\*]+?)\*(?:\s|$)
Demo 2