Puppet dynamic variable from hostname - regex

I am looking at trying to get a dynamic variable out of my ec2's hostname. Hostnames follow this pattern
us-east-1b-application-type-environment-138-10.domain.com
I would like my variable to end up looking like this
application-type-environment
Using this
$variable = regsubst($hostname, '/[a-z]{1}[0-9]{1}-([^-]+)-[0-9]{1,3}/', '')
I end up with this though
us-east-1b-application-type-environment-138-10
How can I get my expected outcome?

You do not need regex delimiters in regsubst. You need to match the whole string to be able to remove it and only keep what you need. The techique consists in matching what you do not want to keep and matching and capturing what you do want to have asa result.
You can use
regsubst($hostname, '^[^0-9]*[0-9][a-z]-(.*?)-[0-9]{1,3}.*$', '\1')
I think you are trying to get just what is in between the first [digit][lowercase-letter] chunk and a three digit chunk.
Here is a regex demo
Breakdown of the expression:
^ - start of line (if start of string is meant, replace with \A)
[^0-9]* - 0 or more non-digit symbols (all but digits, this can be replaced with \D*)
[0-9][a-z]- - a digit followed by a lowercase letter followed by - (the same as \d[a-z])
(.*?) - match and capture any characters but a newline as few as possible before the closest...
-[0-9]{1,3} - 1 to 3 digits (the same as \d{1,3})
.*$ - 0 or more any characters but a newline up to the end of line (if end of string is meant, replace with \z).

Related

capturing values after an optional slash

I am trying to write in regex a string that allows me to have
an alphanumeric string of length no longer than 5 (as an example) [a-z0-9]{3,5}
followed by an optional forward slash /?
that cannot end in a 3
I want to capture any group of at least 3, with our without a slash, and then anything after it.
And I am having a very hard time accomplishing this. If I require the slash / it is much easier to do so.
When I try
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)
I can capture what I want - up until the slash, but can't crack how to get anything after IF legit things occur
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?
My requirement for length goes up by 1 - to 4 instead of 3 - due to the additional . I put after the \/?. I could change my match to account for it, but it becomes really difficult.
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)$
This only gives me the last slash or non slash follwed by 2,5 characters.
(?=.+\/?.+)[a-z0-9]{2,62}\/?.*
or
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?+
simply then ignores my ending rule, of not being able to close with3/ or 3. Also this allows me to use more than 5 characters before the slash. Def not what I want :)
Is there a way to make an optional field still maintain length and ending rules?
I am running this script on both regexr.com and https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_regexp and gitbash and not getting the results I would like
Try:
^[a-z0-9]{3,5}(?<!3)(?:$|\/.*)
Regex demo.
^ - beginning of the string
[a-z0-9]{3,5} - capture a-z0-9 between 3 and 5 times
(?<!3) - the last character should not be 3
(?:$|\/.*) - match either end of string $ or / and any number of characters.
If the last character in this range [a-z0-9] should not be a 3 you can exclude it like [a-z124-9]
^[a-z0-9]{2,4}[a-z124-9](?:\/.*)?$
Explanation
^ Start of string
[a-z0-9]{2,4} Match 2-4 chars in the ranges a-z 0-9
[a-z124-9] Match a single char a-z and then either 1,2 4-9
(?:\/.*)? Optionally match / and the rest of the line
$ End of string
See a regex101 demo.
If you can not match a 3 at all:
^[a-z124-9]{3,5}(?:\/.*)?$
See another regex101 demo

Pattern to match everything except a string of 5 digits

I only have access to a function that can match a pattern and replace it with some text:
Syntax
regexReplace('text', 'pattern', 'new text'
And I need to return only the 5 digit string from text in the following format:
CRITICAL - 192.111.6.4: rta nan, lost 100%
Created Time Tue, 5 Jul 8:45
Integration Name CheckMK Integration
Node 192.111.6.4
Metric Name POS1
Metric Value DOWN
Resource 54871
Alert Tags 54871, POS1
So from this text, I want to replace everything with "" except the "54871".
I have come up with the following:
regexReplace("{{ticket.description}}", "\w*[^\d\W]\w*", "")
Which almost works but it doesn't match the symbols. How can I change this to match any word that includes a letter or symbol, essentially.
As you can see, the pattern I have is very close, I just need to include special characters and letters, whereas currently it is only letters:
You can match the whole string but capture the 5-digit number into a capturing group and replace with the backreference to the captured group:
regexReplace("{{ticket.description}}", "^(?:[\w\W]*\s)?(\d{5})(?:\s[\w\W]*)?$", "$1")
See the regex demo.
Details:
^ - start of string
(?:[\w\W]*\s)? - an optional substring of any zero or more chars as many as possible and then a whitespace char
(\d{5}) - Group 1 ($1 contains the text captured by this group pattern): five digits
(?:\s[\w\W]*)? - an optional substring of a whitespace char and then any zero or more chars as many as possible.
$ - end of string.
The easiest regex is probably:
^(.*\D)?(\d{5})(\D.*)?$
You can then replace the string with "$2" ("\2" in other languages) to only place the contents of the second capture group (\d{5}) back.
The only issue is that . doesn't match newline characters by default. Normally you can pass a flag to change . to match ALL characters. For most regex variants this is the s (single line) flag (PCRE, Java, C#, Python). Other variants use the m (multi line) flag (Ruby). Check the documentation of the regex variant you are using for verification.
However the question suggest that you're not able to pass flags separately, in which case you could pass them as part of the regex itself.
(?s)^(.*\D)?(\d{5})(\D.*)?$
regex101 demo
(?s) - Set the s (single line) flag for the remainder of the pattern. Which enables . to match newline characters ((?m) for Ruby).
^ - Match the start of the string (\A for Ruby).
(.*\D)? - [optional] Match anything followed by a non-digit and store it in capture group 1.
(\d{5}) - Match 5 digits and store it in capture group 2.
(\D.*)? - [optional] Match a non-digit followed by anything and store it in capture group 3.
$ - Match the end of the string (\z for Ruby).
This regex will result in the last 5-digit number being stored in capture group 2. If you want to use the first 5-digit number instead, you'll have to use a lazy quantifier in (.*\D)?. Meaning that it becomes (.*?\D)?.
(?s) is supported by most regex variants, but not all. Refer to the regex variant documentation to see if it's available for you.
An example where the inline flags are not available is JavaScript. In such scenario you need to replace . with something that matches ALL characters. In JavaScript [^] can be used. For other variants this might not work and you need to use [\s\S].
With all this out of the way. Assuming a language that can use "$2" as replacement, and where you do not need to escape backslashes, and a regex variant that supports an inline (?s) flag. The answer would be:
regexReplace("{{ticket.description}}", "(?s)^(.*\D)?(\d{5})(\D.*)?$", "$2")

Regex to truncate last 4 characters and make sure it doesn't contain specific words

I get strings such as "BumpCardV2Resource.getInstance.Time" or "BumpCardWebResource.getInstance.Time" or "BumpCardResource.getInstance.Time". I need a regex expression to obtain only "BumpCardResource.getInstance"
Currently, I'm using a negative look ahead to make sure the string does not contain V2 or Web but not sure how to truncate the last 5 characters (.Time) along with that.
Regex I'm using: /^(?!.*V2|.*WebResource).*$/
PS - The resource and the API endpoint keeps changing. It need not be necessarily only BumpCard or getInstance
You can use a regex where you match the whole string and capture any text from the start of the string till .Time at the end of it:
/^(?!.*V2|.*WebResource)(.*)\.Time$/
See the regex demo. Details:
^ - start of string
(?!.*V2|.*WebResource) - no V2 or WebResource allowed anywhere in the string
(.*) - Capturing group 1: any zero or more chars other than line break chars as many as possible
\.Time - .Time string
$ - end of string.

Regex to match a unlimited repeating pattern between two strings

I have a dataset with repeating pattern in the middle:
YM10a15b5c27
and
YM1b5c17
How can I get what is between "YM" and the last two numbers?
I'm using this but is getting one number in the end and should not.
/([A-Z]+)([0-9a-z]+)([0-9]+)/
Capture exactly two characters in the last group:
/([A-Z]+)([0-9a-z]+)([0-9]{2})/
You should use:
/^(?:([a-z]+))([0-9a-z]+)(?=\1)/
^ matches the start of the sentence. This is really important, because if your code is aaaa1234aaaa, then without the ^, it would also match the aaaa of the end.
(?:([a-z]+)) is a non-capturing group which takes any letter from 'a' to 'z' as group 1
(?=\1) tells the regex to match the text as long as it is followed by the same code at the starting.
All you have to do is extract the code by group(2)
An example is shown here.
Solution
If you want to match these strings as whole words, use \b(([a-z])\2)([0-9a-z]+)(\1)\b. If you need to match them as separate strings, use ^(([a-z])\2)([0-9a-z]+)(\1)$.
Explanation
\b - a word boundary (or if ^ is used, start of string)
(([a-z])\2) - Group 1: any lowercase ASCII letter, exactly two occurrences (aa, bb, etc.)
([0-9a-z]+) - Group 3: 1 or more digits or lowercase ASCII letters
(\1) - Group 4: the same text as stored in Group 1
\b - a word boundary (or if $ is used, end of string).

How do you specify multiples in negative character classes in regular expressions?

I am trying to write a regular expression to search for anything but digits or the * or - characters, with one caveat. Where I'm hitting a wall is that I need to be able to allow three or less digits to be found but not four or more, though even one * or - shouldn't be found.
This is what I have so far (for three matches):
.*?([^0-9\*-]+).*?([^0-9\*-]+).*?([^0-9\*-]+).*?
I have no idea where to insert {4,} for the digits (I've tried and it doesn't seem to work anywhere) or how to change it to do as I want.
For instance, in "Jack has* 777 1883874 -sheep-" I'd like it to return "Jack has 777 sheep". Or in "2343klj-3***.net" I'd like it to return "klj 3 .net"
You may use the following regex (replacing with a literal space, " "):
(?:[-*\s]|\d{4,})+
See the regex demo. Replace with $1 (to insert one captured horizontal whitespace if any).
Details
(?:[-*\s]|\d{4,})+ - a non-capturing group matching one or more consecutive repetitions of
[-*\s] - 0+ whitespaces, - or/and *
| - or
\d{4,} - 4+ digits.
Next, to remove all leading and trailing whitespace you may use
^\s+|\s+$
and replace with an empty string. ^\s+ matches 1+ whitespaces at the start of the string and \s+$ matches 1+ whitespaces at the end of the string.
With the help here, this is what works. It may be impossible to do it all in one regex because of the conflict of needing no spaces at the beginning and end but spaces in between each remaining grouping.
First, a find and replace using ([-*\h]|\d{4,})+ and replacing with a space.
Second, using ^\s*(.*)\s*$.