Regex to extract last period and md5 string

Regex to extract last period and md5 string - regex

I have the following regular expression:
/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash, for example: if I have the following string "hello world .305eef9f x1xxx 304ccf9f test1232" it will return "304ccf9f"
I also have the following regular expression:
/.[^.]*$/ --- This expression extracts a string after the last period (included), for example, if I have "hello world.this.is.atest.case9.23919sd3xxxs" it will return ".23919sd3xxxs"
Thing is, I've readen a bit about regex but I can't join both expressions in order to find the md5 string after the last period (included), for example:
topLeftLogo.93f02a9d.controller.99f06a7s ----> must return ".99f06a7s"
Thanks in advance for your time and help!

/^[a-f0-9]{8}$/ --- This expression extracts an 8 character string as a md5 hash
Yes but it doesn't return "304ccf9f" from "hello world .305eef9f x1xxx 304ccf9f test1232" because ^ in regex means start of string. How is it possible for it to match in middle of a string?
/.[^.]*$/ --- This expression extracts a string after the last period
No. It will do if you escape first dot only \.
To combine these two you have to replace ^ with \.:
\.[a-f0-9]{8}$

To match your characters 8 times after the last dot in this range [a-f0-9] you might use (if supported) a positive lookahead (?!.*\.) to match your values and assert that what follows does not contain a dot:
\.[a-f0-9]{8}(?!.*\.)
Regex demo
If you want to match characters from a-z instead of a-f like 99f06a7s you could use [a-z0-9]
About the first example
This regex ^[a-f0-9]{8}$ will match one of the ranges in the character class 8 times from the start until the end of the string due to the anchors ^ and $. It would not find a match in hello world .305eef9f x1xxx 304ccf9f test1232 on the same line.
About the second example
.[^.]*$ will match any character zero or more times followed by matching not a dot. That would for example also match a single a and is not bound to first matching a dot because you have to escape the dot to match it literally.

I'm adding this just in case people needs to solve a similar casuistic:
Case 1: for example, we want to get the hexadecimal ([a-f0-9]) 8 char string from our filename string
between the last period and the file extension, in order, for example, to remove that "hashed" part:
Example:
file.name2222.controller.2567d667.js ------> returns .2567d667
We will need to use the following regex:
\.[a-f0-9]{8}(?=\.\w+$)
Case 2: for example, we want the same as above but ignoring the first period:
Example:
file.name2222.controller.2567d667.js ------> returns 2567d667
We will need to use the following regex
[a-f0-9]{8}(?=\.\w+$)

Related

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY

You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it

My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Unable to replace some values in regex

This is my input:
0,0,0,1
1,023,1230,1,0
,1,0,01-09-2018,1,
I want to replace 0s and 1s whose length is 1. Rest of them will be as it is.
I already tried with javascript code i.e. split all the strings with "," as delimiter. Then, checking for strings with length 1 and replacing them as per logic. But that's a tedious method which consumes a lot of time.
I want a Regex that can do the replacements in entire input.
I have already tried with this regex: ((0|1)(?<=,))|((0|1)(?=,)). But the output is wrong
Output will be such:
N,N,N,Y
Y,023,1230,Y,N
,Y,N,01-09-2018,Y,

You can use the following regexps with comma word boundaries:
(?<![^,])1(?![^,])
(?<![^,])0(?![^,])
Replace with the appropriate substring.
They match 1 or 0 only when enclosed with commas or start/end of string positions.
(?<![^,]) - a negative lookbehind that matches a position not immediately preceded with a char other than ,
(?![^,]) - a negative lookahead that matches a position not immediately followed with a char other than ,.

Regex for a string with alpha numeric containing a '.' character

I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.

Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)

If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).

You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3

Regular Expression begining of string with special characters

Using this for an example string
+$43073$7
and need the 5 number sequence from it I'm using the Regex expression
#"\$+(?<lot>\d{5})"
which is matching up any +$ in the string. I tried
#"^\$+(?<lot>\d{5})"
as the +$ are always at the beginning of the string. What will work?

If you use anchor ^, you need to include the + symbol at the first and don't forget to escape it because + is a special meta character in regex which repeats the previous token one or more times.
#"^\+\$(?<lot>\d{5})"
And without the anchor, it would be like
#"\$(?<lot>\d{5})"
And get the 5 digit number you want from group index 1.
DEMO

I would match what you want:
\d+
or if you only want digits after "special" characters at the start of input:
^\W+(\d+)
grabbing group 1

C# regular expression to match square brackets

I'm trying to use a regular expression in C# to match a software version number that can contain:
a 2 digit number
a 1 or 2 digit number (not starting in 0)
another 1 or 2 digit number (not starting in 0)
a 1, 2, 3, 4 or 5 digit number (not starting in 0)
an option letter at the end enclosed in square brackets.
Some examples:
10.1.23.26812
83.33.7.5
10.1.23.26812[d]
83.33.7.5[q]
Invalid examples:
10.1.23.26812[
83.33.7.5]
10.1.23.26812[d
83.33.7.5q
I have tried the following:
string rex = #"[0-9][0-9][.][1-9]([0-9])?[.][1-9]([0-9])?[.][1-9]([0-9])?([0-9])?([0-9])?([0-9])?([[][a-zA-Z][]])?";
(note: if I try without the "#" and just escape the square brackets by doing "\[" I get an error saying "Unrecognised escape sequence")
I can get to the point where the version number is validating correctly, but it accepts anything that comes after (for example: "10.1.23.26812thisShouldBeWrong" is being matched as correct).
So my question is: is there a way of using a regular expression to match / check for square brackets in a string or would I need to convert it to a different character (eg: change [a] to a and match for *s instead)?

This happens because the regex matches part of the string, and you haven't told it to force the entire string to match. Also, you can simplify your regex a lot (for example, you don't need all those capturing groups:
string rex = #"^[0-9]{2}\.[1-9][0-9]?\.[1-9][0-9]?\.[1-9][0-9]{0,4}(?:\[[a-zA-Z]\])?$";
The ^ and $ are anchors that match the start and end of the string.
The error message you mentioned has to do with the fact that you need to escape the backslash, too, if you don't use a verbatim string. So a literal opening bracket can be matched in a regex as "[[]" or "\\[" or #"\[". The latter form is preferred.

You need to anchor the regex with ^ and $
string rex = #"^[0-9][0-9][.][1-9]([0-9])?[.][1-9]([0-9])?[.][1-9]([0-9])?([0-9])?([0-9])?([0-9])?([[][a-zA-Z][]])?$";
the reason the 10.1.23.26812thisShouldBeWrong matches is because it matches the substring 10.1.23.26812
The regex can be simplfied slightly for readability
string rex = #"^\d{2}\.([1-9]\d?\.){2}[1-9]\d{0,4}(\[[a-zA-Z]\])?$";
In response to TimCross warning - updated regex
string rex = #"^[0-9]{2}\.([1-9][0-9]?\.){2}[1-9][0-9]{0,4}(\[[a-zA-Z]\])?$";

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to extract last period and md5 string - regex

Related

Regular Expression: Find a specific group within other groups in VB.Net

Unable to replace some values in regex

Regex for a string with alpha numeric containing a '.' character

Regular Expression begining of string with special characters

C# regular expression to match square brackets

Categories

Resources