Regex expression for digit followed by dot (.) - regex

I want to find a text with with digit followed by a dot and replace it with the same text (digit with dot) and "xyz" string.
For ex.
1. This is a sample
2. test
3. string
**I want to change it to**
1.xyz This is a sample
2.xyz test
3.xyz string
I learnt how to find the matching text (\d.) but the challenge is to find the replace with text.
I'm using notepad ++ editor for this, can anyone suggest the "Replace with" string.

First of all, you need to escape the dot since it means "match anything (except newline depending if the s modifier is set)": (\d\.).
Second, you need to add a quantifier in case you have a 2 digit number or more: (\d+\.).
Third, we don't need group 1 in this case: \d+\..
In the replacement, it's quite simple: just use $0xyz. $0 will refer to group 0 which is the whole match.

For notepad++...
You must escape the period/dot character in the expression - precede it with a backslash:
\.
In my case, I needed to find all instances of "{EnvironmentName}.api.mycompany.com"
(dev.api.mycompany.com, stage.api.mycompany.com, prod.api.mycompany, etc.)
I used this search expression:
.*\.api.mycompany.com
Notepad++ RegEx Search Screenshot

I think the right answer is as follow:
Find: ^(\d)([.])(\s)
Replace: $1$2XYZ
That will work with "n. " being "n" a digit [0-9]. If the input should accept digits with different lengths like 10, 100, 1000... or multiples dots "." after the digit or multiple spaces after the dot, then the answer is:
Find: ^(\d*)([.])([.]*)(\s*)
Replace: $1$2XYZ
Input:
1. This is a sample
2. test
3. string
30. string
10..... string
50005... string
Output:
1.XYZ This is a sample
2.XYZ test
3.XYZ string
30.XYZ string
10.XYZ string
50005.XYZ string

Related

Regular Expression: Find a specific group within other groups in VB.Net

I need to write a regular expression that has to replace everything except for a single group.
E.g
IN
OUT
OK THT PHP This is it 06222021
This is it
NO MTM PYT Get this content 111111
Get this content
I wrote the following Regular Expression: (\w{0,2}\s\w{0,3}\s\w{0,3}\s)(.*?)(\s\d{6}(\s|))
This RegEx creates 4 groups, using the first entry as an example the groups are:
OK THT PHP
This is it
06222021
Space Charachter
I need a way to:
Replace Group 1,2,4 with String.Empty
OR
Get Group 3, ONLY
You don't need 4 groups, you can use a single group 1 to be in the replacement and match 6-8 digits for the last part instead of only 6.
Note that this \w{0,2} will also match an empty string, you can use \w{1,2} if there has to be at least a single word char.
^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$
^ Start of string
\w{0,2}\s\w{0,3}\s\w{0,3}\s Match 3 times word characters with a quantifier and a whitespace in between
(.*?) Capture group 1 match any char as least as possible
\s\d{6,8} Match a whitespace char and 6-8 digits
\s? Match an optional whitespace char
$ End of string
Regex demo
Example code
Dim s As String = "OK THT PHP This is it 06222021"
Dim result As String = Regex.Replace(s, "^\w{0,2}\s\w{0,3}\s\w{0,3}\s(.*?)\s\d{6,8}\s?$", "$1")
Console.WriteLine(result)
Output
This is it
My approach does not work with groups and does use a Replace operation. The match itself yields the desired result.
It uses look-around expressions. To find a pattern between two other patterns, you can use the general form
(?<=prefix)find(?=suffix)
This will only return find as match, excluding prefix and suffix.
If we insert your expressions, we get
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6}\s?)
where I simplified (\s|) as \s?. We can also drop it completely, since we don't care about trailing spaces.
(?<=\w{0,2}\s\w{0,3}\s\w{0,3}\s).*?(?=\s\d{6})
Note that this works also if we have more than 6 digits because regex stops searching after it has found 6 digits and doesn't care about what follows.
This also gives a match if other things precede our pattern like in 123 OK THT PHP This is it 06222021. We can exclude such results by specifying that the search must start at the beginning of the string with ^.
If the exact length of the words and numbers does not matter, we simply write
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+)
If the find part can contain numbers, we must specify that we want to match until the end of the line with $ (and include a possible space again).
(?<=^\w+\s\w+\s\w+\s).*?(?=\s\d+\s?$)
Finally, we use a quantifier for the 3 ocurrences of word-space:
(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)
This is compact and will only return This is it or Get this content.
string result = Regex.Match(#"(?<=^(\w+\s){3}).*?(?=\s\d+\s?$)").Value;

Regex for "replace all after first 2 digits after comma"

I would like to replace all characters after the first 2 digits after a comma.
E.g. having a string of 1234,56789 should result into 1234,56.
Using [^,]*$ has led me to the right path, but deleting everything after the comma.
A [^,]..$ doesnt give me a correct result too, thus I need a way to tell my expression that "the first 2 digits after the comma" got to be deleted, not "the last 2 digits" since thats what the ".." seems to do in my expression.
You can use
(,\d{2}).*
The regex matches and captures into Group 1 a comma and two digits, and just matches the rest of the line with .*.
To remove only after last comma:
(.*,\d{2}).*
Here, .* at the start captures also everything at the start of the string.
A more retrictive pattern will be
^(\d+,\d{2})\d*$
It matches start of string (with ^), then one or more digits (with \d+), a comma, two digits, all captured into Group 1, and then just matches zero or more digits (with \d*) at the end of the string ($).
Replace with $1 (or \1 depending on the regex engine). See the regex demo (also this one and this one, too).
You can use:
import re
re.sub(r',(\d{2}).*', r',\1', a)

Regex for a string with alpha numeric containing a '.' character

I have not been able to find a proper regex to match any string not starting and ending with some condition.
This matches
AS.E
23.5
3.45
This doesn't match
.263
321.
.ASD
The regex can be alpha-numeric character with optional '.' character and it has to be with in range of 2-4(minimum 2 chars & maximum 4 chars).
I was able to create one ->
^[^\.][A-Z|0-9|\.]{2,4}$
but with this I couldn't achieve mask '.' character at the end of regex.
Thanks.
Maybe not the most optimized but a working one. Created step by step:
The first character should be alphanumeric
^[a-zA-Z0-9]
0, 1 or 2 character alphanumeric or . but not matching end of string
[a-zA-Z0-9\.]{0,2}
an alphanumeric character matching end of string
[a-zA-Z0-9]$
Concatenate all of this to obtain your regex
^[a-zA-Z0-9][a-zA-Z0-9\.]{0,2}[a-zA-Z0-9]$
Edit: This regex allows multiple dots (up to 2)
If I guessed correctly, you want to match all words that are
Between 2 and 4 characters long ...
... and start and end with a character from [A-Z0-9] ...
... and have characters from [A-Z0-9.] in the middle ...
... and are not preceded or followed by a ..
Try this regex to match all these substrings in a text:
(?<=^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9](?=$|[^.])
However, note that this will match the AA in .AAAA.. If you don't want this match, then please give more details on your requirements.
When you are only interested in the number of matches, but not the matched strings, then you could use
(^|[^.])[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]($|[^.])
If you have one string, and want to know whether that string completely matches or not, then use
^[A-Z0-9][A-Z0-9.]{0,2}[A-Z0-9]$
If there may be at most one . inside the match, replace the part [A-Z0-9.]{0,2} with ([A-Z0-9]?[A-Z0-9.]?|[A-Z0-9.]?[A-Z0-9]?).
You can use this pattern to match what you say,
^[^\.][a-zA-Z0-9\.]{2,4}[^\.]$
Check the result here..
https://regex101.com/r/8BNdDg/3

vim match group of numbers and replace

I have a large file with data in this format:
regabc123456_user_domain_application_env_id
regdef789101_user_domain_application_env_id
in vim I want to do a search and replace ("_" for ", ") and match the machine name (regabc123456).
i am trying this:
:%s/^reg.*\{6}_/^reg.*\{6},\ /g
^ for beginning of the line 'reg' because all start with this then '.*' for anything after that but before the six digit code starts which I am tryign to catch with {6}.
This doesn't seem to be doing what I want. I can match the machine name, but I can't replace it with what I want. Is there an easier way to identify the machine name with regular expressions? example:
'reg' followed by three lower case letter followed by six numbers followed by an underscore, then replace?
Thanks.
The below regex would replace regabc123456_ to regabc123456,
:%s/^\(reg.*[0-9]\{6\}\)_/\1,/g
OR
:%s/^\(reg[a-z]\{3\}[0-9]\{6\}\)_/\1,/g
If you want a space after the comma then add space after comma in the replacement part.
%s/^\(reg[a-z]\{3\}[0-9]\{6\}\)_/\1, /g
To match a 6 digit number , you need to use [0-9]\{6\}. It repeats the previous token exactly 6 times.

changing RegEx from 3 digits to 4

I'm not that great at RegEx, and have the following piece of code on my hands:
value.replace(/\s*.*(\d+[,\.]\d+)[^\d]*/m, "$1");
Now it works great at reducing this "\r\n\t\t\t\t& #36;0.05 USD\t\t\t" (please note I've intentionally left a space between the & and # as removing it converts it to a dollar sign on the site) to this "0.05". The issue I have is that if the number is a double digit (10.05 rather than 0.05) the expression removes the digit from the front and still outputs 0.05 rather than 10.05.
From what I can see in the expression, it's hard coded to pick up just 3 digits, so I was wondering if there's a way to amend it to also work in cases where there are 4 digits.
The . after /\s* is matching the first digit if there are 2 or more digits. Remove that and see if it works...
value.replace(/\s*(\d+[,.]\d+)[^\d]/m, "$1");
Given your example of the regex:
/\s*.*(\d+[,.]\d+)[^\d]/m
And the data:
\r\n\t\t\t\t$0.05 USD\t\t\t
\r\n\t\t\t\t$10.05 USD\t\t\t
In the regex, the leading "/" (forward-slash), and the "/" before the "m" delimits the regex and is not part of the matching.
The "\s" in the regex is shorthand for [ \t\r\n\f] which matches whitespace (space, tab, Carriage-return, Line-feed, Form-feed). So, "\s*" will match "\r\n\t\t\t\t"
The "." (dot) in the regex matches any single character (generally any character except "\n").
The "*" following the "." says to match any 0 or more characters. So, together the ".*", matches the "$" (and possibly, additionally, one or more digits... see below).
Next, the "(" in the regex starts the part of the regex that will "capture" part of your data.
The "\d" in the regex will match any 1 number. Actually "\d" matches [0-9] and other digit characters, like Eastern Arabic numerals "??????????".
The "+" following the "\d" says to match any 1 or more numbers (digits).
The "[,.]" in the regex will match one of either a literal "." (dot), or a "," (comma), to match the "decimal" separator.
Another "\d+" to match any 1 or more numbers (digits).
Next, the ")" in the regex closes the part of the regex that will "capture" part of your data.
The "[^\d]" will match any 1 character that is not a number (digit). So, in this case, it will match the
" " (space).
The "m" at the end of the regex (following the second "/"): "m" changes the behavior of the "^" and "$" anchors, which are not used in your regex, so the "m" should have no effect. But, if you're using Ruby, "m" changes the behavior of the "." (dot).
Now, the "problem"... the ".*" (before the "("), is in regex terms, "greedy". This means it will match as "early" as possible, and for as "long" as possible. So, if there is more than 1 digit following the ";", then the ".*" will consume some digits.
Note: Using ".*" can cause all sorts of problems, especially with "/m" under Ruby. It's best to avoid using ".*" if possible.
There are 2 ways to fix this.
1) If the part before the number you want to capture is always "$", then specify that in regex instead of the ".*". So like this:
/\s*$(\d+[,.]\d+)[^\d]/m
or, if it will always be "$" or something very similar to that:
/\s*[^;]+;(\d+[,.]\d+)[^\d]/m
Here, "[^;]+;" means any string of 1 or more characters that does not contain a ";" followed by a "[;]".
2) If the part before the number you want to capture which is shown as "$", could be totally different in the data, then you just need to make sure that the part of the regex that is currently ".*" will not match a digit in the last position. So like this:
/\s[^.,]*[^\d](\d+[,.]\d+)[^\d]/m
Here, "[^.,]*[^\d]" means any string of 0 or more characters that does not contain a "." (dot) or a "," (comma) where the last character does not contain a digit.
Try this
value.replace( /\s*.(\d+[,.]\d+)[^\d]/m, "$1");
WORKING REGEX
Output:
The .* matches greedily and therefore matches as many characters, including digits, as it can, as long as the rest of the pattern can still match.
The rest of the pattern can still match if just one digit is left for the /d+ to match, so you only end up with one digit there.
If the semicolon in your example is always in that position in the strings you wish to match, use it as a marker like this
value.replace(/.*;(\d+[,\.]\d+).*/m, "$1");