Replace or remove period/dot from expression - regex

I have this expression in PCRE and I want to leave/remove the . (period) out of klantnummer.
Expression:
^h:\/Klant\/(?<klantnummer>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
Input:
h:/Klant/12345678.9/map 1/map 2
Outcome: 12345678.9
Desired result: 123456789
https://regex101.com/r/EVv47V/1
So Klantnummer should have 123456789 as result

You can't do that in one step. You could catch it in two Capture Groups:
^h:\/Klant\/(?<klantnummer1>[^\.\/]+)\.(?<klantnummer2>[^\/]+)\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)
and put both together by string concatenation after or use two regex steps and filter out the period in the second, like stated in comments.
Regex above assumes there is always a period, this will work for 0 or 1 period in the number:
h:\/Klant\/(?<klantnummer1>[^\.\/]+)(?:\.?(?<klantnummer2>[^\.\/]+))\/(?<folder1>[^\/]+)\/(?<folder2>[^\/]+)

As already discussed you can't do this on one step.
The solution of using 2 regex stages, or 2 splitting klantnummer into 2 groups before and after the capture group will both work.
However I believe that the simplest and most efficient both in terms of computer power and of code to write, will be to replace .with and empty String '' after the regex, and before using it.
You haven't said which programming language you are using so I can't give you the syntax/example.
If all that you are doing is splitting the String on the slashes you will probably find it easier to split the string into an array.
For example in python
s = "h:/Klant/12345678.9/map 1/map 2"
array = s.split('/')
Klantnummer=array[2].replace('.','')
folder1=array[3]
folder2=array[4]
print(Klantnummer)
print(folder1)
print(folder2)
output
123456789
map 1
map 2
Tested on https://www.online-python.com/

Related

How to pad zeroes into String using regex - using only regex101.com

I need to come up with "Regular expression" and a "Substitute" to pad any string that's shorter than 10 characters with zeros. It has to work on regex101.com, PHP flavor. This is all I need.
Example Input:
123
12345
1234567891
Expected output:
0000000123
0000012345
1234567891
I wish it was simple as searching for ([0-9]{1,9}) and replacing it with 000000000$1 but obviously string would exceed length of 10 characters. So I am trying with read ahead syntax but no luck.
As you mentioned in the comments below your question, I provided a .NET method using a catalog to pad a string in regex without using a conditional replacement (see my answer here).
This answer can be adapted to PCRE by using a branch reset group (?|...).
See regex in use here
Options gJsm and substitution of ${x}$1
^((?|[1-9](?=.*1\t+(?<x>0+))|[1-9]\d(?=.*2\t+(?<x>0+))|[1-9]\d{2}(?=.*3\t+(?<x>0+))|[1-9]\d{3}(?=.*4\t+(?<x>0+))|[1-9]\d{4}(?=.*5\t+(?<x>0+))|[1-9]\d{5}(?=.*6\t+(?<x>0+))|[1-9]\d{6}(?=.*7\t+(?<x>0+))|[1-9]\d{7}(?=.*8\t+(?<x>0+))))\b
The result:
1
12
123
12345678
123456789
Becomes...
000000001
000000012
000000123
012345678
123456789

Regex for for Phone Numbers allowing for only 6 to 20 characters

Regex beginner here. I've been trying to tackle this rule for phone numbers to no avail and would appreciate some advice:
Minimum 6 characters
Maximum 20 characters
Must contain numbers
Can contain these symbols ()+-.
Do not match if all the numbers included are the same (ie. 111111)
I managed to build two of the following pieces but I'm unable to put them together.
Here's what I've got:
(^(\d)(?!\1+$)\d)
([0-9()-+.,]{6,20})
Many thanks in advance!
I'd go about it by first getting a list of all possible phone numbers (thanks #CAustin for the suggested improvements):
lst_phone_numbers = re.findall('[0-9+()-]{6,20}',your_text)
And then filtering out the ones that do not comply with statement 5 using whatever programming language you're most comfortable.
Try this RegEx:
(?:([\d()+-])(?!\1+$)){6,20}
Explained:
(?: creates a non-capturing group
(\d|[()+-]) creates a group to match a digit, parenthesis, +, or -
(?!\1+$) this will not return a match if it matches the value found from #2 one or more times until the end of the string
{6,20} requires 6-20 matches from the non-capturing group in #1
Try this :
((?:([0-9()+\-])(?!\2{5})){6,20})
So , this part ?!\2{5} means how many times is allowed for each one from the pattern to be repeated like this 22222 and i put 5 as example and you could change it as you want .

Regular Expression Extracting Text from a group

I have a filename like this:
0296005_PH3843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
I needed to break down the name into groups which are separated by a underscore. Which I did like this:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
So far so go.
Now I need to extract characters from one of the group for example in group 2 I need the first 3 and 8 decimal ( keep mind they could be characters too ).
So I had try something like this :
(.*?)_([38]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It didn’t work but if I do this:
(.*?)_([PH]{2})(.*?) _(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
It will pull the PH into a group but not the 38 ? So I’m lost at this point.
Any help would be great
Try the below Regex to match any first 3 char/decimal and one decimal
(.?)_([A-Z0-9]{3}[0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
Try the below Regex to match any first 3 char/decimal and one decimal/char
(.?)_([A-Z0-9]{3}[A-Z0-9]{1})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
It will match any 3 letters/digits followed by 1 letter/digit.
If your first two letter is a constant like "PH" then try the below
(.?)_([PH]+[0-9A-Z]{2})(.?)(.*?)(.?)_(.?)(.*?)(.?)_(.?)
I am assuming that you are trying to match group2 starting with numbers. If that is the case then you have change the source string such as
0296005_383843C5_SEQ_6210_QTY_BILLING_D_DEV_0000000000000183.PS.
It works, check it out at https://regex101.com/r/zem3vt/1
Using [^_]* performs much better in your case than .*? since it doesn't backtrack. So changing your original regex from:
(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)_(.*?)(\d{16})(.*)
to:
([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
reduces the number of steps from 114 to 42 for your given string.
The best method might be to actually split your string on _ and then test the second element to see if it contains 38. Since you haven't specified a language, I can't help to show how in your language, but most languages employ a contains or indexOf method that can be used to determine whether or not a substring exists in a string.
Using regex alone, however, this can be accomplished using the following regular expression.
See regex in use here
Ensuring 38 exists in the second part:
([^_]*)_([^_]*38[^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)
Capturing the 38 in the second part:
([^_]*)_([^_]*)(38)([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_([^_]*)_(.*?)(\d{16})(.*)

REGEX : Extract group of number where digits are more than 3

HI I have a question regarding REGEX.
This sounds very simple and I remember doing it but somehow it got deleted and I am finding it hard to get it back.
I want to extract group of numbers from one line.
If the count of digits > 3 - select that.
EG:
ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk
This line can be different everytime but there will be only 1 group of digits with more than 2 digits.
OUTPUT: 540063
Thank you in advance
You can use \d{3,} where 3 is the minimum number of digits. You an take a look at the following python code
import re
var= "ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk"
pattern = re.compile(r'\d{3,}')
for match in pattern.findall(ver):
print(match)

What is wrong with this Regular Expression?

I am beginner and have some problems with regexp.
Input text is : something idUser=123654; nick="Tom" something
I need extract value of idUser -> 123456
I try this:
//idUser is already 8 digits number
MatchCollection matchsID = Regex.Matches(pk.html, #"\bidUser=(\w{8})\b");
Text = matchsID[1].Value;
but on output i get idUser=123654, I need only number
The second problem is with nick="Tom", how can I get only text Tom from this expresion.
you don't show your output code, where you get the group from your match collection.
Hint: you will need group 1 and not group 0 if you want to have only what is in the parentheses.
.*?idUser=([0-9]+).*?
That regex should work for you :o)
Here's a pattern that should work:
\bidUser=(\d{3,8})\b|\bnick="(\w+)"
Given the input string:
something idUser=123654; nick="Tom" something
This yields 2 matches (as seen on rubular.com):
First match is User=123654, group 1 captures 123654
Second match is nick="Tom", group 2 captures Tom
Some variations:
In .NET regex, you can also use named groups for better readability.
If nick always appears after idUser, you can match the two at once instead of using alternation as above.
I've used {3,8} repetition to show how to match at least 3 and at most 8 digits.
API links
Match.Groups property
This is how you get what individual groups captured in a match
Use look-around
(?<=idUser=)\d{1,8}(?=(;|$))
To fix length of digits to 6, use (?<=idUser=)\d{6}(?=($|;))