Get Number Grouping with Regex - regex

I have strings that look like this:
get_a_string_A14_for_1.23.87.19_A12_and_others
get_a_string_A14_for_1.23.827.19_A12_and_others
get_a_string_A14_for_1.23.87.1_A12_and_others
get_a_string_A14_for_2.23.87.19_A12_and_others
I want to pull the numbers 1.23.87.19, 1.23.827.19, 1.23.87.1, and 2.23.87.19. The numbers will change, but this is the basic structure of the numbers.
I have tried doing:
([0-9]\.[0-9])
[0-9]\.[0-9]{1,4}
[0-9]\.[0-9]\.[0-9]{1,4}
[0-9]\.[0-9]\.[0-9]
And more, but have not had any luck. Can someone please help, and explain what I need to do to get these number groupings?

You can use this regex:
[0-9]+(?:\.[0-9]+)+
RegEx Demo
RegEx Breakup:
[0-9]+ # Match 1 OR more digits
(?: # start of non-capturing group
\. # match a literal dot
[0-9]+ # Match 1 OR more digits
) # group close
(?:\.[0-9]+)+ # Match 1 OR more of the expression in the group

Related

Regex re negative lookahead doesn't exclude multiple characters successfully

There are 5 examples as below, and I am trying to find 3,4,5 while excluding 1,2.
ABC-abc
abc-ABC
ABC-ABC
ABC
vABC-ABC-ABCv
The current expression I use is:
(?!(\w*[A-Z]{2,}-[a-z]+\w*|\w*[a-z]+-[A-Z]{2,}\w*))(\w*-?[A-Z]{2,}-?\w*)
I utilize (\w*-?[A-Z]{2,}-?\w*) to get all possibility of all examples first.
I then use (?!...|...) to put two exclusion conditions.
The first exclusion condition is \w*[A-Z]{2,}-[a-z]+\w* and the second is \w*[a-z]+-[A-Z]{2,}\w*.
This expression works to exclude 1.ABC-abc but not abc-ABC.
I searched a lot and found some people say this way is not something regex is "good" at. Is there any solution or improvement I can do to get rid of abc-ABC.
Appreciate any help or opinion.
As I understand strings are to be rejected if they contain a hyphen that is preceded by a lower-case letter and followed by an upper-case letter, or vice-versa; else they are to be accepted. If so, the following regular expression could be used.
^(?!.*(?:[a-z]-[A-Z]|[A-Z]-[a-z]))
Demo
The regex engine performs the following operations.
^ # match beginning of line
(?! # begin a negative lookahead
.* # match 0+ characters
(?: # begin a non-capture group
[a-z]-[A-Z] # match a lc letter, '-', uc letter
| # or
[A-Z]-[a-z] # match an uc letter, '-', lc letter
) # end non-capture group
) # end negative lookahead

Regex code , Python-2 alphanumeric [duplicate]

My regex knowledge is pretty limited, but I'm trying to write/find an expression that will capture the following string types in a document:
DO match:
ADY123
AD12ADY
1HGER_2
145-DE-FR2
Bicycle1
2Bicycle
128D
128878P
DON'T match:
BICYCLE
183-329-193
3123123
Is such an expression possible? Basically, it should find any string containing letters AND digits, regardless of whether the string contains a dash or underscore. I can find the first two using the following two regex:
/([A-Z][0-9])\w+/g
/([0-9][A-Z)\w+/g
But searching for possible dashes and hyphens makes it more complicated...
Thanks for any help you can provide! :)
MORE INFO:
I've made slight progress with: ([A-Z|a-z][0-9]+-*_*\w+) but it doesn't capture strings with more than one hyphen.
I had a document with a lot of text strings and number strings, which I don't want to capture. What I do want is any product code, which could be any length string with or without hyphens and underscores but will always include at least one digit and at least one letter.
You can use the following expression with the case-insensitive mode:
\b((?:[a-z]+\S*\d+|\d\S*[a-z]+)[a-z\d_-]*)\b
Explanation:
\b # Assert position at a word boundary
( # Beginning of capturing group 1
(?: # Beginning of the non-capturing group
[a-z]+\S*\d+ # Match letters followed by numbers
| # OR
\d+\S*[a-z]+ # Match numbers followed by letters
) # End of the group
[a-z\d_-]* # Match letter, digit, '_', or '-' 0 or more times
) # End of capturing group 1
\b # Assert position at a word boundary
Regex101 Demo

Regex between a string

Example:
I have the following string
a125A##THISSTRING##.test123
I need to find THISSTRING. There are many strings which are nearly the same so I'd like to check if there is a digit or letter before the ## and also if there is a dot (.) after the ##.
I have tried something like:
([a-zA-Z0-9]+##?)(.+?)(.##)
But I am unable to get it working
You can use look behind and look ahead:
(?<=[a-zA-Z0-9]##).*?(?=##\.)
https://regex101.com/r/i3RzFJ/2
But I am unable to get it working.
Let's deconstruct what your regex ([a-zA-Z0-9]+##?)(.+?)(.##) says.
([a-zA-Z0-9]+##?) match as many [a-zA-Z0-9] followed by a # followed by optional #.
(.+?) any character as much as possible but fewer times.
(.##) any character followed by two #. Now . consumes G and then ##. Hence THISSTRING is not completely captured in group.
Lookaround assertions are great but are little expensive.
You can easily search for such patterns by matching wanted and unwanted and capturing wanted stuff in a capturing group.
Regex: (?:[a-zA-Z0-9]##)([^#]+)(?:##\.)
Explanation:
(?:[a-zA-Z0-9]##) Non-capturing group matching ## preceded by a letter or digit.
([^#]+) Capturing as many characters other than #. Stops before a # is met.
(?:##\.) Non-capturing group matching ##. literally.
Regex101 Demo
Javascript Example
var myString = "a125A##THISSTRING##.test123";
var myRegexp = /(?:[a-zA-Z0-9]##)([^#]+)(?:##\.)/g;
var match = myRegexp.exec(myString);
console.log(match[1]);
You wrote:
check if there is a digit or letter before the ##
I assume you mean a digit / letter before the first ## and
check for a dot after the second ## (as in your example).
You can use the following regex:
[a-z0-9]+ # Chars before "##", except the last
(?: # Last char before "##"
(\d) # either a digit - group 1
| # or
([a-z]) # a letter - group 2
)
##? # 1 or 2 "at" chars
([^#]+) # "Central" part - group 3
##? # 1 or 2 "at" chars
(?: # Check for a dot
(\.) # Captured - group 4
| # or nothing captured
)
[a-z0-9]+ # The last part
# Flags:
# i - case insensitive
# x - ignore blanks and comments
How it works:
Group 1 or 2 captures the last char before the first ##
(either group 1 captures a digit or group 2 captures a letter).
Group 3 catches the "central" part (THISSTRING,
a sequence of chars other than #).
Group 4 catches a dot, if any.
You can test it at https://regex101.com/r/ATjprp/1
Your regex has such an error that a dot matches any char.
If you want to check for a literal dot, you must escape it
with a backslash (compare with group 4 in my solution).

How to ignore strings where a specific pattern occurs between two strings?

I am trying to get a RegExp paternwhere I exclude a certain sub-pattern from the middle of the RegExp pattern. So for example, I would like my pattern to start with ABC and end with XYZ, and exclude any string that has 123 in between ABC and XYZ. Please note that if 123 is anywhere in between ABC and XYZ there will be no match.
So for instance:
ABC45123XYZ (No-Match)
ABCfg12XYZ (Match)
ABC9321%$XYZ (Match)
ABC123XYZ (No-Match)
ABC001234XYZ(No-Match)
I have tried the following pattern with a negative lookahead
rex.Pattern = "ABC.+?(?!123).+?XYZ"
but that didn't work. What's the correct way to achieve this?
You can achieve this with a negative lookahead:
ABC(?:(?!123).)*XYZ
Visualization:
Explanation:
ABC # Match literal chars 'ABC'
(?: # Begin non-capturing group
(?! # Negative lookahead: if not followed by
123 # Match Literal chars '123'
) # End of negative lookahead
. # Advance one character at a time
)* # Repeat the group zero or more times
XYZ # Match literal chars 'XYZ'
Demo
(?!ABC.*?123.*?XYZ)ABC.*?XYZ
Checking before eating up string.Negative lookahead
http://regex101.com/r/mD7jN1/1
Here you go:
ABC(.(?!123))+?XYZ
You need to put it in brackets and place a point (for every sign) in front of it... it will then try to find any sign that's not followed by 123 ;)

Check the specific number of occurrences of a single character in a string with regex

I'm trying to create a regex pattern for my powershell code. I've never worked with regex before, so I'm a total noob.
The regex should check if there are two points in the string.
Examples that SHOULD work:
3.1.1
5.10.12
10.1.15
Examples that SHOULD NOT work:
3
3.1
5.10.12.1
The string must have two points in it, the number of digits doesn't matter.
I've tried something like this, but it doesn't really work and I think its far from the right solution...
([\d]*.[\d]*.[\d])
In your current regex I think you could escape the dot \. or else the dot would match any character.
You could add anchors for the start ^ and the end $ of the string and update your regex to ^\d*\.\d*\.\d*$
That would also match ..4 and ..
Or if you want to match one or more digits, I think you could use ^\d+(?:\.\d+){2}$
That would match
^ # From the beginning of the string
\d+ # Match one or more digits
(?: # Non capturing group
\.\d+ # Match a dot and one or more ditits
){2} # Close non capturing group and repeat 2 times
$ # The end of the string
Use a lookahead:
^\d(?=(?:[^.]*\.[^.]*){2}$)[\d.]*$
Broken down, this says:
^ # start of the line
\d # at least one digit
(?= # start of lookahead
(?:[^.]*\.[^.]*){2} # not a dot, a dot, not a dot - twice
$ # anchor it to the end of the string
)
[\d.]* # only digits and dots, 0+ times
$ # the end of the string
See a demo on regex101.com.