Fetch one out of two Numbers out of String - regex

I hav a list of strings, such as: Ø20X400
I need to extract the first of the numbers - between Ø and X
I've come so far to match the numbers in general with \d+ - as simple as it is...
But I need an expression to get the first value separated, not both of them...

You can use lookarounds (?<=..) and (?=..):
(?<=Ø)\d+(?=X)
or in Java style:
(?<=Ø)\\d+(?=X)
A second way is to use a capture group:
Ø(\d+)X
or
Ø(\\d+)X
Then you can extract the content of the group.

The regex engines I know parse \n as a newline. \d is used for numbers.
The following regex gives you the first number between a Ø and a X in a capture group:
^.*?Ø(\d+)X.*
Edit live on Debuggex

This Regex will do it for you, (\d+?)X, and here is a Rubular to prove it. See, you want to group digits together, but make it non-greedy, ending the evaluation on X.

Try this one:
\d+(?=\D)
Should find first number wich has some not a number ahead

With normal regular expressions, I would say:
Ø(\d+)X
This finds the Ø character, followed by one or more numbers, followed by an X. Also, the numbers will be stored in the first capture group. Capture groups differ from one regex implementation to another, but this would typically be denoted by \1. Capture group zero, \0, is usually the matched string itself. In this version, \d denotes digits 0-9, but if your regex engine uses \n for that purpose, use:
Ø(\n+)X

Related

Regex expression for [number2] in [number],[number2][word]

I'm trying to find a regular expression to find [number2] in [number],[number2][word].
So far I've tried with [,](\d*), but it also gets me the comma.
Demo: https://regexr.com/59eqa
You may use:
(?<=,)(\d*)
Regex Demo
Detail:
(?<=,): positive look behind that doesn't consume character but indicate that the number must have , before it
The previous answers do not handle the case that the second (or two numbers) is matched.
If the second number must be captured, this can be done with
\b\d+,(\d+)[A-Za-z]
where the "number2" is contained in captured group 1.
If you want to get the match only, you could use 2 lookarounds, asserting a comma to the left and a char a-zA-Z to the right.
Use \d+ to match 1 or more digits.
(?<=,)\d+(?=[a-zA-Z])
Regex demo
If there should be a digit before the comma as well:
(?<=\d,)\d+(?=[a-zA-Z])
Regex demo

Capture number between two whitespaces (RegEx)

I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?
Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.
I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1
Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3
Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.

Match only exact numbers, not pre of suffixed with slash/dash etc

I need a regular expression that matches only numbers of length 7 (they can have leading zeros). I used the following super easy regex: \b[0-9]{7}\b. However, this regex also matches numbers in e.g. 5254-6408499 and (0241)4013999 (see https://regex101.com/r/zF5hV7/1).
How can I prevent them from being matched? I only want numbers of length 7 having leading and/or trailing spaces.
Depending on the regular expression flavor, you could create your own boundaries:
(?<=^| )\d{7}(?= |$)
This asserts that either the beginning of the string or a space precedes moving on to matching exactly 7 digits only if the engine asserts that either a space or the end of string follows.
You can use this regex:
(?:^|\s)([0-9]{7})(?:\s|$)
and grab captured group #1
Updated RegEx Demo

Regular expression to match non-integer values in a string

I want to match the following rules:
One dash is allowed at the start of a number.
Only values between 0 and 9 should be allowed.
I currently have the following regex pattern, I'm matching the inverse so that I can thrown an exception upon finding a match that doesn't follow the rules:
[^-0-9]
The downside to this pattern is that it works for all cases except a hyphen in the middle of the String will still pass. For example:
"-2304923" is allowed correctly but "9234-342" is also allowed and shouldn't be.
Please let me know what I can do to specify the first character as [^-0-9] and the rest as [^0-9]. Thanks!
This regex will work for you:
^-?\d+$
Explanation: start the string ^, then - but optional (?), the digit \d repeated few times (+), and string must finish here $.
You can do this:
(?:^|\s)(-?\d+)(?:["'\s]|$)
^^^^^ non capturing group for start of line or space
^^^^^ capture number
^^^^^^^^^ non capturing group for end of line, space or quote
See it work
This will capture all strings of numbers in a line with an optional hyphen in front.
-2304923" "9234-342" 1234 -1234
++++++++ captured
^^^^^^^^ NOT captured
++++ captured
+++++ captured
I don't understand how your pattern - [^-0-9] is matching those strings you are talking about. That pattern is just the opposite of what you want. You have simply negated the character class by using caret(^) at the beginning. So, this pattern would match anything except the hyphen and the digits.
Anyways, for your requirement, first you need to match one hyphen at the beginning. So, just keep it outside the character class. And then to match any number of digits later on, you can use [0-9]+ or \d+.
So, your pattern to match the required format should be:
-[0-9]+ // or -\d+
The above regex is used to find the pattern in some large string. If you want the entire string to match this pattern, then you can add anchors at the ends of the regex: -
^-[0-9]+$
For a regular expression like this, it's sometimes helpful to think of it in terms of two cases.
Is the first character messed up somehow?
If not, are any of the other characters messed up somehow?
Combine these with |
(^[^-0-9]|^.+?[^0-9])

match the same unknown character multiple times

I have a regex problem I can't seem to solve. I actually don't know if regex can do this, but I need to match a range of characters n times at the end of a pattern.
eg. blahblah[A-Z]{n}
The problem is whatever character matches the ending range need to be all the same.
For example, I want to match
blahblahAAAAA
blahblahEEEEE
blahblahQQQQQ
but not
blahblahADFES
blahblahZYYYY
Is there some regex pattern that can do this?
You can use this pattern: blahblah([A-Z])\1+
The \1 is a back-reference to the first capture group, in this case ([A-Z]). And the + will match that character one or more times. To limit it you can replace the + with a specific number of repetitions using {n}, such as \1{3} which will match it three times.
If you need the entire string to match then be sure to prefix with ^ and end with $, respectively, so that the pattern becomes ^blahblah([A-Z])\1+$
You can read more about back-references here.
In most regex implementations, you can accomplish this by referencing a capture group in your regex. For your example, you can use the following to match the same uppercase character five times:
blahblah([A-Z])\1{4}
Note that to match the regex n times, you need to use \1{n-1} since one match will come from the capture group.
blahblah(.)\1*\b should work in nearly all language flavors. (.) captures one of anything, then \1* matches that (the first match) any number of times.
blahblah([A-Z]|[a-z])\1+
This should help.