Regex: remove any chars or numbers before a needle - regex

I am about to build a regex pattern to extract a number from a string which is unknown and can be different every time..
Because it is always unknown how my string looks, here a some common examples:
12cm iamtext 311
iamtext 311 12 cm iamtext 311
iamtext 311 12cm
Summed up: What I am aiming for is the number before cm or cm (space). This pattern can show up with a undefined amount of numbers. So, it could also be something like 12414 cm. In this case I want to get the 12414.
But if there is something like iamtext311 cm I don't want to get anything back cause in this case the number belongs to the text. But if there is a space between the number and the text, I want to get the 311.
This is what I got so far:
.*?\d+.*?(\d+)
But this isn't working for chars.. and I don't know how to process at the moment.. Cause it is such a complex situation especially with all the different cases with and without a space...
Would appreciate any kind of help!

How about that with \b with optional space character?
\b\d+\s?cm\b
DEMO: https://regex101.com/r/fsp3FS/10

Split the problem.
The number is obtained with the obvious \d+.
You don't want it preceded by any character but spacing characters: (?<!\S).
Must be followed by an optional space then characters cm: (?=\s?cm).
Put it together: (?<!\S)\d+(?=\s?cm).
Demo.

In your pattern .*?\d+.*?(\d+) you don't account for the cm part.
What you might do instead is assert the start of the string or match 1+ times a whitespace character and use a capturing group for the digits.
To prevent cm to be part of a longer word, you could add a word boundary \b:
(?:^|\s+)(\d+) ?cm\b
regex101 demo
If you don't want to match newlines using \s+ you could use a character class to match a space and/or a tab [ \t]

Related

Regex : Match digits with hyphens and white spaces only

I'm trying to match digits with at least 5 characters (for the whole string) connected by a hyphen or space (like a bank account number).
e.g
"12345-62436-223434"
"12345 6789 123232"
I should also be able to match
"123-4567-890"
The current pattern I'm using is
(\d[\s-]*){5,}[\W]
But i'm getting these problems.
When I do this, I match all the white spaces after matching digits with at least 5 digit-characters
I'm going to replace this so I only want to match digits, not the white-spaces and hypens.
When I get the match what I want to do is to mask it like the one below.
from "12345-67890-11121" to "*****-*****-*****"
or
from "12345 67890 11121" to "***** ***** *****"
My only problem is that I don't get to match it like what I want to.
Thanks!
This one might work for you (probably some false-positives, though):
\d[ \d-]{3,}\d
See a demo on regex101.com.
Maybe you want something like this:
(\d{5,})(?:-|\s)(\d{5,})(?:-|\s)(\d{5,})
Demo
EDIT:
(\d+)(?:-|\s)(\d+)(?:-|\s)(\d+)
Demo
One option here is to take your existing pattern, and then add a positive lookahead which asserts that there are seven or more characters in the pattern. Assuming that there are two spaces or dashes in the account number, this will guarantee that there are five or more digits.
You can try using the following regex:
^(?=.{7,}$)((\\d+ \\d+ \\d+)|(\\d+-\\d+-\\d+))$
Test code:
String input = "123-4567-890";
boolean match = input.matches("^(?=.{7,}$)((\\d+ \\d+ \\d+)|(\\d+-\\d+-\\d+))$");
if (match) {
System.out.println("Match!");
}
If you need to first fish out the account numbers from a larger document/source, then do so and afterwards you can apply the regex logic above.

Regular Expression find space delimited numbers

I have a string that comes from user input through a messaging system, this can contain a series of 4 digit numbers, but as users are likely to type things in wrong it needs to be a little bit flexible.
Therefore I want to allow them to type in the numbers, or pepper their message with any string of characters and then just take the numbers that match the formats
=nnnn or nnnn
For this I have the Regular Expression:
(^|=|\s)\d{4}(\s|$)
Which almost works, however as it says that each group of 4 digits must start with an =, a space, or the start of the string it misses every other set of numbers
I tried this:
(^|=|\s*)\d{4}(\s|$)
But that means that any four digits followed by a space get matched - which is incorrect.
How can I match groups of numbers, but include a single space at the end of one group, and the beginning of the next, to clarify this string:
Ack 9876 3456 3467 4578 4567
Should produce the matches:
9876
3456
3467
4578
4567
Here you need to use lookarounds which won't consume any characters.
(?:^|[=\s])\K\d{4}(?=\s|$)
OR
(?:^|[=\s])(\d{4})(?=\s|$)
DEMO
Your regex (^|=|\s)\d{4}(\s|$) fails because at first this would match <space>9876<space> then it would look for another space or equals or start of the line. So now it finds the next match at <space>3467<space>. It won't match 3456 because the space before 3456 was already consumed in the first match. In-order to do overlapping matches, you need to put the pattern inside positive lookarounds. So when you put the last pattern (\s|$) inside lookahead, it won't consume the space, it just asserts that the match must be followed by a space or end of the line boundary.
\b\d+\b
\b asserts position at a word boundary (^\w|\w$|\W\w|\w\W). It is a 0-width anchor, much like ^ and $. It doesn't consume any characters.
Demo
or
(?:^|(?<=[=\s]))\d{4}\b
Demo

RegEx - 1 to 10 Alphanumeric Spaces Okay

New to Regular Expressions. Thanks in advance!
Need to validate field is 1-10 mixed-case alphanumeric and spaces are allowed. First character must be alphanumeric, not space.
Good Examples:
"Larry King"
"L King1"
"1larryking"
"L"
Bad Example:
" LarryKing"
This is what I have and it does work as long as the data is exactly 10 characters. The problem is that it does not allow less than 10 characters.
[0-9a-zA-Z][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ][0-9a-zA-Z ]
I've read and tried many different things but am just not getting it.
Thank you,
Justin
I don't know what environment you are using and what engine. So I assume PCRE (typically for PHP)
this small regex does exact what you want: ^(?i)(?!\s)[a-z\d ]{1,10}$
What's going on?!
the ^ marks the start of the string (delete it, if the expression must not match the whole string)
the (?i) tells the engine to be case insensitive, so there's no need to write all letter lower and upper case in the expression later
the (?!\s) ensures the following char won't be a white space (\s) (it's a so called negative lookahead)
the [a-z\d ]{1,10} matches any letter (a-z), any digit (\d) and spaces () in a row with min 1 and max 10 occurances ({1,10})
the $ at the end marks the end of the string (delete it, if the expression must not match the whole string)
Here's also a small visualization for better understanding.
Debuggex Demo
Try this: [0-9a-zA-Z][0-9a-zA-Z ]{0,9}
The {x,y} syntax means between x and y times inclusive. {x,} means at least x times.
You want something like this.
[a-zA-Z0-9][a-zA-Z0-9 ]{0,9}
This first part ensures that it is alphanumeric. The second part gets your alphanumeric with a space. the {0,9} allows from anywhere from 0 to 9 occurrences of the second part. This will give your 1-10
Try this: ^[(^\s)a-zA-Z0-9][a-z0-9A-Z ]*
Not a space and alphanumeric for the first character, and then zero or more alphanumeric characters. It won't cap at 10 characters but it will work for any set of 1-10 characters.
The below is probably most semantically correct:
(?=^[0-9a-zA-Z])(?=.*[0-9a-zA-Z]$)^[0-9a-zA-Z ]{1,10}$
It asserts that the first and last characters are alphanumeric and that the entire string is 1 to 10 characters in length (including spaces).
I assume that the space is not allowed at the end too.
^[a-zA-Z0-9](?:[a-zA-Z0-9 ]{0,8}[a-zA-Z0-9])?$
or with posix character classes:
^[[:alnum:]](?:[[:alnum:] ]{0,8}[[:alnum:]])?$
i think the simplest way is to go with \w[\s\w]{0,9}
Note that \w is for [A-Za-z0-9_] so replace it by [A-Za-z0-9] if you don't want _
Note that \s is for any white char so replace it by if you don't want the others

Word count regex that only allows alphanumeric and maximum length

I've spent the whole morning on gSkinner trying to change this regex. It correctly allows only 15 words, but how do I further limit input to alphanumeric only, and no valid word to be more than 25 characters in length?
I understand [a-z0-9], but presumably the use of word boundaries seems to confuse me because whatever I do I'm breaking it.
^\W*(?:\w+\b\W*){1,15}$
It's for use in javascript/php.
try this regex: ^((\w{1,25}))((\W\w{1,25}){1,14}|)
the first word will not be preceded by a space (\w{1,25}), these thing check this. now I want a blank space folowed by a word (\W\w{1,25}), but i want this from 1 to 14 times so (\W\w{1,25}){1,14}. Ok but if the input have only 1 word the second part of the pattern will not work, so instead of a blank space folowed by a word i can have nothing so i added the |. ((\W\w{1,25}){1,14}|)
EDIT
the pattern had a glitch if you put - and these kind of character so I updated it to this: ^([^ ]{1,25})(([ ]{1,}([^ ]{1,25}|)){1,14}|)

Regex allow a string to only contain numbers 0 - 9 and limit length to 45

I am trying to create a regex to have a string only contain 0-9 as the characters and it must be at least 1 char in length and no more than 45. so example would be 00303039 would be a match, and 039330a29 would not.
So far this is what I have but I am not sure that it is correct
[0-9]{1,45}
I have also tried
^[0-9]{45}*$
but that does not seem to work either. I am not very familiar with regex so any help would be great. Thanks!
You are almost there, all you need is start anchor (^) and end anchor ($):
^[0-9]{1,45}$
\d is short for the character class [0-9]. You can use that as:
^\d{1,45}$
The anchors force the pattern to match entire input, not just a part of it.
Your regex [0-9]{1,45} looks for 1 to 45 digits, so string like foo1 also get matched as it contains 1.
^[0-9]{1,45} looks for 1 to 45 digits but these digits must be at the beginning of the input. It matches 123 but also 123foo
[0-9]{1,45}$ looks for 1 to 45 digits but these digits must be at the end of the input. It matches 123 but also foo123
^[0-9]{1,45}$ looks for 1 to 45 digits but these digits must be both at the start and at the end of the input, effectively it should be entire input.
The first matches any number of digits within your string (allows other characters too, i.e.: "039330a29"). The second allows only 45 digits (and not less). So just take the better from both:
^\d{1,45}$
where \d is the same like [0-9].
Use this regular expression if you don't want to start with zero:
^[1-9]([0-9]{1,45}$)
If you don't mind starting with zero, use:
^[0-9]{1,45}$
codaddict has provided the right answer. As for what you've tried, I'll explain why they don't make the cut:
[0-9]{1,45} is almost there, however it matches a 1-to-45-digit string even if it occurs within another longer string containing other characters. Hence you need ^ and $ to restrict it to an exact match.
^[0-9]{45}*$ matches an exactly-45-digit string, repeated 0 or any number of times (*). That means the length of the string can only be 0 or a multiple of 45 (90, 135, 180...).
A combination of both attempts is probably what you need:
^[0-9]{1,45}$
^[0-9]{1,45}$ is correct.
Rails doesnt like the using of ^ and $ for some security reasons , probably its better to use \A and \z to set the beginning and the end of the string
For this case word boundary (\b) can also be used instead of start anchor (^) and end anchor ($):
\b\d{1,45}\b
\b is a position between \w and \W (non-word char), or at the beginning or end of a string.