Regex : Match digits with hyphens and white spaces only - regex

I'm trying to match digits with at least 5 characters (for the whole string) connected by a hyphen or space (like a bank account number).
e.g
"12345-62436-223434"
"12345 6789 123232"
I should also be able to match
"123-4567-890"
The current pattern I'm using is
(\d[\s-]*){5,}[\W]
But i'm getting these problems.
When I do this, I match all the white spaces after matching digits with at least 5 digit-characters
I'm going to replace this so I only want to match digits, not the white-spaces and hypens.
When I get the match what I want to do is to mask it like the one below.
from "12345-67890-11121" to "*****-*****-*****"
or
from "12345 67890 11121" to "***** ***** *****"
My only problem is that I don't get to match it like what I want to.
Thanks!

This one might work for you (probably some false-positives, though):
\d[ \d-]{3,}\d
See a demo on regex101.com.

Maybe you want something like this:
(\d{5,})(?:-|\s)(\d{5,})(?:-|\s)(\d{5,})
Demo
EDIT:
(\d+)(?:-|\s)(\d+)(?:-|\s)(\d+)
Demo

One option here is to take your existing pattern, and then add a positive lookahead which asserts that there are seven or more characters in the pattern. Assuming that there are two spaces or dashes in the account number, this will guarantee that there are five or more digits.
You can try using the following regex:
^(?=.{7,}$)((\\d+ \\d+ \\d+)|(\\d+-\\d+-\\d+))$
Test code:
String input = "123-4567-890";
boolean match = input.matches("^(?=.{7,}$)((\\d+ \\d+ \\d+)|(\\d+-\\d+-\\d+))$");
if (match) {
System.out.println("Match!");
}
If you need to first fish out the account numbers from a larger document/source, then do so and afterwards you can apply the regex logic above.

Related

How to find digits in String by regular expression? [duplicate]

I would like to match positive and negative numbers (no decimal or thousand separators) inside a string using .NET, but I want to match whole words only.
So if a string looks like
redeem: -1234
paid: 234432
then I'd like to match -1234 and 234432
But if text is
LS022-1234-5678
FA123245
then I want no match returned. I tried
\b\-?\d+\b
but it will only match 1234 in the first scenario, not returning the "-" sign.
Any help is appreciated. Thank you.
Well, I'm sure this is far from perfect, but it works for your examples:
(?<=\W)-?(?<!\w-)\d+
If you want to allow underscores just before the number, then I'd use this modification:
(?i)(?<=[^a-z0-9])-?(?<![a-z0-9]-)\d+
Let me know of any issues and I'll try and help. If you'd like me to explain either of them, let me know that too.
EDIT
To only match if there is a space or tab just before the number / negative sign (as noted in the comment below), this could be used:
(?<=[ \t])-?\d+
Note that it will match e.g. on the first number series of a telephone number, time or date value, and will not match if the number is at the beginning of the line (after a newline) - make sure this is what you intend :D
There is no word boundary between a space and -, thus you can't use \b there.
You could use:
(?<!\S)-?\d+\b
or
(?<![\w-])-?\d+\b
depending on your requirements (which aren't fully specified).
Both will work for your examples tho.
The \b-?\d+\b pattern is wrong because \b before an optional -? pattern will require a word char to appear immediately to the left of the hyphen. In general, do not use word boundaries next to optional patterns (unless you know what you are doing of course).
You might use -?\b\d+\b to match 123 or -123 like numbers as whole words. However, here, you are looking for something a bit different, because the 1234 and 5678 are whole words inside LS022-1234-5678 since they are enclosed with non-word chars (namely, a hyphen).
In this case, you need to extend whole word matching \b with extra lookbehind check on the left:
-?\b(?<!\d-)\d+\b
See the regex demo. Details:
-? - an optional hyphen
\b - a word boundary
(?<!\d-) - a negative lookbehind that fails the match if there is a digit + - immediately to the left of the current location.
\d+ - one or more digits
\b - a word boundary.
See the C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "LS022-1234-5678, FA123245, redeem: -1234, paid: 234432";
var matches = Regex.Matches(text, #"-?\b(?<!\d-)\d+\b").Cast<Match>().Select(x => x.Value).ToList();
foreach (var s in matches)
Console.WriteLine(s);
}
}
Output:
-1234
234432

Regex how can i get only exact part in a string

I should only catch numbers which are fit the rules.
Rules:
it should be 16 digit
first 11 digit can be any number
after 3 digit should have all zero
last two digit can be any number.
I did this way;
([0-9]{11}[0]{3}[0-9]{2})
number example:
1234567890100012
now I want to get the number even it has got any letter beginning or ending of the string like " abc1234567890100012abc"
my output should be just number like "1234567890100012"
When I add [a-zA-Z]* it gives all string.
Also another point is if there is any number beginning or ending of the string like "999912345678901000129999". program shouldn't take this. I mean It should return none or nothing. How can I write this with regex.
You can use look around to exclude the cases where there are more digits before/after:
(?<!\d)\d{11}000\d\d(?!\d)
On regex101
You can use a capture group, and match optional chars a-zA-Z before and after the group.
To prevent a partial match, you can use word boundaries \b or if the string should match from the start and end of the line you can use anchors ^ and $
\b[a-zA-Z]*([0-9]{11}000[0-9]{2})[a-zA-Z]*\b
Regex demo

Regex: remove any chars or numbers before a needle

I am about to build a regex pattern to extract a number from a string which is unknown and can be different every time..
Because it is always unknown how my string looks, here a some common examples:
12cm iamtext 311
iamtext 311 12 cm iamtext 311
iamtext 311 12cm
Summed up: What I am aiming for is the number before cm or cm (space). This pattern can show up with a undefined amount of numbers. So, it could also be something like 12414 cm. In this case I want to get the 12414.
But if there is something like iamtext311 cm I don't want to get anything back cause in this case the number belongs to the text. But if there is a space between the number and the text, I want to get the 311.
This is what I got so far:
.*?\d+.*?(\d+)
But this isn't working for chars.. and I don't know how to process at the moment.. Cause it is such a complex situation especially with all the different cases with and without a space...
Would appreciate any kind of help!
How about that with \b with optional space character?
\b\d+\s?cm\b
DEMO: https://regex101.com/r/fsp3FS/10
Split the problem.
The number is obtained with the obvious \d+.
You don't want it preceded by any character but spacing characters: (?<!\S).
Must be followed by an optional space then characters cm: (?=\s?cm).
Put it together: (?<!\S)\d+(?=\s?cm).
Demo.
In your pattern .*?\d+.*?(\d+) you don't account for the cm part.
What you might do instead is assert the start of the string or match 1+ times a whitespace character and use a capturing group for the digits.
To prevent cm to be part of a longer word, you could add a word boundary \b:
(?:^|\s+)(\d+) ?cm\b
regex101 demo
If you don't want to match newlines using \s+ you could use a character class to match a space and/or a tab [ \t]

Regex to match 10 digit exactly with specific pattern

Say i give a pattern 123* or 1234* , i would like to match any 10 digit number that starts with that pattern. It should have exactly 10 digits.
Example:
Pattern : 123 should match 1234567890 but not 12345678
I tried this regex : (^(123)(\d{0,10}))(?(1)\d{10}).. obviously it didn't work. I tried to group the pattern and remaining digits as two different groups. It matches 10 digits after the captured group (https://regex101.com/). How do i check the captured group is exactly 10 digits? Or is there any good knacks here. Please guide me.
Sounds like a case for the positive lookahead:
(?=123)\d{10}
This will match any sequence of exactly 10 digits but only if prefixed with 123. Test it here.
Similarly for prefix 1234:
(?=1234)\d{10}
Of course, if you know the prefix length upfront, you can use 123\d{7}, but then you'll have to change range limits with each prefix change (for example: 1234\d{6}).
Additionally, to ensure only isolated groups of 10 digits are captured, you might want to anchor the above expression with a (zero-length) word boundary \b:
\b(?=123)\d{10}\b
or, if your sequence can appear inside of the word, you might want to use negative lookbehind and lookahead on \d (as suggested in comments by #Wiktor):
(?<!\d)(?=123)\d{10}(?!\d)
I would keep it simple:
import re
text = "1234567890"
match = re.search("^123\d{7}$|^1111\d{6}$", text)
if match:
print ("matched")
Just throw your 2 patterns in as such and it should be good to go! Note that 123* would catch 1234* so I'm using 1111\d{6} as an example

How to extract internal words using regex

I am trying to match only the street name from a series of addresses. The addresses might look like:
23 Barrel Rd.
14 Old Mill Dr.
65-345 Howard's Bluff
I want to use a regex to match "Barrel", "Old Mill", and "Howard's". I need to figure out how to exclude the last word. So far I have a lookbehind to exclude the digits, and I can include the words and spaces and "'" by using this:
(?<=\d\s)(\w|\s|\')+
How can I exclude the final word (which may or may not end in a period)? I figure I should be using a lookahead, but I can't figure out how to formulate it.
You don't need a look-behind for this:
/^[-\d]+ ([\w ']+) \w+\.?$/
Match one or more digits and hyphens
space
match letters, digits, spaces, apostrophes into capture group 1
space
match a final word and an optional period
An example Ruby implementation:
regex = /^[-\d]+ ([\w ']+) \w+\.?$/
tests = [ "23 Barrel Rd.", "14 Old Mill Dr.", "65-345 Howard's Bluff" ]
tests.each do |test|
p test.match(regex)[1]
end
Output:
"Barrel"
"Old Mill"
"Howard's"
I believe the lookahead you want is (?=\s\w+\.?$).
\s: you don't want to include the last space
\w: at least one word-character (A-Z, a-z, 0-9, or '_')
\.?: optional period (for abbreviations such as "St.")
$: make sure this is the last word
If there's a possibility that there might be additional whitespace before the newline, just change this to (?=\s\w+\.?\s*$).
Why not just match what you want? If I have understood well you need to get all the words after the numbers excluding the last word. Words are separated by space so just get everything between numbers and the last space.
Example
\d+(?:-\d+)? ((?:.)+)  Note: there's a space at the end.
Tha will end up with what you want in \1 N times.
If you just want to match the exact text you may use \K (not supported by every regex engine) but: Example
With the regex \d+(?:-\d+)? \K.+(?= )
Another option is to use the split() function provided in most scripting languages. Here's the Python version of what you want:
stname = address.split()[1:-1]
(Here address is the original address line, and stname is the name of the street, i.e., what you're trying to extract.)