REGEX Search and keep specific characters - regex

I have hundreds of References in the following format
HCVSAM0123BK
c35UNI0321RS
scruni0321
XXXXXX ZZZZ WW
6 characters 4 digits 2 characters
I want to keep the 4 digits after the first 6 characters, but in some cases it doesn't have the last 2 characters
My goal is to get only ZZZZ (the 4 digits)
ex: from HCVSAM0123BK to 0123
Thank You

You can do match the following:
^\w{6}(\d+)(\w{2})?$
and the first captured group \1 is what you want.
Demo: http://regex101.com/r/qT0lY8
Answer to udpated question:
^(?!\d+$)\w{6}(\d+)(\w{2})?$
(?!\d+$) is a negative look ahead, that will fail the match if the line is only digits, and \w stands for [0-9a-zA-Z_].

search : ^.{6}(.{4}).*
and replace with : \1
demo here : http://regex101.com/r/kZ7dS8
output :
0123
0321
0321
using branch reset :
search : (?|.*(\d{4}).*)
and replace with : \1

Related

Reg Expression: GB Vat Number with spaces

I currently have the following regular expression:
^GB([0-9]{9}([0-9]{3})?|[A-Z]{2}[0-9]{3})$
This works fine for:
GB999999973
GBGD001
GBHA599
As can be tested here: https://regex101.com/r/jU980W/1
However the problem is that it does not validate with:
GB999 9999 73
I tried adding space indicators to the regular expression but then the other formats aren't supported anymore.
Does anyone know a way to have this regular expression both accept with and without spaces for the GB VAT Number?
Thanks in advance!
See regex in use here
^GB(?:\d{3} ?\d{4} ?\d{2}(?:\d{3})?|[A-Z]{2}\d{3})$
^ Assert position at the start of the line
GB Match this literally
(?:\d{3} ?\d{4} ?\d{2}(?:\d{3})?|[A-Z]{2}\d{3}) Match either of the following options
\d{3} ?\d{4} ?\d{2}(?:\d{3})? Option 1:
\d{3} Match exactly 3 digits
? Optionally match a space
\d{4} Match exactly 4 digits
? Optionally match a space
\d{3} Match exactly 2 digits
(?:\d{3})? Optionally match exactly 3 digits
[A-Z]{2}\d{3} Option 2:
[A-Z] Match any uppercase ASCII letter
\d{3} Match exactly 3 digits
$ Assert position at the end of the line

Regular Expression allow whitspace without counting them

How to get
[\d ]{6}
to match:
1 23456
1 2 3456
1 2 3 456
1 2 3 4 56
1 2 3 4 5 6
In other words, I would like the space to not be counted towards the char limit. Something like [\d]{6 + but allow spaces you can eat}
The following will match 6 numbers, with any amount of space characters between them.
(?:\d\s*){5}\d
?: at the beginning there makes the group non-capturing. It's not necessary if all you wish to do is a simple match.
A live example:
https://regex101.com/r/PZJ8DO/2
Just to put my two cents in: you could use the opposite of \d which is \D in most flavors:
^(?:\d\D*){6}$
See a demo on regex101.com.
Note, that this would even allow something like
1a2b3c4d5e6
If this is not what you want (meaning you only want to allow spaces, nothing else), use \s* instead of \D*.
You can try to use
(?<=).*6.*
This will match any line that contains '6' even if there are some white spaces or other characters in the line.
The (?<=) Positive Look Behind.
The . matches any character except line breaks.
The * matches 0 or more of the preceding token.
And 6 matches a "6" Character.
You can test Regular Expression here: RegExr
Note that the positive look behind feature is not supported in all flavors of RegEx.

Regex IF/Then expression with required characters

I'm quite new to regex and very much stuck with the following expression. I'm looking for a regex code that allows the following combinations:
AA1A 1AA
AA 12
A 12
A 1
Requirements:
String can not start with a number
Only capital letters (A-Z)
1 space is required at predetermined places (see above examples)
Numbers 0-9 can be used
I am currently working on/with the following string
([A-Z]{1,2}|[A-Z0-9]{1,4})([ ]{1})([0-9A-Z]{1,3})
The issue with this one that it does not allow the AA1A 1AA string..
Any ideas?
Based on the spec, the examples, and the fact you want the first example even though there's a space before it, it seems you need a regex like this one:
^[ ]*([A-Z][A-Z0-9]{0,3})[ ]([A-Z0-9]{1,3})$
You can test it here
Note that the ^ and $ are added to the regex. But I have a premonition that you're using the regex in some tool or functionality that implicitly assumes the regex needs to match for the whole line. Because otherwise your original regex would have matched "AA1A 1AA" in the string " AA1A 1AA".
If that's the case, the ^ and $ should be redundant for your purpose and you can remove them.
Explain:
^ : // Matches the beginning of the string
// or the beginning of a line if the multiline flag (m) is enabled.
[ ]* : // 0 or more spaces
[A-Z] : // an upper case ascii letter
[A-Z0-9]{0,3} : // between 0 and 3 upper case letters or digits
[ ] : // A character class with a space. Which matches 1 space.
// You don't actually need to put a single character in a character class.
// But here it's done to make the space stand out more.
[A-Z0-9]{1,3} : // Between 1 and 3 upper case letters or digits
$ : // Matches the end of the string
// or the end of a line if the multiline flag (m) is enabled.
The space in the middle isn't put in a capture group (...). Because what would be the purpose of that? It's not like one would validate later that the capture group indeed contains a space.
If you want to search for those in a longer string you can use word boundaries instead.
\b([A-Z][A-Z0-9]{0,3})[ ]([A-Z0-9]{1,3})\b
The \b is a word boundary, it indicates a transition between a word character [A-Za-z0-9_] and a non-word character. It's usefull to make sure that your word characters are followed or procedeed by a space or the start or end of the line.
For example, if you have a string like "ABC DE", then the regex /[A-Z]{2}/g would match "AB" and "DE". But with a wordboundary /\b[A-Z]{2}\b/g it would only match "DE", and not a part of a word like "AB".
You just have to refine you first group to handle both :
Two letters (AA)
Two letters and letters and digits (AA1A)
Change from (demo here):
/([A-Z]{1,2}|[A-Z0-9]{1,4})([ ]{1})([0-9A-Z]{1,3})/g
to
/([A-Z]{1,2}|[A-Z]{2}[A-Z0-9]{2})([ ]{1})([0-9A-Z]{1,3})/g
The matches (in bold) changes from :
AA1A 1AA
AA 12
11 AB
A 12
11 A
A 1
to :
AA1A 1AA
AA 12
11 AB
A 12
11 A
A 1
(notice 11 AB and 11 A are note matched)

Regex: limit both the whole match AND character classes within

Is it possible with regex to allow a match to have a maximum number of characters, divided between different character classes?
I need to match a number of up to 4 digits in total, with or without decimal digits. So these should all match the regex:
123
1234
12.34
123.4
But these should not:
12345
12.345
In concept, something like this should work, except it doesn't:
([0-9]{0,4}([.]?[0-9]{0,4})){0,4}
Use a look ahead to assert there's at most 1 dot:
^(?!([^.]\.){2})(?!\d{5})[\d.]{3,5}$
(?!([^.]\.){2}) means "looking ahead anywhere, there aren't 2 dots
(?!\d{5}) means "looking ahead, there aren't 5 straight digits"
[\d.]{3,5} means "3-5 of digits and dots"
See live demo.
To restrict decimal digits to maximum 2, add a (?!.*\.\\d{3,}$) which is a negative look ahead for "dot then 3+ digits at the end", ie:
^(?!([^.]\.){2})(?!\d{5})(?!.*\.\\d{3,}$)[\d.]{3,5}$
See live demo.
It's not pretty, but you can do it like this:
(\d{1,4}|\d{0,3}\.\d|\d{0,2}\.\d{0,2}|\d\.\d{0,3})
Just make sure that you have some boundary control character around it.
Say like this:
(?:^|[^\d.])(\d{1,4}|\d{0,3}\.\d|\d{0,2}\.\d{0,2}|\d\.\d{0,3})(?:$|[^\d.])
You can see here that it works as intended.
I would however advice to use another tool for this specific case.
The following regex should do it ...
\b(?:\d{1,3}\.\d{1,2}|\d{1}\.\d{1,3}|(?<!\.)\d{1,4}(?!\.))\b
see regex demo / explanation
If your regex flavor accept it, you could use lookahead like:
Edit: allow max 2 decimal
^(?:\d{1,4}|(?=.{1,5}$)\d+\.\d{1,2})$
Explanation:
^ : begining of string
(?: : start non capture group
\d{1,4} : 1 up to 4 digit
| : OR
(?= : lookahead
.{1,5}$ : 1 up to 5 character (it could be .{3,5} if at least 1 digit is mandatory on each side of the dot)
) : end lookahead
\d+ : 1 or more digits, integer part
\. : dot
\d{1,2} : 1 or 2 digits, decimal part
) : end group
$ : end of string
var test = [
123,
1234,
12.34,
123.4,
12345,
12.345,
1.234
];
console.log(test.map(function (a) {
return a+' :'+/^(?:\d{1,4}|(?=.{1,5}$)\d+\.\d{1,2})$/.test(a);
}));

Need regex to match 1 or more of exactly n-digit numbers

I need a regex to match a series of one or more n-digit numbers, separated by comma, ie:
abc12345def returns 12345
abc12345,23456def returns 12345,23456
so far I got this: \d{5}(,\d{5})*
problem is it also matches in cases like these:
123456 returns 12345, but I need it not to match if the number is longer than 5. So I need numbers of exactly 5 digits, and if a number is shorter or longer it's a no-match
Thanks
Which language are you using for your regexes? You want to put non-digit markers around your \d{5}'s; here is the Perl syntax (with a negative look-ahead/look-behind fix by Lukasz):
(?<![\d,])\d{5}(,\d{5})*(?![\d,])
Actually I think I got it! (?<!\d)\d{5}(?!\d)(,(?<!\d)\d{5}(?!\d))*
I used the look-ahead and look-behind
Thanks.
You could use this one:
/\D?\d{5}(?:,\d{5})?\D?/
explanation:
/ : regex delimiter
\D? : non digit optionnal
\d{5} : 5 digits
(?: : begining of non-capture group
,\d{5} : comma and 5 digits
)? : end of group optionnal
\D? : non digit optionnal
/ : regex delimiter