Regex exact length of whole string - regex

I want to match a string of exact 3 length. I am using the following regex
("\\d?[A-Za-z]{2,3}\d?")
Here the string can have 1 digit either at start or at end of the string, or the string can have 3 letters.Is there any way to define length of the matching string like :
("(\\d?[A-Za-z]{2,3}\d?){3}") // it does not work
I have another solution of it.
("(\\d[A-Za-z]{2})|([A-Za-z]{2}\\d)|([A-Za-z]{3})")
But I just want to know if there is any way to define length of whole matching string.

^.{3}$
If this isn't really your answer you need to specify it better. You have zero solutions not several. What exactly are you trying to match. Give a couple examples.
http://www.regexplanet.com/advanced/java/index.html
^(\d[a-zA-Z]{2}|[a-zA-Z]{2}\d|[a-zA-Z]{3})$
If you want that letters and numbers thing.
If you want the extra stuff at the end to be possible without the string being over you can just look for the space afterwards.
^(\d[a-zA-Z]{2}|[a-zA-Z]{2}\d|[a-zA-Z]{3})\s
From the comments:
So it's
^[^\s]{3}\s\d{7}\s.\d{6}
? -- '^' start of line, '[^\s]' not a space. '{3}' three of those. '\s' a space. '\d' a digit. '{7}' seven of those. '\s' a space. '.' some character. '\d' a digit. '{6}' of those.
Regex is basically just programmatically a way of describing what you're looking for. If you can properly form the question of what you want to match it's easy to write that directly in regex.

Your three solutions will match also longer strings. I suggest you to use word boundary (\b) or line boundary (^ and $):
\b([a-zA-Z]{2}\d|\d[a-zA-Z]{2}|[a-zA-Z]{3})\b
or
^([a-zA-Z]{2}\d|\d[a-zA-Z]{2}|[a-zA-Z]{3})$
based on the specific usage.
EDIT: fixed the regex, matching also 3 digits.

Related

Regular expression for extracting substring in the middle of a string

Looking for a regular expression that extracts multiple characters, at different locations, in my string. For example, the string I'm working with is 5490028400316201600008 and it will always be this same length, but the numbers can change.
I would like to extract the first 9 characters, then skip the next 8, extract the next 4, then ignore the last character. The resulting string would be 5490028400000 in this case. I can't seem to find an easy way to do this and I'm fairly new to regular expressions. Thanks in advance for your advice/help.
First of all, this seems more appropiate for substring functions, they are usually faster and not so error-prone. However, for a learning purpose, you could come up with sth. like:
(.{9}).{8}(.{4}).
This matches any (not only digits, that is - for digits use \d instead) character 9 times, saves it in a group, matches another 8 characters which will not be saved, and will finally match another 4 characters into the second group.
Concenate $1 and $2 (5490028400000 in your case) and you should be fine.
See this demo on regex101.com.

Regex : Find a number between space

I am trying to extract a zip code of six numbers starting with the number 4 from a string. Right now I am using [4][0-9]{5}, but it is also matching starting from other numbers, like 020-25468811 and it's returning 468811. I don't want it to search in the middle of a number, only full numbers.
Try to use the following:
(?<!\d)4\d{5}(?!\d)
I.e. find 6-digit number starting with 4 and not preceded or followed by digit.
Your expression right now tries to match any six numbers consisting of a 4 with five numbers between 0 and 9. To fix this behavior you should add word boundaries as per Jon's suggestion.
\b[4][0-9]{5}\b
More on word boundaries here: http://www.regular-expressions.info/wordboundaries.html
You could simply add a space to the beginning of your regular expression " 4[0-9]{5}". If you need a more universal way of finding the beginning of the number (could it maybe be also be tabulator, a newline, etc?) you should have look at the predefined character class \s. Also have a look at boundary matchers. I dont know which language you are using, but regex work very similar in most languages. Check this Java regex documentation.
There is a start of line character in regex: ^
You could do:
^4[0-9]{5}
If the numbers are not always in the beginning of a line, you can more generally use:
\<4[0-9]{5}\>
To match only whole words.
Both examples work with egrep.

Regex: Accepting space anywhere but at the beginning

I'm working with Python bindings for Qt4.8 on OS X.
I want to accept any digit and a few other chars AND white space.
String can be empty or at any length.
What I don't want is, for the string to being or end with white space.
My working example: '[0-9pqw\+\-\*\#\(\)\.][0-9pqw\+\-\*\# \(\)\.]*'
However, I don't want to repeat two blocks one containing space one does not. There should be a better way I guess, employing [^ ], but how?
Second question:
If I want to limit strings total length, how would I do it?
Thank you.
You could use negative lookarounds at the beginning and end of the pattern:
^(?![ ])[0-9pqw+*# ().-]*(?<![ ])$
Note that the brackets are not necessary but aid readability. Neither are any of your escapes (as long as you put the - at the end).
Does this not do what you want?
import re
re.match('^[^\W].*[^\W]$', ' aaa ')
(Where the last arg is your test string).
If you want to ensure the length is less than a certain amount use curly braces. One character is already spent testing the first and last chars of the test string with the inclusion of the [^\W] notation. So in this example, there is a match when there are no spaces at either side and when the test string is no longer than 4 characters.
re.match('^[^\W].{1,2}[^\W]$', 'aaaa')

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.

How to make a regular expression looking for a list of extensions separated by a space

I want to be able to take a string of text from the user that should be formated like this:
.ext1 .ext2 .ext3 ...
Basically, I am looking for a dot, a string of alphanumeric characters of any length a space, and rinse and repeat. I am a little confused on how to say " i need a period, string of characters and a space". But also, the last extension could either be followed by nothing, or a space, or a series of spaces. Also, I guess in between extensions could be followed by any number of spaces?
EDIT: I made it clearer what I was looking for.
Thanks!
Try this:
^(?:\.[A-Za-z0-9]+ +)*\.[A-Za-z0-9]+ *$
(Rubular)
In a Java string literal you need to escape the backslashes:
"^(?:\\.[A-Za-z0-9]+ +)*\\.[A-Za-z0-9]+ *$"
(\.\w+)\s* Match this and get your results.
^((\.\w+)\s*)*$ Check this and if it's true, your String is exactly what you want.
For the last pattern thing, you can't (AFAIK) do both getting all extensions (separated) and checking that the last is followed by other things. Either you check your string, or you extract the extensions from it.
I'd start with something like: ^.[a-z0-9]+([\t\n\v ]+.[a-z0-9]+)*$