Regex extract first portion of four numbers starting from specific position - regex

How do i extract four numbers starting after the 8th number which is dynamic from the following strings using regex.
20190715171712904_10008_file_activate_10.20.30.4000233223456_name.unl
20190715141712904_10008_runco_activate_10.20.30.40_name.unl
From first string i want 1717
From second string i want 1417
I have tried to write regex queries in https://regex101.com/ i.e.
I have tried ^\d{8}([0-9]{4})$ but not working.

Drop the $. It forces the expression to look for the end of the string after your 4 digits, which it is not. The answer will be in the first subgroup capture. Note you can use \d for the second [0-9] as well.
If your language supports look-behinds, you can capture your digits as the main capture, instead of a subgroup:
(?<=^\d{8})\d{4}
This is really not a problem for a regular expression though - getting the substring indexed from index 4 to index 7 including (0 indexed) is basic and faster in any language.

Related

Regular expression for extracting substring in the middle of a string

Looking for a regular expression that extracts multiple characters, at different locations, in my string. For example, the string I'm working with is 5490028400316201600008 and it will always be this same length, but the numbers can change.
I would like to extract the first 9 characters, then skip the next 8, extract the next 4, then ignore the last character. The resulting string would be 5490028400000 in this case. I can't seem to find an easy way to do this and I'm fairly new to regular expressions. Thanks in advance for your advice/help.
First of all, this seems more appropiate for substring functions, they are usually faster and not so error-prone. However, for a learning purpose, you could come up with sth. like:
(.{9}).{8}(.{4}).
This matches any (not only digits, that is - for digits use \d instead) character 9 times, saves it in a group, matches another 8 characters which will not be saved, and will finally match another 4 characters into the second group.
Concenate $1 and $2 (5490028400000 in your case) and you should be fine.
See this demo on regex101.com.

Backreferencing something without putting it in the rest of the expression

I am trying to make a regular expression that will match all words that have a letter that repeats at least an arbitrary number of times.
For example, if I want to match words that have a letter that repeats at least 3 times, I would want to match words like
applepie banana insidious
I want to be able to change the number of repeats I'm looking for by just changing one number in my expression, so expressions that only work for a certain number of repeats are not what I'm looking for.
Currently, this is what I'm using
^(?=.*(.))(?=(.*\1){4}).*$
Where 4 is the number of repeats, a number that I can change to whatever number of repeats I'm looking for.
The above regular expression appears to work, but using a lookahead just so I can use a capturing group seems very unwieldy, and so I'm looking for a better way to solve this problem.
This will eliminate one lookahead:
\b(?=\w*(\w)(\w*\1){2})\w*
Start of word, then any number of word-characters such that they consist of any number of word characters, a particular word character, and then any number of characters and that character again, repeated at least twice.
For four repetitions, use {3} (for n repetitions, use one less).
Also, feel free to replace \b... with ^...$ as you were doing if you meant to match whole lines and not words in text.
You can use this regex:
\b\w*?(\w)(?=(?:\w*?\1){2})\w*\b
RegEx Demo
Where 2 is n-1 for n repetitions you're trying to find in a complete word.

Regex : Find a number between space

I am trying to extract a zip code of six numbers starting with the number 4 from a string. Right now I am using [4][0-9]{5}, but it is also matching starting from other numbers, like 020-25468811 and it's returning 468811. I don't want it to search in the middle of a number, only full numbers.
Try to use the following:
(?<!\d)4\d{5}(?!\d)
I.e. find 6-digit number starting with 4 and not preceded or followed by digit.
Your expression right now tries to match any six numbers consisting of a 4 with five numbers between 0 and 9. To fix this behavior you should add word boundaries as per Jon's suggestion.
\b[4][0-9]{5}\b
More on word boundaries here: http://www.regular-expressions.info/wordboundaries.html
You could simply add a space to the beginning of your regular expression " 4[0-9]{5}". If you need a more universal way of finding the beginning of the number (could it maybe be also be tabulator, a newline, etc?) you should have look at the predefined character class \s. Also have a look at boundary matchers. I dont know which language you are using, but regex work very similar in most languages. Check this Java regex documentation.
There is a start of line character in regex: ^
You could do:
^4[0-9]{5}
If the numbers are not always in the beginning of a line, you can more generally use:
\<4[0-9]{5}\>
To match only whole words.
Both examples work with egrep.

Adding minimum characters to this regex

I currently have the following regex
".*[0-9].*"
The above makes sure that the text has a number in it. I would also like to add the condition of minimum length to it say 8 characters. How can I add another condition to the above expression making sure that there are at least 8 characters in the text ?
Solution
You can use Positive Lookaheads to validate a string before capturing it.
Regex
(?=^.{8,}$)(?=^.*\d)^.*$
Explanation
The syntax for a Positive Lookahead is like so: (?=REGEX)
In the regex above, I have specified ^.{8,}$ inside the first Lookahead. This means that the string MUST have a MINIMUM of 8 characters from start to finish in order to pass validation.
The second positive lookahead has ^.*\d. This means that the string can begin with any characters, but there must be a digit somewhere in the string, otherwise it will not pass validation.
The last bit is simply "match everything" ^.*$, because if it passed the initial validation, then we want to capture it.
Demonstration
Regex101 Example
The regex for checking minimum and maximum element goes as follow
{2} contains 2 element
{2,5} contains 2 to 5 element
{2,} contains 2 or more
You can also check this PDF file for more information
Good luck

How to optimise this regex to match string (1234-12345-1)

I've got this RegEx example: http://regexr.com?34hihsvn
I'm wondering if there's a more elegant way of writing it, or perhaps a more optimised way?
Here are the rules:
Digits and dashes only.
Must not contain more than 10 digits.
Must have two hyphens.
Must have at least one digit between each hyphen.
Last number must only be one digit.
I'm new to this so would appreciate any hints or tips.
In case the link expires, the text to search is
----------
22-22-1
22-22-22
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
666666-7777777-1
88888888-88888888-1
1-1-1
88888888-88888888-22
22-333-
333-22
----------
My regex is: \b((\d{1,4}-\d{1,5})|(\d{1,5}-\d{1,4}))-\d{1}\b
I'm using this site for testing: http://gskinner.com/RegExr/
Thanks for any help,
Nick
Here is a regex I came up with:
(?=\b[\d-]{3,10}-\d\b)\b\d+-\d+-\d\b
This uses a look-ahead to validate the information before attempting the match. So it looks for between 3-10 characters in the class of [\d-] followed by a dash and a digit. And then after that you have the actual match to confirm that the format of your string is actually digit(dash)digit(dash)digit.
From your sample strings this regex matches:
22-22-1
333-333-1
333-4444-1
4444-4444-1
4444-55555-1
55555-4444-1
1-1-1
It also matches the following strings:
22-7777777-1
1-88888888-1
Your regexp only allows a first and second group of digits with a maximum length of 5. Therefore, valid strings like 1-12345678-1 or 123456-1-1 won't be matched.
This regexp works for the given requirements:
\b(?:\d\-\d{1,8}|\d{2}\-\d{1,7}|\d{3}\-\d{1,6}|\d{4}\-\d{1,5}|\d{5}\-\d{1,4}|\d{6}\-\d{1,3}|\d{7}\-\d{1,2}|\d{8}\-\d)\-\d\b
(RegExr)
You can use this with the m modifier (switch the multiline mode on):
^\d(?!.{12})\d*-\d+-\d$
or this one without the m modifier:
\b\d(?!.{12})\d*-\d+-\d\b
By design these two patterns match at least three digits separated by hyphens (so no need to put a {5,n} quantifier somewhere, it's useless).
Patterns are also build to fail faster:
I have chosen to start them with a digit \d, this way each beginning of a line or word-boundary not followed by a digit is immediately discarded. Other thing, using only one digit, I know the remaining string length.
Then I test the upper limit of the string length with a negative lookahead that test if there is one more character than the maximum length (if there are 12 characters at this position, there are 13 characters at least in the string). No need to use more descriptive that the dot meta-character here, the goal is to quickly test the length.
finally, I describe the end of string without doing something particular. That is probably the slower part of the pattern, but it doesn't matter since the overwhelming majority of unnecessary positions have already been discarded.