How can I make regex find a string a certain distance away?

How can I make regex find a string a certain distance away? - regex

For example, this is what I came up with so far
lasts{0,1}.*?(\d).*?doggs
The beginning part could be either last or lasts with an s.
Now, I want to look a maximum of 10 characters ahead of wherever it finds lasts{0,1} If it finds a digit within those 10 characters, look again to see if anywhere within a maxmimum of 10 characters is the string doggs
Is this even possible?
This is an example
So I figure if I use them about 7-8 hours a day they should last about 5.8 doggs. That works out
I want to only get the 5

You can use some more limiting quantifiers:
lasts?.{0,10}?(\d).{0,10}doggs
^^^^^^^^ ^^^^^^^
See the regex demo
Pattern explanation:
lasts? - match either last or lasts
.{0,10}? - match 0 to 10 characters as few as possible other than a newline (use DOTALL modifier to also match a newline)
\d - a digit
.{0,10} - see above
doggs - match a literal character sequence doggs.

`lasts{0,1}.{0,10}\d.{0,10}doggs`
The lasts{0,1} can be replaced by lasts?.

Related

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.

Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.

An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

RegEx to check 24 hours time format fails

I have the following RegEx that is supposed to do 24 hours time format validation, which I'm trying out in https://rubular.com
/^[0-23]{2}:[0-59]{2}:[0-59]{2}$/
But the following times fails to match even if they look correct
02:06:00
04:05:00
Why this is so?

In character classes, you're supposed to denote the range of characters allowed (in contrast to the numbers you want to match in your example). For minutes and seconds, this is relatively straight-forward - the following expression
[0-5][0-9]
...will match any numerical string from "00" to "59".
But for the hours, you need to two separate expressions:
[01][0-9]|2[0-3]
...one to match "00" to "19" and one to match "20" to "23". Due to the alternative used (| character), these need to be grouped, which adds another bit of syntax (?:...). Finally we're just adding the anchors ^ and $ for beginning and end of string, which you already had where they belong.
^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$
You can check this solution out at regex101, if you like.

Your problem is that you understand characters ranges wrong: 0-23 doesn't mean "match any number from 0 to 23", it means: 0-2- match one digit: 0,1 or 2, then match 3.
Try this pattern: (?:[01][0-9]|2[0-3])(?::[0-5][0-9]){2}
Explanation:
(?:...) - non-capturing group
[01][0-9]|2[0-3] - alternation: match whether 0 or one followed by any digits fro 0 to 9 OR 2 followed by 0, 1, 2 or 3 (number from 00-23)
(?::[0-5][0-9]){2} - match : and [0-5][0-9] (basically number from 00-59) twice
Demo

use this (([0-1]\d|[2][0-3])):(([0-5][0-9])):(([0-5][0-9]))
Online demo

Regex ignore first x characters and then match pattern

String = '11111111111110000000000000000000110000000000000011111111111111111111111111111111110011111111111110000011110000011111111111110000000000011111111111111111010001111111111111111111110011111111111111111111111111110111112111121111111111111111111000011000001011111111111101022111101111001111111111110000001000000111111111111111000000000000011111111111111100011111111001011111111100000000000000000000000000000000100111001000000000000000000011000000000000001111111000000000000000000000000000000000001111100000000000000000000011000000000000000000000010000000000333333333'
I want a pattern to take out 10 characters after the first 100 so i want to have 100 - 110 then I want to compare that one and see if that string with a length of 10 have 4 zeros in a row.
How can I do this with only Regex? I have been using substring before.

You could use this:
^.{100}(?=.{0,6}0000)(.{10})
Explanation:
^: matches the start of the string to avoid that the pattern is used anywhere in the input
.{100}: match 100 characters
(?= ): look ahead. This does not capture, but just verifies something that is still ahead.
.{0,6}: 0 to 6 characters
0000: literally 4 zeroes
(.{10}): 10 characters, this time they are captured and can be referenced back with \1 or $1 depending on the flavour of regex.

The above answer is perfect. But that matches all the characters including first 100.
In case of ignoring first 100, we can use
(?<=.{100})
To check the required pattern in last 10 characters after first 100 only, we can use
(?<=.{100})(?=.{0,6}0000)(.{10})
You can test it here
Update : I checked the link today. It's taking somewhere else.

Regex - matching while ignoring some characters

I am trying to write a regex to max a sequence of numbers that is 5 digits long or over, but I ignore any spaces, dashes, parens, or hashes when doing that analysis. Here's what I have so far.
(\d|\(|\)|\s|#|-){5,}
The problem with this is that this will match any sequence of 5 characters including those characters I want to ignore, so something like "#123 " would match. While I do want to ignore the # and space character, I still need the number itself to be 5 digits or more in order to qualify at a match.
To be clear, these would match:
1-2-3-4-5
123 45
2(134) 5
Bonus points if the matching begins and ends with a number rather than with one of those "special characters" I am excluding.
Any tips for doing this kind of matching?

If I understood requirements right you can use:
^\d(?:[()\s#-]*\d){4,}$
RegEx Demo
It always matches a digit at start. Then it is followed by 4 or more of a non-capturing group i.e. (?:[()\s#-]*\d) which means 0 or more of any listed special character followed by a digit.

So just repeat a digit, followed by any other sequence of allowed characters 5 or more times:
^(\d[()\s#-]*){5,}$
You can ensure it ends on a digit if you subtract one of the repetitions and add an explicit digit at the end:
^(\d[()\s#-]*){4,}\d$

You can suggest non-digits with \D so et would be something like:
(\d\D*){5,}
Here is a guide.

Regex to capture some ID from URL if that URL does not contain banned text

I have the following regex that I created to locate a 10 digit id (ideally it would not consider set of digits that more than 10 e.g. id=12345678901). After it finds the last set of 10 digits, it would trash everything that comes after it EXcept when it hits brackets or quotes. In that case it would just stop.
www.site1\.com\/((?!someid\=12345name).)*([0-9]{10})[^\"\'\[\]\n\s]*
However, in examples like below, it does not stop at a bracker or quiotation after the 10 digit number and keeps going untill it find another one:
[URL='http://www.site1.com/path/445-453/L?test=3456&test2=333629710&item=1058371930']Some Title of This URL[/URL]or [URL='http://www.site1.com/path/445-453/L?test=3456&test2=333629710&item=2932475321']Some url title 2[/URL]
See live url for more examples: http://regex101.com/r/pG5fA4/2
FYI - notice some links have the same parameters with 10 digit ids in it. As it is now, I would like it to select only the last set of 10 digits as long as it does not go over looking after brackets or quotations.
Thanks!

* is a greedy operator. Because of the greedy operator, .* will match all characters (except newline) until it reaches the last set of digits at the very end of the string. Use *? for a non-greedy match. This guarantees that the quantified dot will only match as many characters as needed for the pattern to succeed.
((?!someid\=12345name).)*?([0-9]{10})
^
If you want the set of digits before the last &, ' [ or ] you can use a lookahead.
www\.site1\.com/((?!someid=12345name).)*?([0-9]{10})(?=[\[\]'\s]|&[^&]*\n)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I make regex find a string a certain distance away? - regex

`lasts{0,1}.{0,10}\d.{0,10}doggs` The lasts{0,1} can be replaced by lasts?.

Related

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

RegEx to check 24 hours time format fails

Regex ignore first x characters and then match pattern

Regex - matching while ignoring some characters

Regex to capture some ID from URL if that URL does not contain banned text

Categories

Resources