Using a regex for a string, we need to remove all text before the first instance of four digits in a row. We have a regex that "sort of" works:
^((?!\d{4}\w).)*
Given this string:
foo-bar-spring_06-2006_02_25.rm
the desired output is:
2006_02_25.rm
That works - if there's only one instance of a four-digit pattern. The string:
batt-fall_01-2001-11-10_0200-0400.rm produces this result: 0400.rm
It should produce:
2001-11-10_0200-0400.rm
Note: long story, but we cannot use a - or _ as a delimiter.
I feel like we're close. Does anyone have any suggestions?
Thanks!
You can use a positive lookahead pattern after a lazily repeated . instead:
^.*?(?=\d{4})
Demo: https://regex101.com/r/8DZDQp/1
Alternatively, you can group the 4 digits:
^.*?(\d{4})
and substitute the match with the first group $1.
Demo: https://regex101.com/r/8DZDQp/3
A likely faster option would be to ignore the beginning and undesired part, without using lookarounds, and with a simple expression similar to:
(\d{4}.*\..+)$
or:
(\d{4}.*\.[a-z]+)$
End $ anchor is also unnecessary, without which it would still work.
Demo
Related
What regex could I use if I wanted to match bar-100 from foo/bar-100-baz. The original string could be longer with more hyphens.
Total regex beginner don't really have a start..
\/([^-]+) matches bar but I want to match the second hyphen somehow.
If a full-match might be desired, then
(?<=/)[a-z]+-\d+
Demo 1
or,
[a-z]+-\d+(?=-)
Demo 2
or,
[^/]+(?=-)
Demo 3
might also work OK.
I'm trying to match on a list of strings where I want to make sure the first character is not the equals sign, don't capture that match. So, for a list (excerpted from pip freeze) like:
ply==3.10
powerline-status===2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
psutil==4.0.0
ptyprocess==0.5.1
I want the captured output to look like this:
==3.10
==4.0.0
==0.5.1
I first thought using a negative lookahead (?![^=]) would work, but with a regular expression of (?![^=])==[0-9]+.* it ends up capturing the line I don't want:
==3.10
==2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
==4.0.0
==0.5.1
I also tried using a non-capturing group (?:[^=]) with a regex of (?:[^=])==[0-9]+.* but that ends up capturing the first character which I also don't want:
y==3.10
l==4.0.0
s==0.5.1
So the question is this: How can one match but not capture a string before the rest of the regex?
Negative look behind would be the go:
(?<!=)==[0-9.]+
Also, here is the site I like to use:
http://www.rubular.com/
Of course it does some times help if you advise which engine/software you are using so we know what limitations there might be.
If you want to remove the version numbers from the text you could capture not an equals sign ([^=]) in the first capturing group followed by matching == and the version numbers\d+(?:\.\d+)+. Then in the replacement you would use your capturing group.
Regex
([^=])==\d+(?:\.\d+)+
Replacement
Group 1 $1
Note
You could also use ==[0-9]+.* or ==[0-9.]+ to match the double equals signs and version numbers but that would be a very broad match. The first would also match ====1test and the latter would also match ==..
There's another regex operator called a 'lookbehind assertion' (also called positive lookbehind) ?<= - and in my above example using it in the expression (?<=[^=])==[0-9]+.* results in the expected output:
==3.10
==4.0.0
==0.5.1
At the time of this writing, it took me a while to discover this - notably the lookbehind assertion currently isn't supported in the popular regex tool regexr.
If there's alternatives to using lookbehind to solve I'd love to hear it.
I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?
Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.
I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1
Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3
Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.
I just started to learn Regex but I'm struggling to get the first date of the following string:
gs://dcdt_-dcm_account/dcm_account_click_2016070510_20160631_165654_2592254.csv.gz
I want to get 20160705
Any Ideas?
Try using this regex:
^.*dcm_account_click_(\d{8}).*$
The (\d{8}) term is a capture group, and tells the regex engine to extract it and make it available.
\d{8} matches 8 numbers in sequence, which is what you are after.
Demo:
Regex101
You can use a regex like this:
k_(\d{8})
or with a positive lookbehind
(?<=k_)(\d{8})
And then access to the capturing group.
Working demo
Btw, if you just use (\d{8}) and look for the first match it also will work.
If there are no digits before the first date, you might very well get along with:
^\D*(\d+) # beginning of the line/string, followed by NON-digits
# capture the digits afterwards
See a demo on regex101.com.
I need some help with a RegEx pattern match.
How do i write a regex if i want it to match
N-NN-N-NN-NN-N-NNN
but also
N-NN-NN-NN
Exmaple:
10pcs- ratchet spanner combination wrench 6-8-10-11-12-13-14-15-17-19
Cr-v,heated 12pcs-1/4dr 4-4.5-5-5.5-6-7-8-9-10-11-12-13 Cr-v,heated
17pcs-1/2dr 10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-27-30
Cr-v,heated 1-2-33 Cr-V heater 1-.2-1-4
It needs to match where they is at least 2 - in the total string. So a phone number like this 020-11223344 is not to be matched.
The strings almost always look like this 6-8-10-11-12-13-14-15-17-19 , except sometimes a . can apper before a number, they also differ in length, is it possible?
I came up with this so far but it also matches on phone numbers and when a . appears it doenst match at all.
(\d-[^>])
On this page you can find the different patters: http://www.cazoom.nl/en/partij-aanbod/186-pcs-working-tools-trolly-3
What about this pattern:
[\d.]+(?:-[\d.]+){2,}
Match [\d.]+ if followed by at least 2x -[\d.]+
(?: Using a non capturing group for repetition.
test at regex101
The following regex will match the thing.
(?:\.?\d\.?\d?-){2,}\.?\d\.?\d?
Debuggex Demo
Just try with following regex:
^\d-\d{2}-\d(\d-\d{2})|(\d-\d{2}-\d-\d{3})$