regex doesn't catch the group - regex

Could anyone tell why group 1 below only catch "400" not "123"?
thanks!
pattern:
((\d+)\s*)+.*LC\s*$
text:
123 400 LC
"123 " match "((\d+)\s*)+" and others match ".LC\s$" seems to be works too? but why regex don't use this?

Your current approach doesn't capture each sequence of digits separately because you are repeating a pattern which leaves one match (last match) to be in capturing group. Besides, without repeating pattern, regex flavor wouldn't re-try a match while following characters are all consumed with .*.
If your flavor supports lookaheads you are in luck:
(\d+)\s*(?=.*LC\s*$)
Live demo

you are specifying that after the (("one or more digits")"whitespace") there will be any characters maybe and then an "LC"
if you want to catch all of the numbers you can just say (\d+)
you can check your stuff here http://pythex.org/

/(\d)+/g
This regex will work if you only want to capture the numbers in the given text.
Edit: If you want to consider LC and space as well, then try this regex:
/(\d+[^LC\s*])/g
Explanation for the same:

Related

Regex: match only first instance of a pattern

Using a regex for a string, we need to remove all text before the first instance of four digits in a row. We have a regex that "sort of" works:
^((?!\d{4}\w).)*
Given this string:
foo-bar-spring_06-2006_02_25.rm
the desired output is:
2006_02_25.rm
That works - if there's only one instance of a four-digit pattern. The string:
batt-fall_01-2001-11-10_0200-0400.rm produces this result: 0400.rm
It should produce:
2001-11-10_0200-0400.rm
Note: long story, but we cannot use a - or _ as a delimiter.
I feel like we're close. Does anyone have any suggestions?
Thanks!
You can use a positive lookahead pattern after a lazily repeated . instead:
^.*?(?=\d{4})
Demo: https://regex101.com/r/8DZDQp/1
Alternatively, you can group the 4 digits:
^.*?(\d{4})
and substitute the match with the first group $1.
Demo: https://regex101.com/r/8DZDQp/3
A likely faster option would be to ignore the beginning and undesired part, without using lookarounds, and with a simple expression similar to:
(\d{4}.*\..+)$
or:
(\d{4}.*\.[a-z]+)$
End $ anchor is also unnecessary, without which it would still work.
Demo

Capture number between two whitespaces (RegEx)

I have the following data:
SOMEDATA .test 01/45/12 2.50 THIS IS DATA
and I want to extract the number 2.50 out of this. I have managed to do this with the following RegEx:
(?<=\d{2}\/\d{2}\/\d{2} )\d+.\d+
However that doesn't work for input like this:
SOMEDATA .test 01/45/12 2500 THIS IS DATA
In this case, I want to extract the number 2500.
I can't seem to figure out a regex rule for that. Is there a way to extract something between two spaces ? So extract the text/number after the date until the next whitespace ? All I know is that the date will always have the same format and there will always be a space after the text and then a space after the number I want to extract.
Can someone help me out on this ?
Capture number between two whitespaces
A whitespace is matched with \s, and non-whitespace with \S.
So, what you can use is:
\d{2}\/\d{2}\/\d{2} +(\S+)
^^^
See the regex demo
The 1+ non-whitespace symbols are captured into Group 1.
If - for some reason - you need to only get the value as a whole match, use your lookbehind approach:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Or - if you are using PCRE - you may leverage the match reset operator \K:
\d{2}\/\d{2}\/\d{2} +\K\S+
^^
See another demo
NOTE: the \K and a capture group approaches allow 1 or more spaces after the date and are thus more flexible.
I see some people helped you already, but if you would want an alternative working one for some reason, here's what works too :)
.+ \d+\/\d+\/\d+ (\d+[\.\d]*)
So the .+ matches anything plus the first space
then the \d+/\d+/\d+ is the date parsing plus a space
the capturing group is the number, as you can see I made the last part optional, so both floating point values and normal values can be matched. Hope this helped!
Proof: https://regex101.com/r/fY3nJ2/1
Just make the fractal part optional:
(?<=\d{2}\/\d{2}\/\d{2} )\d+(?:\.\d+)?
Demo: https://regex101.com/r/jH3pU7/1
Update following clarifications in comments:
To match anything (but space) surrounded by spaces and prepended by date use:
(?<=\d{2}\/\d{2}\/\d{2} )\S+
Demo: https://regex101.com/r/jH3pU7/3
Rather than capture, you can make your entire match be the target text by using a look behind:
(?<=\d\d(\/\d\d){2} )\S+
This matches the first series of non-whitespace that follows a "date like" part.
Note also the reduction in the length of the "date like" pattern. You may consider using this part of the regex in whatever solution you use.

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

Regex Greediness

I have a perl regex that i'm fairly certain should work (perl) but is being too greedy:
regex:
(?:.*serial[^\d]+?(\d+).*)
Test string:
APPLICATIONSERIALNO123456Plnsn123456te20140728tdrnserialnun12hou
Desired group 1 match:
123456
Actual group 1 Match:
12
I've tried every permutation of lookahead and behind and laziness and I can't get the damn thing to work.
WHAT AM I MISSING.
Thanks!
The Problem is Not Greediness, but Case-Sensitivity
Currently your regex matches the 12 at the end of serialnun12, probably because it is case-sensitive. We have two options: using upper-case, or making the pattern case-insensitive.
Option 1: Use Upper-Case
If you only want 123456, you can use:
SERIALNO\K\d+
The \K tells the engine to drop what was matched so far from the final match it returns.
If you want to match the whole string and capture 123456 to Group 1, use:
.*?SERIAL\D+(\d+).*
Option 2: Turning Case-Sensitivity On using (?i) inline or the i flag
To only match 123456, you can use:
(?i)serial\D+\K\d+
Note that if you use the g flag, this would match both numbers.
If you want to match the whole string and capture 123456 to Group 1, use:
(?i).*?serial\D+(\d+).*
A few tips
You can turn case-insensitivity either with the (?i) inline modifier or the i flag at the end of the pattern: /serial\D+\K\d+/i
Instead of [^\d], use \D
There is no need for a lazy quantifier in something like \D+\d+ because the two tokens are mutually exclusive: there is no danger that the \D will run over the \d
The problem is not greediness; it's case-sensitivity.
Currently your regex matches the 12 at the end of serialnun12 because those are the only digits following serial. The ones you want follow SERIAL. S and s are different characters.
There are two solution.
Use the uppercase characters in the pattern.
my ($serial) = $string =~ /SERIAL\D*(\d+)/;
Use case-insensitive matching.
my ($serial) = $string =~ /serial\D*(\d+)/i;
There's probably no need for this, but I thought I'd mention it just in case.

Regex to match N-NN-NN

I need some help with a RegEx pattern match.
How do i write a regex if i want it to match
N-NN-N-NN-NN-N-NNN
but also
N-NN-NN-NN
Exmaple:
10pcs- ratchet spanner combination wrench 6-8-10-11-12-13-14-15-17-19
Cr-v,heated 12pcs-1/4dr 4-4.5-5-5.5-6-7-8-9-10-11-12-13 Cr-v,heated
17pcs-1/2dr 10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-27-30
Cr-v,heated 1-2-33 Cr-V heater 1-.2-1-4
It needs to match where they is at least 2 - in the total string. So a phone number like this 020-11223344 is not to be matched.
The strings almost always look like this 6-8-10-11-12-13-14-15-17-19 , except sometimes a . can apper before a number, they also differ in length, is it possible?
I came up with this so far but it also matches on phone numbers and when a . appears it doenst match at all.
(\d-[^>])
On this page you can find the different patters: http://www.cazoom.nl/en/partij-aanbod/186-pcs-working-tools-trolly-3
What about this pattern:
[\d.]+(?:-[\d.]+){2,}
Match [\d.]+ if followed by at least 2x -[\d.]+
(?: Using a non capturing group for repetition.
test at regex101
The following regex will match the thing.
(?:\.?\d\.?\d?-){2,}\.?\d\.?\d?
Debuggex Demo
Just try with following regex:
^\d-\d{2}-\d(\d-\d{2})|(\d-\d{2}-\d-\d{3})$