Regex to match number specific number in a string - regex

I'm trying to fix a regex I create.
I have an url like this:
http://www.demo.it/prodotti/822/Panasonic-TXP46G20E.html
and I have to match the product ID (822).
I write this regex
(?<=prodotti\/).*(?<=\/)
and the result is "822/"
My match is always a group of numbers between two / /

You're almost there!
Simply use:
(?<=prodotti\/).*?(?=\/)
instead of:
(?<=prodotti\/).*(?<=\/)
And you're good ;)
See it working here on regex101.
I've actually just changed two things:
replaced that lookbehind of yours ((?<=\/)) by its matching lookahead... so it asserts that we can match a / AFTER the last character consumed by .*.
changed the greediness of your matching pattern, by using .*? instead of .*. Without that change, in case of an url that has several / following prodotti/, you wouldn't have stopped to the first one.
i.e., given the input string: http://www.demo.it/prodotti/822/Panasonic/TXP46G20E.html, it would have matched 822/Panasonic.

Related

Regex formation and Issue in Negation

I need to create two regex
One, for catching these type of strings:
/xyz-courses/test/test
/abc-courses/test-abc/test-xyz
/abc-courses/test-abc/test-xyz?itsok=yes
But I don't want to match these strings where fixed word is prepended with -courses:
/fixed-courses/test/test
/fixed-courses/test-abc/test-xyz
/fixed-courses/test-abc/test-xyz?itsok=yes
I have created the following REGEX, which is working perfectly fine, but not sure about case how to exclude the prepended word fixed
/([^/]+)-courses/([^/]+)/([^/]+)$
Second, I need to create REGEX to negate all regex created in previous step.
I tried:
[^/([^/]+)-courses/([^/]+)/([^/]+)]$
But this is showing invalid on all REGEX checkers.
You may use this regex to disallow fixed- before courses:
^/((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$
RegEx Demo
(?!fixed-) is a negative lookahead that will fail the match if fixed- appears right after / and before courses/.
For second part use this to negate first regex:
^/(?!((?!fixed-)[^/-]+)-courses/([^/]+)/([^/]+)$).+
RegEx Demo 2

Select Northings from a 1 Line String

I have the following string;
Start: 738392E, 6726376N
I extracted 738392 ok using (?<=.art\:\s)([0-9A-Z]*). This gave me a one group match allowing me to extract it as a column value
.
I want to extract 6726376 the same way. Have only one group appear because I am parsing that to a column value.
Not sure why is (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) giving me the entire line after S.
Helping me get it right with an explanation will go along way.
Because you used positive lookaheads. Those just make some assertions, but don't "move the head along".
(?=(art\:\s\s*)) makes sure you're before "art: ...". The next thing is another positive lookahead that you quantify with a star to make it optional. Finally you match anything, so you get the rest of the line in your capture group.
I propose a simpler regex:
(?<=(art\:\s))(\d+)\D+(\d+)
Demo
First we make a positive lookback that makes sure we're after "art: ", then we match two numbers, seperated by non-numbers.
There is no need for you to make it this complicated. Just use something like
Start: (\d+)E, (\d+)N
or
\b\d+(?=[EN]\b)
if you need to match each bit separately.
Your expression (?=(art\:\s\s*))(?=[,])*(.*[0-9]*) has several problems besides the ones already mentioned: 1) your first and second lookahead match at different locations, 2) your second lookahead is quantified, which, in 25 years, I have never seen someone do, so kudos. ;), 3) your capturing group matches about anything, including any line or the empty string.
You match the whole part after it because you use .* which will match until the end of the line.
Note that this part [0-9]* at the end of the pattern does not match because it is optional and the preceding .* already matches until the end of the string.
You could get the match without any lookarounds:
(art:\s)(\d+)[^,]+,\s(\d+)
Regex demo
If you want the matches only, you could make use of the PyPi regex module
(?<=\bStart:(?:\s+\d+[A-Z],)* )\d+(?=[A-Z])
Regex demo (For example only, using a different engine) | Python demo

Regex about url encoded string

Would like to write one regex to get the url encoded string in below line:
<topicref href="%E4%BA%B0.txt"/>
When I used a regex like (%[A-Z][0-9])+\.txt it only got %B0.txt. What can I do if I want to get the whole url encoded string such like %E4%BA%B0.txt.
Thanks a lot.
Proper URL encoding uses hex digits only, A-F not A-Z. The encoded URL could contain non-encoded characters anywhere. Also, you should escape the full stop.
((%[0-9A-F]{2}|[^<>'" %])+)\.txt
is a quick ad-hoc fix for your regex, though obviously for any production code, probably don't use a regex for this at all, or at the very least try a well-defined and properly tested URL regex like the one you can find in the HTTP RFC.
Putting the + quantifier outside the capturing parentheses will only return the last repetition. I added a second set of parentheses to put the quantifier inside the first capture group, which assumes you are doing something to extract the first capture group in particular. (If your regex dialect has non-capturing groups, you could change the second opening parenthesis to non-capturing, i.e. (?:.)
You need to change your regex to
([%\dA-Z]+)\.txt
([%\dA-Z]+) - Match %, digits and alphabets one or more time
\.txt - Match .txt
where as your regex means
(%[A-Z][0-9])+.txt
(%[A-Z][0-9])+
% - Match %
[A-Z] - Match A to Z one time
[0-9] - Match any digit one or more time
+ - Match the captured group one or more time
.txt - Match single character (anything except new line) followed by txt

Regex matching Cisco interface

I am trying to match Cisco's interface names and split it up. The regex i have so far is:
(\D+)(\d+)(?:\/)?(\d+)?(?:\.)?(\d+)?
This matches:
FastEthernet9
FastEthernet9/5
FastEthernet9/5.10
The problem i have is that it also matches:
FastEthernet9.10
Any ideas on how to make it so it does not match? Bonus points if it can match:
tengigabitethernet0/0/0.20
Edit:
Okay. I am trying to split this string up into groups for use in python. In the cisco world the first part of the string FastEthernet is the type of interface, the first zero is the slot in the equipment the zero efter the slash is the port number and the one after the dot is a sub-interface.
Because of how regex works i can't get dynamic groups like (?:\/?\d+)+ to match all numbers in /0/0/0 by them selves, but i only get the last match.
My current regex (\D+)(\d+)(?:((?:\/?\d+)+)?(?:(?:\.)?(\d+))?) builds on murgatroid99's but groups all /0/0/0 together, for splitting in python.
My current result in python with this regex is [('tengigabitethernet', '0', '/0/0', '10')]. This seems to be how close i can get.
The regular expression for matching these names (Removing unnecessary capturing groups for clarity) is:
\D+\d+((/\d+)+(\.\d+)?)?
To break it up, \D+ matches the part of the string before the first number (such as FastEthernet and \d+ matches the first number (such as 10). Then the rest of the pattern is optional. /\d+ matches a forward slash followed by a number, so (/\d+)+ matches any number of repetitions of that (such as /0/0). Finally, (\.\d+)? optionally matches the period followed by a number at the end.
The important difference that makes this pattern match your specification is that in the final optional group, we get at least one (/\d+) before the (\.\d).

Regex to match N-NN-NN

I need some help with a RegEx pattern match.
How do i write a regex if i want it to match
N-NN-N-NN-NN-N-NNN
but also
N-NN-NN-NN
Exmaple:
10pcs- ratchet spanner combination wrench 6-8-10-11-12-13-14-15-17-19
Cr-v,heated 12pcs-1/4dr 4-4.5-5-5.5-6-7-8-9-10-11-12-13 Cr-v,heated
17pcs-1/2dr 10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-27-30
Cr-v,heated 1-2-33 Cr-V heater 1-.2-1-4
It needs to match where they is at least 2 - in the total string. So a phone number like this 020-11223344 is not to be matched.
The strings almost always look like this 6-8-10-11-12-13-14-15-17-19 , except sometimes a . can apper before a number, they also differ in length, is it possible?
I came up with this so far but it also matches on phone numbers and when a . appears it doenst match at all.
(\d-[^>])
On this page you can find the different patters: http://www.cazoom.nl/en/partij-aanbod/186-pcs-working-tools-trolly-3
What about this pattern:
[\d.]+(?:-[\d.]+){2,}
Match [\d.]+ if followed by at least 2x -[\d.]+
(?: Using a non capturing group for repetition.
test at regex101
The following regex will match the thing.
(?:\.?\d\.?\d?-){2,}\.?\d\.?\d?
Debuggex Demo
Just try with following regex:
^\d-\d{2}-\d(\d-\d{2})|(\d-\d{2}-\d-\d{3})$