How to get the only the digit using Regex expression from URL? - regex

I need some help with Regex expression, as it s very new to me.
I have a URL which consists of Item Number or Product ID.
What I am looking to achieve is that could trim the URL part and extra part after a symbol of %.
Here is how the url looks like.
https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid=12345678%2C24753325#seemoreoptions-b0uksl51j4m
OR
https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid=12345678
So from the above URL I am looking to trim https://www.test.com/test-test/test/test-demo-demo-demo-demo.html?piid= and this part %2C24753325#seemoreoptions-b0uksl51j4m
So, this should give me only 12345678.
I have use the following Regex
(.*)(\=) Replace with $2
Above Regex does trim the url first part but does not the part after % symbol.
I tried to get solution on
https://regexr.com/
So for the both the above URL examples, I should get the result as
12345678
Thank you in advance

Instead of trimming part before and after digits you want, try another approach: extract digits you want.
You can use groups (parentheses) in regexp to extract found data.
piid=([0-9]+)
It means:
piid= - text to find
[0-9]+ - one or more digits
() - group
You can extract first group by $1 (or \1 etc. - depends of language you use).
Example: https://regexr.com/758d9

Related

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

Regex: Negative lookahead after list match

Consider the following input string (part of css file):
url('...');
url(example.png);
The objective is to take the url part using regex and do something with it. So the first part is easy:
url\(['"]?(.+?)['"]?\)
Basically, it takes contents from inside url(...) with optional quotes symbols. Using this regexp I get the following matches:
...
example.png
So far so good. Now I want to exclude the urls which include 'data:image' in their text. I think negative lookahead is the proper tool for that but using it like this:
url\(['"]?(?!data:image)(.+?)['"]?\)
gives me the following result for the first url:
'...
Not only it doesn't exclude this match, but the matched string itself now includes quote character at the beginning. If I use + instead of first ? like this:
url\(['"]+(?!data:image)(.+?)['"]?\)
it works as expected, url is not matched. But this doesn't allow the optional quote in url (since + is 1 or more). How should I change the regex to exclude given url?
You can use negative lookahead like this:
url\((['"]?)((?:(?!data:image).)+?)\1?\)
RegEx Demo

Regex validation to don't allow submission of numbers which starts with given sequence

we Have an issue with spammers and I'd like to add a validation regex to the phone field in my form, in order to don't allow input which starts with a particular sequence of numbers.
I am using a wordpress plugin to build up the form, and I can add custom regex validation to each field.
so at the moment for my phone field I am using a text field and I have this regex to allow only numbers: /^\d+$/
the prefixes I'd like to block are these:
+44704, +44714, 0704, 0714, 0044704, 0044714
is it possible to create a regex which will check if the input starts with one of these sequences, and if yes it will block them?
If possible I need it to keep allowing only numbers, in addition of allowing only if it's not starting with one of those sequences.
I hope someone will be able to help me, as I really don't understand regex at all.. :(
Thank You!
You can make use of optional groups, like this:
^\+?(?:(?:00)?44|0)7[01]4
regex101 demo
This regex matches only strings that begins with the patterns you described. To negate it, you could use a negative lookahead with the pattern above:
^(?!\+?(?:(?:00)?44|0)7[01]4)
^ matches the beginning of the line
\+? matches an optional + sign.
(?:(?:00)?44|0) matches either of: 0044, or 44, or 0
7[01]4 matches either 704 or 714.
To validate the whole entry string and prevent the matches, then add the bit you already had, with an optional + sign:
/^(?!\+?((00)?44|0)7[01]4)\+?\d+$/

Regex substring

I'm trying to select a substring using regex and I'm going round in circles. I need to select everything before the first "_".
exampale URL - GI_2013_JUNE_10_VOL3_LASTCHANCE
So the result Im looking for from the URL above would be "GI". The text before the first "_" can vary in length.
Any help would be much apprecited
The regex would be:
^[^_]+
and grab the whole regex match. But as a comment says, using a substring function is more efficient!
^[^_]*
...is the expression you're looking for.
It basically says: Select everything that is not an underscore, starting at the beginning of the string.
http://regexr.com?356in

Correct regex / mod-rewrite syntax for this url

Hi I am having a little difficulty working out this mod rewrite rule / regex correctly.
I have a url format like this:
www.site.com/some-page-title-here-cb384
www.site.com/another-page-title-here-cb385
And I'd like to find only the numbers after each 'cb' only if the url contains a 'cb' after the last hyphen in each string.
I have:
.*?([0-9]+)$
Which matches the last set of numbers but I need to be more specific in saying only if the last section of the url contains the pattern '-cb'.
Try this:
.*-cb(\d+)$
This one should work and you should find the numbers in $1.
Let me be more specific. Your regexp (and mine above) doesn't match only the last part of the string, but matches the whole string. If you want to match only the last part, you should write it without .*:
-cb(\d+)$