Looking to match WS-810-REFERENCE-1 where the string must have -'s within it
And can't think of something to work perfectly
[a-zA-Z0-9\-]+
That will match but will also match words that do not have the - character
Thought of maybe this ([a-zA-Z0-9\-]+\-)+
But that will match WS-810-REFERENCE- missing the final segment.
Thoughts?
Used a modified version of the second attempt just to grab that extra missing section
((?:[a-zA-Z0-9]+\-)+[a-zA-Z0-9]+)
I believe you're looking for lookahead to make sure hyphen is present in the string. You can use:
\b(?=\w*?-)[a-zA-Z0-9-]+(?= |$)
Online Demo: http://regex101.com/r/pZ6hV6
Related
I am in the process of learning Regex and have been stuck on this case. I have a url that can be in two states EXAMPLE 1:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA
OR EXAMPLE 2:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA
I need to extract the 1HYcYZCOpaLjg51qUg8ilA ID
So far I am using this: (?<=track\/)(.*)(?=\?)? which works well for Example 2 but it includes the ?si=Nf5w1q9MTKu3zG_CJ83RWA when matching with Example 1.
BUT if I remove the ? at the end of the expression then it works for Example 1 but not Example 2! Doesn't that mean that last group (?=\?) is optional and should match?
Where am I going wrong?
Thanks!
I searched a handful of "Questions that may already have your answer" suggestions from SO, and didn't find this case, so I hope asking this is okay!
The capturing group in your regular expression is trying to match anything (.) as much as possible due to the greediness of the quantifier (*).
When you use:
(?<=track\/)(.*)(?=\?)
only 1HYcYZCOpaLjg51qUg8ilA from the first example is captured, as there is no question mark in your second example.
When using:
(?<=track\/)(.*)(?=\??)
You are effectively making the positive lookahead optional, so the capturing group will try to match as much as possible (including the question mark), so that 1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA and 1HYcYZCOpaLjg51qUg8ilA are matched, which is not the desired output.
Rather than matching anything, it is perhaps more appropriate for you to match alphanumerical characters \w only.
(?<=track\/)(\w*)(?=\??)
Alternatively, if you are expecting other characters , let's say a hyphen - or a underscore _, you may use a character class.
(?<=track\/)([a-zA-Z0-9_-]*)(?=\??)
Or you might want to capture everything except a question mark ? with a negated character class.
(?<=track\/)([^?]*)(?=\??)
As pointed out by gaganso, a look-behind is not necessary in this situation (or indeed the lookahead), however it is indeed a good idea to start playing around with them. The look-around assertions do not actually consume the characters in the string. As you can see here, the full match for both matches only consists of what is captured by the capture group. You may find more information here.
This should work:
track\/(\w+)
Please see here.
Since track is part of both the strings, and the ID is formed from alphanumeric characters, the above regex which matches the string "track/" and captures the alphanumeric characters after that string, should provide the required ID.
Regex : (\w+(?=\?))|(\w+&)
See the demo for the regex, https://regexr.com/3s4gv .
This will first try to search for word which has '?' just after it and if thats unsuccessful it will fetch the last word.
I have the following regex query I'm trying to use to exclude assets from being cached:
^((?!(\.css|\.js|\.|\.json|\.xml|\.svg|\.ico|\.png|\.mp3|\.jpg|\.svg|\.woff|\.woff2|\.eot|\.ttf|\/api\/play\/add|\/api\/favorite|\/Listen\/channel|getAccountInfo)).)*$
Except it doesn't match https://exampl.com/home for some reason. Does anyone know how I can fix this? Also, is there anyway I can make the Regex expression better?
Your regex contains a |\.| part (after |\.js). That alternative makes your regex fail the match with any string containing a dot. You need to remove that alternative:
^((?!(\.css|\.js|\.json|\.xml|\.svg|\.ico|\.png|\.mp3|\.jpg|\.svg|\.woff|\.woff2|\.eot|\.ttf|\/api\/play\/add|\/api\/favorite|\/Listen\/channel|getAccountInfo)).)*$
See the regex demo
I'm trying to write a rule to match on a top level domain followed by five digits. My problem arises because my existing pcre is matching on what I have described but much later in the URL then when I want it to. I want it to match on the first occurence of a TLD, not anywhere else. The easy way to check for this is to match on the TLD when it has not bee preceeded at some point by the "/" character. I tried using negative-lookbehind but that doesn't work because that only looks back one single character.
e.g.: How it is currently working
domain.net/stuff/stuff=www.google.com/12345
matches .com/12345 even though I do not want this match because it is not the first TLD in the URL
e.g.: How I want it to work
domain.net/12345/stuff=www.google.com/12345
matches on .net/12345 and ignores the later match on .com/12345
My current expression
(\.[a-z]{2,4})/\d{5}
EDIT: rewrote it so perhaps the problem is clearer in case anyone in the future has this same issue.
You're pretty close :)
You just need to be sure that before matching what you're looking for (i.e: (\.[a-z]{2,4})/\d{5}), you haven't met any / since the beginning of the line.
I would suggest you to simply preppend ^[^\/]*\. before your current regex.
Thus, the resulting regex would be:
^[^\/]*\.([a-z]{2,4})/\d{5}
How does it work?
^ asserts that this is the beginning of the tested String
[^\/]* accepts any sequence of characters that doesn't contain /
\.([a-z]{2,4})/\d{5} is the pattern you want to match (a . followed by 2 to 4 lowercase characters, then a / and at least 5 digits).
Here is a permalink to a working example on regex101.
Cheers!
You can use this regex:
'|^(\w+://)?([\w-]+\.)+\w+/\d{5}|'
Online Demo: http://regex101.com/
Is there a possibility to write a regex that matches for [a-zA-Z]{2,4} but not for the word test? Or do i need to filter this in several steps?
Sure, you can use a negative lookahead.
(?!test)[a-zA-Z]{2,4}
I don't know if you'll need it for what you're doing, but note that you may need to use start and end anchors (^ and $) if you're checking that an entire input matches that pattern. Otherwise, it could match something like ouaeghAEtest because it will still find four chars somewhere that aren't "test".
[A-Za-su-z][A-Za-df-z]{0,1}[A-Za-rt-z]{0,1}[A-Za-su-z]{0,1}
just a idea, haven't use real code to try
Hi all i was hoping someone could help be with some basic regex i am really struggling with.
Bascially i need to match a url for redirection. I have been using
^~/abc(/)?
however i need to change the end part to just check the last optional character as this will also match ^/abcd
How about ^~/abc(/?)
or more generally: ^~/[a-zA-Z0-9]+/?
Assuming PCRE, you will want:
^~/abc(.)?$
Which will match "~/abc" followed (optionally) by any single character, which will be captured. Leave the () off if you don't need to capture said character.
Just like ^ matches the beginning of string (or line, depending upon mode), $ matches the end of string (or line).
I'll do something like this :
^~/([a-zA-Z0-9]+/?)*$