Problem with basic regex to match ending optional character - regex

Hi all i was hoping someone could help be with some basic regex i am really struggling with.
Bascially i need to match a url for redirection. I have been using
^~/abc(/)?
however i need to change the end part to just check the last optional character as this will also match ^/abcd

How about ^~/abc(/?)
or more generally: ^~/[a-zA-Z0-9]+/?

Assuming PCRE, you will want:
^~/abc(.)?$
Which will match "~/abc" followed (optionally) by any single character, which will be captured. Leave the () off if you don't need to capture said character.
Just like ^ matches the beginning of string (or line, depending upon mode), $ matches the end of string (or line).

I'll do something like this :
^~/([a-zA-Z0-9]+/?)*$

Related

Extract specific string using regular expression

I want to extract only a specific string if its match
example as an input string:
13.10.0/
13.10.1/
13.10.2/
13.10.3/
13.10.4.2/
13.10.4.4/
13.10.4.5/
I'm using this regex [0-9]+.[0-9]+.[0-9] to extract only digit.digit.digit from a string if its match
but in that case, this is the wrong output related to my regex :
13.10.0
13.10.1
13.10.2
13.10.3
13.10.4.2 (no need to match this string 13.10.4 )
13.10.4.4 (no need to match this string13.10.4 )
13.10.4.5(no need to match this string 13.10.4 )
the correct output that I need :
13.10.0
13.10.1
13.10.2
13.10.3
It's hard to say without knowing how you're passing these strings in -- are they lines in a file? An array of strings in a programming language?
If you're searching a file using grep or a similar tool, it will give you all lines that match anywhere, even if only part of the line matches.
Normally, you'd deal with this using anchors to specify the regex must start on the first character of the line, and end on the last (e.g. ^[0-9]+.[0-9]+.[0-9]$). ^ matches the start of the line, and $ matches at the end.
In your case, you've got slashes at the end of all the lines, so the easiest fix is to match that final slash, with ^[0-9]+.[0-9]+.[0-9]/.
You could also use lookahead or groups to match the slash without returning it -- but that depends a bit more on what tool you're running this regex in and how you're processing it.
If your strings are separated by whitespace (other than newlines), replacing ^ with (^|\s) (either the beginning of the string, or some whitespace character) may work -- but it will add a leading space to some of your results.
You may also need to set your regex tool to match multiple times in a line (e.g. the -o flag in grep). Again, it's hard to give useful advice about this without knowing what regular-expression tool you're using, or how you're processing the results.
I think you want:
^\d+\.\d+\.\d+$
Which is exactly 3 groups of digit(s) separates by (literal) dots.
Some tools (like grep) match all lines that contain your regex, and may have additional characters before/after.
Use $ character to match end of line after your regex. (Also note, that . matches any character, not literal dot)
[0-9]+\.[0-9]+\.[0-9]$

How to extract characters from a string with optional string afterwards using Regex?

I am in the process of learning Regex and have been stuck on this case. I have a url that can be in two states EXAMPLE 1:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA
OR EXAMPLE 2:
spotify.com/track/1HYcYZCOpaLjg51qUg8ilA
I need to extract the 1HYcYZCOpaLjg51qUg8ilA ID
So far I am using this: (?<=track\/)(.*)(?=\?)? which works well for Example 2 but it includes the ?si=Nf5w1q9MTKu3zG_CJ83RWA when matching with Example 1.
BUT if I remove the ? at the end of the expression then it works for Example 1 but not Example 2! Doesn't that mean that last group (?=\?) is optional and should match?
Where am I going wrong?
Thanks!
I searched a handful of "Questions that may already have your answer" suggestions from SO, and didn't find this case, so I hope asking this is okay!
The capturing group in your regular expression is trying to match anything (.) as much as possible due to the greediness of the quantifier (*).
When you use:
(?<=track\/)(.*)(?=\?)
only 1HYcYZCOpaLjg51qUg8ilA from the first example is captured, as there is no question mark in your second example.
When using:
(?<=track\/)(.*)(?=\??)
You are effectively making the positive lookahead optional, so the capturing group will try to match as much as possible (including the question mark), so that 1HYcYZCOpaLjg51qUg8ilA?si=Nf5w1q9MTKu3zG_CJ83RWA and 1HYcYZCOpaLjg51qUg8ilA are matched, which is not the desired output.
Rather than matching anything, it is perhaps more appropriate for you to match alphanumerical characters \w only.
(?<=track\/)(\w*)(?=\??)
Alternatively, if you are expecting other characters , let's say a hyphen - or a underscore _, you may use a character class.
(?<=track\/)([a-zA-Z0-9_-]*)(?=\??)
Or you might want to capture everything except a question mark ? with a negated character class.
(?<=track\/)([^?]*)(?=\??)
As pointed out by gaganso, a look-behind is not necessary in this situation (or indeed the lookahead), however it is indeed a good idea to start playing around with them. The look-around assertions do not actually consume the characters in the string. As you can see here, the full match for both matches only consists of what is captured by the capture group. You may find more information here.
This should work:
track\/(\w+)
Please see here.
Since track is part of both the strings, and the ID is formed from alphanumeric characters, the above regex which matches the string "track/" and captures the alphanumeric characters after that string, should provide the required ID.
Regex : (\w+(?=\?))|(\w+&)
See the demo for the regex, https://regexr.com/3s4gv .
This will first try to search for word which has '?' just after it and if thats unsuccessful it will fetch the last word.

Ant regex expression

Quite a simple one in theory but can't quite get it!
I want a regex in ant which matches anything as long as it has a slash on the end.
Below is what I expect to work
<regexp id="slash.end.pattern" pattern="*/"/>
However this throws back
java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*/
^
I have also tried escaping this to \*, but that matches a literal *.
Any help appreciated!
Your original regex pattern didn't work because * is a special character in regex that is only used to quantify other characters.
The pattern (.)*/$, which you mentioned in your comment, will match any string of characters not containing newlines, however it uses a possibly unnecessary capturing group. .*/$ should work just as well.
If you need to match newline characters, the dot . won't be enough. You could try something like [\s\S]*/$
On that note, it should be mentioned that you might not want to use $ in this pattern. Suppose you have the following string:
abc/def/
Should this be evaluated as two matches, abc/ and def/? Or is it a single match containing the whole thing? Your current approach creates a single match. If instead you would like to search for strings of characters and then stop the match as soon as a / is found, you could use something like this: [\s\S]*?/.

RegEx: a group to match any string beginning with a specific character

I'm creating some reports in Google Analytics.
I am trying to write a RegEx that will match
www.website.com/
www.website.com/?_string_begins_with_question_mark
But will not match
www.website.com/string_doesnt_begin_with_question_mark
Using Reggy (with POSIX Extended), I tried to create an optional group that would match a string beginning with a question mark, followed by any number of characters. I thought
(\?.+)?
would do the trick, but it ignores the question mark requirement, and matches any string.
I tried some variations:
www.website.com/(\?(.+))?
www.website.com/(\?.+)?
www.website.com/(?.+)?
Et cetera.
Any help is appreciated - Sorry if this has already been asked! I'm new to RegEx.
Thank you!
Your regexp
www.website.com/(\?.+)?
will still match anything that contains www.website.com/, no matter what comes after the slash. Have you tried appending a $ (end-of-input marker)?
www\.website\.com/(?:\?.+)?$
(Escape the dots for more precision; the ?: is just a way of indicating that the group is of no special meaning and does not have to be remembered -- if you omit this, you can access the contents of the group by \1, e.g., in a replace operation.)
Try this one
www.website.com/\?(.+)
I would suggest
www.website.com/($|\?.+$)
Either empty or question mark plus some string

Negative integer Regex doesn't match

I have Googled it, and found the following results:
http://icfun.blogspot.com/2008/03/regular-expression-to-handle-negative.html
http://regexlib.com/DisplayPatterns.aspx?cattabindex=2&categoryId=3
With some (very basic) Regex knowledge, I figured this would work:
r\.(^-?\d+)\.(^-?\d+)\.mcr
For parsing such strings:
r.0.0.mcr
r.-1.5.mcr
r.20.-1.mcr
r.-1.-1.mcr
But I don't get a match on these.
Since I'm learning (or trying to learn) Regex, could you please explain why my pattern doesn't match (instead of just writing a new working one for me)? From what I understood, it goes like so:
Match r
Match a period
Match a prefix negative sign or not, and store the group
Match a period
Match a prefix negative sign or not, and store the group
Match a preiod
Match mcr
But I'm wrong, apparently :).
You are very close. ^ matches the start of a string, so it should only be located at the start of a pattern (if you want to use it at all - that depends on whether you will also accept e.g. abcr.0.0.mcr or not). Similarly, one can use $ (but only at the end of the pattern) to indicate that you will only accept strings that do not contain anything after what the pattern matches (so that e.g. r.0.0.mcrabc won't be accepted). Otherwise, I think it looks good.
The ^ characters are telling it to match only at the beginning of a line; since it's obviously not at the beginning of a line in either case, it fails to match. In this case, you just need to remove both ^s. (I think what you're trying to say is "don't let anything else be in between these", but that's the default except at the start of the regex; you would need something like .* to make it allow additional characters between them.)
Since the ^ is not at the start of the expression, its meaning is 'not'. So in this case it means that there should not be a dash there.