Extract String Between Slashes, But Second Slash May Not Exists [duplicate] - regex

This question already has answers here:
In regex, match either the end of the string or a specific character
(2 answers)
Closed 2 years ago.
I'm trying to figure out how to extract usernames from a URL that's captured in a form. I do have the below regex, but the issue is that the second forward slash may not exist. Here are the examples:
Sample URLs
https://test.site.com/u/username
https://test.site.com/u/username/pref/summary
I'm trying to extract the username.
Current Regex
/u/(.*?)/
The current one I have above successfully extracts the username, but only when there is another / after the username. The second / needs to be optional; it may or may not be there, and there may or may not be more after that.
I just couldn't find the correct regex to make the second / optional (using ? at the end didn't help) but not exactly "optional," if that makes sense.
Thanks in advance!

/u/([^/]*) will match as many non-/ characters after /u/ as possible.
It will not match pref and summary,
because [^/] matches any character other than /,
so [^/]* matches a string (as long as possible)
of characters other than /. 
Consider: if your pattern is B[aeiou]*
and your input is Beetles (or Beethoven),
it will match only Bee,
stopping at (before) the first character that isn’t a vowel. 
Similarly, [^/]* stops at (before) the first occurrence of /.

Related

Regex for alphanumeric with at least one digit [duplicate]

This question already has answers here:
RegEx for an invoice format
(5 answers)
Closed 2 years ago.
I'm looking for a regex for Invoice Number in Vbscript
It can have alphanumeric but at least one numeric digit is a must.
I'm using the below regex but it matches ALPHA String INVOICE also. It need to have at least one digit
\b(?=.*\d)[A-Z0-9\-]{5,12}\b
Expected Match String
1233444
M62899M
M828828
783838PTE
A751987
Expected Unmatch String
INVOICE
ubb62727
XYZ
123
If we use ([A-Z0-9]*[0-9]+[A-Z0-9]*), I can't specify the length.
Please suggest a proper regex. Please note its totally different from the suggested duplicate as the requirement, format is different.
The blanket .* in your lookahead will happily skip past the trailing \b if it has to. Make it more constrained, so it can't.
\b(?=[-A-Z]*\d)[A-Z0-9-]{5,12}\b
(I removed the backslash before the -; if you really want to allow a literal backslash, obviously add it back, to the character class in the lookahead also. A dash at beginning or end of a character class is unambiguous and doesn't require a backslash escape; this is also the only way to have a literal dash in a character class in many regex dialects.)

Regex validation: Allow new lines but not spaces [duplicate]

This question already has answers here:
How to matches anything except space and new line?
(3 answers)
Closed 3 years ago.
I know there are loads of questions on this site around regex and validating against white spaces, believe me I've spent the past few hours looking. I've been unable to create a regex validation that matches the following requirements:
Fail validation if there are any spaces in this text (including at the start and end of the text)
Allow validation if the text has new lines in it
I very quickly found /s was not a good option as this fails on my second point. The closest I've managed to get is [A-Z]*[a-z]*[\" *"$] which flags exactly the reverse (spaces pass but everything else fails).
I've tried reversing it somehow but not having much success.
Anchor to the beginning of the string, repeatedly match anything but a space with [^ ]*, and anchor to the end of the string:
^[^ ]*$
This matches 0 or more non-space characters, but will permit newlines. (If you want 1 or more non-space characters, use + instead of *)

Regular Expression - Need Help Matching Everything Except For A Certain String [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 4 years ago.
After countless hours of trying to get this regex to work (including looking all over StackOverflow), I thought I'd reach out for help on here as I have not been successful).
I have tried creating a regex to match everything and to not match any parameters that look like this:
text=3242ffs3F34
The data after the = sign can be random (its a mixture of numeric and string characters) and is never the same. So far I have created the following regex below, which is almost doing what I am after but it does not work.
\b(?!text=.*)\b\S+
Assistance is much appreciated!
EDIT:
I will be using the regex to match everything in a file but to filter out all parameters that look like this:
text=3242ffs3F34
Below is an example of how the config file will look like:
This is a test
test=asda
test2=22rr2
text=3242ffs3F34
test5=hello
To match everything except strings containing LAST_DOMINO_TIME= as substring you can use the expression:
(?!.*\bLAST_DOMINO_TIME=.*$)^.*$
(?! Negative lookahead.
.* Match anything.
\b Word boundary.
LAST_DOMINO_TIME= Literal substring.
.*$ Anything up to end of string.
) Close lookahead.
^.*$ Assert position beginning of line, match anything up to end of line.
You can try it here.

Regex for string containing one string, but not another [duplicate]

This question already has answers here:
Regular expression for a string containing one word but not another
(5 answers)
Closed 3 years ago.
Have regex in our project that matches any url that contains the string
"/pdf/":
(.+)/pdf/.+
Need to modify it so that it won't match urls that also contain "help"
Example:
Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf"
Should match: "/dealer/us/en/pdf/simple.pdf"
If lookarounds are supported, this is very easy to achieve:
(?=.*/pdf/)(?!.*help)(.+)
See a demo on regex101.com.
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)
First thing is match either a space or the start of a line
(?:^|\s)
Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.
((?:[^h ]|h(?!elp))+\/pdf\/\S*)
If we want to detect help after the /pdf/, we can duplicate matching from the start.
((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)
Finally, we match a or end line/string ($)
(?:$|\s)
The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.
Example on regex101

Regex to get domain from email [duplicate]

This question already has answers here:
Regex get domain name from email
(9 answers)
Closed 4 years ago.
I am using the below regex for getting domain from email address:
re.findall('#(.+?)',str), which is not giving me the intended result.
I have got the correct regex: re.findall('#(\w.+)',str).
Please explain the difference between the patterns.
The main difference is the way it's matching the actual domain.
.+?
. Matches any non-newline character.
+ Matches the previous element (.) one or more times.
? In this case, as it's after a repeater, it makes it "lazy." Without this, it would be "greedy", matching as many times as possible. When a repeater is "lazy" it matches as few times as possible.
\w.+
\w Matches any "word character" (generally alphabetical upper- and lower-case, and underscores).
. Matches any non-newline character.
+ Repeats the previous element (.) one or more times. And because there is no ?, it will match as many times as possible.
That should outline the differences between the two. If you have examples that you wanted to match or not match, and add them to the original post, I can help with a further explanation on why one works while the other doesn't for those cases.