Get n chars after last / in url path [duplicate] - regex

This question already has answers here:
Get 5 Characters After Last Slash
(4 answers)
Closed 5 days ago.
I want to match 10 characters after / in urls but when I've extras / It's not able to ignore it can someone help me here how to strictly select last / with 10 chars.
([a-z0-9]+)(?:\/?$)
Using this regex I'm able to get last part but I only want 10 chars.
([a-z0-9]{10})(?:\/?$)
Using this I'm getting last 10 chars but I need first 10 chars. Also I want to ignore last / if there is no path after it.
Example
https://www.facebook.com/reel/1a1c6e99h60a3169h86816
https://www.facebook.com/reel/0e2c4a1a1c6e6990eac186/
Output
1a1c6e99h6
0e2c4a1a1c

I need first 10 chars
Ok, good.
So ask for them.
(online)
We anchor against a / slash so we get the beginning of a path element.
\/([a-z0-9]{10})
If you wanted anywhere between eight a dozen letters, then [a-z]{8,12} would work.

A slight improvement:
(?<!\/)\/([^\/]){10}(?!.*\/.+)
Explained:
(?<!\/): make sure there is not a preceding slash
\/: Slash
([^\/]){10}: Match 10 non slashes
(?!.*\/.+) make sure there is not another slash with chars after.

You could try this pattern: (?:\/)([a-z0-9]{10})(?:[^\/]*\/?$)
Your expected output should be captured in matched group 1.
This pattern will only match 10 characters of last string after \, instead matched all groups of 10 characters after a \, for example in this case:
https://www.facebook.com/reel/g012345678/g123456789/0e2c4a1a1c6e6990eac186/

You can use
.*\/(.{10})
This reads, "match zero or more characters, as many as possible, followed by a forward slash, followed by 10 characters that are saved to capture group 1". The contents of capture group 1 contains the desired string of 10 characters.
Capture group 1 will contain the 10 characters following the last forward slash that is followed by at least 10 characters. Here are three examples. The contents of capture group 1 is indicated by the position of the 10-segment centipede below.
abc/1234567890123/1234567890123
^^^^^^^^^^
abc/1234567890123/1234567
^^^^^^^^^^
abc/1234567/1234567
^^^^^^^^^^
abc/1234/1234
Demo.
.* is greedy, meaning that it will consume as many characters as possible, including forward slashes, so long as the rest of the regular expression is satisfied.
In the first example the last forward slash is followed by 10 characters, so those 10 characters are saved to capture group 1.
In the second example the last forward slash is followed by fewer than 10 characters so the 10 characters following the next-to-last forward slash are captured.
The third example is the same as the second, except the 10 characters captured includes a forward slash.
In the fourth example no forward slash is followed by 10 characters so no match is made.
I cannot be sure that the behaviour of this regex in the three cases other than the first is what the OP wants because the question does not speak to those situations.

Related

capturing values after an optional slash

I am trying to write in regex a string that allows me to have
an alphanumeric string of length no longer than 5 (as an example) [a-z0-9]{3,5}
followed by an optional forward slash /?
that cannot end in a 3
I want to capture any group of at least 3, with our without a slash, and then anything after it.
And I am having a very hard time accomplishing this. If I require the slash / it is much easier to do so.
When I try
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)
I can capture what I want - up until the slash, but can't crack how to get anything after IF legit things occur
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?
My requirement for length goes up by 1 - to 4 instead of 3 - due to the additional . I put after the \/?. I could change my match to account for it, but it becomes really difficult.
(?=.+\/?.+)[a-z0-9]{2,5}\/?(?<!3\/|3)$
This only gives me the last slash or non slash follwed by 2,5 characters.
(?=.+\/?.+)[a-z0-9]{2,62}\/?.*
or
(?=.+\/?.+)[a-z0-9]{2,62}\/?.?+
simply then ignores my ending rule, of not being able to close with3/ or 3. Also this allows me to use more than 5 characters before the slash. Def not what I want :)
Is there a way to make an optional field still maintain length and ending rules?
I am running this script on both regexr.com and https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_regexp and gitbash and not getting the results I would like
Try:
^[a-z0-9]{3,5}(?<!3)(?:$|\/.*)
Regex demo.
^ - beginning of the string
[a-z0-9]{3,5} - capture a-z0-9 between 3 and 5 times
(?<!3) - the last character should not be 3
(?:$|\/.*) - match either end of string $ or / and any number of characters.
If the last character in this range [a-z0-9] should not be a 3 you can exclude it like [a-z124-9]
^[a-z0-9]{2,4}[a-z124-9](?:\/.*)?$
Explanation
^ Start of string
[a-z0-9]{2,4} Match 2-4 chars in the ranges a-z 0-9
[a-z124-9] Match a single char a-z and then either 1,2 4-9
(?:\/.*)? Optionally match / and the rest of the line
$ End of string
See a regex101 demo.
If you can not match a 3 at all:
^[a-z124-9]{3,5}(?:\/.*)?$
See another regex101 demo

Regex match everything before n-th occurrence [duplicate]

This question already has answers here:
Regex get nth value separated with slash
(2 answers)
Closed last month.
I have a regex pattern to use on working directory paths. The objective attempt is to grab everything before four forward slashes. Therefore, never exceed capturing string values after four forward slashes.
I have attempted two approaches:
[^\\].*[\\]
Which grabs all values up to a forward slash , for example:
C:\Users\testing\again\later
#I will grab
C:\Users\testing\again\
However, if there is a leading forward slash, this will capture it, regardless if it occurs four times or not. I have also attempted:
(?=[\\]){4}.*[\\]
However, this again will grab for any number of leading forward slashes.
^(?:[^\\]*\\){4}
Will grab everything up to the 4th slash.
^ - Matches start of line
(?: ) - Non-capturing group:
    [^\\]* - Matches any number (including zero) of characters except backslashes
    \\ - Matches literal backslash
{4} - Repeats non-Capturing group 4 times
(.+)(?:\\.*){4}
Will grab everything before the four backslashes. In your case C:

Regex to match specific string + optional space + 8 digits [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I need a regular expression to validate strings with the prefix 'CON' followed by an optional space followed by 8 digits.
I've tried various expressions, I got tangled up and now I'm lost.
^(CON+s\?d{8})$
\bCON\b\S?D{8}
Syntax is off a bit
^(CON\s?\d{8})
( starts a capturing group
CON is exactly matched
\s matches any white space character and the ? makes it optional
\d{8} matches 8 digits
) ends the capturing group
You were pretty well off to start, Hope this helps :)
keeping in mind If there is no space, then there shouldn't be 8 more digits
^CON(\ \d{8})?
If the string you are looking for can be part of a larger string (note that in this case it may be preceded or followed by anything, even other digits):
CON\s?\d{8}
If the string must match in full, use ^$ to designate that:
^CON\s?\d{8}$
You can add variations to it, if say you want it to begin/end with a word boundary - use \bto indicate that. If you want it to end in a non-digit, use \D+ at the end, instead of $.
Finally, if you want the string to end with an EOL or a non-digit, you may use an expression like this:
CON\s?\d{8}(\D+|$) or the same with a non-capturing group: CON\s?\d{8}(?:\D+|$)

Regex ignore first x characters and then match pattern

String = '11111111111110000000000000000000110000000000000011111111111111111111111111111111110011111111111110000011110000011111111111110000000000011111111111111111010001111111111111111111110011111111111111111111111111110111112111121111111111111111111000011000001011111111111101022111101111001111111111110000001000000111111111111111000000000000011111111111111100011111111001011111111100000000000000000000000000000000100111001000000000000000000011000000000000001111111000000000000000000000000000000000001111100000000000000000000011000000000000000000000010000000000333333333'
I want a pattern to take out 10 characters after the first 100 so i want to have 100 - 110 then I want to compare that one and see if that string with a length of 10 have 4 zeros in a row.
How can I do this with only Regex? I have been using substring before.
You could use this:
^.{100}(?=.{0,6}0000)(.{10})
Explanation:
^: matches the start of the string to avoid that the pattern is used anywhere in the input
.{100}: match 100 characters
(?= ): look ahead. This does not capture, but just verifies something that is still ahead.
.{0,6}: 0 to 6 characters
0000: literally 4 zeroes
(.{10}): 10 characters, this time they are captured and can be referenced back with \1 or $1 depending on the flavour of regex.
The above answer is perfect. But that matches all the characters including first 100.
In case of ignoring first 100, we can use
(?<=.{100})
To check the required pattern in last 10 characters after first 100 only, we can use
(?<=.{100})(?=.{0,6}0000)(.{10})
You can test it here
Update : I checked the link today. It's taking somewhere else.

Regex to capture some ID from URL if that URL does not contain banned text

I have the following regex that I created to locate a 10 digit id (ideally it would not consider set of digits that more than 10 e.g. id=12345678901). After it finds the last set of 10 digits, it would trash everything that comes after it EXcept when it hits brackets or quotes. In that case it would just stop.
www.site1\.com\/((?!someid\=12345name).)*([0-9]{10})[^\"\'\[\]\n\s]*
However, in examples like below, it does not stop at a bracker or quiotation after the 10 digit number and keeps going untill it find another one:
[URL='http://www.site1.com/path/445-453/L?test=3456&test2=333629710&item=1058371930']Some Title of This URL[/URL]or [URL='http://www.site1.com/path/445-453/L?test=3456&test2=333629710&item=2932475321']Some url title 2[/URL]
See live url for more examples: http://regex101.com/r/pG5fA4/2
FYI - notice some links have the same parameters with 10 digit ids in it. As it is now, I would like it to select only the last set of 10 digits as long as it does not go over looking after brackets or quotations.
Thanks!
* is a greedy operator. Because of the greedy operator, .* will match all characters (except newline) until it reaches the last set of digits at the very end of the string. Use *? for a non-greedy match. This guarantees that the quantified dot will only match as many characters as needed for the pattern to succeed.
((?!someid\=12345name).)*?([0-9]{10})
^
If you want the set of digits before the last &, ' [ or ] you can use a lookahead.
www\.site1\.com/((?!someid=12345name).)*?([0-9]{10})(?=[\[\]'\s]|&[^&]*\n)