Regex to finding any kind of uuid or random generated text - regex

I want to find any kind of uuid or random generated text in a url path and replace it with <random>. Examples :
/test/ajs1d5haFkajs1dhasdd2as345sdAS3+Ddas9 = /test/<random>
/test/akKd9Ja3/ajs1d5haFkajs1ddasd623ha5sdAS3Ddas9=/30 = /test/<random>/<random>/30
/test/akKd9Ja3/Example-ASDAdddasd-108174.js = /test/<random>/Example-108174.js.
/test/akKd9Ja3-ASj83asj-dask92qwe_ke = /test/<random>
I'm looking for a solution that will match on a string:
starting with / AND
end with / or $
contain [0-9] AND
contain [a-z] OR [A-Z]
CAN contain -, =, _, +, \s (spa
DOES NOT contain an extension i.e .<something>
7 char and longer {7,}
This is what I used so far :
/[a-zA-Z0-9-=_+\s]{30,}
This works for most cases since uuids are often longer than 30 char. But I don't catch the small ones i.e /5c88148/ or /6qdkKdk5/. I also match on things like Example-ASDAddasd-108174.js.

Update - In case you want match must contain at least one digit.You can use this.
(?<=\/)(?=[\w-+=\s]+[0-9])[\w-+=\s]{7,}(?![.])(?!\.)(?=\/|\n)
Demo for update
You can try this.
(?<=\/)[\w-+=\s]{7,}(?!\.)(?=\/|\n)
Explanation
(?<=\/) - Positive look behind. Matches '/'.
[\w-+=\s]{7,} - Matches any word character, -,+,=, and space 7 or more time.
(?!\.) - Negative look ahead. Do not match ..
(?=\/|\n) - Positive look ahead. Matches '/' or '\n'(New line).
Demo

Related

How to find digits in String by regular expression? [duplicate]

I would like to match positive and negative numbers (no decimal or thousand separators) inside a string using .NET, but I want to match whole words only.
So if a string looks like
redeem: -1234
paid: 234432
then I'd like to match -1234 and 234432
But if text is
LS022-1234-5678
FA123245
then I want no match returned. I tried
\b\-?\d+\b
but it will only match 1234 in the first scenario, not returning the "-" sign.
Any help is appreciated. Thank you.
Well, I'm sure this is far from perfect, but it works for your examples:
(?<=\W)-?(?<!\w-)\d+
If you want to allow underscores just before the number, then I'd use this modification:
(?i)(?<=[^a-z0-9])-?(?<![a-z0-9]-)\d+
Let me know of any issues and I'll try and help. If you'd like me to explain either of them, let me know that too.
EDIT
To only match if there is a space or tab just before the number / negative sign (as noted in the comment below), this could be used:
(?<=[ \t])-?\d+
Note that it will match e.g. on the first number series of a telephone number, time or date value, and will not match if the number is at the beginning of the line (after a newline) - make sure this is what you intend :D
There is no word boundary between a space and -, thus you can't use \b there.
You could use:
(?<!\S)-?\d+\b
or
(?<![\w-])-?\d+\b
depending on your requirements (which aren't fully specified).
Both will work for your examples tho.
The \b-?\d+\b pattern is wrong because \b before an optional -? pattern will require a word char to appear immediately to the left of the hyphen. In general, do not use word boundaries next to optional patterns (unless you know what you are doing of course).
You might use -?\b\d+\b to match 123 or -123 like numbers as whole words. However, here, you are looking for something a bit different, because the 1234 and 5678 are whole words inside LS022-1234-5678 since they are enclosed with non-word chars (namely, a hyphen).
In this case, you need to extend whole word matching \b with extra lookbehind check on the left:
-?\b(?<!\d-)\d+\b
See the regex demo. Details:
-? - an optional hyphen
\b - a word boundary
(?<!\d-) - a negative lookbehind that fails the match if there is a digit + - immediately to the left of the current location.
\d+ - one or more digits
\b - a word boundary.
See the C# demo:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "LS022-1234-5678, FA123245, redeem: -1234, paid: 234432";
var matches = Regex.Matches(text, #"-?\b(?<!\d-)\d+\b").Cast<Match>().Select(x => x.Value).ToList();
foreach (var s in matches)
Console.WriteLine(s);
}
}
Output:
-1234
234432

.net Regex to look ahead and eliminate strings in advance that dont contain certain characters

I am Using .Net Flavor of Regex.
Suppose i have a string 123456789AB
and i want to match AB (Could be any two Capital letters) only if the string part containing numbers(123456789) has 5 and 8 in it.
So what i came up with was
(?=5)(?=8)([A-Z]{2})
But this is not working.
After some trail error on RegexStorm
I got to
(?=(.*5))(?=(.*8))[A-Z]{2}
What i am expecting is it will start matching from the start of the string as look ahead does not consume any characters.
But the part "[A-Z]{2}" does not move ahead to match AB in the input string.
My question is why is that so?
i know replacing it with .*[A-Z]{2} will make it move ahead but then the string matched has entire string in it.
What is the solution in this case other than putting word part ([A-Z]{2}) in a separate group and then catching only that group.
Lookaheads check for the pattern match immediately to the right of the current position in the string. (?=(.*5))(?=(.*8)) matches a location that is immediately followed with any 0 or more chars other than line break chars as many as possible and then 5 and then - at the same position - another similar check if performed but requiring 8 after any zero or more chars, as many as possible.
You may use as many as lookbehinds as there are required substrings before the two letters:
(?s)(?<=5.*?)(?<=8.*?)[A-Z]{2}
See the regex demo
Details
(?s) - makes the . match newline characters, too
(?<=5.*?) - a location that is immediately preceded with 5 and then 0 or more chars as few as possible
(?<=8.*?) - a location that is immediately preceded with 8 and then 0 or more chars as few as possible
[A-Z]{2} - two ASCII uppercase letters.
An alternative would be to "unfold" what you expect to match using exclusionary character classes and alternation of match order. Not pretty, but pretty fast:
(?<=\b[^58]*?(?:5[^8]*8|8[^5]*5)[^A-Z]*?)[A-Z]{2}

Regular Expression to extract alphanumeric parts of a URL?

Given any URL, like:
https://stackoverflow.com/v1/summary/1243PQ/details/P1/9981
How do I extract the numeric or alphanumeric part of the URL? I.e. the following strings from the url given above:
1. v1
2. 1243PQ
3. P1
4. 9981
To rephrase, a regex to extract strings from a string (URL) which have at least 1 digit and 0 or more alphabet characters, separated by '/'.
I tried to capture a repeating group (^[a-zA-Z0-9]+)+ and ([a-zA-Z]{0,100}[0-9]{1,100})+ but it didn't work. In hindsight intuition does say this shouldn't work. I am unsure how do I match patterns over a group and not just a single character.
If I understand what you really want:
Extracting parts with only numbers or with numbers following alphabets
then; I can suggest this regex:
\b[a-zA-Z]*[0-9]+[a-zA-z]*\b
Regex Demo
I use \b to assert position of a word boundary or a part.
As numbers are required and alphabets can comes before or after that I use above regex.
If following alphabets are not required then I can suggest this regex:
\b[a-zA-z0-9]*[0-9]+[a-zA-Z0-9]*\b
Regex Demo
I believe this should work for you:
(\d*\w+\d+\w*)
EDIT: actually, this should be sufficient
(\w+\d+\w*)
or
(\w*\d+\w*)
Well, you could do this:
(\w*\d+\w*) with the g (global) regex option
On the example URL, it would look like this:
const regex = /(\w*\d+\w*)/g;
const url = 'https://stackoverflow.com/v1/summary/1243PQ/details/P1/9981';
console.log(url.match(regex))
Try \/[a-zA-Z]*\d+[a-zA-Z0-9]*
Explanation:
\/ - match / literally
[a-zA-Z]* - 0+ letters
\d+ - 1+ digits - thanks to this, we require at least one digits
[a-zA-Z0-9]* - 0+ letters or digits
Demo
It will captrure together with / at the beginning, so you need to trim it.

Need a regex for ONLY Alphanumeric (no pure numbers or letters) AND limit to exactly 10 characters?

I've run into some issues with this one and cannot find it in past questions.
Criteria:
Reject pure digits
Reject pure letters
Reject any symbols
Accept ONLY Alphanumeric combo
MUST be equal to 10 characters total
Here is what I have made and the problems with each:
^(?!^\d*$)[a-zA-Z\d]{10}$
This fails criteria #2
^[a-zA-Z0-9]{10}$
This fails criteria #1
I have tried some others that meet all criteria but fail the 10 char limit.
Any help is appreciated.
You may use a second lookahead:
^(?!\d+$)(?![a-zA-Z]+$)[a-zA-Z\d]{10}$
See the regex demo and the Regulex graph:
Details
^ - start of string
(?!\d+$) - a negative lookahead that makes sure the whole string is not composed of just digits
(?![a-zA-Z]+$) - the whole string cannot be all letters
[a-zA-Z\d]{10} - 10 letters or digits
$ - end of string.
Try this:
(?=^.{10}$)^([a-z]+\d[a-z0-9]*|\d+[a-z][a-z0-9]*)$
Demo
Explanation:
(?=^.{10}$)^([a-z]+\d[a-z0-9]*|\d+[a-z][a-z0-9]*)$
(?=^.{10}$) # there's exactly 10 characters following
^( | )$ # we match the entire string, containing either:
[a-z]+\d[a-z0-9]* # letters, followed by a number, followed by alphanumerics, or
\d+[a-z][a-z0-9]* # numbers, followed by a letter, followed by alphanumerics
Use lookahead to find at least one char of each type you require, and specify the length and char limitation in the "regular" part of your regex:
^(?=.*[a-zA-Z])(?=.*\d)[0-9a-zA-Z]{10}$
(?=.*[a-zA-Z])- Look ahead and find a letter,
(?=.*\d) - Look ahead and find a digit
[0-9a-zA-Z]{10} - exactly 10 digit/letter chars

Regex to match number(s) or UUID

I need regex which loosely matches UUIDs and numbers. I expect my filename to be formatted like:
results_SOMETHING.csv
This something ideally should be numbers (count of how many time a script is run) or a UUID.
This regex is encompasses a huge set of filenames:
^results_?.*.csv$
and this one:
^results_?[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}.csv$
matches only UUIDs. I want a regex whose range is somewhere in between. Mostly I don't want matches like result__123.csv.
Note: This doesn't directly answer the OP question, but given the title, it will appear in searches.
Here's a proper regex to match a uuid based on this format without the hex character constraint:
(\w{8}(-\w{4}){3}-\w{12}?)
If you want it to match only hex characters, use:
/([a-f\d]{8}(-[a-f\d]{4}){3}-[a-f\d]{12}?)/i
(Note the / delimiters used in Javascript and the /i flag to denote case-insensitivity; depending on your language, you may need to write this differently, but you definitely want to handle both lower and upper case letters).
If you're prepending results_ and appending .csv to it, that would look like:
^results_([a-z\d]{8}(-[a-z\d]{4}){3}-[a-z\d]{12}?).csv$
-----EDITED / UPDATED-----
Based on the comments you left, there are some other patterns you want to match (this was not clear to me from the question). This makes it a little more challenging - to summarize my current understanding:
results.csv - match (NEW)
results_1A.csv - match (NEW)
results_ABC.csv - ? no match (I assume)
result__123.csv - no match
results_123.csv - match
Results_123.cvs - ? no match
results_0a0b0c0d-884f-0099-aa95-1234567890ab.csv - match
You will find the following modification works according to the above "specification":
results(?:_(?:[0-9a-f]{8}-(?:[0-9a-f]{4}-){3}[0-9a-f]{12}|(?=.*[0-9])[A-Z0-9]+))?\.csv
Breaking it down:
results matches characters "results" literally
(?:_ ….)? non-capturing group, repeated zero or one time:
"this is either there, or there is nothing"
[0-9a-f]{8}- exactly 8 characters from the group [0-9a-f]
followed by hyphen "-"
(?:[0-9a-f]{4}-){3} ditto but group of 4, and repeated three times
[0-9a-f]{12} ditto, but group of 12
| OR...
(?=.*[0-9]+) at least one number following this
[A-Z0-9]+ at least one capital letter or number
\.csv the literal string ".csv" (the '.' has to be escaped)
demonstration on regex101.com