Struggling with regular expression - regex

I'm struggling to find the regular expression I can use to classify data that matches a certain pattern:
Here's a few examples:
pli:06e9b616-5712-d0e9-1bc2-000012e61393
pli:6fdd187d-cbdc-3028-4a8d-000020f3449a
pli:0472def9-ccf3-e4e9-ca05-00005fecf9f8
As you can see each string begins with pli: and they all have the same pattern even though the characters are different. Each set of characters is separated by a '-' at the same position.

Looks like it has the form pli:UUID where UUID is a universally unique identifier. Try this one:
pli:[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}
Where I've allowed upper case letters too.
See http://en.wikipedia.org/wiki/Universally_unique_identifier

This does it in as short an expression as I could think of:
pli:(?i)[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}
The (?i) means "ignore case" (saves having to type a-zA-Z everywhere), and I've abbreviated the regex by recognising 3 groups of 4 digits in the middle
See a live demo on rubular

Related

How to extract values after a 2nd time repeated symbol using regular expression

For an example I have a value like this:
"ConFst-DxtExBS1-NtrB1".
This is the regular expression I used to extract the value between '-' that is "DxtExBS1":
-(.*?)-
Now I want to write a regular expression that should extract the values after the 2nd repeated '-', that is "NtrB1". I cannot say 1-(.*?)$ because the digit is not static. But the pattern will not change.
I think I only need to tell:
begin after 2nd one hyphen
then
(.*?)$
Do this:
"-[^-\s]*-([^-\s]*)"
We essentially match every non-whitespace character which is not a - until the second - is encountered, then capture the rest until yet another - may or may not appear.
The captured group will have the text present after the second -.
Demo
As per your comments, you can use this:
-[^-]*-(.*)
This matches until the second -, then captures the rest.
Demo
You may try the following regex pattern:
=REGEXP_EXTRACT("ConFst-DxtExBS1-NtrB1", "[^-]+$")
Demo

how to make a regex to validate a username

I've written this regex
/(?=.*[a-z])(?!.*[A-Z])([\w\_\-\.].{3,10})/g
to check the following conditions
>has minimum of 3 and maximum of 10 characters.
>must contain atleast a lowercase alphabet.
>must contain only lowercase alphabets, '_', '-', '.' and digits.
this works but returnes true even if there is more than 10 characters.
I would like a new or modified regular expression to check the above given conditions.
add hanchors
remove the last dot
the negative lookahead is useless is you use a correct character class
This regex will work:
^(?=.*[a-z])[a-z0-9_.-]{3,10}$
Demo & explanation
You can use this REGEX
REGEX Demo
([a-z]{1}[0-9a-z_.-]{2,9})
, Tried text
username77
usr
username10
user_test
usr.1000
There are many ways of doing this. I believe the common characteristic is they will all have a positive lookahead. Here is another.
^(?=.{3,10}$)[a-z\d_.-]*[a-z][a-z\d_.-]*$
Demo
Notice that [a-z\d_.-]* appears twice. Some regex engines support subroutines (or subexpressions) that allow one to save a repeated part of the regex to a numbered or named capture group for reuse later in the string. When using the PCRE engine, for example, you could write
^(?=.{3,10}$)([a-z\d_.-]*)[a-z](?1)$
Demo
(?1) is replaced by the regex tokens that matched the string saved to capture group 1 ([a-z\d_.-]*), as contrasted with \1, which references the content of capture group 1. The use of subroutines can shorten the regex expression, but more importantly it reduces the chance of errors when changes are made to the regex's tokens that are repeated.

Regular Expression allow only numbers, commas and dashes

I'm trying to come up with a Data Annotation regular expression to match the following formats.
34
38-30
100,25-30
4-5,5,1-5
Basically the expression should only allow numbers, -(dash) and ,(comma) in any order
I tried following but couldn't get it working.
[RegularExpression(#"(0-9 .&'-,]+)", ErrorMessage ="Lot numbers are invalid.")]
It's ^[0-9,-]*$. Check out this demo.
I think your use case is having a CSV list of numbers, or ranges of numbers (identified as a number followed by a dash followed by another number). We can use the following regex:
[0-9]+(?:-[0-9]+)?(,[0-9]+(?:-[0-9]+)?)*
This regex matches a number, followed by an optional dash and another number, that quantity then followed by comma and another similar term, any number of times.
In the demo below I added anchors on both sides of the regex. Whether you need to do this depends on how you plan to use the pattern.
Demo

Regular expression to correct email address

I need help in writing one regular expression where I want to remove unwanted characters in the start and end of the email address. For example:
z>user1#hotmail.com<kt
z>user2#hotmail.pk<kt
z>puser3#yahoo.com<kt
z>npuser4#yaoo.uk<kt
After applying regular expression my emails should look like:
user1#hotmail.com
user2#hotmail.pk
puser3#yahoo.com
npuser4#yaoo.uk
Regular expression should not applied if email address is already correct.
You can try deleting matches of
^[^>]*>|<[^>]*$
(demo)
Debuggex Demo
Find ^[^>]*>([^<]*)<*.*$ and replace it with \1
Here's an example on regex101
I think you might be missing the point of a regular expression slightly. A regular expression defines the 'shape' of a string and return whether or not the string conforms to that shape. A simple expression for an email address might be something like:
[a-z][A-Z][0-9]*.?[a-z][A-Z][0-9]+#[a-z][A-Z][0-9]*.[a-z]+
But it is not simple to write one catch-all regular expression for an email address. Really, what you need to do to check it properly is:
Ensure there is one and only one '#'-sign.
Check that the part before the at sign conforms to a regular expression for this part:
Characters
Digits
Extended characters: .-'_ (that list may not be complete)
Check that the part after the #-sign conforms to the reg-ex for domain names:
Characters
Digits
Extended characters: . -
Must start with character or digit and must end with a proper domain name ending.
Try using a capturing group on anything between the characters you don't want. For example,
/>([\w|\d]+#[\w\d]+.\w+)</
Basically, any part that the regexp inside () matches is saved in a capturing group. This one matches anything that's inside >here< that starts with a bunch of characters or digits, has an #, has one or more word or digit characters, then a period, then some word characters. Should match any valid email address.
If you need characters besides >< to be matched, make a character class. That's what those square bracketed bits are. If you replace > with [.,></?;:'"] it'll match any of those characters.
Demo (Look at the match groups)

Regex - how to match everything except a particular pattern

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.
You could use a look-ahead assertion:
(?!999)\d{3}
This example matches three digits other than 999.
But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.
A compatible regular expression with basic syntax only would be:
[0-8]\d\d|\d[0-8]\d|\d\d[0-8]
This does also match any three digits sequence that is not 999.
If you want to match a word A in a string and not to match a word B. For example:
If you have a text:
1. I have a two pets - dog and a cat
2. I have a pet - dog
If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:
^(?=.*?\bdog\b)((?!cat).)*$
It will find only second line:
2. I have a pet - dog
Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.
notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
I'm faced with a situation where I have to match an (A and ~B)
pattern.
The basic regex for this is frighteningly simple: B|(A)
You just ignore the overall matches and examine the Group 1 captures, which will contain A.
An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag
The regex: <a.*?<\/a>|(\d+)
Demo (look at Group 1 in the lower right pane)
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.
pattern - re
str.split(/re/g)
will return everything except the pattern.
Test here
My answer here might solve your problem as well:
https://stackoverflow.com/a/27967674/543814
Instead of Replace, you would use Match.
Instead of group $1, you would read group $2.
Group $2 was made non-capturing there, which you would avoid.
Example:
Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");
The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.
(B)|(A)
then use what group 2 captures...