Regular expression to correct email address - regex

I need help in writing one regular expression where I want to remove unwanted characters in the start and end of the email address. For example:
z>user1#hotmail.com<kt
z>user2#hotmail.pk<kt
z>puser3#yahoo.com<kt
z>npuser4#yaoo.uk<kt
After applying regular expression my emails should look like:
user1#hotmail.com
user2#hotmail.pk
puser3#yahoo.com
npuser4#yaoo.uk
Regular expression should not applied if email address is already correct.

You can try deleting matches of
^[^>]*>|<[^>]*$
(demo)
Debuggex Demo

Find ^[^>]*>([^<]*)<*.*$ and replace it with \1
Here's an example on regex101

I think you might be missing the point of a regular expression slightly. A regular expression defines the 'shape' of a string and return whether or not the string conforms to that shape. A simple expression for an email address might be something like:
[a-z][A-Z][0-9]*.?[a-z][A-Z][0-9]+#[a-z][A-Z][0-9]*.[a-z]+
But it is not simple to write one catch-all regular expression for an email address. Really, what you need to do to check it properly is:
Ensure there is one and only one '#'-sign.
Check that the part before the at sign conforms to a regular expression for this part:
Characters
Digits
Extended characters: .-'_ (that list may not be complete)
Check that the part after the #-sign conforms to the reg-ex for domain names:
Characters
Digits
Extended characters: . -
Must start with character or digit and must end with a proper domain name ending.

Try using a capturing group on anything between the characters you don't want. For example,
/>([\w|\d]+#[\w\d]+.\w+)</
Basically, any part that the regexp inside () matches is saved in a capturing group. This one matches anything that's inside >here< that starts with a bunch of characters or digits, has an #, has one or more word or digit characters, then a period, then some word characters. Should match any valid email address.
If you need characters besides >< to be matched, make a character class. That's what those square bracketed bits are. If you replace > with [.,></?;:'"] it'll match any of those characters.
Demo (Look at the match groups)

Related

regular expression which can treat a string containing '#' as illegal input

I wrote a regular expression (https?:\/\/)+([a-x]*)?.[a-z]*.(com|io|cn|net) that can achieve:
Must start with http or https
Must end with com,cn,io or net
Domain names can only consist of numbers, letters, and underscores
Subdomain can be empty
the right answer can be 'http://123.cn' or 'https://www.123.cn'
but it also considered 'http://ww#.123.com' as the correct answer,
I wonder what's wrong with my expression, how to limit input '#'.
If you use a RegEx tester online (like regex101.com) it will tell you that it's matching because the . is not escaped as \. so it will match the # character.
Try: ^(https?:\/\/)([a-z0-9_]*\.)?[a-z0-9_]*\.(com|io|cn|net)$ and you may get what you're looking for.
Note your original RegEx did not include digits or the underscore in the domain names.

Regular Expression for a alphanumeric after a text

This is my regular expression
(\b(serial|sheet))+(\s(number|code|no))+?\b
For the input :
Serial no
sheet no
Sheet Number
Requirement is to parse the text which contain:
Serial no : 2424ABC
Sheet No 5 (Without colon)
Sheet No : 5
Serial No = 5335ABC
How to escape a assignment character (if available) and parse the next alphanumeric character?
This should work:
(\b(serial|sheet))+(\s(number|code|no))+?\b\s*[:=#~– ]*(.*)
You can try it here : https://regex101.com/r/rO2cX1/1
To escape a assignment character, do \=.
To parse the alphanumeric characters, do [a-zA-Z0-9]* or simply \w*.
If the = is optional, you could replace the \s in the regular expression with [=\s] to allow either a space or an equals. Perhaps better and matching your example try \s=?\s*.
If may characters might be between the word and the number then perhaps use \s[-=#~_]?\s*. Note the - goes at the start, otherwise it will be interpreted as a range of characters. Namely [a-f] means [abcdef], ie any of those six characters, whereas [-af] means any of those three characters.
Hence the regular expression becomes:
(\b(serial|sheet))+(\s[-=#~_]?\s*(number|code|no))+?\b
Try the following pattern:
(serial\s+no|sheet\s*no)(\s*\:\s*)([a-z0-9]+)
Demo.
You can add further cases to the pattern in first group. I covered two cases separated by |.
You can find the alphanumeric value in last group of this pattern.
Please note that, this pattern is written as a case-insensitive pattern.

Weird in a regular expression

I tried the following regular expression:
Pattern: ((.[^[0-9])+)(([0-9]{1,3}([.][0-9]{3})+)|([0-9]+))
My goal is to match any string (excluding digit number) followed by a specified number, e.g. MG2999, dasdassa33232
I used the above regular expression.
It's weird as follows:
V375 (not matched)
Vv375 (matched)
Vvv375 (not matched, but first character is not matched)
Vvvv375 (matched)
...
I don't understand why the first character is never matched. May I need your help?
For your quick test, please try: http://regex101.com/
Thanks in advance!
--
Vu
(.[^[0-9])+) matches any character (.), followed by any character except digits and [, repeatedly.
You probably want [^0-9]+ here – or, simpler, \D+.
The rest of there regular expression has similar problems but since I don’t know the number format you want to match I cannot correct that.

Struggling with regular expression

I'm struggling to find the regular expression I can use to classify data that matches a certain pattern:
Here's a few examples:
pli:06e9b616-5712-d0e9-1bc2-000012e61393
pli:6fdd187d-cbdc-3028-4a8d-000020f3449a
pli:0472def9-ccf3-e4e9-ca05-00005fecf9f8
As you can see each string begins with pli: and they all have the same pattern even though the characters are different. Each set of characters is separated by a '-' at the same position.
Looks like it has the form pli:UUID where UUID is a universally unique identifier. Try this one:
pli:[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}
Where I've allowed upper case letters too.
See http://en.wikipedia.org/wiki/Universally_unique_identifier
This does it in as short an expression as I could think of:
pli:(?i)[\da-f]{8}-([\da-f]{4}-){3}[\da-f]{12}
The (?i) means "ignore case" (saves having to type a-zA-Z everywhere), and I've abbreviated the regex by recognising 3 groups of 4 digits in the middle
See a live demo on rubular

Get text using Regular Expression

I have the sentence as below:
First learning of regular expression.
And I want to extract only First learning and expression by means of regular expressions.
Where would I start/
Regular expressions are for pattern matching, which means we'd need to know a pattern that is to be matched.
If you literally just want those strings, you'd just use First learning and expression as your patterns.
As #orique says, this is kind of pointless; you don't need RegEx for that. If you want something more complicated, you'd need to explain what you're trying to match.
Regex is not usually used to match literal text like what you're doing, but instead is used to match patterns of text. If you insist on using regex, you'll have to match the trivial expression
(First learning|expression)
As already pointed out, it is unusual to match a literal string like you are asking, but more common to match patterns such as several word characters followed by a space character etc...
Here is a pattern to match several word characters (which are a-z, A-Z, 0-9 and _) followed by a space, followed by several more word characters etc... It ends up capturing three groups. The first group will match the first two words, the second part the next to words, and the last part, the fifth word and the preceding space.
$words = "First learning of regular expression.";
preg_match(/(\w+\s\w+)\s(\w+\s\w+)(\s\w+)/, $words, $matches);
$result = matches[1]+matches[3];
I hope this matches your requirement.