Check string for email with regular expressions or other way - regex

I've tried the following code, but it gives me nomatch.
re:run("qw#qc.com", "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b").
regexp i got here http://www.regular-expressions.info/email.html
EDITED:
Next doesnt work to
re:run("345345", "\b[0-9]+\b").
If you got just en email in string when that one will match
re:run("qw#qc.com", "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$").

I hesitate to answer this question, since I believe it relies on an incorrect assumption - that you can determine whether an email address is valid or not with a regular expression. See this question for more details; from a short glance I'd note that the regexp in your question doesn't accept the .museum and .рф top-level domains.
That said, you need to escape the backslashes. You want the string to contain backslashes, but in Erlang, backslashes are used inside strings to escape various characters, so any literal backslash needs to be written as \\. Try this:
3> re:run("qw#qc.com", "\\b[a-z0-9._%+-]+#[a-z0-9.-]+\\.[a-z]{2,4}\\b").
{match,[{0,9}]}
Or even better, this:
8> re:run("qw#qc.com", "\\b[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*\\b").
{match,[{0,9}]}
That's the regexp used in the HTML 5 standard, modified to use \\b instead of ^ and $.

Looks like you need a case-insensitive match ?
Currently [A-Z0-9._%+-] (for example) only matches upper-case characters (plus numbers etc).
One solution is to specify [A-Za-z]. Another solution is to convert your email address to uppercase prior to matching.

Related

Valid name cannot accept only hyphen

I am weak in regex but I am learning. Currently I have a requirement to validate name and I am not able to write a valid regex for it. A valid name would contain alphabet only or alphabet with hyphens or spaces.
Example of valid name would be
jones
jones-smiht
a loreal jones
but if the name contains digits it's an invalid name. The following regex
^[-\\sa-zA-Z]+$ works fine but only - is also considered as a valid name.
How do I modify it so that a valid name must contain letters regardless or whether it contains hyphens and spaces?
I think you're looking for this regex:
^[a-zA-Z][-\\sa-zA-Z]*$
This will make sure your name always starts with a letter instead of starting with hyphen or space.
Note: In Java you can also make use of (?i) for ignore case and shorten your regex as follows:
(?i)^[a-z][-\\sa-z]*$
The literal answer for you would be ^[a-zA-Z][-\sa-zA-Z]*$.
There are better answers: for instance,
([a-zA-Z]+)([-\s][a-zA-Z]+)*
will allow any number of words separated by single space or dash, allowing for simon peyton-jones, but disallowing silliness like --jumbo-spaz--.
And copied from the response I tried to publish on the deleted answer:
Regexp is single-backslash. However, since regexps are constructed from strings in Java, you need to escape the backslash; but it is the feature of strings, not of regexps. So, regexp is \s, but you need to write Pattern.compile("\\s") in Java. Not all languages have this twist, so keeping rules of strings separate from what Regexp is is useful.

Regex match anything that is not sub-pattern

I have cookies in my HTTP header like so:
Set-Cookie: frontend=ovsu0p8khivgvp29samlago1q0; adminhtml=6df3s767g199d7mmk49dgni4t7; external_no_cache=1; ZDEDebuggerPresent=php,phtml,php3
and I need to extract the 26 character string that comes after frontend (e.g. ovsu0p8khivgvp29samlago1q0). The following regular expression matches that for me:
(?<=frontend=)(.*)(?=;)
However, I am using Varnish Cache and can only use a regex replace. Therefore, to extract that cookie value (26 character frontend string) I need to match all characters that do not match that pattern (so I can replace them with '').
I've done a fair bit of Googling but so far have drawn a blank. I've tried the following
Match characters that do not match the pattern I want: [^((?<=frontend=)[A-Za-z0-9]{26}(?=;))] which matches random characters, including the ones I want to preserve
I'd be grateful if someone could point me in the right direction, or note where I might have gone wrong.
The Set-Cookie response header is a bit magical in Varnish, since the backends tend to send multiple headers with the same name. This is prohibited by the RFC, but the defacto way to do it.
If you are using Varnish 3.0 you can use the Header VMOD, it can parse the response and extract what you need:
https://github.com/varnish/libvmod-header
Use regex pattern
^Set-Cookie:.*?\bfrontend=([^;]*)
and the "26 character string that comes after frontend" will be in group 1 (usually referred to in the replacement string as $1)
Do you have control over the replacement string? If so, you can go with Ωmega's answer, and use $1 in your replacement string to write the frontend value back.
Otherwise, you could use this:
^Set-Cookie:.*(?!frontend=)|(?<=frontend=.{26}).*$
This will match everything from the start of the string, until frontend= is encountered. Or it will match everything that has frontend= exactly 26 characters to the left of it and up until the end of the string. If those 26 characters are a variable length, it would get signigicantly more complicated, because only .NET supports variable-length lookbehinds.
For your last question. Let's have a look at your regex:
[^((?<=frontend=)[A-Za-z0-9]{26}(?=;))]
Well, firstly the negative character class [^...] you tried to surround you pattern with, doesn't really work like this. It is still a character class, so it matches only a single character that is not inside that class. But it gets even more complicated (and I wonder why it matches at all). So firstly the character class should be closed by the first ]. This character class matches anything that is not (, ?, <, =, ), a letter or a digit. Then the {26} is applied to that, so we are trying to find 26 of those characters. Then the (?=;) which asserts that those 26 characters are followed by ;. Now comes what should not work. The closing ) should actually throw and error. And the final ] would just be interpreted as a literal ].
There are some regex flavors which allow for nesting of character classes (Java does). In this case, you would simply have a character class equivalent to [^a-zA-Z0-9(){}?<=;]. But as far as I could google it, Varnish uses PCRE, and in PCRE your regex should simply not compile.

What is wrong with my simple regex that accepts empty strings and apartment numbers?

So I wanted to limit a textbox which contains an apartment number which is optional.
Here is the regex in question:
([0-9]{1,4}[A-Z]?)|([A-Z])|(^$)
Simple enough eh?
I'm using these tools to test my regex:
Regex Analyzer
Regex Validator
Here are the expected results:
Valid
"1234A"
"Z"
"(Empty string)"
Invalid
"A1234"
"fhfdsahds527523832dvhsfdg"
Obviously if I'm here, the invalid ones are accepted by the regex. The goal of this regex is accept either 1 to 4 numbers with an optional letter, or a single letter or an empty string.
I just can't seem to figure out what's not working, I mean it is a simple enough regex we have here. I'm probably missing something as I'm not very good with regexes, but this syntax seems ok to my eyes. Hopefully someone here can point to my error.
Thanks for all help, it is greatly appreciated.
You need to use the ^ and $ anchors for your first two options as well. Also you can include the second option into the first one (which immediately matches the third variant as well):
^[0-9]{0,4}[A-Z]?$
Without the anchors your regular expression matches because it will just pick a single letter from anywhere within your string.
Depending on the language, you can also use a negative look ahead.
^[0-9]{0,4}[A-Za-z](?!.*[0-9])
Breakdown:
^[0-9]{0,4} = This look for any number 0 through 4 times at the beginning of the string
[A-Za-z] = This look for any characters (Both cases)
(?!.*[0-9]) = This will only allow the letters if there are no numbers anywhere after the letter.
I haven't quite figured out how to validate against a null character, but that might be easier done using tools from whatever language you are using. Something along this logic:
if String Doesn't equal $null Then check the Rexex
Something along those lines, just adjusted for however you would do it in your language.
I used RegEx Skinner to validate the answers.
Edit: Fixed error from comments

Regex to match any strings containing Cyrillic symbols, except comments marked with //, ///, ///, etc

I want to find all strings containing at least 1 Cyrillic character (basically /.*[А-я].*/) but with exception of comments.
Comment is a string or part of a string which starts with 2 or more / characters.
Currently I get this regex which do some part of the trick:
^(?=^.*?[А-я]+).*?((?=[\/]{2,})|(^(?:(?![\/]{2,}).)*$))
But I'd like to get less bloated and faster expression.
And as additional question: could anyone explain why this one is working? I combined it by trial-and-error but I'm not sure I completely understood how it works, because when I try to change it in any part - it stops working.
The following regex will match any cyrllic character that is not preceded by a double forward slash
(?<!/{2}.*)[А-я]
It specifies that it should not be preceded by a double slash by using a negative lookbehind.
You haven't specified what flavour of regex your using, but be aware some flavours don't support lookarounds. For example PCRE (javascript) doesn't. You are using 3 of them in your regex, so i presume its ok.

do we ever use regex to find regex expressions?

let's say i have a very long string. the string has regular expressions at random locations. can i use regex to find the regex's?
(Assuming that you are looking for a JavaScript regexp literal, delimited by /.)
It would be simple enough to just look for everything in between /, but that might not always be a regexp. For example, such a search would return /2 + 3/ of the string var myNumber = 1/2 + 3/4. This means that you will have to know what occurs before the regular expression. The regexp should be preceded by something other than a variable or number. These are the cases that I can think of:
/regex/;
var myVar = /regex/;
myFunction(/regex/,/regex/);
return /regex/;
typeof /regex/;
case /regex/;
throw /regex/;
void /regex/;
"global" in /regex/;
In some languages you can use lookbehind, which might look like this (untested!):
(?=<^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/
However, JavaScript does not support that. I would recommend imitating lookbehind by putting the portion of the regexp designed to match the literal itself in a capturing group and accessing that. All cases of which I am aware can be matched by this regexp:
(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)
NOTE: This regex sometimes results in false positives in comments.
If you want to also grab modifiers (e.g. /regex/gim), use
(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow|\bvoid|\bin)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/\w*)
If there are any reserved words I am missing that may be followed by a regexp literal, simply add this to the end of the first group: |\bkeyword
All that remains then is to access the capturing group, using a code similar to the following:
var codeString = "function(){typeof /regex/;}";
var searchValue = /(?:^|\n|[^\s\w\/]|\breturn|\btypeof|\bcase|\bthrow)\s*(\/(?:\\\/|[^\/\*\n])(?:\\\/|[^\/\n])*\/)/g;
// the global modifier is necessary!
var match = searchValue.exec(codeString); // "['typeof /regex/','/regex/']"
match = match[1]; // "/regex/"
UPDATE
I just fixed an error with the regexp concerning escaped slashes that would have caused it to get only /\/ of a regexp like /\/hello/
UPDATE 4/6
Added support for void and in. You can't blame me too much for not including this at first, as even Stack Overflow doesn't, if you look at the syntax coloring in the first code block.
What do you mean by "regular expression"? aaaa is a valid regular expression. This is also a regular expression. If you mean a regular expression literal you might need something like this: /\/(?:[^\\\/]|\\.)*\// (adapted from here).
UPDATE
slebetman makes a good point; regular-expression literals don't need to start with /. In Perl or sed, they can start with whatever you want. Essentially, what you're trying to do is risky and probably won't work for all cases.
Its not the best way to go about this.
You can attempt to do so with some degree of confidence (using EOL to break up into substrings and finding ones that look like regular expressions - perhaps delimited by quotation marks) however dont forget that a very long string CAN be a regex, so you will never have complete confidence using this approach.
Yes, if you know whether (and how!) your regex is delimited. Say, for example, that your string is something like
aaaaa...aaa/b/aaaaa
where 'b' is the 'regular expression' delimited by the character / (this is a near-basic scenario); what you have to do is scan the string for the expected delimiter, extract whatever it's inbetween delimiters (paying attention to escape chars) and you should be set.
This, if your delimiter is a known character and if you are sure that it appears an even number of times or you want to discard the rest (for example, which set of delimiters are you considering in the following string: aaa/b/aaa/c/aaa/d)
If this is the case then you need to follow the same reasoning you'd do to find any substring in a given string. Once you've found the first regexp, keep parsing until you hit the end of the string or you find another regexp, and so on.
I suspect, however, that you are looking for a 'general rule' to find any string that, once parsed, would result in a valid regular expression (say we're talking about POSIX regexp-- try man re_format if you're under *BSD). If that is the case you could try every possible substring of every length of the given string and feed it to a regexp parser for syntax correctness. Still, you have proven nothing of the validity of the regexp, i.e. on what they actually match.
If that is what you're trying to do I strongly recommend finding another way or explaining better what you are trying to accomplish here.