I'm new to the website and to Regular Expression as well.
So I want to bookmark a list of Emails that have no value after the colons ":" as highlighted in the picture below.
Here is an example:
abcdef#gmail.com:123456
abcdEF452#gmail.com:test123##NEW
abcdef#gmail.com:
abcdef#gmail.com:
I only want to bookmark the last two ones so it would be like this:
abcdef#gmail.com:
abcdef#gmail.com:
The following regex will match the "pre-colon" pattern if and only if it is followed by nothing but whitespace until the end of the line:
\w+#\w+\.\w+:\s*$
View on regex101
Note that matching email addresses with 100% correctness is more complicated than this, but this will likely do for your use case.
If you only want to find strings that end with a colon, then all you need is :$.
I find this request a bit odd, perhaps if you could elaborate a bit more on your use case I may be able to provide a better approach or solution.
Now, I think that this expression should work the way you expect:
[\w\.]+#[a-z0-9][a-z0-9-]*[a-z0-9]?
Add the colon sign at the end if you need to match for the colon sign as well.
I noticed that the other proposed expressions don't account for email addresses with a dot in the username part or with dashed in the domain part. You may use a combination of all the solutions if you are more familiar with RegEx. I highly recommend you test the expression before moving it to production, you can do further tests easily on this page https://regexr.com/.
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?
Will be a more adequate RegEx since the Internet Engineering Task Force established limits on how an email address can be formatted and this accounts for those additional characters. More details on this page https://www.mailboxvalidator.com/resources/articles/acceptable-email-address-syntax-rfc/.
As a friendly reminder, Stack Overflow can be best used when you have already invested some effort in fixing some problem, rather than having a community member provide you with a straight answer. This and other suggestions are listed on this other page https://stackoverflow.com/tour.
Try this:
[a-zA-Z]+#[a-zA-Z]+: # Only a-zA-Z, numbers are not accepted
Note: the last character is a space " "
[\w+]+#[\w+]+: # \w+ = Matches one or more [A-Za-z0-9_]
Without a space it will matches only these with no character after the colon.
[\w+]+#[\w+]+.*:$ # Matches only when there is also .XXX. For example: .com or .de
Given this:
abcdEF452#gmail.com:test123##NEW
There are three parts to this:
Before the #.
Between the # and the :
After the :
If we assume (1) has to be there and not empty.
If we assume (2) has to be there and not empty.
If we assume (3) the ':' is required by the trailing part can be empty.
I don't want to make assumptions about other requirements.
Then I would use:
[^#]+#[^:]+:.*$
Meaning:
[^#] => Anything apart from the '#' character.
[^#]+ => The above 1 or more times.
[^#]+# => The above followed by '#' character.
[^:] => Anything apart from the ':' character.
[^:]+ => The above 1 or more times.
[^:]+: => The above followed by ':' character.
.* => Any character 0 or more times.
$ end of line.
So if we want to mkae sure we only find things that don't have anything after the ':' we can simplify a bit.
[^#]+#[^:]+:$
Make sure we have the '#' and ':' parts and they are none empty. But the colon is followed by the end of line.
If you don't care about part (1) or (2) we can simplify even more.
[^:]+:$
Line must contain a : don't care what is in front as long as there is a least one character before the ':' and zero after.
Final simplification.
:$
If you don't care about anything except that the colon is not followed by anything.
Related
I have below set of sample emailids
EmailAddress
1123
123.123
123_123
123#123.123
123#123.com
123#abc.com
123mbc#abc.com
123mbc#123abc.com
123mbc#123abc.123com
123mbc123#cc123abc.c123com
Need to eliminate mailids if they contain entirely numericals before #
Expected output:
123mbc#abc.com
123mbc#123abc.com
123mbc#123abc.123com
123mbc123#cc123abc.c123com
I used below Java Rex. But its eliminating everything. I have basic knowledge in writing these expressions. Please help me in correcting below one. Thanks in advance.
[^0-9]*#.*
do you mean something like this ? (.*[a-zA-Z].*[#]\w*\.\w*)
breakdown .* = 0 or more characters [a-zA-Z] = one
letter .* = 0 or more characters #
\w*\.\w* endless times a-zA-Z0-9 with a single . in between
this way you have the emails that contains at least one letter
see the test at https://regex101.com/r/qV1bU4/3
edited as suggest by ccf with updated breakdown
The following regex only lets email adresses pass that meet your specs:
(?m)^.*[^0-9#\r\n].*#
Observe that you have to specify multi-line matching ( m flag. See the live demo. The solution employs the embedded flag syntax m flag. You can also call Pattern.compile with the Pattern.MULTILINE argument. ).
Live demo at regex101.
Explanation
Strategy: Define a basically sound email address as a single-line string containing a #, exclude strictly numerical prefixes.
^: start-of-line anchor
#: a basically sound email address must match the at-sign
[^...]: before the at sign, one character must neither be a digit nor a CR/LF. # is also included, the non-digit character tested for must not be the first at-sign !
.*: before and after the non-digit tested for, arbitrary strings are permitted ( well, actually they aren't, but true syntactic validation of the email address should probably not happen here and should definitely not be regex based for reasons of reliability and code maintainability ). The strings need to be represented in the pattern, because the pattern is anchored.
Try this one:
[^\d\s].*#.+
it will match emails that have at least one letter or symbol before the # sign.
I have tried using ^[a-zA-Z0-9 `.]*$ . But it is allowing more spaces.And can anyone please explain what is "closed" in this context? Any help is appreciated.
Try this:
/^[a-z0-9\-\.`]+\s{0,1}[a-z0-9\-\.`]+$/gmi
Regex live here.
Explaining:
^[a-z0-9\-\.\`]+ # starts with at least one letter/number/-/./`
\s{0,1} # must or not contain one space - same as: '\s?'
[a-z0-9\-\.\`]+$ # ends with at least one letter/number/-/./`
This one should do a pretty good job:
/*#!(?#!js valid Rev:20150715_1300)
# Validate alphabets numbers `-. and only one space.
^ # Anchor to start of string.
(?=[^ ]+(?:[ ][^ ]+)*$) # Only one space between words.
[a-zA-Z0-9 `.-]* # One or more allowed chars.
$ # Anchor to end of string.
!#*/
var valid = /^(?=[^ ]+(?: [^ ]+)*$)[a-zA-Z0-9 `.-]*$/;
Your braces have a space in them and are also at the beginning of your regex after the carrot. so you need to exclude spaces at the beginning and end of text:
/^([a-z0-9\-]+\s{0,1}[a-z0-9\-]+)+$/gmi
you also want to include the '-' character by escaping it and including it.
https://regex101.com/
A nice website for testing regex
What I am interpreting from your question would look something like this.
^([a-zA-Z0-9`.]+ ?)*[a-zA-Z0-9`.]+$
It means that for every space in your string, there must be a series of the characters you suggested, and it must end with at least one of those characters as well.
I want to remove all numbers from a paragraph except from some words.
My attempt is using a negative look-ahead:
gsub('(?!ami.12.0|allo.12)[[:digit:]]+','',
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
But this doesn't work. I get this:
"." "" "ami.. " "allo."
Or my expected output is:
"." "" 'ami.12.0','allo.12'
You can't really use a negative lookahead here, since it will still replace when the cursor is at some point after ami.
What you can do is put back some matches:
(ami.12.0|allo.12)|[[:digit:]]+
gsub('(ami.12.0|allo.12)|[[:digit:]]+',"\\1",
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
I kept the . since I'm not 100% sure what you have, but keep in mind that . is a wildcard and will match any character (except newlines) unless you escape it.
Your regex is actually finding every digit sequence that is not the start of "ami.12.0" or "allo.12". So for example, in your third string, it gets to the 12 in ami.12.0 and looks ahead to see if that 12 is the start of either of the two ignored strings. It is not, so it continues with replacing it. It would be best to generalize this, but in your specific case, you can probably achieve this by instead doing a negative lookbehind for any prefixes of the words (that can be followed by digit sequences) that you want to skip. So, you would use something like this:
gsub('(?<!ami\\.|ami\\.12\\.|allo\\.)[[:digit:]]+','',
c('0.12','1245','ami.12.0 00','allo.12 1'),perl=TRUE)
I'm really bad at regex, I have:
/(#[A-Za-z-]+)/
which finds words after the # symbol in a textbox, however I need it to ignore email addresses, like:
foo#things.com
however it finds #things
I also need it to include numbers, like:
#He2foo
however it only finds the #He part.
Help is appreciated, and if you feel like explaining regex in simple terms, that'd be great :D
/(?:^|(?<=\s))#([A-Za-z0-9]+)(?=[.?]?\s)/
#This (matched) regex ignores#this but matches on #separate tokens as well as tokens at the end of a sentence like #this. or #this? (without picking the . or the ?) And yes email#addresses.com are ignored too.
The regex while matching on # also lets you quickly access what's after it (like userid in #userid) by picking up the regex group(1). Check PHP documentation on how to work with regex groups.
You can just add 0-9 to your regex, like so:
/(#[A-Za-z0-9-]+)/
Don't think any more explanation is needed since you've been able to come this far by yourself. 0-9 is just like a-z (though numeric ofcourse).
In order to ignore emailaddresses you will need to provide more specific requirements. You could try preceding # with (^| ) which basically states that your value MUST be preceeded by either the start of the string (so nothing really, though at the start) or a space.
Extending this you can also use ($| ) on the end to require the value to be followed by the end of the string or a space (which means there's no period allowed, which is requirement for a valid emailaddress).
Update
$subject = "#a #b a#b a# #b";
preg_match_all("/(^| )#[A-Za-z0-9-]+/", $subject, $matches);
print_r($matches[0]);
Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.
You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.
/cats/ -> /c\s*a\s*t\s*s/
While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.
If you want to search for "my cats", instead of:
myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)
Just do:
myString.replace(/\s*/g,"").match(/mycats/g)
Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.
Addressing Steven's comment to Sam Dufel's answer
Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?
This should do the trick:
/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/
See this page for all the different variations of 'cats' that this matches.
You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.
You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s
It's long but you could build the string dynamically of course.
You can see it working here: http://www.rubular.com/r/zzWwvppSpE
If you only want to allow spaces, then
\bc *a *t *s\b
should do it. To also allow tabs, use
\bc[ \t]*a[ \t]*t[ \t]*s\b
Remove the \b anchors if you also want to find cats within words like bobcats or catsup.
This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):
you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:
def regex_search_ignore_space(regex, string):
no_spaces = ''
char_positions = []
for pos, char in enumerate(string):
if re.match(r'\S', char): # upper \S matches non-whitespace chars
no_spaces += char
char_positions.append(pos)
match = re.search(regex, no_spaces)
if not match:
return match
# match.start() and match.end() are indices of start and end
# of the found string in the spaceless string
# (as we have searched in it).
start = char_positions[match.start()] # in the original string
end = char_positions[match.end()] # in the original string
matched_string = string[start:end] # see
# the match WITH spaces is returned.
return matched_string
with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'
If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.
And the performance of this function can of course also be optimized, this example is just to show the path to a solution.
The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.