Use regex to strip out emails - regex

I know that this is a notoriously difficult topic. The best regex that I've found after trawling many different answers is the one at http://emailregex.com/
It works great at validating an email address, but I'm struggling to alter this regex to find all email addresses in a string.
I'm using the PHP version of the regex.
How would I go about using this regex to find all of the email addresses in a string?
I know about the preg functions, my PHP code isn't as much the problem as adapting that regex.
$redacted = preg_replace_callback(
"/$emailRegex/i",
function ($matches) {
return '[' . $this->getHashedValue($matches[0]) . ']';
},
$input
);

If you already have a working regular expression, you can use PHP's preg_replace to replace all (non-overlapping) matches by a certain string, in our case "" (to remove them).
preg_replace($your_regex, "", $your_string)
This should strip all matches from your string.
Also, as #MonkeyZeus commented, if your regex contains the start anchor (^) or the end anchor ($), make sure to remove those before using preg_replace. Otherwise, the only match you can get will be the entire string, if it matches.

Related

How can I match multiple hits between 2 delimiters?

Hi, my fellow RegEx'ers ;)
I'm trying to match multiple Texts between every two quotes
Here's my text:
...random code
someArray[] = ["Come and",
"get me,",
"or fail",
"trying!",
"Yours truly"]
random code...
So far, I managed to get the correct matches with two patterns, executed after each other:
(?s)someArray\[\].*?=.*?\[(.*?)\]
this extracts the text between the two brackets and on the result, I use this one:
"(.*?)"
This is working just fine, but I'd love to get the Texts in one regex.
Any help is highly appreciated!
Consider using \G. With its help, you may match "(.*?)" preceded by either someArray[] = [ or previous match of "(.*?)" (well, strictly speaking previous match of entire regex). Then just grab first capture groups from all matches:
(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"
Demo: https://regex101.com/r/eBQWdU/3
How you grab the first capture groups from depends on the language you're using regex in. For example in PHP you may do something like this:
preg_match_all('/(?:(?s).*someArray\[\].*?=.*?\[|\G[^"\]]+)"(.*?)"/', $input, $matches);
$array_items = $matches[1];
Demo: https://ideone.com/mZgU1x

Regex in Yahoo Pipes: append only if match found

In Yahoo Pipes, I'm trying to append content to existing string, but only if a match is found in that string.
Specifically, I want to append "#Fracking" to the end of the line, but only if "Hydraulic Fracturing" is found in the item.title.
My regex is as follows (doesn't work): replace (?(Hydraulic Fracturing)(.*)) with $1 $#Fracking
What am I doing wrong? Does Yahoo Pipes not support conditional regex? I could not find an answer to that.
Thanks for any help!
If I understood correctly, you could do like this:
pattern: .*Hydraulic Fracturing.*
replacement: $0#Fracking
It seems that in Yahoo Pipes the matched string can be back referenced with $0.
Here, the .* in the front and back make the entire string match if it contains "Hydraulic Fracturing", and then replace the whole thing with itself, plus "#Fracking" appended. If the string doesn't contain "Hydraulic Fracturing" then there will be no replacement.

How do I craft a regular expression to exclude strings with parentheses

I have the following SDDL:
O:BAG:BAD:(A;;CCDCLCSWRP;;;BA)(A;;CCDCSW;;;WD)(A;;CCDCLCSWRP;;;S-1-5-32-562)(A;;CCDCLCSWRP;;;LU)(A;;CCLCRP;;;S-1-5-21-4217728705-3687557540-3107027809-1003)
Unfortunately I keep getting this:
(A;;CCDCLCSWRP;;;BA)(A;;CCDCSW;;;WD)
And what I want is just (A;;CCDCSW;;;WD).
My regex is: (\(A;.+;WD\)) : find "(A;" some characters ending in ";WD)"
I've tried making the match lazy and I've tried excluding the ")(" pair of characters based on a search of the stackoverflow regex tag looking for examples where others have answered similar questions.
I'm really confused why the exclusion of the parens isn't working:
(\(A;.+[^\(\)]*.+;WD\)) : find "(A;" followed by some characters where none of them are ")('' followed by other characters ending in ";WD)"
And this was my guess at using negative look around:
(\(A;.+^((?!\)\().).+;WD\))
which didn't match anything.
I'm also doing this in PowerShell v3.0 with the following code:
$RegExPattern = [regex]"(\($ACE_Type;.*;$ACE_SID\))+?"
if ($SDDL -match $RegExPattern) {
$MatchingACE = $Matches[0]
Where in this instance $ACE_Type = "A" and $ACE_SID = "WD".
You almost had the solution with your second regex pattern. The problem was that you included too many . wildcards. This should be all you need:
A;[^()]+;WD
And of course if you just want to capture the string in between A; and ;WD:
A;([^()]+);WD
Then just replace with \1.
I simplified this a lot and then added lookarounds so that you only matched the intended string (in between A;...;WD). This looks behind for A;, then matches 1+ non-parenthesis characters, while looking ahead for ;WD.
(?<=A;)[^()]+(?=;WD)
Regex101

parsing url for specific param value

im looking to use a regular expression to parse a URL to get a specific section of the url and nothing if I cannot find the pattern.
A url example is
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5#c452fds-634d-f424fds-cdsa&bf_action=jildape
I wish to get the bolded text in it.
Currently im using the regex "d=([^#]*)" but the problem is im also running across urls of this pattern:
and im getting the bold section of it
/te/file/value/jifle?uil=testing-cdas-feaw:jilk:&jklfe=https://value-value.jifels/temp.html/topic?id=e997aad4-92e0-j30e-a3c8-jfkaliejs5&bf_action=jildape
I would prefer it have no matches of this url because it doesnt contain the #
Regexes are not a magic tool that you should always use just because the problem involves a string. In this case, your language probably has a tool to break apart URLs for you. In PHP, this is parse_url(). In Perl, it's the URI::URL module.
You should almost always prefer an existing, well-tested solution to a common problem like this rather than writing your own.
So you want to match the value of the id parameter, but only if it has a trailing section containing a '#' symbol (without matching the '#' or what's after it)?
Not knowing the specifics of what style of regexes you're using, how about something like:
id=([^#&]*)#
regex = "id=([\\w-])+?#"
This will grab everything that is character class[a-zA-Z_0-9-] between 'id=' and '#' assuming everything between 'id=' and '#' is in that character class(i.e. if an '&' is in there, the regex will fail).
id=
-Self explanatory, this looks for the exact match of 'id='
([\\w-])
-This defines and character class and groups it. The \w is an escaped \w. '\w' is a predefined character class from java that is equal to [a-zA-Z_0-9]. I added '-' to this class because of the assumed pattern from your examples.
+?
-This is a reluctant quantifier that looks for the shortest possible match of the regex.
#
-The end of the regex, the last character we are looking for to match the pattern.
If you are looking to grab every character between 'id=' and the first '#' following it, the following will work and it uses the same logic as above, but replaces the character class [\\w-] with ., which matches anything.
regex = "id=(.+?)#"

I want to modify this regex to include apostrophe

This regex is used for validating email addresses, however it doesn't include the case for apostrophy (') which is a valid character in the first part of an email address.
I have tried myself and to use some examples I found, but they don't work.
^([\w-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
How do I modify it slightly to support the ' character (apostraphy)?
Per the documentation for an email address, the apostrophe can appear anywhere before the # symbol, which, in your current regex is:
^([\w-\.]+)#
You should be able to add the apostrophe into the brackets of valid characters:
^([\w-\.']+)#
This would make the entire regex:
^([\w-\.']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
EDIT (regex contained in single-quotes)
If you're using this regex inside a string with single-quotes, such as in PHP with $regex = '^([\w ..., you will need to escape the single-quote in the regex with \':
^([\w-\.\']+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$
You need to update the first part as follows:
^([\'\w-\.]+)