I have a simple regular expression for finding email addresses in a text, but even though I don't see an error, it doesn't work.
$addr=array();
$t='Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean fermentum risus id tortor. Morbi leo mi, nonummy eget tristique non, rhoncus non leo. Donec quis nibh at felis congue commodo. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos aaa#bbb.com. Aliquam ccc#ddd.net ornare wisi eu metus.';
if(preg_match_all('~[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}~',$t,$addr, PREG_SET_ORDER)){
echo 'found';
}
I have also tried this version I found, but it didn't work either:
if(preg_match_all('/^[A-Z0-9._%-]+#[A-Z0-9._%-]+\.[A-Z]{2,4}$/',$t,$addr, PREG_SET_ORDER)){
You are matching emails that are all uppercase. You need to either do [A-Za-z], or set the case insensitive flag on preg_match
This would also work for the match pattern and is shortest:
[0-z.%-]+#[0-z.-]+\.[A-z]{2,4}
This works because 0-z covers A-Z, a-z, 0-9 and _, also A-z covers A-Z and a-z
I develop in ruby and you can see this 'in action' at http://rubular.com/r/PdbH1BjWMs
Include the lower case letter class:
if(preg_match_all('~[a-zA-Z0-9._%-]+#[a-zA-Z0-9.-]+\.[A-Z]{2,4}~',$t,$addr, PREG_SET_ORDER)){
echo 'found';
}
...note a-z.
Related
So I have this text that I am trying to parse with Regex:
Name: Test Data 1
Description: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec feugiat nulla id nisi venenatis blandit.
Donec blandit egestas orci, at tristique dui vehicula in. Maecenas fringilla fringilla enim, in pulvinar ex gravida
in. Nam cursus facilisis ante, sed tristique nisl sagittis sed. In auctor felis id neque suscipit ullamcorper. Nunc
faucibus elit sed metus vestibulum, ullamcorper pulvinar nisi auctor. Praesent sodales orci mauris, eget dapibus
mauris sodales in. Ut iaculis, ante vitae ullamcorper semper, metus tortor auctor purus, eu convallis nulla lacus
in tellus. Phasellus feugiat tempus neque, in fringilla nisi scelerisque sed. Donec elementum diam nec mattis dignissim.
I am trying to parse it to load it into a database.
With this expression, I am trying to get a match on the "Name" and "Description" parameters but also trying to get a match on the parameter value as well (which can sometimes be multi-line).
(.*):\s(.*)
I have been searching for a while now and I cannot seem to be able to make it match the whole paragraph but stop when it hits a blank line.
I would like the result to be as follows:
1st Match
Group 1: Name
Group 2: Test Data 1
2nd Match
Group 1: Description
Group 2: Description value with multi-line
https://regex101.com/r/mG2ms9/3
Thanks
You can use the following:
(.*?):\s([\s\S]*?)(?=\n(?:\n|\w|$))
Here it is on regex101.
[\s\S] matches any character, even a new line (whereas '.' does not, by default).
Then we're matching as few characters as possible (*?) up until the point where the next line is either blank (\n), starts with a word character (\w), or is the end of the string ($).
We can get away with the \w option since all of the new lines in the description parameter are followed by a space. If this isn't always the case, you could replace \w with something like .*: to check instead if the next line contains ':' and stop if so.
Note that I disabled multi-line mode; it's not suitable here.
I need to remove all anchors (anchor text remains) from the string except those anchors that have href="/"
This is example text:
Fusce imperdiet nulla ut sapien aliquet, congue varius dui consectetur. This link remains et blandit nisl. Curabitur euismod volutpat urna, eget dignissim libero cursus rhoncus. Nulla ac test sollicitudin link from this text should be removed. Maecenas sodales vel lorem eu placerat.
Here is regex that I think should work (using negative lookahead):
/<a.*?(?!href=["']\/["'])>(.*?)</a>/gi
Yet it selects both anchors.
try regex <a(?!.*href=["']\/["']).*?>(.*?)<\/a>
The negative lookahead (?!.*href=["']\/["']) won't capture the tag with href="/"
Regex
I’m having a hard time figuring out the regex code in Google Sheets to check a cell then return everything including new lines \n and returns \r before a certain pattern \*+.
A little more background: I'm using REGEXEXTRACT(A:A,"...") format inside a bigger ArrayFormula so that it automatically updates when a new row is added. This one’s working properly. It’s only the regex part I’m having trouble with.
So, for the purpose of this question, let's say I'm only worried about extracting the data from the A1 cell before a certain pattern and return that value in cell B1. Which brings us to this code in cell B1:
REGEXEXTRACT(A1,"...")
For example, this is how my A1 cell looks like:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus accumsan risus id ex dapibus sodales.
Curabitur dui lacus, tincidunt vel ligula quis, volutpat mattis eros.
In quis metus at ex auctor lobortis. Aliquam sed nisi purus. Sed cursus odio erat, ut tristique sapien interdum interdum. Morbi vel sollicitudin ante, non pellentesque libero.
***********
Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Aenean egestas urna facilisis massa posuere, quis accumsan erat ornare.
Curabitur at dapibus nibh. Nam nec vestibulum ligula. Phasellus bibendum mi urna, ac hendrerit libero interdum non. Suspendisse semper non elit aliquam auctor.
Morbi vel sem tortor. Donec a sapien quis erat condimentum consequat in ut sem. Quisque in tellus sed est lobortis ultricies sed vitae enim.
I want to return this value in B1:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus accumsan risus id ex dapibus sodales.
Curabitur dui lacus, tincidunt vel ligula quis, volutpat mattis eros.
In quis metus at ex auctor lobortis. Aliquam sed nisi purus. Sed cursus odio erat, ut tristique sapien interdum interdum. Morbi vel sollicitudin ante, non pellentesque libero.
Which is basically anything before the pattern *******. In Python, I can add the re.DOTALL to the .* but I can't get this to work in Google Sheets.
To make a dot match line breaks, you need to add (?s) to the pattern. To match any char, you may use a .. To match up to the leftmost occurrence, use lazy quantifier, *?. To actually extract a substring you need, wrap the part of the pattern you are interested in getting with capturing parentheses.
So, to match up to the first ******* substring, you may use
(?s)^(.*?)\*\*\*\*\*\*\*
or (?s)^(.*?)\*{7}. See the regex demo (note that Go regex engine is also RE2, so you may test your patterns there, at regex101.com).
(?s) - a DOTALL modifier
^ - start of string
(.*?) - Group 1: any 0+ chars as few as possible
\*\*\*\*\*\*\* - 7 literal asterisk symbols.
Note you cannot rely on a negated character class (that matches line breaks) if your substring may contain * chars, that is, ^([^*]*)\*\*\*\*\*\*\* won't work in those cases.
If you just want to match any chars up to the first * in the string, your regex will simplify greatly to
^([^*]+)
It matches
^ - start of string
([^*]+) - Capturing group 1: one or more chars other than *.
re.DOTALL flag in python corresponds to (?s) single line mode flag in re2.
Python:
(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.
re2:
Flags: s let . match \n (default false)
So,
=REGEXEXTRACT(A1,"(?s)(.*?)\*")
This corresponds to re.findall()
Not regex though might suit someone wanting the same result but less particular about the method:
=ArrayFormula(LEFT(A1:A,Find("***********",A1:A)-3))
If you really only want to match everything before the first *:
=REGEXEXTRACT(A1;"[^*]*")
If you want to allow a single star in the text and only stop at multiple (2 or more) stars (possibly divided by newlines) at the beginning of a line, you could try:
=REGEXEXTRACT(A1;"(?s)^(.*)\n(\*\n?){2,}")
But you would have to strip the stars. E.g.
=REGEXREPLACE(REGEXEXTRACT(A1;"(?s)^(.*)\n(\*\n?){2,}"); "\n(\*\n?){2,}"; "")
A lookahead does not seem to work in Google Sheets.
I'm trying to match untill the first occurence of ] is found but can't seem to make it work, if someone could help me figure this out.
The string I'm matching against:
[plugin:tabs][tab title="test"]Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam sit amet nisl nisl. Ut interdum libero vitae quam ultricies et lacinia elit aliquet. Praesent tincidunt, sem tempus feugiat feugiat, turpis tellus scelerisque erat, sit amet feugiat neque arcu ac lectus. Sed at mi et elit interdum scelerisque vitae eu felis.[/tab][/plugin]
What it should match:
[plugin:tabs]
What it keeps matching:
[plugin:tabs][tab title="test"]
The regex:
(\[plugin:(?<identifier>[^\s]+)(?<parameters>.*?)\])
EDIT:
What it should also match:
[plugin:tabs test="test"]
You just need to add ? like so (lazy match, will match as few characters as possible):
(\[plugin:(?<identifier>[^\s]+?)(?<parameters>.*?)\])
^
Although the (?<parameters>.*?) part is unnecessary then.
So your final Regex would look like this:
(\[plugin:(?<identifier>[^\s]+?)\])
€dit: See #stema's answer.
Try this here
(\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
See it here on Regexr
This avoids additionally to the whitespace characters also the ] character in the first named group.
If you don't need the first capturing group you can make it a non-capturing group by adding ?: right after the opening bracket.
(?:\[plugin:(?<identifier>[^\]\s]+)(?<parameters>.*?)\])
To avoid that the space in between is captured by the second group, just match optional whitespace between the two groups
(?:\[plugin:(?<identifier>[^\]\s]+)\s*(?<parameters>.*?)\])
See it here on Regexr
With any language that supports lookbehinds that will be your easiest solution.
/(^(?<!])*)/
I'm trying to find a way to make an array of matched patterns out of a string.
I'll explain myself with an example.
From a string like
Lorem ipsum dolor **sit** amet, consectetur adipiscing elit.
Nulla elementum euismod mi. Morbi eu eros eget augue vestibulum semper.
Curabitur sapien purus, **semper** in consequat eu, gravida vitae purus.
I need to apply a regexp to extract the words sit and semper
and I really don't know how to manage it.
I would think a regex such as \*{2}(.*?)\*{2} would take care of it, and using regular expressions in Objective-C (assuming you're on an Apple platform) you'd want to look at the NSRegularExpression iOS or Mac documentation.
You can do it like this..
\s*{2}([^\*]+)\s*{2}