How To Ignore removing Some Words As Exceptions to Regex removing Sublist of Word in Google Sheets? - regex

UPDATE:
I just learned the Golang flavor is the right one for re2 on the regex101 site. I also checked the Github syntax page for re2 and found that:
(?!re) before text not matching re (NOT SUPPORTED)
I tested with the Golang flavor on regex101 site and confirmed the negative look-ahead is causing the error:
So next my updated question became:
Would there be an alternative to the negative look-ahead function available for re2?
I found those 2 SO questions on this topic:
PCRE to RE2 regex conversion with negative lookahead
Negative look-ahead in Go regular expressions
And the third comment of the 2nd question seems to provide a working solution hint with regex example here:
https://regex101.com/r/aM5oU3/4
Using this regex:
BBB[^B]*?EEE
I adapted it ot my case as this:
(?i)\b([^(or|be)])\b(?i)\b(my|be|or)\b
Now it returns better but still not yet as expected on the minimal example with the updated formula:
=ArrayFormula(IF(REGEXMATCH(trim(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2")),"(?i)\b(my|be|or)\b"),
trim(regexreplace(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2"),"\b[^(or|my)]\b(?i)\b(my|be|or)\b", " ")),
trim(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2"))))
With correct expected result would be as:
Testing Match
Or Match
And Be
(I noticed now the prior expected output was partially wrong as it wouldn't need returninng the lower case "be" or "or" occurences as per the replace function specification (sorry for that, I didn't catch it prior as with all the new concepts and logic acrobatics I got attention dissipated on that).)
But with a Be as input in D2 it still returns it, when it would be expected it not returning the Be.
Any suggestion on solving this new issue?
Sample Sheet Update
I have this formula that removes some words with the corresponding regex:
=ArrayFormula(IF(REGEXMATCH(trim(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2")),"(?i)\b(my|be|or)\b"),
trim(regexreplace(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2"),"\b(?i)(?!or|be)\b(?i)\b(my|be|or)\b", " ")),
trim(regexreplace(regexreplace(D2:D,"(?:([A-Z]([a-z]){1,}))|.", " $1"), "(\s)([A-Z])","$1 $2"))))
=ArrayFormula(
IF(
REGEXMATCH(
trim(
regexreplace(
regexreplace(
D2:D,
"(?:([A-Z]([a-z]){1,}))|.",
" $1"
),
"(\s)([A-Z])",
"$1 $2"
)
),
"(?i)\b(my|be|or)\b"
),
trim(
regexreplace(
regexreplace(
regexreplace(
D2:D,
"(?:([A-Z]([a-z]){1,}))|.",
" $1"
),
"(\s)([A-Z])",
"$1 $2"
),
"\b(?i)(?!or|be)\b(?i)\b(my|be|or)\b",
" "
)
),
trim(
regexreplace(
regexreplace(
D2:D,
"(?:([A-Z]([a-z]){1,}))|.",
" $1"
),
"(\s)([A-Z])",
"$1 $2"
)
)
)
)
I need it to allow for some exceptions of or|be, on this minimal testing sample:
Testing for my Match.
Or for or Match.
And Be for be.
The expected result should be:
Testing Match
Or or
Be be
I tried this "\b(?i)(?!or|be)\b(?i)\b(my|be|or)\b" but it's not working in Google Sheets, while it is in this tester: https://regex101.com/r/Liw6hg/1.
Screenshots:
I also looked at those other answers but could not adapt them succesfully:
A regular expression to exclude a word/string
Convert regular expression into re2 that works in Google Spreadsheets?
How to exclude a specific string constant?
Any solution tip is greatly appreciated.
Sample Sheet

Related

Google Sheets REGEXTRACT between two quotes

I'm trying to extract data between two quotes using the Google Sheets REGEXEXTRACT function.
The regex works perfect:
(?<=actor_email":")(.*?)(?=")
Data in the cell is:
{"account_name":"Test","actor_email":"test#test.com","user_email":"anyone#test.com"}
However, placing it within the Google Sheet gives an error.
Been trying a number of combinations with no luck.
Tried using: (?<=actor_email""":""")(.*?)(?=""")
The output should be: test#test.com
You may use
=REGEXEXTRACT(A1, "actor_email"":""([^""]+)""")
The pattern is actor_email":"([^"]+)":
actor_email":" - a literal substring
([^"]+) - Capturing group 1 (the value extracted): any 1+ chars other than "
" - a " char (may be removed if this " can be missing)
or eliminate quotes like:
=REGEXEXTRACT(SUBSTITUTE(A1, """", ), "actor_email:(.+),user_")
=REGEXEXTRACT(SUBSTITUTE(A1, """", " "), "actor_email : ([^ ]+)")

Jupyter Notebook's search and replace not as greedy as regex101's javascript

I have a number of logger.*() functions I want to convert to simple print() statements in a Jupyter Notebook. I already changed the beginning of the lines: logger.*(. Now I need to fix the tail and change ", e to " % (e:
print("(%s):\n"
" Failed to load logger" % (e, ))
logger.error("(%s):\n"
" Validation Testing errors occurred. '%s'",
e, report_file)
logger.critical("(%s):\n"
" Failed to return Parsed Report "
" in debug mode.", e)
logger.critical("(%s):\n"
" Error loading template.", e)
Using Regex101 to test my javascript Regex, I wrote
print\("[\s\S.]*(", e)
But in Jupyter's find and replace, this only captures up to print("(%s)\n".
The search and replace preview shows only a single line. Nonetheless, regex replace works across multiple lines. Your sample string can be replaced as suggested:
print\("[\s\S]*?", e\)
While the dialog shows 0 matches the replacement works anyway:
Note: I've modified your search pattern. The modified dot should match lazy, [\s\S]*? to avoid matching too much. Also, I removed the capture group, it looks like you do not need it.
Update: As it turned out the capture group needed to be inverse to replace the string in question (kudos to xtian):
Search:
(print\("[\s\S.]*)", e
Replace:
$1 " % (e,

Regex find incorrect "

i am trying to build an regex to find wrong " in an csv file:
for example
,"Nori",,,,,896282962,23.07.2013,,,,"Lady Love "Karo","w",
The " before Karo is wrong, but there can be multiple " inside a ,"", column.
So every ," and ", is correct but an " with leading or following char and no , before or after the char is incorrect.
Can anyone help me find the correct regex pattern?
Regards.
You can use the following to match:
(?<!,|^)"(?!,|$)
See RegEX DEMO
Explanation:
(?<!,|^) : Negative lookbehind to check for , or start of the string
" : Match quote
(?!,|$) : Negative lookahead to check for , or end of the string

Regular expression for extracting excerpt from long String

I want to extract excerpt from a long string using Regular expression
Example string: "" Is it possible that Germany, which beat Argentina 1-0 today to win the World Cup, that will end up as a loser in terms of economic growth? ""
String to search: " that "
Expected result from regex
" possible that Germany "
" rd Cup, that will end "
I want to search the desired text from the string with -9 and +9 characters from the forward and the backward of the occurence of the searched string. Search string can occur multiple times within the given string.
I am working on an iOS app
using iOS 7.
I have so far created this expression with my little knowledge about reguler expressions but not able to get desired result from that
" (.){0,9} (that) {0,9} "
Remove the spaces in your regex. If you want to capture the matched ones. Then enclose the pattern within capturing groups (ie, ()),
.{9}that.{9}
OR
(?:.{9}|.{0,9})that(?:.{9}|.{0,9})
DEMO
Make the preceding and following characters as optional to match the line which looks like that will change history
Well, in your expression you were just missing the second "." and maybe the "?" for spaces.
.{0,9} ?that ?.{0,9}
Try that.
You can add ( ) for making groups if you want. I added the "?" to make it comply with your other example:
" that will change history"

Regex Valid Twitter Mention

I'm trying to find a regex that matches if a Tweet it's a true mention. To be a mention, the string can't start with "#" and can't contain "RT" (case insensitive) and "#" must start the word.
In the examples I commented the desired output
Some examples:
function search($strings, $regexp) {
$regexp;
foreach ($strings as $string) {
echo "Sentence: \"$string\" <- " .
(preg_match($regexp, $string) ? "MATCH" : "NO MATCH") . "\n";
}
}
$strings = array(
"Hi #peter, I like your car ", // <- MATCH
"#peter I don't think so!", //<- NO MATCH: the string it's starting with # it's a reply
"Helo!! :# how are you!", // NO MATCH <- it's not a word, we need #(word)
"Yes #peter i'll eat them this evening! RT #peter: hey #you, do you want your pancakes?", // <- NO MATCH "RT/rt" on the string , it's a RT
"Helo!! ineed#aser.com how are you!", //<- NO MATCH, it doesn't start with #
"#peter is the best friend you could imagine. RT #juliet: #you do you know if #peter it's awesome?" // <- NO MATCH starting with # it's a reply and RT
);
echo "Example 1:\n";
search($strings, "/(?:[[:space:]]|^)#/i");
Current output:
Example 1:
Sentence: "Hi #peter, I like your car " <- MATCH
Sentence: "#peter I don't think so!" <- MATCH
Sentence: "Helo!! :# how are you!" <- NO MATCH
Sentence: "Yes #peter i'll eat them this evening! RT #peter: hey #you, do you want your pancakes?" <- MATCH
Sentence: "Helo!! ineed#aser.com how are you!" <- MATCH
Sentence: "#peter is the best friend you could imagine. RT #juliet: #you do you know if #peter it's awesome?" <- MATCH
EDIT:
I need it in regex beacause it can be used on MySQL and anothers
languages too. Im am not looking for any username. I only want to know
if the string it's a mention or not.
This regexp might work a bit better: /\B\#([\w\-]+)/gim
Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/96/
Here's a regex that should work:
/^(?!.*\bRT\b)(?:.+\s)?#\w+/i
Explanation:
/^ //start of the string
(?!.*\bRT\b) //Verify that rt is not in the string.
(?:.*\s)? //Find optional chars and whitespace the
//Note: (?: ) makes the group non-capturing.
#\w+ //Find # followed by one or more word chars.
/i //Make it case insensitive.
I have found that this is the best way to find mentions inside of a string in javascript. I don't know exactly how i would do the RT's but I think this might help with part of the problem.
var str = "#jpotts18 what is up man? Are you hanging out with #kyle_clegg";
var pattern = /#[A-Za-z0-9_-]*/g;
str.match(pattern);
["#jpotts18", "#kyle_clegg"]
I guess something like this will do it:
^(?!.*?RT\s).+\s#\w+
Roughly translated to:
At the beginning of string, look ahead to see that RT\s is not present, then find one or more of characters followed by a # and at least one letter, digit or underscore.
Twitter has published the regex they use in their twitter-text library. They have other language versions posted as well on GitHub.
A simple but works correctly even if the scraping tool has appended some special characters sometimes: (?<![\w])#[\S]*\b. This worked for me