I've got a long google sheets QUERY, part of which is this:
=QUERY(LOOKUP!$A$4:$H,"Select count(B) where UPPER(D) matches 'OK' and UPPER(H) matches '.*(?:^|,|,\s)"®EXEXTRACT(REGEXREPLACE($Q3,"\s|-","")," \w+ ")&"(?:,\s|,|$).*' and (UPPER(C) contains '"®EXEXTRACT($Q3, "\{(\w+)\}")&"' or UPPER(F) contains '"®EXEXTRACT($Q3, "\{(\w+)\}")&"') limit 1 label count(B) ''",0)
Basically if I have an entry like apple {pear}, I only want the apple bit to be matched as part of the query. This works absolutely fine except if I put an & in the bit to match eg. apple&banana {pear}the match fails even though apple&pearis definetely present in the lookup so I think the issue is with my RegEx. I've tried just replacing \w+ selector at the seperated spot in in the RegEx with .* above but no luck.
Any help would be much appreciated
It would make more sense to replace \w+ with \w+(?:&\w+)*.
The \w+(?:&\w+)* pattern matches
\w+ - 1 or more word chars, letters, digits or _
(?:&\w+)* - matches 0 or more occurrences of:
& - a & char
\w+ - 1 or more word chars, letters, digits or _
If you do not want to match _, use
[A-Za-z0-9]+(?:&[A-Za-z0-9]+)*
Note that [A-Za-z0-9&]+ can also be used if you do not care if there are consectuvie & chars in the input (and want to match it).
Related
Have used an online regex learning site (regexr) and created something that works but with my very limited experience with regex creation, I could do with some help/advice.
In IIS10 logs, there is a list for time, date... but I am only interested in the cs(User-Agent) field.
My Regex:
(scan\-\d+)(?:\w)+\.shadowserver\.org
which matches these:
scan-02.shadowserver.org
scan-15n.shadowserver.org
scan-42o.shadowserver.org
scan-42j.shadowserver.org
scan-42b.shadowserver.org
scan-47m.shadowserver.org
scan-47a.shadowserver.org
scan-47c.shadowserver.org
scan-42a.shadowserver.org
scan-42n.shadowserver.org
scan-42o.shadowserver.org
but what I would like it to do is:
Match a single number with the option of capturing more than one: scan-2 or scan-02 with an optional letter: scan-2j or scan-02f
Append the rest of the User Agent: .shadowserver.org to the regex.
I will then add it to an existing URL Rewrite rule (as a condition) to abort the request.
Any advice/help would be very much appreciated
Tried:
To write a regex for IIS10 to block requests from a certain user-agent
Expected:
It to work on single numbers as well as double/triple numbers with or without a letter.
(scan\-\d+)(?:\w)+\.shadowserver\.org
Input Text:
scan-2.shadowserver.org
scan-02.shadowserver.org
scan-2j.shadowserver.org
scan-02j.shadowserver.org
scan-17w.shadowserver.org
scan-101p.shadowserver.org
UPDATE:
I eventually came up with this:
scan\-[0-9]+[a-z]{0,1}\.shadowserver\.org
This is explanation of your regex pattern if you only want the solution, then go directly to the end.
(scan\-\d+)(?:\w)+
(scan\-\d+) Group1: match the word scan followed by a literal -, you escaped the hyphen with a \, but if you keep it without escaping it also means a literal - in this case, so you don't have to escape it here, the - followed by \d+ which means one more digit from 0-9 there must be at least one digit, then the value inside the group will be saved inside the first capturing group.
(?:\w)+ non-capturing group, \w one character which is equal to [A-Za-z0-9_], but the the plus + sign after the non-capturing group (?:\w)+, means match the whole group one or more times, the group contains only \w which means it will match one or more word character, note the non-capturing group here is redundant and we can use \w+ directly in this case.
Taking two examples:
The first example: scan-02.shadowserver.org
(scan\-\d+)(?:\w)+
scan will match the word scan in scan-02 and the \- will match the hyphen after scan scan-, the \d+ which means match one or more digit at first it will match the 02 after scan- and the value would be scan-02, then the (?:\w)+ part, the plus + means match one or more word character, at least match one, it will try to match the period . but it will fail, because the period . is not a word character, at this point, do you think it is over ? No , the regex engine will return back to the previous \d+, and this time it will only match the 0 in scan-02, and the value scan-0 will be saved inside the first capturing group, then the (?:\w)+ part will match the 2 in scan-02, but why the engine returns back to \d+ ? this is because you used the + sign after \d+, (?:\w)+ which means match at least one digit, and one word character respectively, so it will try to do what it is asked to do literally.
The second example: scan-2.shadowserver.org
(scan\-\d+)(?:\w)+
(scan\-\d+) will match scan-2, (?:\w)+ will try to match the period after scan-2 but it fails and this is the important point here, then it will go back to the beginning of the string scan-2.shadowserver.org and try to match (scan\-\d+) again but starting from the character c in the string , so s in (scan\-\d+) faild to match c, and it will continue trying, at the end it will fail.
Simple solution:
(scan-\d+[a-z]?)\.shadowserver\.org
Explanation
(scan-\d+[a-z]?), Group1: will capture the word scan, followed by a literal -, followed by \d+ one or more digits, followed by an optional small letter [a-z]? the ? make the [a-z] part optional, if not used, then the [a-z] means that there must be only one small letter.
See regex demo
A colleague has written some C# code that outputs GUIDs to a CSV file. The code has been running for a while but it has been discovered that the GUIDs contain underscore characters, instead of hyphens :-(
There are several files which have been produced already and rather than regenerate these, I'm thinking that we could use the Search and Replace facility in Notepad++ to search across the files for "GUIDs" in this format:
{89695C16_C0FF_4E7C_9BB2_8B50FAC9D371}
and replace it with a properly formatted GUID like this:
{89695C16-C0FF-4E7C-9BB2-8B50FAC9D371}.
I have a RegEx to find the offending GUIDs (probably not very efficient):
(([A-Z]|[0-9]){8}_)(([A-Z]|[0-9]){4})_(([A-Z]|[0-9]){4})_(([A-Z]|[0-9]){4}_(([A-Z]|[0-9]){12}))
but I don't know what RegEx to use to replace the underscores with. Does anybody know how to do this?
You can use the following solution:
Find What: (?:\G(?!\A)|{(?=[a-f\d]{8}(?:_[a-f\d]{4}){4}[a-f\d]{8}\}))[a-f\d]*\K_
Replace with: -
Match case: OFF
See the settings and demo:
See the regex demo online. Details:
(?:\G(?!\A)|{(?=[a-f\d]{8}(?:_[a-f\d]{4}){4}[a-f\d]{8}\})) - either the end of the previous match or a { char immediately followed with eight alphanumeric chars, four repetitions of an underscore and then four alphanumeric chars and then eight alphanumeric chars and a } char
[a-f\d]* - zero or more alphanumeric chars
\K - match reset operator that discards the text matched so far from the overall match memory buffer
_ - an underscore.
You can match the pattern with 5 capture groups where you would match the underscores in between.
Then you can use the capture groups in the replacement with $1-$2-$3-$4-$5
{\K([A-Z0-9]{8})_([A-Z0-9]{4})_([A-Z0-9]{4})_([A-Z0-9]{4})_([A-Z0-9]{12})(?=})
{ Match {
\K Clear the match buffer (forget what is matched so far)
([A-Z0-9]{8})_ Capture group 1, match 8 times a char A-Z0-9
([A-Z0-9]{4})_ Capture 4 times a char A-Z0-9 in group 2
([A-Z0-9]{4})_ Same for group 3
([A-Z0-9]{4})_ Same for group 4
([A-Z0-9]{12}) Capture 12 times a char A-Z0-9 in group 5
(?=}) Positive lookahead, assert } to the right
Regex demo
If the pattern should also match without matching the curly's { and } you can append word boundaries
\b([A-Z0-9]{8})_([A-Z0-9]{4})_([A-Z0-9]{4})_([A-Z0-9]{4})_([A-Z0-9]{12})\b
Regex demo
I want to extract [games, games, things, things] from
the following array.
Today_games
Today_games_freq
Today_things
Today_things_freq
I have tried Today_(\w+)(?=_freq)?
Which will give me the extra "freq"
And some other combinations, but I couldn't figure out how to get just after the first hyphen.
You can use
Today_(\w+?)(?:_freq)?$
See the regex demo. This matches Today_, then captures any one or more word chars (as few as possible) into Group 1 (with (\w+?)), and then (?:_freq)?$ matches an optional occurrence of a _freq substring and asserts the position at the end of string.
Or,
Today_([^\W_]+)
See this regex demo.
Here, Today_ is matched and the ([^\W_]+) pattern captures one or more alphanumeric chars into Group 1 (same as \w+ with _ subtracted from \w).
I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.
I have a spreadsheet of tweets and want to isolate username mentions in Google Sheets. Somehow, regexps that work in R or other languages are not doing the job there.
An example:
RT #Neromoto: #cazainfractor inconsciente agresiva y poco ciudadana conductora
Desired output:
#Neromoto
#cazainfractor
I have tried this: REGEXEXTRACT(B1,(^|[^#\w])#(\w{1,15})\b).
First of all, your (^|[^#\w])#(\w{1,15})\b regex pattern must be put inside a string literal, i.e. double quotes. Then note that every capturing group will be output, you may want to make the first group non-capturing by replacing ( with (?:. Also, the last \b is redundant, after the last \w matched, there will be either the end of string, or the non-word char.
I'd rather suggest
=REGEXEXTRACT(B1,"\B#\w{1,15}")
Or
=REGEXREPLACE(B1,"(\B#\w{1,15})\s*|.","$1 ")
Details:
\B - a non-word boundary (that is, before #, there can be either start of string or a non-word char)
# - a # char
\w{1,15} - 1 to 15 word chars (if you do not care about the length, replace {1,15} with +)
And the second regex details:
(\B#\w{1,15})\s* - Group 1 capturing # at the non-word boundary position, 1 to 15 word chars and then 0+ whitespaces (in the replacement, the $1 backreference inserts the found mentions back into the resulting string)
| - or
. - any 1 char.