Match the word "bar" if found anywhere in a field - regex

I am trying to use a CASE statement in Google Data Studio to return a Boolean result if a given string is found within an existing field.
As Google Data Studio uses RE2 RegEx syntax, I believe the following would work, but it returns a could not parse formula error:
CASE
WHEN REGEXP_MATCH(Foo, '(\W|^)bar(\W|$)') THEN 1
ELSE 0
END
I have tried many different combinations of RegEx syntax, but can't work it out. Any help would be much appreciated as this should be a simple REGEXP_MATCH?
The Boolean result should be true if the string is found anywhere within the field:
+---------------------------+----------------+
| Foo | Boolean Result |
+---------------------------+----------------+
| blah bar / boo doo | True |
| but is / should not match | False |
| but match / here bar | True |
+---------------------------+----------------+

You need to make sure you match the whole string with the pattern that you want to use in a REGEXP_MATCH and when using regex escapes, make sure to double escape them:
CASE WHEN REGEXP_MATCH(Foo, '(.*\\W|^)bar(\\W.*|$)') THEN 1 ELSE 0 END
If there are line breaks in Foo, add (?s) at the start of the pattern.
Details
(.*\\W|^) - either any 0+ chars as many as possible followed with a non-word char or start of a string
bar - the word
(\\W.*|$) - either a non-word char followed with any 0+ chars as many as possible or end of a string
See the regex demo.

A Boolean field can be created using the single REGEXP_MATCH Calculated Field below, where \\b on either side of bar represents a Word Boundary thus matching bar but not bark, embark or embar:
REGEXP_MATCH(Foo, ".*(\\bbar\\b).*")
Google Data Studio Report and a GIF to elaborate:

Related

Regex pattern for mm/dd/yyyy and mmddyyyy in Scala

I have date in my .txt file which comes like either of the below:
mmddyyyy
OR
mm/dd/yyyy
Below is the regex which works fine for mm/dd/yyyy.
^02\/(?:[01]\d|2\d)\/(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)\/(?:[0-2]\d|3[01])\/(?:19|20)\d{2}|(?:0[469]|11)\/(?:[0-2]\d|30)\/(?:19|20)\d{2}|02\/(?:[0-1]\d|2[0-8])\/(?:19|20)\d{2}$
However, unable to build the regex for mmddyyyy. I just want to understand is there any generic regex that would work for both cases?
Why use regex for this? Seems like a case of "Now you have two problems"
It would be more effective (and easier to understand) to use a DateTimeFormatter (assuming you are on the JVM and not using scala-js)
The format patterns support using [] to surround optional sections, such as the /, and the formatters inherently perform input validation so if you plug in a month or day that can't exist, it'll throw an exception.
import java.time.format.DateTimeFormatter
import java.time.LocalDate
val mdy = DateTimeFormatter.ofPattern("MM[/]dd[/]yyyy")
def parse(rawDate: String) = LocalDate.parse(rawDate, mdy)
scala> parse("12252022")
res7: java.time.LocalDate = 2022-12-25
scala> parse("12/25/2022")
res8: java.time.LocalDate = 2022-12-25
scala> parse("25/12/2022")
java.time.format.DateTimeParseException: Text '25/12/2022' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 25
scala> parse("abc123")
java.time.format.DateTimeParseException: Text 'abc123' could not be parsed at index 0
If you want to match all those variations with either 2 forward slashes or only digits, you can use a positive lookahead to assert either only digits or 2 forward slashes surrounded by digits.
Then in the pattern itself you can make matching the / optional.
Note that you don't have to escape the \/
^(?=\d+(?:/\d+/\d+)?$)(?:02/?(?:[01]\d|2\d)/?(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)/?(?:[0-2]\d|3[01])/?(?:19|20)\d{2}|(?:0[469]|11)/?(?:[0-2]\d|30)/?(?:19|20)\d{2}|02/?(?:[0-1]\d|2[0-8])\?(?:19|20)\d{2})$
Regex demo
Another option is to write an alternation | matching the same pattern without the / in it.
First of all, there is a tiny shortcoming in your regex: the ^ anchor only applies to the first part of your regex, not to the other alternatives that are separated by |. Similarly the final $ applies only to the final alternative. You should put all alternatives in a non-capturing group, like ^(?: | | | )$
Then for the question itself, you could make the forward slash that follows the month optional and put it in a capture group. Then what comes between the day and the year could be a backreference to that capture group. So (\/?) and \1.
^(?:02(\/?)(?:[01]\d|2\d)\1(?:19|20)(?:0[048]|[13579][26]|[2468][048])|(?:0[13578]|10|12)(\/?)(?:[0-2]\d|3[01])\2(?:19|20)\d{2}|(?:0[469]|11)(\/?)(?:[0-2]\d|30)\3(?:19|20)\d{2}|02(\/?)(?:[0-1]\d|2[0-8])\4(?:19|20)\d{2})$

Regex to select NOT and operand

I am trying to break a string to array using Regex in C# .
I have for example the string
{([Field] = '100' OR [LaneDescription] LIKE '%DENTINPALEUW%'
OR [LaneDescription] = 'asdf' OR ([ObjectID] = 1) AND [ITEM_HEIGHT] >=
10 AND [SENDER_COMPANY] NOT LIKE '%DHL%'}
(Generated from Telerik RadFilter)
and i need it broken so i can pass it to a custom object with types: open parenthesis, field, comparator , value, close parenthesis.
So far and with the help of http://regexr.com i have reached to
\[([^\[\]]*)\]+|[\w'%]+|[()=]
but i need to get the '>=' and 'NOT LIKE' as one (and similar values like <> != etc..)
You can see my late night attempts at http://regexr.com/39g6b
Any help would be much appreciated.
(PS: There are no newline characters at the string)
Try
\(|\)|\[[a-zA-Z0-9_]+\]|'.*?'|\d+|NOT LIKE|\w+|[=><!]+
Demo.
Explanation:
\( // match "(" literally
| // or
\) // ")"
| // or
\[[a-zA-Z0-9_]+\] // any words inside square braces []
|
'.*?' // strings enclosed in single quotes '' (escape sequences can easily trip this up though)
|
\d+ // digits
|
NOT LIKE // "NOT LIKE", because this is the only token that can contain whitespace
|
\w+ // words like "NOT", "AND", etc
|
[=><!]+ // operators like ">", "!=", etc

Confusion in regex pattern for search

Learning regex in bash, i am trying to fetch all lines which ends with .com
Initially i did :
cat patternNpara.txt | egrep "^[[:alnum:]]+(.com)$"
why : +matches one or more occurrences, so placing it after alnum should fetch the occurrence of any digit,word or signs but apparently, this logic is failing....
Then i did this : (purely hit-and-try, not applying any logic really...) and it worked
cat patternNpara.txt | egrep "^[[:alnum:]].+(.com)$"
whats confusing me : . matches only single occurrence, then, how am i getting the output...i mean how is it really matching the pattern???
Question : whats the difference between [[:alnum:]]+ and [[:alnum:]].+ (this one has . in it) in the above matching pattern and how its working???
PS : i am looking for a possible explanation...not, try it this way thing... :)
Some test lines for the file patternNpara.txt which are fetched as output!
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
Looking at your screenshot it seems you're trying to match email address that has # character also which is not included in your regex. You can use this regex:
egrep "[#[:alnum:]]+(\.com)" patternNpara.txt
DIfference between 2 regex:
[[:alnum:]] matches only [a-zA-Z0-9]. If you have # or , then you need to include them in character class as well.
Your 2nd case is including .+ pattern which means 1 or more matches of ANY CHARACTER
If you want to match any lines that end with '.com', you should use
egrep ".*\.com$" file.txt
To match all the following lines
valid email = abc#abc.com
invalid email = ab#abccom
another invalid = abc#.com
1 : abc,s,11#gmail.com
2: abc.s.11#gmail.com
^[[:alnum:]].+(.com)$ will work, but ^[[:alnum:]]+(.com)$ will not. Here is the reasons:
^[[:alnum:]].+(.com)$ means to match strings that start with a a-zA-Z or 0-9, flows two or more any characters, and end with a 'com' (not '.com').
^[[:alnum:]]+(.com)$ means to match strings that start with one or more a-zA-Z or 0-9, flows one character that could be anything, and end with a 'com' (not '.com').
Try this (with "positive-lookahead") :
.+(?=\.com)
Demo :
http://regexr.com?38bo0

Regex to get password from a long string of mess

I am using power-shell and am getting the below output from my program.
I am having problems getting the password from the mess of other things. Ideally i need to get Hiva!!66 by itself. I am using reg-ex to accomplish this and its just not working. the password will always be 8 characters have an upper and a lowercase and a special character. I have created the split and everything else i need but the reg-ex part is messing with me.
I am away that there are a lot of questions around reg-ex and passwords but those don't seem to have a lot of mess before and after it.Any help would be appreciated.
My best attempt so far is:
"(?=.*\d)(?=.*[A-Z])(?=.*[!##\$%\^&\*\~()_\+\-={}\[\]\\:;`"'<>,./]).{8}$"
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\CONNECTEXP.VCB:5:For intTmp = 1 To 4
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\CONNECTEXP.VCB:8:cboCOMPort.SelectString 1, "1"
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\CONNECTEXP.VCB:11:str2CRLF = Chr(13) & Chr(10) & Chr(13) & Chr(10)
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\CONNECTEXP.VCB:14: & "include emulation type (currently Tandem), the I/O method (currently Async) and host connection information
for the session (currently COM9, 8N1)" _
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\CONNECTEXP.VCB:15: & " to the correct values for your target host (e.g., TCP/IP and host IP name or address) and save the
IOSet "CHARSIZE", "8"
PASS="Hiva!!66" If DDEAppReturnCode() <> 0 Then
If DDEAppReturnCode() <> 0 Then
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\DDEtoXL.vcb:28: MsgBox "Could not load " & txtWorkSheet.text, 48
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\DDEtoXL.vcb:37:DDESheetChan = -1
C:\Users\<username>\AppData\Roaming\Crystal Point\OutsideView\Macro\DDEtoXL.vcb:38:DDESystemChan = -2
If you can't count on the quotes or the PASS= being there, you'll have to rely on the password's composition to do everything. The following regex matches a string of eight consecutive characters of the allowed types, with the lookahead and lookbehind to make sure there aren't more than eight.
$regex = [regex] #'
(?x)
(?<![!##$%^&*~()_+\-={}\[\]\\:;`<>,./A-Za-z0-9])
(?:
[!##$%^&*~()_+\-={}\[\]\\:;`<>,./]()
|
[A-Z]()
|
[a-z]()
|
[0-9]()
){8}
\1\2\3\4
(?![!##$%^&*~()_+\-={}\[\]\\:;`<>,./A-Za-z0-9])
'#
It also verifies that there's at least one of each character type: uppercase letter, lowercase letter, digit and special. The lookahead approach used in your regex won't work because it can look too far ahead, beyond the end of the word you're trying to match. Instead, I put an empty group in each branch to act like check boxes. If a backreference to one of those groups fails, it means that branch didn't participate in the match, meaning in turn that the associated character type was not present.
Did you try the following regex:
^PASS="(.{8})"
?
Just use this
(?<=PASS=").+(?=")
You can extract the password from that output with something like this:
... | ? { $_ -cmatch 'PASS="(.{8})"' | % { $matches[1] }
or like this (in PowerShell v3):
... | Select-String -Case 'PASS="(.{8})"' | % { $_.Matches.Groups[1].Value }
In PowerShell v2 you'll have to do something like this if you want to use Select-String:
... | Select-String -Case 'PASS="(.{8})"' | select -Expand Matches |
select -Expand Groups | select -Last 1 | % { $_.Value }

extract a variable value from the middle of a string

I have been trying to figure out for quite sometime. how do I get the PID value from the following string using powershell? I thought REGEX was the way to go but I can't quite figure out the syntax.
For what it is worth everything except for the PID will remain the same.
$foo = <VALUE>I am just a string and the string is the thing. PID:25973. After this do that and blah blah.</VALUE>
I have tried the following in regex
[regex]::Matches($foo, 'PID:.*') | % {$_.Captures[0].Groups[1].value}
[regex]::Matches($foo, 'PID:*?>') | % {$_.Captures[0].Groups[1].value}
[regex]::Matches($foo, 'PID:*?>') | % {$_.Captures[0].Groups[1].value}
[regex]::Matches($foo, 'PID:*?>(.+).') | % {$_.Captures[0].Groups[1].value}
For your regex you'll want to indicate what's before and after the portion you're looking for. PID:.* will find everything from the PID to the end of the string.
And to use a capture group you'll want to have some ( and ) in your regex, which defines a group.
So try this on for size:
[regex]::Matches($foo,'PID:(\d+)') | % {$_.Captures[0].Groups[1].value}
I'm using a regex of PID:(\d+). The \d+ means "one or more digits". The parentheses around that (\d+) identifies it as a group I can access using Captures[0].Groups[1].
Here's another option. Basically it replaces everything with the first capture group (which is the digits after 'pid:':
$foo -replace '^.+PID:(\d+).+$','$1'