C# Regex Expression to extract field name and values in SQL Condition - regex

Consider following 2 SQL conditions.
1.) AssetView.[PROPTYPE] NOT IN ('B15/30','SFD','SFA')
2.) AssetView.[FICO] IN (500,600,700)
I want to break this SQL using RegEx so that I can have table name, field name, function type and field values into 4 different parts.
e.g.
Table Name - AssetView
Field Name - PROPTYPE
Function - NOT IN
Field Values (Together or separate): B15/30, SFD, SFA
Here is the regex I tried (https://rubular.com/r/WGiyz0oGrooyiA) but I am not able to split TableName, Field Name and Function type into its own group.
(.*?)[^=]['(]+(.*?)[')]

In your pattern (.*?)[^=]['(]+(.*?)[')] you make use of a character classes ['(] and [')] which match any of the listed and can also first match an opening ' and then a closing )
For your example data, you might use:
(\w+)\.\[(\w+)\] +(\w+(?: \w+)*) +\(([^)\n]+)\)
(\w+) Capture 1+ word chars in group 1
\. Match a dot
\[(\w+)\] + Capture 1+ word chars between square brackets in group 2 and 1+ spaces
(\w+(?: \w+)*) + Capture 1+ word chars followed by repeating 0+ times matching a space and 1+ word chars in group 3 and 1+ spaces
\(([^)\n]+)\) Capture 1+ times not a closing parenthesis or newline between parenthesis in group 4
Rubular regex | .NET regex (click on the Table tab)
If you want to allow more characters to match than \w you could extend that using a character class.
For example if you also want to allow a hyphen and a space use [\w-]+ or if you want to match all between the brackets you could make use of a negating character class, for example \[([^\]]+)\]

Related

Regex groups for dash delimited filename in URL

I have a URL that is structured like so: <domain>/<subdirectory>/<filename>-<semantic_version>-<hash>.<filetype>
For example, it could look like: https://cdn.example.com/sample_files/some_file-1.2.3-56857cfc709d3996f057252c16ec4656f5292802.css
So far I have the following regex which gives me the entire filename. However, I'd like to individually get the filename, semantic_version, and hash as defined above. You can assume that the filename will not has dashes in the name.
([^/\\&\?]+)$(?<=(?:.js))
You could match the protocol and then until the last forward slash.
After that, capture 1+ word chars in group 1 for the file name, a repeating part in group 2 to capture digits divided by dots and in the third group a character class which would match all the characters in the hash.
^http\S+\/(\w+)-(\d+(?:\.\d+)+)-([0-9a-f]+)\.\w+$
Explanation
^ Start of string
http\S+\/ Match the protocol followed by 1+ non whitespace chars, then backtrack till the last /
(\w+)- Capture group 1, match 1+ word chars followed by -
(\d+(?:\.\d+)+)- Capture group 2, match digits divided by dots followed by -
([0-9a-f]+)\.\w+ Capture group 3, match 1+ times the chars from the hash followed by . and 1+ word chars
$ End of string
Regex demo
If the hash always has 40 characters, you could match [a-z0-9]{40} instead of [a-z]+ to be a bit more precise.
Use multiple capture groups that don't match - characters.
([^-/\\&\?]+)-([^-/\\&\?]+)-([^-/\\&\?]+)\.[a-z]+$(?<=(?:.js))

Regex to pull first two fields from a comma separated file

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex

Match names joined with a delimiter except last

Let's suppose we have, in a text file, many rows containing each one multiple names joined with ";" delimiter except last name (which doesn't end with it).
We can use the following regex :
^(\w+;)+$ // Not good
The previous regex won't work because it forces last name, hence the whole row to end with a ";" also
You could add matching a single \w+ after it. If you don't need the capturing group, you might make it non capturing.
This way you are repeating matching word characters followed by a ; and end the match with word characters.
^(?:\w+;)+\w+$
Explanation
^ Start of string
(?: Non capturing group
\w+; Match 1+ word chars followed by ;
)+ Close non capturing group and repeat 1+ times
\w+ Match 1+ word chars
$ End of string
Regex demo
If a single word should also match, you could repeat the group 0+ times using * instead of +
^(?:\w+;)*\w+$
Regex demo

Regex Extract a string between two words containing a particular string

I have the below string
abc-12d-ef-oy-5678-xyz--**--20190120075439322am--**--ghi-66d-ef-oy-8877-sdf--**--sfdfdsgfg--**--20190120075765487am
It is kind of multi character delimited string, delimited by '--**--' I am trying to extract the first and second words which has the -oy- tag in it. This is a column in a table. I am using the regex_extract method but i am not able extract the string which contains a string and ends with a string.
Here is one pattern that i tried .*(.*oy.*)--
If the -oy- can not be at the start or at the end, you could use this pattern to match the 2 hyphen delimited strings with -oy-:
[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+
Regex details
[a-z0-9]+ Match 1+ times a-z0-9
(?: Non capturing group
-[a-z0-9]+ Match - and 1+ times a-z0-9
)* Close group and repeat 0+ times
-oy Match literally
(?:-[a-z0-9]+)+ Repeat 1+ times a group which will match - and 1+ times a-z0-9
You can extend the character class [A-Za-z0-9] to allow what you want to match like uppercase chars.
Regex demo | Java demo
If the matches should be between delimiters, you could use a positive lookbehind and positive lookahead and an alternation:
(?<=^|--\\*\\*--)[a-z0-9]+(?:-[a-z0-9]+)*-oy(?:-[a-z0-9]+)+(?=--\\*\\*--|$)
See a Java demo
You can use this regex which will match string containing -oy- and capture them in group1 and group2.
^.*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*).*?(\w+(?:-\w+)*-oy-\w+(?:-\w+)*)
This regex basically matches two strings delimiter separated containing -oy- using this (\w+(?:-\w+)*-oy-\w+(?:-\w+)*) to capture the text.
Demo
Are you able to select values from capture groups?
(?:--\*\*--|^)(.*?-oy-.*?)(?:--\*\*--|$)
?: - Non-capture group, matches the delimiter, begin of line, or end of line but does not create a capture group
*? - Lazy match so you only grab the contents of the field
https://regex101.com/r/aUAvcx/1
--- Second stab at this follows ---
This is convoluted. Hopefully you can use Lookahead and Lookbehind. The last problem I had was the final record was being "Greedy" and sucking up the field before it too. So I had to add an exclusion in the capture group for your delimiter.
See if this works for you.
(?<=--\*\*--|^)((?:(?:(?!--\*\*--).)*)-oy-(?:(?:(?!--\*\*--).)*))(?=--\*\*--|$)
https://regex101.com/r/aUAvcx/3
Basically the (?: are so we are not getting too many capture groups to work with.
There are three parts to this:
The lookbehind - Make sure the field is framed by the delimiter (or start of line)
The capture group - Grab the contents of the field, making sure a delimiter isn't sucked up into it
The lookahead - Make sure the field is framed by the delimiter (or end of line)
As far as the capture group goes, I check the left and right side of the -oy- to make sure the delimiter isn't there.

Regex capture group ( ) within character set [ ]

I would like to match space characters () only if they are followed by a hash (#).
This is what ( #) below is trying to do, which is a capture group. (I tried escaping the brackets, otherwise the brackets are not recognised properly within a group set). However, this is not working.
The below regex
/#[a-zA-Z\( #\)]+/g
matches all of the below
#CincoDeMayo #Derby party with UNLIMITED #seafood towers
while I would like to match #CincoDeMayo #Derby and separately #seafood
Is there any way to specify captures groups () within a character set []?
Character classes are meant to match a single character, thus, it is not possible to define a character sequence inside a character class.
I think you want to match specific consecutive hashtags. Use
/#[a-zA-Z]+(?: +#[a-zA-Z]+)*/g
or
/#[a-zA-Z]+(?:\s+#[a-zA-Z]+)*/g
See the regex demo.
Details
#[a-zA-Z]+ - a # followed with 1+ ASCII letters
(?: - start of a non-capturing group...
\s+ - 1+ whitespaces
#[a-zA-Z]+ - a # followed with 1+ ASCII letters
)* - ... that repeats 0 or more times.