How can I match everything between 2 commas? - regex

I want to match basically any text that has a comma separated list of weekdays.
(?i)(every (mon|tue|wed|thu|fri|sat|sun)[A-Za-z]{3,5}, .*+,
(mon|tue|wed|thu|fri|sat|sun)[A-Za-z]{3,5})
Above is what what I have and I want to make it match the following strings. I don't need help in the case that only 2 weekdays are supplied.
Every mon, tue, wednesday
Every wed, Saturday, Friday, sun.

Try pattern: (?<=,|^)[^,\n]+
Explanation
(?<=,|^) - positive lookbehind: assert what preceeds is comma , or beginning of the string ^
[^,\n]+ - match one or more characters other than comma , or newline \n
Demo

You might list the abbreviations and optionally match the full name by listing them using an alternation followed by a comma and a space.
Add that to a group and repeat that 0+ times. After that add the group without a comma to make sure you match at least a single day.
(?i)\bevery (?:(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?), )*(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?)\b
Explanation
(?i)\bevery Case insensitive modifier
(?: No capturing group
(?:mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)?), Match any of the listed followed by a comma and space
)* Close non capturing group and repeat 0+ times
(?: Non capturing group
mon(?:day)?|tue(?:sday)?|wed(?:nesday)?|thu(?:rsday)?|fri(?:day)?|sat(?:urday)?|sun(?:day)? Match any of the listed
)\b Close non capturing group and add a word boundary to prevent being part of a larger word
Regex demo
To not match only multiple days, you could update the * quantifier for the first non capturing groupe to for example + or {2,}.

Related

Regex: Replace certain part of the matched characters

I want to be able to match with a certain condition, and keep certain parts of it. For example:
June 2021 9 Feature Article Three-Suiters Via Puppets Kai-Ching Lin
should turn into:
Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin
So, everything until the end of the word Article should be matched; then, only the first three characters of the months is kept, as well as the year, and this part is going to replace the matched characters.
My strong regex knowledge got me as far as:
.+Article(?)
You could use 2 capture groups and use those in a replacement:
\b([A-Z][a-z]+)[a-z](\s+\d{4})\b.*?\bArticle\b
\b A word boundary to prevent a partial word match
([A-Z][a-z]+) Capture group 1, match a single uppercase char and 1+ lowercase chars
[a-z] Match a single char a-z
(\s+\d{4})\b Capture group 2, match 1+ whitspace chars and 4 digits followed by a word boundary
.*?\bArticle\b Match as least as possible chars until Article
Regex demo
The replaced value will be
Jun 2021 Three-Suiters Via Puppets Kai-Ching Lin
You could use positive lookbehinds:
(?<=^[A-Z][a-z]{2})[a-z]*|(?<=\d{4}).*Article
(?<=^[A-Z][a-z]{2}) - behind me is the start of a line and 3 chars; presumably the first three chars of the month
[a-z]* - optionally, capture the rest of the month
| - or
(?<=\d{4}) - behind me is 4 digits; presumably a year
.*Article - capture everything leading up to and including "Article"
https://regex101.com/r/fbYdpH/1

What is the proper regex for capturing everything after "String" and between two delimeters ('=' and and non alphanumeric))

Details={
AwsEc2SecurityGroup={GroupName=m.com-rds, OwnerId=123, VpcId=vpc-123,
IpPermissions=[{FromPort=3306, ToPort=3306, IpProtocol=tcp, IpRanges=[{CidrIp=1.1.1.1/32}, {CidrIp=2.2.2.2/32}, {CidrIp=0.0.0.0/0}, {CidrIp=3.3.3.3/32}],
UserIdGroupPairs=[{UserId=123, GroupId=sg-123abc}]}], IpPermissionsEgress=[{IpProtocol=-1, IpRanges=[{CidrIp=0.0.0.0/0}]}], GroupId=sg-123abc}},
Region=us-east-1, Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc}]
}
I want to capture exactly arn:aws:ec2:us-east-1:123:security-group/sg-123abc in this example. Generically, I want to capture the value of Id regardless of placement. My current solution is /Details={.*Id=(.*\w)/, but this only works if it's the last object in the data. How can I take into account the following potential scenario:
Id=arn:aws:ec2:us-east-1:123:security-group/sg-123abc, Thing=123abc}]
You have a pattern with 2 times .* which will first match till the end of the line/string (depending on if the dot matches a newline) and it will backtrack to match the last occurrence where this part of the pattern Id=(.*\w) can match.
If you want to use a capture group, you can make the format and the allowed characters a bit more specific:
\bId=(\w+(?:[:\/-]\w+)+)
The pattern in parts
\b A word boundary to prevent a partial word match
Id= Match literally
( Capture group 1
\w+ Match 1+ word chars
(?:[:\/-]\w+)+ Repeat 1+ times either : / - and 1+ word chars
) Close group 1
Regex demo
Or if you know that it starts with Id=arn:
\bId=(arn:[\w:\/-]+)
Regex demo
Note that you don't have to escape the \/ only when the delimiters of the regex are forward slashes, but there is no language tagged.
You can use look-behind to check that there is the Id= prefix, and then match anything that is not a space, comma or closing brace:
(?<=\bId=)[^,}\s]*

Regex to pull first two fields from a comma separated file

I want to pull the second string in a commma delimited list where the first value is numeric and the second is alpha.
I'm using \d[^,]+(?=,) to pull the numeric value in the first field and just need help with pulling the second value from the "Name" column.
Here's part of a sample file that I'm trying to extract data from:
Address Number,Name,Employee Master Exist(Y/N),Auto-Deposit Exists(Y/N),Supplier Master Exists(Y/N),Supplier Master Created,ACH Account Exists(Y/N),ACH Account Created,ACH Same as Auto-deposit(Y/N)
//line break here is for clarity and does not exist in file//
4398,Presley Elvis Aaron,Y,N,Y,N,Y,N,N
10154,Shepard Alan Barrett,Y,Y,Y,N,Y,N,N
You could make use of a capturing group if you want to match the second string by first matching 1+ digits and a comma.
Then capture in a group matching 1+ chars a-zA-Z and match the trailing comma.
^\d+,([a-zA-Z]+(?: [a-zA-Z]+)*),
^ Start of string
\d+, Match 1+ digits and a comma (Or use (\d+), if the digits should also be a group)
( Capture group 1
[a-zA-Z]+ Match 1+ chars a-zA-Z
(?: [a-zA-Z]+)* Repeat matching the same as previous preceded by a space
), Close capturing group and match trailing comma
Regex demo
To get a bit broader match you could use this pattern to match at least a single char a-zA-Z
\d+,([a-zA-Z ]*[a-zA-Z][a-zA-Z ]*),
Regex demo
Note that this part in your pattern \d[^,]+ matches not only digits, but 1 digit followed by 1+ times any char except a comma which would for example also match 4a$ .
You could try this regex:
^\d+,([^,]+),
This will look for lines:
starting with one or more digits
followed by a comma
capture anything that is not a comma
followed by a comma
See it at Regex 101
If not all lines contain a name, then change the + to a *:
^\d+,([^,]*),
See alternative regex

Regex (PCRE) exclude certain words from match result

I need to get only the string with names that is in Bold:
author={Trainor, Sarah F and Calef, Monika and Natcher, David and Chapin, F Stuart and McGuire, A David and Huntington, Orville and Duffy, Paul and Rupp, T Scott and DeWilde, La'Ona and Kwart, Mary and others},
Is there a way to skip all 'and' 'others' words from match result?
Tried to do lots of things, but nothing works as i expect
(?<=\{).+?(?<=and\s).+(?=\})
Instead of using omission, you could be better off by implementing rules which expect a specific format in order to match the examples you've provided:
([A-Z]+[A-Za-z]*('[A-Za-z]+)*, [A-Z]? ?[A-Z]+[A-Za-z]*('[A-Za-z]+)*( [A-Z])?)
https://regex101.com/r/9LGqn3/3
You could make use of \G and a capturing group to get you the matches.
The values are in capturing group 1.
(?:author={|\G(?!^))([^\s,]+,(?:\h+[^\s,]+)+)\h+and\h+(?=[^{}]*\})
About the pattern
(?: Non capturing group
author={ Match literally
| Or
\G(?!^) Assert position at the end of previous match, not at the start
) Close non capturing group
( Capture group 1
[^\s,]+, Match not a whitespace char or comma, then match a comma
(?:\h+[^\s,]+)+ Repeat 1+ times matching 1+ horizontal whitespace chars followed by matching any char except a whitespace char and a comma
) Close group 1
\h+and\h+ Match and between 1+ horizontal whitespaces
(?=[^{}]*\}) Assert what is on the right is a closing }
Regex demo

Capturing groups and money symbol in Regex

I am trying to write a regular expression that takes a string and parses it into three different capturing groups:
$3.99 APP DOWNLOAD – 200 11/19 – 1/21 3.99
Group 1: $3.99 APP DOWNLOAD – 200
Group 2: 11/29 – 1/28
Group 3: 3.99
Does anyone have any ideas???
I do not have much experience with capturing groups and do not know how to create them.
i.e. I believe this expression would work for identifying the dates?
/(\d{2}\/\d{2})/
Any help would be greatly appreciated!
Regex:
([$]\d+[.]\d{2}.*?)\s*(\d{1,2}/\d{2}.*?\d{1,2}/\d{2})\s(\d+[.]\d{2})
So with this we have 3 capture groups (()) separated by \s* which means 0+ characters of whitespace (this isn't necessary, but it will remove trailing spaces from your captured groups).
The first capture group [$]\d+[.]\d{2}.*? matches a dollar sign, followed by 1+ digits, followed by a period, followed by 2 digits, followed by a lazy match of 0+ characters (.*?). What this lazy match does is match anything up until the next match in our expression (in this case, our next capture group).
Our second capture group \d{1,2}/\d{2}.*?\d{1,2}/\d{2} matches 1-2 digits, a slash, and 2 digits. Then we use another lazy match of any characters followed by another date.
Our final capture group \d+[.]\d{2} looks for 1+ digits, a period, and 2 more digits.
Note: I used ~ as delimiters so that we do not need to escape our / in the dates. Also, I put $ and . in character classes because I think it looks cleaner than escaping them ([$] vs \$)..either works though :)