Regex to match multiple string conditions - regex

I have a need to capture two different string types returned by a query. The first string has data that needs to be trimmed off while the second string is just text.
Review / Sign Changes (Doe,John Howard - 555-00-5555)
City & State for Current Visit
I tried
(?\<Group1\>(?:(.) ())|.\*
(?\<Group1\>\*.\[a-zA-Z\] )|(.\*)
I expected:
Review / Sign Changes
City & State for Current Visit
I'm not strong in Regex but I try :) Any help would be appreciated.

This regex:
^[^(]+
Online demo
The regular expression matches as follows:
Node
Explanation
^
the beginning of the string
[^(]+
any character except: ( (1 or more times (matching the most amount possible))

If the formatting of the first string is consistent, you can skip regex and just split the string by two spaces and take the first part. For the second string, I see no changes at all so you shouldn't need a regex there.

Related

Regex check for name Initials

I am trying to create a regex that checks if one or more middle-name initials have the following stucture:
INITIAL.[BLANK]INITIAL.[BLANK]INITIAL.
There can be multiple Initials as long as they are followed by a dot (.) - blank spaces are only allowed between two initials (e.g. L. B.)
It should not be possible to have a space after an initial if there's no other initial following.
At the moment, I have the following Regex which doesn't work perfectly as of now:
([A-Z]\. (?=[A-Z]|$))+
Using regex101, this is an example:
As you can see, it still matches the string even though there's a blank space at the end, without having another Initial following.
I am not sure why this is happening. I am just learning regex and would be glad if anyone could provide me with a solution to my problem :)
The error you're seeing is because at the last step, your expression reads in [A-Z]\. looks ahead for $ (and finds it). I would express the pattern this way: (?:[A-Z]\. )*[A-Z]\.$. Treat the last initial specially because it does not have a final space.
The pattern you tried ([A-Z]\. (?=[A-Z]|$))+ uses a repeated capturing group which will give you the value of the last iteration.
In that repetition you match a space <code>[A-Z]\. </code> effectively meaning that it should be present in the match.
You could repeat 0+ times matching a char [A-Z] followed by a space to match multiple occurrences.
Then match a char [A-Z] asserting what is on the right is not a non whitespace char.
\b(?:[A-Z]\. )*[A-Z]\.(?!\S)
Regex demo
If there can be multiple spaces but it should not match a newline:
\b(?:[A-Z]\.[^\S\r\n]*)*[A-Z]\.(?!\S)
Regex demo

I need a regex result that does not include the substring at the beginning and end of the matched pattern

I have a pattern I need to match that's always a date "_YYYYMMDD.". However, I don't want to include the "_" and the "." in the result. I have a regex pattern that successfully match above. Its too complicated to include here because I would have to write by hand and would mess it up.
Suffice it to say I have a pattern:
[_](lots of stuff in the middle)[.]
It works fine but I don't want to include the "_" and "."
Any answers are greatly appreciated. Thanks!
For matching underscore and dot with the pattern and not including it in the full matching text, you will need to use lookarounds in the regex pattern. Following regex will match date preceded by _ and followed by .
(?<=_)\d{8}(?=\.)
Regex Demo
Additionally, if you want to capture the year, month and date part into their own capture groups, you can use this regex and capture year part from group1, month from group2 and date from group3,
(?<=_)(\d{4})(\d{2})(\d{2})(?=\.)
Demo with different parts of date into their own groups
Easiest way would be to slice the first and last characters off the result. You can do it either by string length:
result="${result:1:${#result}-2}"
(or result="${result:1:8}" since the length will be constant)
Or by specific character:
result="${result#_}"
result="${result%.}"

I want a regex code that accepts only a list of characters that are seperated by a comma or a space

So my problem is that i have a text field and i want the user to type a list of days only and to not accept any other word for example :
monday tuesday saturday
or monday,tuesday,saturday
this is what i wrote
"\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\b"
but this didn't work i don't know why i'm a regex beginner and i need some help, thank you guys.
^((monday|tuesday|wednesday|thursday|friday|saturday|sunday)[, ])*(monday|tuesday|wednesday|thursday|friday|saturday|sunday)$
The ^ will anchor the pattern to match the start of the value, and the $ anchors at the end of the value. The combination of those two means the pattern will only match if the entire value matches. Without the anchors, the pattern would match anything which contains the pattern.
The pattern is saying that it must be zero or more dayname-followed-by-space-or-comma, followed by a dayname.
In your pattern the last pipe | of the alternation should be a closing parenthesis to close the group and you are not taking a comma or a space into account.
\b(monday|tuesday|wednesday|thursday|friday|saturday|sunday|\b
^
If you are not referring to the capturing groups in your code or tool, you could make them non capturing using (?: instead of (
You might update your pattern to use anchors ^ and $ to assert the start and the end of the string. Then match 1 day and repeat 0+ times matching another day prededing with a comma or a space.
^(?:mon|tues|wednes|thurs|fri|satur|sun)day(?:[, ](?:mon|tues|wednes|thurs|fri|satur|sun)day)*$
Regex demo
If you want to allow only the specified formats and for example not monday tuesday,saturday using a space AND a comma you could capture the space or comma the first time and then make use of a backreference using \1:
^(?:mon|tues|wednes|thurs|fri|satur|sun)day(?:([, ])(?:mon|tues|wednes|thurs|fri|satur|sun)day)?(?:\1(?:mon|tues|wednes|thurs|fri|satur|sun)day)*$
Regex demo

Need help constructing a regex

I need to write a regex which matches strings representing comma separated days of week, like:
"Sun,Mon,Tue,Wed,Thu,Fri,Sat"
Each day can appear in the string at most once. The order of days is important.
So far I have tried the following patterns:
1) (Sun,|Mon,|Tue,|Wed,|Thu,|Fri,|Sat,)*(Sun|Mon|Tue|Wed|Thu|Fri|Sat)
This one is very bad: allows multiple presence of days, also doesn't watch over the days order.
2) (Sun)?([,^]Mon)?([,^]Tue)?([,^]Wed)?([,^]Thu)?([,^]Fri)?([,^]Sat)?
This is the best I got so far. The only problem here is that it matches strings starting with comma, e.g. ,Mon,Tue,Fri. My question is how to filter out the comma starting string matching this pattern.
Thanks in advance.
Agreed that regex is possibly not the best option. However, if the only problem with your current version is that it matches strings beginning with a comma, you could just bung a check for a starting comma at the beginning of the regex:
(?!,)(Sun)?([,^]Mon)?([,^]Tue)?([,^]Wed)?([,^]Thu)?([,^]Fri)?([,^]Sat)?
However, I don't think [,^] does what you think it does - in the regex flavours I'm familiar with, ^ inside square brackets matches a literal ^ when it's not the first character in the list - it doesn't match the beginning of the string. You could replace it with (^|,):
(?!,)(Sun)?((^|,)Mon)?((^|,)Tue)?((^|,)Wed)?((^|,)Thu)?((^|,)Fri)?((^|,)Sat)?
This is a bit complicated, but it fulfills all of your specifications. Maybe regex isn't the best solution for this...
^(Sun(,(?=.)|$))?(Mon(,(?=.)|$))?(Tue(,(?=.)|$))?(Wed(,(?=.)|$))?(Thu(,(?=.)|$))?(Fri(,(?=.)|$))?(Sat)?$
As a verbose regex:
^ # start of string
( # Try to match...
Sun # Sun
( # followed by either
, # a comma
(?=.) # but only if more text follows
| # or
$ # end of string
)
)? # make it optional.
(Mon(,(?=.)|$))? # same for Mon-Fri
(Tue(,(?=.)|$))?
(Wed(,(?=.)|$))?
(Thu(,(?=.)|$))?
(Fri(,(?=.)|$))?
(Sat)? # never a comma after Sat
$ # end of string
Another option is a creative use of word boundaries:
^\b(?:Sun)?,?\b(?:Mon)?,?\b(?:Tue)?,?\b(?:Wed)?,?\b(?:Thu)?,?\b(?:Fri)?,?\b(?:Sat)?$
Or, if you don't care about capturing each day, you can simplify that a little further:
^\b(Sun)?,?\b(Mon)?,?\b(Tue)?,?\b(Wed)?,?\b(Thu)?,?\b(Fri)?,?\b(Sat)?$
\b only matches between a word character and a non-word character. In this case, between a day and a comma or the edge of the string (start or end).
The word boundaries make sure each comma is surrounded by letters: it will never match a comma near the edge of the string. Similarly, it will never match between two days if the comma isn't there, as in SunMon.
Example: http://rubular.com/r/mTCU0ZWtMm

Extracting movie name and year from string were year is optional

I'm missing a really obvious thing here, but I'm new to regex so be kind ;-)
I have a number of films in an arbitrary format that may or may not have the year attached.
My Movie Name 2010
Some.Other.Super.Cool.Movie
The~Third|Movie.2010
Now, using (.+)\W(\d{4}) I can extract the two movies with dates into two groups one containing the name and the other the year, but the middle one gets ignored? I'm just a little unsure on how to actually make the year segment optional.
Ideally, ;-), I could use a single expression to return the names with \W converted into spaces but that a different conversation.
Thanks in advance
using a ? after the a character group will make it optional so in your case after the (\d{4})
(.+)\W(\d{4})?
That is because you are using greedy matching on (.+) and \W includes the new line character in it's set ( I think it does at least ). Strip your string of trailing whitespace and if that doesn't work make (.+) lazy with a ? of it's own, (.+?) - Also consider that \W may be the wrong delimiter for this problem.
Also adding $ to the end may help, as that would require the digits to end the function is they can, try lazing matching and $.
(.+?)\W(\d{4})?$
? Makes it optional
(.+?)\W?(\d{4})?$