Need help constructing a regex - regex

I need to write a regex which matches strings representing comma separated days of week, like:
"Sun,Mon,Tue,Wed,Thu,Fri,Sat"
Each day can appear in the string at most once. The order of days is important.
So far I have tried the following patterns:
1) (Sun,|Mon,|Tue,|Wed,|Thu,|Fri,|Sat,)*(Sun|Mon|Tue|Wed|Thu|Fri|Sat)
This one is very bad: allows multiple presence of days, also doesn't watch over the days order.
2) (Sun)?([,^]Mon)?([,^]Tue)?([,^]Wed)?([,^]Thu)?([,^]Fri)?([,^]Sat)?
This is the best I got so far. The only problem here is that it matches strings starting with comma, e.g. ,Mon,Tue,Fri. My question is how to filter out the comma starting string matching this pattern.
Thanks in advance.

Agreed that regex is possibly not the best option. However, if the only problem with your current version is that it matches strings beginning with a comma, you could just bung a check for a starting comma at the beginning of the regex:
(?!,)(Sun)?([,^]Mon)?([,^]Tue)?([,^]Wed)?([,^]Thu)?([,^]Fri)?([,^]Sat)?
However, I don't think [,^] does what you think it does - in the regex flavours I'm familiar with, ^ inside square brackets matches a literal ^ when it's not the first character in the list - it doesn't match the beginning of the string. You could replace it with (^|,):
(?!,)(Sun)?((^|,)Mon)?((^|,)Tue)?((^|,)Wed)?((^|,)Thu)?((^|,)Fri)?((^|,)Sat)?

This is a bit complicated, but it fulfills all of your specifications. Maybe regex isn't the best solution for this...
^(Sun(,(?=.)|$))?(Mon(,(?=.)|$))?(Tue(,(?=.)|$))?(Wed(,(?=.)|$))?(Thu(,(?=.)|$))?(Fri(,(?=.)|$))?(Sat)?$
As a verbose regex:
^ # start of string
( # Try to match...
Sun # Sun
( # followed by either
, # a comma
(?=.) # but only if more text follows
| # or
$ # end of string
)
)? # make it optional.
(Mon(,(?=.)|$))? # same for Mon-Fri
(Tue(,(?=.)|$))?
(Wed(,(?=.)|$))?
(Thu(,(?=.)|$))?
(Fri(,(?=.)|$))?
(Sat)? # never a comma after Sat
$ # end of string

Another option is a creative use of word boundaries:
^\b(?:Sun)?,?\b(?:Mon)?,?\b(?:Tue)?,?\b(?:Wed)?,?\b(?:Thu)?,?\b(?:Fri)?,?\b(?:Sat)?$
Or, if you don't care about capturing each day, you can simplify that a little further:
^\b(Sun)?,?\b(Mon)?,?\b(Tue)?,?\b(Wed)?,?\b(Thu)?,?\b(Fri)?,?\b(Sat)?$
\b only matches between a word character and a non-word character. In this case, between a day and a comma or the edge of the string (start or end).
The word boundaries make sure each comma is surrounded by letters: it will never match a comma near the edge of the string. Similarly, it will never match between two days if the comma isn't there, as in SunMon.
Example: http://rubular.com/r/mTCU0ZWtMm

Related

How to allow spaces in between words?

EDIT: I've been experimenting, and it seems like putting this:
\(\w{1,12}\s*\)$
works, however, it only allows space at the end of the word.
example,
Matches
(stuff )
(stuff )
Does not
(st uff)
Regexp:
\(\w{1,12}\)
This matches the following:
(stuff)
But not:
(stu ff)
I want to be able to match spaces too.
I've tried putting \s but it just broke the whole thing, nothing would match. I saw one post on here that said to enclose the whole thing in a ^[]*$ with space in there. That only made the regex match everything.
This is for Google Forms validation if that helps. I'm completely new to regex, so go easy on me. I looked up my problem but could not find anything that worked with my regex. (Is it because of the parenthesis?)
For matching text like (st uff) or (st uff some more) you will need to write your regex like this,
\(\w{1,12}(?:\s+\w{1,12})*\)
Regex explanation:
\( - Literal start parenthesis
\w{1,12} - Match a word of length 1 to 12 like you wanted
(?:\s+\w{1,12})* - You need this pattern so it can match one or more space followed by a word of length 1 to 12 and whole of this pattern to repeat zero or more times
\) - Literal closing parenthesis
Demo
Now if you want to optionally also allow spaces just after starting parenthesis and ending parenthesis, you can just place \s* in the regex like this,
\(\s*\w{1,12}(?:\s+\w{1,12})*\s*\)
^^^ ^^^
Demo with optional spaces
If you are trying to get 12 characters between parentheses:
\([^\)]{1,12}\)
The [^\)] segment is a character class that represents all characters that aren't closing parentheses (^ inverts the class).
If you want some specific characters, like alphanumeric and spaces, group that into the character class instead:
\([\w ]{1,12}\)
Or
\([\w\s]{1,12}\)
If you want 12 word characters with an arbitrary number of spaces anywhere in between:
\(\s*(?:\w\s*){1,12}\)

Regex with start and end match

I'm having trouble matching the start and end of a regex on Python.
Essentially I'm confused about the when to use word boundaries /b and start/end anchors ^ $
My regex of
^[A-Z]{2}\d{2}
matches 4 letter characters (two uppercase letters, two digits) which is what I'm after
Matches AJ99, RD22, CP44 etc
However, I also noted that AJAJAJAJAJAJAJAJAJSJHS99 could be matched as well. I've tried used ^ and $ together to match the whole string. This doesn't work
^[A-Z]{2}\d{2}$ # this doesn't work
but
^[A-Z]{2}\d{2} # this is fine
[A-Z]{2}\d{2}$ # this is fine
The string I'm matching against is 4 characters long, but in the first two examples the regex could pick the start and end of a longer string respectively.
s = "NZ43" # 4 characters, match perfect! However....
s = "AM27272727" # matches the first example
s = "HAHSHSHSHDS57" # matches the second example
The position anchors ^ and $ place a restriction on the position of your matched chars:
Analyzing your complete regex:
^[A-Z]{2}\d{2}$
^ matches only at the beginning of the text
[A-Z]{2} exactly 2 uppercase Ascii alphabetic characters
\d{2} exactly 2 digits (equivalent to [0-9]{2})
$ matches only at the end of the text
If you remove one or both of the 2 position anchors (^ or $) you can match a substring starting from the beginning or the end as you stated above.
If you want to match exactly a word without using the start/end of the string use the \b anchor, like this:
``\b[A-Z]{2}\d{2}\b``
\b matches at the start/end of text and between a regex word (in regex a word char \w is intended as one of [a-zA-Z0-9_]) and one char not in the word group (available as \W).
The regex above matches WS24 in all the next strings:
WS24 alone
before WS24
WS24 after
before WS24 after
NZ43
It doesn't match:
AM27272727 (it will do if is AM27 272727 or AM27"272727
HAHSHSHSHDS57 (it will do if HAHSHSHSH DS75 or...you get it)
A demo online (the site will be useful to you also to experiment with regex).
The fact that your shown behaviour is like it's supposed to be, your question suggests that you maybe does not have fully understood how regular expressions work.
As a addition to the very good and informative answer of GsusRecovery, here's a site, that guides you through the concepts of regular expressions and tries to teach you the basics with a lessons-based system. To be clear, I do not want to tout this website, as there are plenty of those, but however I could really made a use of this one and so it's the one I'm suggesting.

regex match till a character from a second occurance of a different character

My question is pretty similar to this question and the answer is almost fine. Only I need a regexp not only for character-to-character but for a second occurance of a character till a character.
My purpose is to get password from uri, example:
http://mylogin:mypassword#mywebpage.com
So in fact I need space from the second ":" till "#".
You could give the following regex a go:
(?<=:)[^:]+?(?=#)
It matches any consecutive string not containing any : character, prefixed by a : and suffixed by a #.
Depending on your flavour of regex you might need something like:
:([^:]+?)#
Which doesn't use lookarounds, this includes the : and # in the match, but the password will be in the first capturing group.
The ? makes it lazy in case there should be any # characters in the actual url string, and as such it is optional. Please note that that this will match any character between : and # even newlines and so on.
Here's an easy one that does not need look-aheads or look-behinds:
.*:.*:([^#]+)#
Explanation:
.*:.*: matches everything up to (and including) the second colon (:)
([^#]+) matches the longest possible series of non-# characters
# - matches the # character.
If you run this regex, the first capturing group (the expression between parentheses) will contain the password.
Here it is in action: http://regex101.com/r/fT6rI0

How to use regex for field validation on whole string?

I've been working for many hours trying to do a "simple thing": use a regex to validate a text field.
I need to make sure of:
1- Only use (a-z), (A-Z) and (0-9) values
2- Add a SINGLE wildcard only at the end.
Ex.
Match
MICHE*
Match
JAMES
No match
MICHE**
No match
MIC_HEAL*
I have this regex till now:
[a-zA-Z0-9\s-]+.\z*?
The problem is it still matches when I introduce an invalid character as long as I have a matching sub-string See my REGEX
What can I do to force a match on the whole string? What am I missing?
Thx!
Use ^ (start of line) and $ (end of line) to only match the whole string:
^[a-zA-Z0-9\s-]+.\z*?$
(If you have a multiline input you can also use \A and \z - start and end of string)
On a second look, I don't understand the end of your regex: . (anything) \z * ? (end of string, zero or more times, zero or one time). This regex will match something like:
Ikdflfdf&
Is that correct? If you only want the character *, you should use:
^[a-zA-Z0-9\s-]+\*?$
Also, as Robbie pointed out, you're including spaces and the - in your list of accepted characters. If you only want letters and digits, a shortcut would be using \w (word characters):
^\w+\*$
However, depending on whether the matcher is Unicode-aware or not, \w will also match non-ASCII letters and digits, which may or may not be what you want.
Try this one :
^[a-zA-Z0-9]+\*?$
^ string start
$ string end
* is meta character so it should be escaped like \* to use it as a letter
I think you just need ^ at the begining and $ at the end
^[a-zA-Z0-9\s-]+.\*?$
Also, you don't need the \z
Also, you haven't mentioned that you want to allow spaces and dashes - but you have included them in your allowed character set.

Regex matching beginning AND end strings

This seems like it should be trivial, but I'm not so good with regular expressions, and this doesn't seem to be easy to Google.
I need a regex that starts with the string 'dbo.' and ends with the string '_fn'
So far as I am concerned, I don't care what characters are in between these two strings, so long as the beginning and end are correct.
This is to match functions in a SQL server database.
For example:
dbo.functionName_fn - Match
dbo._fn_functionName - No Match
dbo.functionName_fn_blah - No Match
If you're searching for hits within a larger text, you don't want to use ^ and $ as some other responders have said; those match the beginning and end of the text. Try this instead:
\bdbo\.\w+_fn\b
\b is a word boundary: it matches a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. This regex will find what you're looking for in any of these strings:
dbo.functionName_fn
foo dbo.functionName_fn bar
(dbo.functionName_fn)
...but not in this one:
foodbo.functionName_fnbar
\w+ matches one or more "word characters" (letters, digits, or _). If you need something more inclusive, you can try \S+ (one or more non-whitespace characters) or .+? (one or more of any characters except linefeeds, non-greedily). The non-greedy +? prevents it from accidentally matching something like dbo.func1_fn dbo.func2_fn as if it were just one hit.
^dbo\..*_fn$
This should work you.
Well, the simple regex is this:
/^dbo\..*_fn$/
It would be better, however, to use the string manipulation functionality of whatever programming language you're using to slice off the first four and the last three characters of the string and check whether they're what you want.
\bdbo\..*fn
I was looking through a ton of java code for a specific library: car.csclh.server.isr.businesslogic.TypePlatform (although I only knew car and Platform at the time). Unfortunately, none of the other suggestions here worked for me, so I figured I'd post this.
Here's the regex I used to find it:
\bcar\..*Platform
Scanner scanner = new Scanner(System.in);
String part = scanner.nextLine();
String line = scanner.nextLine();
String temp = "\\b" + part + "|" + part + "\\b";
Pattern pattern = Pattern.compile(temp.toLowerCase());
Matcher matcher = pattern.matcher(line.toLowerCase());
System.out.println(matcher.find() ? "YES" : "NO");
If you need to determine if any of the words of this text start or end with the sequence, you can use this regex: \bsubstring|substring\b:
anythingsubstring
substringanything
anythingsubstringanything
The simplest thing that you can do is:
dbo.*_fn$
It searches with dbo, followed by any characters, and then ends with _fn.
If you can identify what’s the right next character after n if it’s space, you can replace $ with space .