I'm trying to write a regex which matches the first 3 lines below (the rest are tests cases which I do NOT want to catch)
Sample text for testing:
10:00:00+10:00/mon,thu
10:00:00+10:00/mon-thu
10:00:00+10:00/mon
10:00:00+10:00/monday-thu
10:00:00+10:00/mon-thursday
10:00:00+10:00/mon,,,thu
10:00:00+10:00/mon,
10:00:00+10:00/mon+thu
10:00:00+10:00/monthu
10:00:00+10:00/
21:00:00+10:00\sat-sun
So far I have come up with
[0-9]{2}[:][0-9]{2}[:][0-9]{2}[+][0-9]{2}[:][0-9]{2}([/][a-z]{3}){1}([,-][a-z]{3})?
but as you can see it makes the matches I want but it also includes cases where there are trailing characters which I do not want and when there are trailing characters it should not be a match.
Add $ to the end of the regexp. This matches the end of the line, so it will prevent matches if there's anything after it.
You should also put ^ at the beginning so it doesn't match if there's anything before the time.
Related
I have the following regular expression:
(=)(?<!\\\\)(')(.*?)(?<!\\\\)(')(.*?)
Which should match an equal sign followed by any set of characters between single quotes and then anything that comes after.
But when I test it with the sample text ='abc'xyz, it only matches ='abc'.
I also tested the code here: https://regexr.com/61gof
Any ideas as to why that is?
The ? makes the last (.*?) to match lazily, so matching as few character as possible, which will be 0. Remove the ? or put a $ at the end of the regex telling it it should match until the end of the line (if that is what you want).
For example, I want to exclude 'fitting', 'hollow', 'trillion'
but not 'hello' or 'pattern'
I already got the following to work
(.)(.)\2\1
which matches 'hollow' or 'fitting', but I have trouble negating this.
the closest thing I get is
^.(?!(.)(.)\2\1)
which excludes 'fitting' and 'hollow' but not 'trillion'
It's a little different from what you have. Your current regex will check for the pallindromicity (?) as of the second character. Since you want to check the whole string, you need to change it a little to:
^(?!.*(.)(.)\2\1)
The first anchor will ensure that the check is made only at the beginning (otherwise, the regex can claim a match at the end of the string).
Then the .* within the negative lookahead will enable the check to be done anywhere within the string. If there's any match, fail the entire match.
It doesn't match with trillion because you added ^. means it must have a character before the match from beginning. For your first two cases it has h and f character. So if you change this into ^..(?!(.)(.)\2\1) then it will work for trillion.
So in general the regex will be:
(?!.*(.)(.)\2\1)
^^ any number of characters(other than \n)
I'm having trouble doing simple things with regex in dot net.
Suppose I want to find all lines that contain the word "pizza". I would think I would do the following:
^ .* pizza .* $
The idea is the first character indicates the start of a line, the dollar sign indicates the end of the line, and the dot-star indicates any number of characters.
This doesn't seem to work.
Then I tried something else that doesn't work either. I thought I would find all routines in my visual basic project that start with "Sub Page_Load" and end with "End Sub". I did a search for:
Sub Page_Load .* End Sub
But this found pretty much EVERY subroutine in the project.
In other words, it didn't limit itself to the Page_Load sub.
So I thought I'd be smart and notice that every End Sub is at the end of a line, so all I have to do is put a $ after it like this:
Sub Page_Load .* End Sub$
But that finds absolutely zero strings.
So what am I doing wrong? (one note, I put extra blanks around .* here so you can see it, but normally the blanks would not be there.
you may need non-greedy approach. try this:
^.*?pizza.*$
So, now complete new answer.
Search for the word "pizza" (not "pizzas")
If you have a Multiline string and want to find a single row, you need to use the Option [Multiline][1]. That changes the behaviour of the anchors ^ and $ to match the start and the end of the row.
To ensure to match only the complete word "pizza" and no partial match, use word boundaries
If you don't use the Singleline option, you don't need to worry about greediness
So your regex would be:
Regex optionRegex = new Regex(#"^.*\bpizza\b.*$", RegexOptions.Multiline);
For the Sub Page_Load.*End Sub thing, you need to match more than one line:
Use the single line option, to allow the . match also newline characters.
You need ungreedy matching behaviour of the quantifier
So your regex would be:
Regex optionRegex = new Regex(#"Sub Page_Load.*?End Sub", RegexOptions.Singleline);
I want to match a combination of expressions that is optional. In this specific example, I want to match on the word through. Also, if the words run or swim precede through (with whitespace) then match on the whole phrase. So that combination of expressions preceding through must be optional.
I want all the following lines to be positive matches:
swim through <-- match entire phrase
jump through <-- match entire phrase
hike through <-- match only the word "through"
To do this, I can use the following expression:
(jump\W|swim\W)?through
However, is it possible to accomplish the same thing without having to add \W after jump and swim? I was trying something like this:
(jump|swim)?\W?through
But that wasn't working properly because it would include the space that precedes through on the 3rd example. I only want the word through, not the whitespace around it.
What about this one: (?:(jump|swim)\W)?through
I was wondering how to match a line without either of two words?
For example, I would like to match a line without neither Chapter nor Part. So neither of these two lines is a match:
("Chapter 2 The Economic Problem 31" "#74")
("Part 2 How Markets Work 51" "#94")
while this is a match
("Scatter Diagrams 21" "#64")
My python-style regex will be like (?<!(Chapter|Part)).*?\n. I know it is not right and will appreciate your help.
Try this:
^(?!.*(Chapter|Part)).*
#MRAB's solution will work, but here's another option:
(?m)^(?:(?!\b(?:Chapter|Part)\b).)*$
The . matches one character at a time, after the lookahead checks that it's not the first character of Chapter or Part. The word boundaries (\b) make sure it doesn't incorrectly match part of a longer word, like Partition.
The ^ and $ are start- and end anchors; they ensure that you match a whole line. $ is better than \n because it also matches the end of the last line, which won't necessarily have a linefeed at the end. The (?m) at the beginning modifies the meaning of the anchors; without that, they only match at the beginning and end of the whole input, not of individual lines.