I am trying to match the start of the domain using regex but to check if the domain includes the #-symbol or not.
So for:
test#hotmail.com I want to match the position after the #
hotmail.com I want to match at the start of the line
What I have is:
(?(?=[#])(?<=#)|^)
I understand that if else conditions work like this
(?(?=regex)then|else)
Do a positive look forward for the # symbol: (?=[#])
If it matches I do a positive look behind from the #symbol: (?<=#)
else/or |
Match start of line ^
However for test#hotmail.com and hotmail.com it always matches at the start of the line.
I would use the following :
^(?!.*#)|(?<=#)
It matches the start of the string only if there's no # in the rest of the string, and the 0-width space after the # otherwise.
Explanation :
the whole pattern is an alternation between "start of the string if it isn't followed by #" and "the space after an #"
^(?!.*#) matches the start of a string that isn't followed by an # anywhere in the string by using a negative lookahead.
(?<=#) matches the space after a # using a positive lookbehind
You can try it here.
The problem with your if/else is that the predicate doesn't check the whole string but only the next character, so when the first character isn't an # it will match the start of the string. Moreover, your "predicate" and your "then" don't go together : the predicate tests that the following character is an #, and the then tests that the previous character is. I don't think that if/else hack works well with lookarounds.
This is a good question ans +1 to both OP and the accepted answer. I see #Aaron has accepted answer already, but here is how to get it using conditionals as you were asking in the questions (if/else) condition
May be not as clean as the accepted answer. I am just presenting a diffent solution
/(?(?!.*#)(.*)|(?:#(.*)))/gm
Here is the solution on regex101
Explanation:
(? -- conditonal
(?!.*#)(.*) looks to not to have # then select the whole string -- having only hotmail.com
| OR -- else condition meaning if the string has #
(?:#(.*)) -- get the string after # ..meaning it excludes "test#" and captures hotmail.com
Related
I'd like to find word RADU3_ or RADU3- in a sentence that begins with xlink:href= and ends with .svg
How to do this?
I've tried following, but does not give the result I'm expecting.
(?=\wxlink:href=|\wsvg\b)|\bRADU3_|\bRADU3-
Just last line in example is good result (RADU3_)
ProductionGraphics\GP1**RADU3-**11_HeatingFurnaceF1.svg
PB:ExpressionText id="RADU3_FUEL GAS _SUM_EX" PBD:LinkUses
xlink:href="C:\ProcBookImport\MaintenanceGraphics\RADU3_AI.svg"
Example...
Not sure exactly how you want to use it but the below pattern finds the string. I put the RADU3 part in a group where I matches RADU3 followed by - or _ ([_-])
(xlink:href=.*)(RADU3[_-]*)(.*\.svg)
Edit, handle multiple occurences
If a string might contain the pattern several times then use ? to allow a group to repeat itself
(RADU3[_-]*?)(.*?\.svg?)
The above could be used in a replace expression like
\1someotherword\3
Where \2 is the second group that is replaced
If you want to make sure that the string starts with xlink:href= and ends with \.svg you could use anchors to assert the start ^ and the end $ of the string.
Use 1 capturing group to make sure xlink:href= comes before RADU3 followed by an underscore or a hyphen. Then you could match it and in the replacement use that capturing group follwed by your replacement.
You could use a positive lookahead to assert that the string ends with \.svg
That will match:
^(xlink:href=.*)\bRADU3[_-](?=.*\.svg$)
^ Assert the start of the string
(xlink:href=.*) Capturing group, match up until the last occurence of ..
\bRADU3[_-] Word boundary to prevent matching part of a larger word. Match RADU3 followed by an underscore or hyphen
(?=.*\.svg$) Positive lookahead to assert the string ends with .svg
See the regex demo
It sounds like you only want the word (substring) if it is in a specific context?
In your case, you can restart the regex midways if you want to have starting and ending conditions (multiple conditions) for a string, but at the same time only want to use these conditions as "if-statements" and not as part of the result.
The following uses this method, and utilizes restarts (\K) in order to only extract the substring you are looking for.
# The string has to start with "xlink:href="
xlink:href=
# Fetch everything up to our match, and the restart the regex
.*\K
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(?=(.*\.svg))
If you want the entire string matching our rules you are looking for something like this:
#The string has to start with "xlink:href"
^(xlink:href=).*
# The strings we are looking for
(RADU3[-_])
# String has to end with ".svg"
(\w+\.svg)
#Get everything after .svg too
.*
If you only want the ending " after the .svg, you'd want to modify the last part where I just take everything after .svg
You can play around with what I have come up with at regex101 (no affiliation, just love their site): https://regex101.com/r/g0v07V/3/
I am trying to write an regular expression that would check if a pattern exists and, if it does, matches everything following it, and if (and only if) it does not, matches everything after another pattern.
example lines:
http://example.com/contact
www.example.com/contact
http://www.example.com/contact
expected output in all 3 cases: example
Here is the regular expression I expected would do the job:
(?(?<=www\.).+|(?<=http:\/\/).+)(?=\.com)
which I assumed would:
check if "www." is to be found
if yes, would match everything following it
if not, match everything following "http://"
restrict match to everything before the occurrence of ".com "
For the first two lines, the expression worked well, but in the third line www.example is matched instead of just example. Does this mean that for some reason the else command is executed although the if condition is met?
How can I change the above expression so that it only does the http// lookahead if the www. part was not found?
Converting my comment to answer.
You may use this regex:
^(?:https?://(?:www\.)?|www\.)\K\S+?(?=\.com(?:/|$))
RegEx Demo
RegEx Description:
^: Start
(?:https?://(?:www\.)?|www\.): Match http://www. or http:// or (https)
\K: Reset matched information
\S+?: Match 1+ non-space characters (lazy)
(?=\.com(?:/|$)): Using lookahead assert that we have .com or end of line ahead
Given the following 3 example paths representing server paths i am trying to create a skiplist for my FTP client via PCRE regular expressions but can't seem to get the wished result.
/subdir-level-1/subdir-level-2/.../Author1_-_Title1-(1234)-Publisher1
/subdir-level-1/subdir-level-2/.../Author2_-_Title2_(5678)-PUBLiSHER2
/subdir-level-1/subdir-level-2/.../Author3_-_Title3-4951-publisher3
I want to skip all folders (not paths) that do not end with
-Publisher1
I am trying to create a working pattern with the help of this online help and and this regex tester but don't get any further than to this negative lookahead pattern
.*-(?!Publisher1)
But with this pattern all lines match because with all of them the substrings up to the pattern do all not contain the pattern.
/subdir/subdir/.../Author1_-_Title1-(1234) -Publisher1
/subdir/subdir/.../Author2_-_Title2_(5678) -PUBLiSHER2
/subdir/subdir/.../Author3_-_Title3-4951 -publisher3
What is my mistake and how would the correct pattern be just to match only the second and third line as line to be skipped but keep the first line?
EDIT to make it clearer what to highlight and what not.
Everything from the beginning of the path to the last slash must be ignored (allowed).
Everything after the last slash that matches the defined regex must be skipped.
EDIT to present an advanced pattern matching only the red part
[^/]*(?<!-Publisher2)$
Debuggex Demo
The regex which you have used is:
.*-(?!Publisher1)
I will tell you whats the fault in it.
According to this regex it will match those lines which dont have a - followed by Publisher1. Okay, do you notice the - there in between on yur text, yes. between author and title or after title. So all the strings satisfy this condition. Instead if you search with a negative lookahead in such a way that hiphen is with Publisher1 then your match should work.
So you plan on moving the hiphen inside the parenthesis so that it matches and make your regex like this :
^.*(?!-Publisher1)
but this will also not work, because here .* matches everything, so when we do a lookahead, we are not able to find a single character to match . Thus we will use a negative lookbehind. <.
.*(?<!-Publisher1)
what now ? . I have done everything but still I cannot get it to work. why is it so ?
because a negative lookbehind will lookback and tell if it is not followed by -Publisher1.
this is complex, just bear with me :
suppose your string
/subdir/subdir/.../Author1_-_Title1-(1234)-Publisher1
we do a negative lookbehind for -Publisher1. From the postition after 1 . i.e. at the end of the string -Publisher1 is visible when we lookback. BUT our condition is negative lookbehind. So it will move one character left to reach a position where it will no more be able to lookback and say that "Hey I can see -Publisher1 from here" because from here we are able to see "-Publisher" only. Our condtin satisfies but the regex still matches the rest of the string.
So it is essential to bind the lookbehind to the end of the string so that it doesnot move one character to the left to search for its match.
final regex:
.*(?<!-Publisher1)$
demo here : http://regex101.com/r/lE1vW2
This should suit your needs:
^.*(?<!-Publisher1)$
Debuggex Demo
I want to skip all folders that do not end with -Publisher1
You can use this negative lookahead based regex:
^(?!.*?-Publisher1$).+$
Working Demo
You could use the following regex in order to exclude lines containing Publisher1:
^((?!Publisher1).)*$
Online demo: http://regex101.com/r/gD8jK0
How can I match a path only if there is no "?" plus zero or more character on the end.
I have the following path:
/something/contentimg/coast03.jpg?itok=ABC
I want the filename, but only if there is no "?something" after the file extension.
I tried:
/^.*\/(.*)(?!\?.*)$/
But it matches anyway. This is the result. What am I doing wrong?
Array
(
[0] => /something/contentimg/coast03.jpg?itok=ABC
[1] => coast03.jpg?itok=ABC
)
Using php.
Use parse_url:
print_r(parse_url('/something/contentimg/coast03.jpg?itok=ABC'))
(
[path] => /something/contentimg/coast03.jpg
[query] => itok=ABC
)
The * quantifier behaves greedily and matches everything up to the end of the regular expression, so the negative lookahead kicks in at the end of the input (and of course doesn't find what it's looking for). The regex should be done a little differently:
/^.*\/([^?]+)$/
This expression matches one or more non-question-mark characters and then asserts that it has reached the end of the input string, which is what you want to do.
^.*\/([^?]+)(?![?].+)$
Working DEMO
Your expression does not work, because (.*) matches everything after last \, so there is nothing that could be considered as negative lookahead input.
This is how it's currently matching:
.* - greedily matches up to before the last / - /something/contentimg
\/ - matches /
(.*) - matches the rest of the string - coast03.jpg?itok=ABC
(?!\?.*) - checks that the characters following don't match, since we are at the end already, it obviously won't match.
What you should do:
It seems like you can just check if a ? exists in the string, so try:
/^(?!.*\?)/
Or match up to the last /, then check for a ? from there:
/^(?!.*\/.*\?)/
Explanation:
You already know (?!...) is negative look-ahead, you're just not entirely sure how to use it. Wherever you put it, it tries its best to match the given pattern from that position onwards. If it succeeds, the regex doesn't match. So it might be a good idea to put this at the very beginning and try to match the rest of the string.
So the basic format for this example is:
/^(?!...).*$/
where (?!...) contains a pattern for the strings you want to exclude.
The .*$ at the end shouldn't be required, and if you want to check the entire string, remember the $ at the end of the look-ahead.
/^(?!...$)/
I am working in VB.Net and trying to use Regex.Replace to format a string I am using to query Sql. What Im going for is to cut out comments "--". I've found that in most cases the below works for what I need.
string = Regex.Replace(command, "--.*\n", "")
and
string = Regex.Replace(command, "--.*$", "")
However I have ran into a problem. If I have a string inside of my query that contains the double dash string it doesn't work, the replace will just cut out the whole line starting at the double dash. It makes since to me as to why but I can't figure out the regular expression i need to match on.
logically I need to match on a string that starts with "--" and is not proceeded by "'" and not followed by "'" with any number of characters inbetween. But Im not sure how to express that in a regular expression. I have tried variations of:
string = Regex.Replace(cmd, "[^('.*)]--.*\n[^(.*')]", "")
Which I know is obviously wrong. I have looked at a couple of online resources including http://www.codeproject.com/KB/dotnet/regextutorial.aspx
but due to my lack of understanding I can't seem to figure this one out.
I think you meant "match on a string that starts with -- and is not proceededpreceeded by ' and not followed by ' with any number of characters inbetween"
If so, then this is what you are looking for:
string = Regex.Replace(cmd, "(?<!'.*?--)--(?!.*?').*(?=\r\n)", "")
'EDIT: modified a little
Of course, it means you can't have apostrophes in your comments... and would be exceedingly easy to hack if someone wanted to (you aren't thinking of using this to protect against injection attacks, are you? ARE YOU!??! :D )
I can break down the expression if you'd like, but it's essentially the same as my modified quote above!
EDIT:
I modified the expression a little, so it does not consume any carriage return, only the comment itself... the expression says:
(?<! # negative lookbehind assertion*
' # match a literal single quote
.*? # followed by anything (reluctantly*)
-- # two literal dashes
) # end assertion
-- # match two literal dashes
(?! # negative lookahead assertion
.*? # match anything (reluctant)
' # followed by a literal single quote
) # end assertion
.* # match anything
(?= # positive lookahead assertion
\r\n # match carriage-return, line-feed
) # end assertion
negative lookbehind assertion means at this point in the match, look backward here and assert that this cannot be matched
negative lookahead assertion means look forward from this point and assert this cannot be matched
positive lookahead asserts the following expression CAN be matched
reluctant means only consume a match for the previous atom (the . which means everything in this case) if you cannot match the expression that follows. Thus the .*? in .*?-- (when applied against the string abc--) will consume a, then check to see if the -- can be matched and fail; it will then consume ab, but stop again to see if the -- can be matched and fail; once it consumes abc and the -- can be matched (success), it will finally consume the entire abc--
non-reluctant or "greedy" which would be .* without the ? will match abc-- with the .*, then try to match the end of the string with -- and fail; it will then backtrack until it can match the --
one additional note is that the . "anything" does not by default include newlines (carriage-return/line-feed), which is needed for this to work properly (there is a switch that will allow . to match newlines and it will break this expression)
A good resource - where I've learned 90% of what I know about regex - is Regular-Expressions.info
Tread carefully and good luck!
OK what you are doing here is not right :
/[^('.*)]--.*\n[^(.*')]/
You are saying the following :
Do not match a (, ), ', ., * then match -- then match anything until a newline and to not match the same character class as the one at the start.
What you probably meant to do is this :
/(?<!['"])\s*--.*[\r\n]*/
Which says, make sure that you don't match a ' or " match any whitespace match -- and anything else until the end or a newline or line feed character.