Regex if then else confusion - regex

I have a problem with the Regex-If-then-else logic:
I am trying to achieve the following:
If the string contains the substring PubDSK then do the Regex Expression
^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*
If it does NOT contain the substring PubDSK then do a different Regex Expression, namely ^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*
I am using this Regex Expression (?(?=^.*PubDSK.*$)^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*|^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)(?=\1)[\s\S]*)
The affirmative case works great: https://regex101.com/r/ab9yOv/
BUT the non-affirmative case, doesn't do the trick: https://regex101.com/r/azxGvh/1
I assume it doesn't match so it cannot do the replacement?? How can I tell the regex to do the replacement on the complete string in the ELSE case?
I understand, that this problem can be easily solved with any other programming language, but for this use case I can only use pure regex...

The second \1 backreference refers to the first capturing group of the entire regex. So, it does not refer to the right capturing group defined in the else pattern part. In fact, the second \1 must be replaced with \3 as it refers to the third capturing group.
Also, note that (?=\1) and (?=\3) lookaheads make little sense here as they are followed with [\s\S]* consuming patterns. Just remove the lookahead pattern and use consuming ones.
The fixed pattern looks like
(?(?=^.*PubDSK.*$)^[\s\S]{24}(?=.{10}([\s\S]*))0*(.*?)\1[\s\S]*|^[\s\S]{48}(?=.{10}([\s\S]*))0*(.*?)\3[\s\S]*)
See the regex demo.

Related

Skip quotes in regex

I have the following regex
val=\"(?<val>.*?)\"
it works ok for val="value"
Now I need regex that will match val="value" and val=value
Could you please help? I don't understand how to build such regex. I have tried the following but no success
val=[^"](?<val>.*?)[^"]
update
it seems works val=(?:[^"])*(?<val>.*?)(?:[^"]|")* but I'm not sure that it is correct
You can capture the optional opening quote, and require it to be present at the end of the match.
val=(\"?)(?<val>.*?)\1
The back-reference \1 recalls the text which matched the first parenthesized expression.
Obviously, if you have code which depends on the order of grouped parentheses, you need to refer to the second group to get val; but of course you are likely referring to it by name already (otherwise why use a named group?)
The expression [^"] matches a character which isn't a quote, so it's completely wrong here.
Of course, when there aren't any quotes, the expression .*? will match the empty string if there isn't a trailing context which forces it to match something longer. Perhaps you can use something like
val=(\"?)(?<val>.*?)\1(\s|$)
but this will obviously depend on what exactly you are hoping to match and in what context. If not this then maybe you can constrain the value so that you can use a greedy match instead? For instance,
val=(\"?)(?<val>[^\"]*)\1

Regex: how do I match a character before other capture characters?

I'm trying to match on a list of strings where I want to make sure the first character is not the equals sign, don't capture that match. So, for a list (excerpted from pip freeze) like:
ply==3.10
powerline-status===2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
psutil==4.0.0
ptyprocess==0.5.1
I want the captured output to look like this:
==3.10
==4.0.0
==0.5.1
I first thought using a negative lookahead (?![^=]) would work, but with a regular expression of (?![^=])==[0-9]+.* it ends up capturing the line I don't want:
==3.10
==2.6.dev9999-git.b-e52754d5c5c6a82238b43a5687a5c4c647c9ebc1-
==4.0.0
==0.5.1
I also tried using a non-capturing group (?:[^=]) with a regex of (?:[^=])==[0-9]+.* but that ends up capturing the first character which I also don't want:
y==3.10
l==4.0.0
s==0.5.1
So the question is this: How can one match but not capture a string before the rest of the regex?
Negative look behind would be the go:
(?<!=)==[0-9.]+
Also, here is the site I like to use:
http://www.rubular.com/
Of course it does some times help if you advise which engine/software you are using so we know what limitations there might be.
If you want to remove the version numbers from the text you could capture not an equals sign ([^=]) in the first capturing group followed by matching == and the version numbers\d+(?:\.\d+)+. Then in the replacement you would use your capturing group.
Regex
([^=])==\d+(?:\.\d+)+
Replacement
Group 1 $1
Note
You could also use ==[0-9]+.* or ==[0-9.]+ to match the double equals signs and version numbers but that would be a very broad match. The first would also match ====1test and the latter would also match ==..
There's another regex operator called a 'lookbehind assertion' (also called positive lookbehind) ?<= - and in my above example using it in the expression (?<=[^=])==[0-9]+.* results in the expected output:
==3.10
==4.0.0
==0.5.1
At the time of this writing, it took me a while to discover this - notably the lookbehind assertion currently isn't supported in the popular regex tool regexr.
If there's alternatives to using lookbehind to solve I'd love to hear it.

Regular expression to find specific string and add characters when the're not already there in notepad++

Okay, I have zero knowledge of regular expressions so if someone can direct me to a better way to figure this out then by all means please do.
I figured out that a series of files are missing a particular naming convention for the database they will write to. So some might be dbname1, dbname2, dbname3, abcdbname4, abcdbname5 and they all need to have that abc in the beginning. I want to write a regular expression that will find all tags in the file that do not follow immediately by abc and add in abc. Any ideas how I can do this?
Again, forgive me if this is poorly worded/expressed. I really have absolutely zero knowledge of regular expressions. I can't find any questions that are asking this. I know that there are questions asking how to add strings to lines but not how to add only to lines that are missing the string when some already have it.
I thought I had written this in but I'm looking at lines that look like this
<Name>dbname</Name>
or
<Name>abcdbname</Name>
and I need to get them all to have that abc at the beginning
Cameron's answer will work, but so will this. It's called a negative lookbehind.
(?<!abc)(dbname\d+)
This regex looks for dbname followed by 1 or more digits, and not prefixed by abc. So it will capture dbname113.
This looks for any occurrence of dbname not immediately prefixed by the string "abc". THe original name is in the capture group \1 so you can replace this regex with abc\1 and all your files will be properly prefixed.
Not every program/language that implements regex (famously, javascript) supports lookbehinds, but most do and Notepad++ certainly does. Lookarounds (lookbehind / lookaheads) are exceedingly handy once you get the hang of them.
?<! negative lookbehind, ?<= positive lookbehind / lookbehind, ?! negative lookhead, and ?= lookahead all must be used within parantheses as I did above, but they're not used in capturing so they do not create capture groups, hence why the second set of parentheses is able to be referenced as \1 (or $1 depending on the language)
Edit: Given some better example criteria, this is possibly more what you're looking for.
Find: (<Name>)(.*?(?<!abc)dbname\d+)(</Name>)
Replace: \1abc\2\3
Alternatively, something a bit easier to understand, you can do this or something like this:
Find: (<Name>)(abc)?(dbname\d+)(</Name>)
Replace: \1abc\3\4
What this is does is:
Matches <Name>, captures as backreference 1.
Looks for abc and captures it, if it's there as backreference 2, otherwise 2 contains nothing. The ? after (abc) means match 0 or 1 times.
Looks for the dbname and captures it. and captures as backreference 3.
Matches </Name>, captures as backreference 4.
By replacing with \1abc\3\4, you kind of drop abc off dbname if it exists and replace dbname with abcdbname in all instances.
You can take this a step further and
Find: (<Name>)(?:abc)?(dbname\d+)(</Name>)
Replace: \1abc\2\3
prefix the abc with ?: to create a noncapturing group, so the backreferences for replacing are sequential.
Replace \bdbname(\d+) with abcdbname\1.
The \b means "word boundary", so it won't match the abc versions, but will match the others. The (...) parentheses represent a capturing group, which capture everything that's matched in-between into a numbered variable that can be later referenced (there's only one here so it goes in \1). The \d+ matches one or more digit characters.

Regular Expression Words stuck together

Is there a way to write regular expressions to stop right before a particular word or characters?
For example, I have a text like:
Advisor:HarrisTeamTeamRole
So I want to write a regular expression that makes the advisor name dynamic, but only capture Harris. How do I write a regular expression to stop right before Team?
You could use a lookbehind and lookahead like this:
(?<=Advisor:).*?(?=Team)
Debuggex Demo
This will only capture from "Advisor:" up to the first "Team", and the regex will not capture anything else after (including "Team") in a capture group or otherwise. This will require a type of regex that can do lookbehinds... if you are not using that, you'll have to use grouping... which could be as simple as:
Advisor:(.*?)Team
and then just get the capture group #1
Try this one
This regular expression would be:
:([A-Z][a-z]*)
This one captures only the first word after the colon as long as it's in CamelCase, meaning it doesn't have to be the word Team it could be Advisor:HarrisNetworkSomething as well.
You can try in Lazy way and get the matched group from index 1
^Advisor:(.*?)Team
Here is online demo

Regex - how to match everything except a particular pattern

How do I write a regex to match any string that doesn't meet a particular pattern? I'm faced with a situation where I have to match an (A and ~B) pattern.
You could use a look-ahead assertion:
(?!999)\d{3}
This example matches three digits other than 999.
But if you happen not to have a regular expression implementation with this feature (see Comparison of Regular Expression Flavors), you probably have to build a regular expression with the basic features on your own.
A compatible regular expression with basic syntax only would be:
[0-8]\d\d|\d[0-8]\d|\d\d[0-8]
This does also match any three digits sequence that is not 999.
If you want to match a word A in a string and not to match a word B. For example:
If you have a text:
1. I have a two pets - dog and a cat
2. I have a pet - dog
If you want to search for lines of text that HAVE a dog for a pet and DOESN'T have cat you can use this regular expression:
^(?=.*?\bdog\b)((?!cat).)*$
It will find only second line:
2. I have a pet - dog
Match against the pattern and use the host language to invert the boolean result of the match. This will be much more legible and maintainable.
notnot, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
I'm faced with a situation where I have to match an (A and ~B)
pattern.
The basic regex for this is frighteningly simple: B|(A)
You just ignore the overall matches and examine the Group 1 captures, which will contain A.
An example (with all the disclaimers about parsing html in regex): A is digits, B is digits within <a tag
The regex: <a.*?<\/a>|(\d+)
Demo (look at Group 1 in the lower right pane)
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
The complement of a regular language is also a regular language, but to construct it you have to build the DFA for the regular language, and make any valid state change into an error. See this for an example. What the page doesn't say is that it converted /(ac|bd)/ into /(a[^c]?|b[^d]?|[^ab])/. The conversion from a DFA back to a regular expression is not trivial. It is easier if you can use the regular expression unchanged and change the semantics in code, like suggested before.
pattern - re
str.split(/re/g)
will return everything except the pattern.
Test here
My answer here might solve your problem as well:
https://stackoverflow.com/a/27967674/543814
Instead of Replace, you would use Match.
Instead of group $1, you would read group $2.
Group $2 was made non-capturing there, which you would avoid.
Example:
Regex.Match("50% of 50% is 25%", "(\d+\%)|(.+?)");
The first capturing group specifies the pattern that you wish to avoid. The last capturing group captures everything else. Simply read out that group, $2.
(B)|(A)
then use what group 2 captures...