Regex issue - fine backwards and not forwards Textmate - regex

I'm not very good with regular expressions but in Textmate, I'm trying to clear out some multi-lines in an XML file that looks like
<comments>
<sub_node>....
....
</comments>
and I'm using this in the find/replace with regex,
<comments>(?m:.*)</comments>
There're multiple occurences of the above, but if I do a find, it finds the first and then selects everything in between including outside nodes till the last in the file.
If I do a find previous (backwards) from the last line it captures a block correctly. I'm not sure what I'm doing wrong here and if anyone might even suggest a far more efficient way of doing this.
Thanks.

You need to use non-greedy qualifiers. I don't know anything about Textmate, so I don't know if it supports them. If it doesn't, you can search for <comments> followed by any number of things that isn't </comments> followed by <comments>. (This would be more specific help, but your posted example is unfamiliar and must be some Textmate weirdness.)

Sounds like perfectly normal behavior to me. You just need to use a reluctant quantifier, which means adding a ?, like so:
<comments>(?m:.*?)</comments>
The only weirdness here is the m (for "multiline") modifier, which allows the . metacharacter to match newlines. Most regex flavors call that "single-line" or "dot-matches-all" mode, and use s to specify it. Those flavors tend to support an m/"multiline" mode as well, which changes the behavior of the ^ and $ anchors. In TextMate that's the default mode, and can't be changed.

Related

EditPad: How to replace multiple search criteria with multiple values?

I did some searching and found tons of questions about multiple replacements with Regex, but I'm working in EditPadPro and so need a solution that works with the regex syntax of that environment. Hoping someone has some pointers as I haven't been able to work out the solution on my own.
Additional disclaimer: I suck with regex. I mean really... it's bad. Like I barely know wtf I'm doing.So that being said, here is what I need to do and how I'm currently approaching it...
I need to replace two possible values, with their corresponding replacements. My two searches are:
(.*)-sm
(.*)-rad
Currently I run these separately and replace each with simple strings:
sm
rad
Basically I need to lop off anything that comes prior to "sm" so I just detect everything up to and including sm, and then replace it all with that string (and likewise for "rad").
But it seems like there should be a way to do this in a single search/replace operation. I can do the search part fine with:
(.*)-sm|(.*)-rad
But then how to replace each with it's matching value? That's where I'm stuck. I tried:
sm|rad
but alas, that just becomes the literal complete string that is used for replacement.
Jonathan, first off let me congratulate you for using EPP Pro for regex in your text. It's my main text editor, and the main reason I chose it, as a regex lover, is that its support of regex syntax is vastly superior to competing editors. For instance Notepad++ is known for its shoddy support of regular expressions. The reason of course is that EPP's author Jan Goyvaerts is the author of the legendary RegexBuddy.
A picture is worth a thousand words... So here is how I would do your replacement. Just hit the "replace all button". The expression in the regex box assumes that anything before the dash that is not a whitespace character can be stripped, so if this is not what you want, we need to tune it.
Search for:
(.*)-(sm|rad)
Now, when you put something in parenthesis in Regex, those matches are stored in temporary variables. So whatever matched (.*) is stored in \1 and whatever matched (sm|rad) is stored in \2. Therefore, you want to replace with:
\2
Note that the replacement variable may be different depending on what programming language you are using. In Perl, for example, I would have to use $2 instead.

Confusion regarding the *? regular expression operator

So I want to search a string, using the below regular expression:
border-.*\.5pt
to find all border-top, border-bottom, etc CSS properties in a file with a border thickness of .5pt. It generally works great, but it's too greedy.
For example all of the below comes back as a single match:
border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt
I want those two CSS properties to be two separate matches.
So I tried to modify my regular expression to:
border-.*?\.5pt
Using ? to make it non-greedy. However, after that modification, nothing matches.
Can anyone explain why I see this behavior? What am I missing?
(If it's worth knowing, I'm using Microsoft Expression Web's 'find with regular expressions' when doing this search.)
There is no one "regular expression" language. While there are broad commonalities, details differ from implementation to implementation. Many regexes use - to be the non-greedy "0 or more", others use *?. Apparently Microsoft Expression Web uses #.
In short, regexes can differ, so you'll often need to RTM for the one you're using to find its range of capabilities and detailed syntax (i.e. support for alteration/backtracking/etc., grouping character, set shorthand, etc.)
.*? is the badest, so to say "antipattern" for Regular Expressions. It is commonly used as a "Match-something-until-the-string-i-want" Pattern - but it isn't.
Especially when combining multiple .*? within ONE pattern, it may lead to very wrong and unexpected results.
For your Case - as stated in the comments - It works. (Maybe you did something wrong?)
However, it is ALWAYS a good idea to be more specific, when generating a regex pattern.
ALWAYS KEEP IN MIND that .*? can be ANYTHING. Also Stuff you really don't want to match!
In your example, i would use something like this: border-(?:[^:]+):\s*(?:[^\s]+)\s+(?:\#[a-fA-F0-9]{6})\s+(?:\d*(?:\.\d+)?)pt;?
It is more specific, but matches the given Requirements, ignores all whitespaces that dont make sence, and even matches border widths, regardles if they are written as .2, 3 or 4.1. If you remove the ?: from the single match Groups you can also match every single attribute, if required. : Position, Border type, Color and thickness.
The pattern border-([^:]+):\s*([^\s]+)\s+(\#[a-fA-F0-9]{6})\s+(\d*(?:\.\d+)?)pt;? with your string border-top:solid #1F497D .5pt;border-bottom:solid #1F497D .5pt will match:
First Match:
1.top
2.solid
3.#1F497D
4..5
Second Match:
1.bottom
2.solid
3.#1F497D
4..5

Notepad++ masschange using regular expressions

I have issues to perform a mass change in a huge logfile.
Except the filesize which is causing issues to Notepad++ I have a problem to use more than 10 parameters for replacement, up to 9 its working fine.
I need to change numerical values in a file where these values are located within quotation marks and with leading and ending comma: ."123,456,789,012.999",
I used this exp to find and replace the format to:
,123456789012.999, (so that there are no quotation marks and no comma within the num.value)
The exp used to find is:
([,])(["])([0-9]+)([,])([0-9]+)([,])([0-9]+)([,])([0-9]+)([\.])([0-9]+)(["])([,])
and the exp to replace is:
\1\3\5\7\9\10\11\13
The problem is parameters \11 \13 are not working (the chars eg .999 as in the example will not appear in the changed values).
So now the question is - is there any limit for parameters?
It seems for me as its not working above 10. For shorter num.values where I need to use only up to 9 parameters the string for serach and replacement works fine, for the example above the search works but not the replacement, the end of the changed value gets corrupted.
Also, it came to my mind that instead of using Notepad++ I could maybe change the logfile on the unix server directly, howerver I had issues to build the correct perl syntax. Anyone who could help with that maybe?
After having a little play myself, it looks like back-references \11-\99 are invalid in notepad++ (which is not that surprising, since this is commonly omitted from regex languages.) However, there are several things you can do to improve that regular expression, in order to make this work.
Firstly, you should consider using less groups, or alternatively non-capture groups. Did you really need to store 13 variables in that regex, in order to do the replacement? Clearly not, since you're not even using half of them!
To put it simply, you could just remove some brackets from the regex:
[,]["]([0-9]+)[,]([0-9]+)[,]([0-9]+)[,]([0-9]+)[.]([0-9]+)["][,]
And replace with:
,\1\2\3\4.\5,
...But that's not all! Why are you using square brackets to say "match anything inside", if there's only one thing inside?? We can get rid of these, too:
,"([0-9]+),([0-9]+),([0-9]+),([0-9]+)\.([0-9]+)",
(Note I added a "\" before the ".", so that it matches a literal "." rather than "anything".)
Also, although this isn't a big deal, you can use "\d" instead of "[0-9]".
This makes your final, optimised regex:
,"(\d+),(\d+),(\d+),(\d+)\.(\d+)",
And replace with:
,\1\2\3\4.\5,
Not sure if the regex groups has limitations, but you could use lookarounds to save 2 groups, you could also merge some groups in your example. But first, let's get ride of some useless character classes
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
We could merge those groups:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+)(\.)([0-9]+)(")(,)
^^^^^^^^^^^^^^^^^^^^
We get:
(\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(,)
Let's add lookarounds:
(?<=\.)(")([0-9]+)(,)([0-9]+)(,)([0-9]+)(,)([0-9]+\.[0-9]+)(")(?=,)
The replacement would be \2\4\6\8.
If you have a fixed length of digits at all times, its fairly simple to do what you have done. Even though your expression is poorly written, it does the job. If this is the case, look at Tom Lords answer.
I played around with it a little bit myself, and I would probably use two expressions - makes it much easier. If you have to do it in one, this would work, but be pretty unsafe:
(?:"|(\d+),)|(\.\d+)"(?=,) replace by \1\2
Live demo: http://regex101.com/r/zL3fY5

How to search (using regex) for a regex literal in text?

I just stumbled on a case where I had to remove quotes surrounding a specific regex pattern in a file, and the immediate conclusion I came to was to use vim's search and replace util and just escape each special character in the original and replacement patterns.
This worked (after a little tinkering), but it left me wondering if there is a better way to do these sorts of things.
The original regex (quoted): '/^\//' to be replaced with /^\//
And the search/replace pattern I used:
s/'\/\^\\\/\/'/\/\^\\\/\//g
Thanks!
You can use almost any character as the regex delimiter. This will save you from having to escape forward slashes. You can also use groups to extract the regex and avoid re-typing it. For example, try this:
:s#'\(\\^\\//\)'#\1#
I do not know if this will work for your case, because the example you listed and the regex you gave do not match up. (The regex you listed will match '/^\//', not '\^\//'. Mine will match the latter. Adjust as necessary.)
Could you avoid using regex entirely by using a nice simple string search and replace?
Please check whether this works for you - define the line number before this substitute-expression or place the cursor onto it:
:s:'\(.*\)':\1:
I used vim 7.1 for this. Of course, you can visually mark an area before (onto which this expression shall be executed (use "v" or "V" and move the cursor accordingly)).

regexIssueTracker not working in CruiseControl.net

I am trying to get an issueUrlBuilder to work in my CruiseControl.NET config, and cannot figure out why they aren't working.
The first one I tried is this:
<cb:define name="issueTracker">
<issueUrlBuilder type="regexIssueTracker">
<find>^.*Issue (\d*).|\n*$</find>
<replace>https://issuetracker/ViewIssue.aspx?ID=$1</replace>
</issueUrlBuilder>
</cb:define>
Then, I reference it in the sourceControl block:
<sourcecontrol type="vaultplugin">
...
<issueTracker/>
</sourcecontrol>
My checkin comments look like this:
[Issue 1234] This is a test comment
I cannot find anywhere in the build reports/logs/etc. where that issue link is converted to a link. Is my regex wrong?
I've also tried the default issueUrlBuilder:
<cb:define name="issueTracker">
<issueUrlBuilder type="defaultIssueTracker">
<url>https://issuetracker/ViewIssue.aspx?ID={0}</url>
</issueUrlBuilder>
</cb:define>
Again, same comments and no links anywhere.
Anyone have any ideas.
It looks like you're trying to match a potentially multiline comment by using .|\n instead of just ., which doesn't match newlines by default. Your first problem is that | has the lowest associativity of all regex constructs, so it's dividing your whole regex into the alternatives ^.*Issue (\d*). or \n*$. You would need to enclose the alternation in a group: (?:.|\n)*.
Another potential problem is that the lines might be separated by \r\n (carriage-return plus linefeed) instead of just \n. If CCNET uses the .NET regex engine under the hood, that won't be a problem because the dot matches \r. But that's not true of all flavors, and anyway, there's always a better way to match anything including newlines than (?:.|\n)*. I suggest you try
<find>^.*Issue (\d*)(?s:.*)$</find>
or
<find>(?s)^.*Issue (\d*).*$</find>
(?s) and (?s:...) are inline modifiers which allow the dot to match line separator characters.
EDIT: It looks like this is a known bug in CCNET. If the inline modifier doesn't work, try replacing . with [\s\S], as you would in a JavaScript regex. Example:
<find>^.*Issue (\d*)[\s\S]*$</find>