Replacing with regex: How to insert a number right after a group match - regex

How to insert a number after a group match in a find-replace regex? Like this:
mat367 -> mat0363
fis434 -> fis0434
chm185 -> chm0185
I was renaming those files with the rename command line tool. I tried the following regex
rename 's/([a-z]{3})(.+)/$1\0$2/g' *
s at the beginning means replace
* at the end means every file.
([a-z]{3})(.+) is the regex to match the name of the files.
$1\0$2 is the replacement.
I thought the regex above would insert a 0 after the first group match ($1), but it doesn't insert anything. So I tried:
rename 's/([a-z]{3})(.+)/$10$2/g' *
However, this makes the regex think that I'm refering to $10 (group number teen), and throws errors.
I'd like to know if it is possible to accomplish my goal in a single regex. In other words, don't use the rename command twice or more. For example, use the rename command to insert a letter instead of 0, and then replace that letter with 0, but this would require two regex, two commands. Using only one regex may be useful in contexts other than renaming files.
Note: It seems like the regex used by rename is based on perl. That may help if someone knows perl.

The argument is evaluated as Perl code, and you are correct about Perl seeing $10.
In a double-quoted string literal (which the replacement expression is), you can only safely escape non-word characters. Like letters, digits are word characters. Specifically, \0 refers to the NUL character. So using \0 is not acceptable.
The solution is to use curlies to delimit the var name.
rename 's/([a-z]{3})(.+)/${1}0$2/g' *
Another way to address the problem in this case is by side-stepping it. Since there's no need to replace the text before the insertion point, we don't need to capture it.
rename 's/[a-z]{3}\K(.+)/0$1/g' *
We can further simplify the second solution.
The .+ ensures there's going to be at most one match per line, so the above can be simplified to the following (assuming none of the file names contain a line feed):
rename 's/[a-z]{3}\K(.)/0$1/' *
We could even avoid the remaining capture with a look-ahead.
rename 's/[a-z]{3}\K(?=.)/0/' *
But is there really a reason to look-ahead? The following isn't equivalent as it doesn't require anything to follow the letters, but I don't think that's a problem.
rename 's/[a-z]{3}\K/0/' *
Finally, if the goal is to add a zero before the number (and thus before the first digit encountered), I'd use
rename 's/(?=\d)/0/' *

You can wrap your variable name $1 in curly braces.
$ rename 's/([a-z]{3})(.+)/${1}0$2/g' *
This is Perl's way to enclose variable names inside strings.

Related

Vim S&R to remove number from end of InstallShield file

I've got a practical application for a vim regex where I'd like to remove numbers from the end of file location links. For example, if the developer is sloppy and just adds files and doesn't reuse file locations, you'll end up with something awful like this:
PATH_TO_MY_FILES&gt
PATH_TO_MY_FILES1&gt
...
PATH_TO_MY_FILES22&gt
PATH_TO_MY_FILES_ELSEWHERE&gt
PATH_TO_MY_FILES_ELSEWHERE1&gt
...
So all I want to do is to S&R and replace PATH_TO_MY_FILES*\d+ with PATH_TO_MY_FILES* using regex. Obviously I am not doing it quite right, so I was hoping someone here could not spoon feed the answer necessarily, but throw a regex buzzword my way to get me on track.
Here's what I have tried:
:%s\(PATH_TO_MY_FILES\w*\)\(\d+\)&gt:gc
But this doesn't work, i.e. if I just do a vim search on that, it doesn't find anything. However, if I use this:
:%s\(PATH_TO_MY_FILES\w*\)\(\d\)&gt:gc
It will match the string, but the grouping is off, as expected. For example, the string PATH_TO_MY_FILES22 will be grouped as (PATH_TO_MY_FILES2)(2), presumably because the \d only matches the 2, and the \w match includes the first 2.
Question 1: Why doesn't \d+ work?
If I go ahead and use the second string (which is wrong), Vim appears to find a match (even though the grouping is wrong), but then does the replacement incorrectly.
For example, given that we know the \d will only match the last number in the string, I would expect PATH_TO_MY_FILES22&gt to get replaced with PATH_TO_MY_FILES2&gt. However, instead it replaces it with this:
PATH_TO_MY_FILES2PATH_TO_MY_FILES22&gtgt
So basically, it looks like it finds PATH_TO_MY_FILES22&gt, but then replaces only the & with group 1, which is PATH_TO_MY_FILES2.
I tried another regex at Regexr.com to see how it would interpret my grouping, and it looked correct, but maybe a hack around my lack of regex understanding:
(PATH_TO_\D*)(\d*)&gt
This correctly broke my target string into the PATH part and the entire number, so I was happy. But then when I used this in Vim, it found the match, but still replaced only the &.
Question 2: Why is Vim only replacing the &?
Answer 1:
You need to escape the + or it will be taken literally. For example \d\+ works correctly.
Answer 2:
An unescaped & in the replacement portion of a substitution means "the entire matched text". You need to escape it if you want a literal ampersand.

Specific search pattern using regex

I would like to search for a pattern in following type of strings.
I have both of these patterns
"<deliveries!ntg5!intel!api!ntg5!avt!tuner!src>CDAVTTunerTVProxy.cpp"
and
"<.>api/sys/mocca/pf/comm/component/src\HBServices.hpp"
I would like to extract the file names from the patterns above
I tried the following
if(m/(\|>[0-9a-zA-Z_]\.cpp"$|\.hpp"$|\.h"$|\.c")$/){
Above expression is not listing file names with " >xxxxx.cpp" ( or .hpp, or .h, or .c)
Any idea would be of great help.
There are a few mistakes in your regex
if(m/(\|>[0-9a-zA-Z_]\.cpp"$|\.hpp"$|\.h"$|\.c")$/){
I assume that \|> is supposed to match either \ or >, but this is incorrect. It will try to match a pipe | followed by >. Backslash is used to escape characters, and so if you want to match a literal backslash, you need to escape it: \\. This is the wrong way to use an alternation, though (see more below), and there is a better way, which is to use a character class: [\\>].
[0-9a-zA-Z_] is a character class that is represented by \w, so it makes sense to use that instead to make your regex more readable. Also, you are only matching one character. If you want to match more than that, you need to supply a quantifier, such as +, which is suitable in this case. The quantifier + means to match 1 or more times.
Your alternations | are mixed up. Unless you group them properly, they will be intended to match the entire string. Your regex as it is now would capture strings like:
|>A.cpp"
.hpp"
.c"
Which is not what you want. If you want to apply the different extensions to the main file name body, you have to group the alternate extensions properly:
\w+\.(?:cpp|hpp|h|c)"$
Using parentheses that do not capture (?: ... ) are suitable for grouping. As you can also see, there is no need to repeat the parts of the string which are identical for all extensions.
So what do we end up with?
/([\\>]\w+\.(?:cpp|hpp|h|c)")$/
Although I do not think that you really want to include the leading [\\>] in the match, or the trailing ". So more properly it would be
/[\\>](\w+\.(?:cpp|hpp|h|c))"$/
Note that as I said in the comment, there is a module to use if these are paths, and you want to extract the file name. File::Basename is included in Perl core since version 5.
Please try this regex:
m/([0-9a-zA-Z_]+\.(?:cpp|hpp|h|c))$/
This one is looking for the extension cpp, hpp, h or c at the end of the string(using $) and then looking for the file name just before the period(.) with extension.

Regex for SublimeText Snippet

I've been stuck for a while on this Sublime Snippet now.
I would like to display the correct package name when creating a new class, using TM_FILEPATH and TM_FILENAME.
When printing TM_FILEPATH variable, I get something like this:
/Users/caubry/d/[...]/src/com/[...]/folder/MyClass.as
I would like to transform this output, so I could get something like:
com.[...].folder
This includes:
Removing anything before /com/[...]/folder/MyClass.as;
Removing the TM_FILENAME, with its extension; in this example MyClass.as;
And finally finding all the slashes and replacing them by dots.
So far, this is what I've got:
${1:${TM_FILEPATH/.+(?:src\/)(.+)\.\w+/\l$1/}}
and this displays:
com/[...]/folder/MyClass
I do understand how to replace splashes with dots, such as:
${1:${TM_FILEPATH/\//./g/}}
However, I'm having difficulties to add this logic to the previous one, as well as removing the TM_FILENAME at the end of the logic.
I'm really inexperienced with Regex, thanks in advance.
:]
EDIT: [...] indicates variable number of folders.
We can do this in a single replacement with some trickery. What we'll do is, we put a few different cases into our pattern and do a different replacement for each of them. The trick to accomplish this is that the replacement string must contain no literal characters, but consist entirely of "backreferences". In that case, those groups that didn't participate in the match (because they were part of a different case) will simply be written back as an empty string and not contribute to the replacement. Let's get started.
First, we want to remove everything up until the last src/ (to mimic the behaviour of your snippet - use an ungreedy quantifier if you want to remove everything until the first src/):
^.+/src/
We just want to drop this, so there's no need to capture anything - nor to write anything back.
Now we want to match subsequent folders until the last one. We'll capture the folder name, also match the trailing /, but write back the folder name and a .. But I said no literal text in the replacement string! So the . has to come from a capture as well. Here comes the assumption into play, that your file always has an extension. We can grab the period from the file name with a lookahead. We'll also use that lookahead to make sure that there's at least one more folder ahead:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))
And we'll replace this with $1$2. Now if the first alternative catches, groups $1 and $2 will be empty, and the leading bit is still removed. If the second alternative catches, $1 will be the folder name, and $2 will have captured a period. Sweet. The \G is an anchor that ensures that all matches are adjacent to one another.
Finally, we'll match the last folder and everything that follows it, and only write back the folder name:
^.+/src/|\G([^/]+)/(?=[^/]+/.*([.]))|\G([^/]+)/[^/]+$
And now we'll replace this with $1$2$3 for the final solution. Demo.
A conceptually similar variant would be:
^.+/src/|\G([^/]+)/(?:(?=[^/]+/.*([.]))|[^/]+$)
replaced with $1$2. I've really only factored out the beginning of the second and third alternative. Demo.
Finally, if Sublime is using Boost's extended format string syntax, it is actually possible to get characters into the replacement conditionally (without magically conjuring them from the file extension):
^.+/src/|\G(/)?([^/]+)|\G/[^/]+$
Now we have the first alternative for everything up to src (which is to be removed), the third alternative for the last slash and file name (which is to be removed), and the middle alternative for all folders you want to keep. This time I put the slash to be replaced optionally at the beginning. With a conditional replacement we can write a . there if and only if that slash was matched:
(?1.:)$2
Unfortunately, I can't test this right now and I don't know an online tester that uses Boost's regex engine. But this should do the trick just fine.

how to delete lines containing a string AND also NOT containing some other string?

I need some regular expression help.
Using the Firefox firebug extension, "css-usage", I was able to export a new css file where the utility prepended "UNUSED" to every class that was not referenced in the page.
I would like to now remove every style that contains the UNUSED styles, however there are some complexities with that. Namely, some tags are comma separated with other tags/selectors which may still be used so I don't want to delete any lines that have a comma in it. And secondly some styles are specified in a long multi-line block in curly braces, so I don't want to delete any lines that do not have a closing curly brace '}'.
I'm using a mac so any solution with SED or AWk or vi is acceptable. I would like to delete all lines in a css file that starts with "UNUSED" and contains no commas and must have a closing '}' curly brace.
To match UNUSED without commas and ensure that it ends with }:
UNUSED [^,]*}
To test if a string contains a character without capturing that character or moving the pointer forward, use a lookahead.
(?=.*[,])
Or, to make sure the string does not contain that character, use a negative lookahead.
(?!.*[,])
If you want to make sure that a string does not contain a character but does contain other characters, you can combine negative and positive lookaheads.
(?!.*[,])(?=.*[}])(?=.*UNUSED)
Finally, to actually select the entire string where this match occurs, use .* after the lookaheads.
^(?!.*[,])(?=.*[}])(?=.*UNUSED).*$
I am mostly familiar with .NET, but I believe many regex engines have options that allow you to specify whether ^ and $ will match the beginning/ending of the entire input string or the beginning/ending of a line. You'd want the second option.
Finally, you could use a regex replace to replace lines that match the given expression with an empty string (thus "deleting" those lines).
Use regex pattern
^UNUSED[^,]*}[^,]*$
with multi-line modifier.

Explain this Regular Expression please

Regular Expressions are a complete void for me.
I'm dealing with one right now in TextMate that does what I want it to do...but I don't know WHY it does what I want it to do.
/[[:alpha:]]+|( )/(?1::$0)/g
This is used in a TextMate snippet and what it does is takes a Label and outputs it as an id name. So if I type "First Name" in the first spot, this outputs "FirstName".
Previously it looked like this:
/[[:alpha:]]+|( )/(?1:_:/L$0)/g (it might have been \L instead)
This would turn "First Name" into "first_name".
So I get that the underscore adds an underscore for a space, and that the /L lowercases everything...but I can't figure out what the rest of it does or why.
Someone care to explain it piece by piece?
EDIT
Here is the actual snippet in question:
<column header="$1"><xmod:field name="${2:${1/[[:alpha:]]+|( )/(?1::$0)/g}}"/></column>
This regular expression (regex) format is basically:
/matchthis/replacewiththis/settings
The "g" setting at the end means do a global replace, rather than just restricting the regex to a particular line or selection.
Breaking it down further...
[[:alpha:]]+|( )
That matches an alpha numeric character (held in parameter $0), or optionally a space (held in matching parameter $1).
(?1::$0)
As Roger says, the ? indicates this part is a conditional. If a match was found in parameter $1 then it is replaced with the stuff between the colons :: - in this case nothing. If nothing is in $1 then the match is replaced with the contents of $0, i.e. any alphanumeric character that is not a space is output unchanged.
This explains why the spaces are removed in the first example, and the spaces get replaced with underscores in your second example.
In the second expression the \L is used to lowercase the text.
The extra question in the comment was how to run this expression outside of TextMate. Using vi as an example, I would break it into multiple steps:
:0,$s/ //g
:0,$s/\u/\L\0/g
The first part of the above commands tells vi to run a substitution starting on line 0 and ending at the end of the file (that's what $ means).
The rest of the expression uses the same sorts of rules as explained above, although some of the notation in vi is a bit custom - see this reference webpage.
I find RegexBuddy a good tool for me in dealing with regexs. I pasted your 1st regex in to Buddy and I got the explanation shown in the bottom frame:
I use it for helping to understand existing regexs, building my own, testing regexs against strings, etc. I've become better # regexs because of it. FYI I'm running under Wine on Ubuntu.
it's searching for any alpha character that appears at least once in a row [[:alpha:]]+ or space ( ).
/[[:alpha:]]+|( )/(?1::$0)/g
The (?1 is a conditional and used to strip the match if group 1 (a single space) was matched, or replace the match with $0 if group 1 wasn't matched. As $0 is the entire match, it gets replaced with itself in that case. This regex is the same as:
/ //g
I.e. remove all spaces.
/[[:alpha:]]+|( )/(?1:_:/\L$0)/g
This regex is still using the same condition, except now if group 1 was matched, it's replaced with an underscore, and otherwise the full match ($0) is used, modified by \L. \L changes the case of all text that comes after it, so \LABC would result in abc; think of it as a special control code.