Regex to batch rename files in OS X Terminal - regex

I'm after a way to batch rename files with a regex i.e.
s/123/onetwothree/g
I recall i can use awk and sed with a regex but couldnt figure out how to pipe them together for the desired output.

You can install perl based rename utility:
brew install rename
and than just use it like:
rename 's/123/onetwothree/g' *
if you'd like to test your regex without renaming any files just add -n switch

An efficient way to perform the rename operation is to construct the rename commands in a sed pipeline and feed them into the shell.
ls |
sed -n 's/\(.*\)\(123\)\(.*\)/mv "\1\2\3" "\1onetwothree\2"/p' |
sh

Namechanger is super nice. It supports regular expressions for search and replace: consider that I am doing a super complex rename with the following regex:
\.sync-conflict-.*\.
thats a life saver.
Regex captures groups (Diomidis answer) be the CLI way, into variables I think called $1 and $2 so rename -nv 's/^(\d{2})\.(\d{2}).*/s$1e$2.mp4/' *.mp4 becomes possible. Notice the $1 and $2? Those are coming from capture group one (\d{2}) and two (\d{2}) in my example.

My take on a friendly recursive regex file name renamer which by default only emulates the replacement and shows what the resulting file names would be.
Use -w to actually write changes when you are satisfied with the dry run result, -s to suppress displaying non-matching files; -h or --help will show usage notes.
Simplest usage:
# replace all occurences of 'foo' with 'bar'
# "foo-foo.txt" >> "bar-bar.txt"
ren.py . 'foo' 'bar' -s
# only replace 'foo' at the beginning of the filename
# "foo-foo.txt" >> "bar-foo.txt"
ren.py . '^foo' 'bar' -s
Matching groups (e.g. \1, \2 etc) are supported too:
# rename "spam.txt" to "spam-spam-spam.py"
ren.py . '(.+)\.txt' '\1-\1-\1.py' -s
# rename "12-lovely-spam.txt" to "lovely-spam-12.txt"
# (assuming two digits at the beginning and a 3 character extension
ren.py . '^(\d{2})-(.+)\.(.{3})' '\2-\1.\3' -s
NOTE: don't forget to add -w when you tested the results and want to actually write the changes.
Works both with Python 2.x and Python 3.x.
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import print_function
import argparse
import os
import fnmatch
import sys
import shutil
import re
def rename_files(args):
pattern_old = re.compile(args.search_for)
for path, dirs, files in os.walk(os.path.abspath(args.root_folder)):
for filename in fnmatch.filter(files, "*.*"):
if pattern_old.findall(filename):
new_name = pattern_old.sub(args.replace_with, filename)
filepath_old = os.path.join(path, filename)
filepath_new = os.path.join(path, new_name)
if not new_name:
print('Replacement regex {} returns empty value! Skipping'.format(args.replace_with))
continue
print(new_name)
if args.write_changes:
shutil.move(filepath_old, filepath_new)
else:
if not args.suppress_non_matching:
print('Name [{}] does not match search regex [{}]'.format(filename, args.search_for))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Recursive file name renaming with regex support')
parser.add_argument('root_folder',
help='Top folder for the replacement operation',
nargs='?',
action='store',
default='.')
parser.add_argument('search_for',
help='string to search for',
action='store')
parser.add_argument('replace_with',
help='string to replace with',
action='store')
parser.add_argument('-w', '--write-changes',
action='store_true',
help='Write changes to files (otherwise just simulate the operation)',
default=False)
parser.add_argument('-s', '--suppress-non-matching',
action='store_true',
help='Hide files that do not match',
default=False)
args = parser.parse_args(sys.argv[1:])
print(args)
rename_files(args)

files = "*"
for f in $files; do
newname=`echo "$f" | sed 's/123/onetwothree/g'`
mv "$f" "$newname"
done

Related

Regex if then without else in ripgrep

I am trying to match some methods in a bunch of python scripts if certain conditions are met. First thing i am looking at is if import re exists in a file, and if it does, then find all cases of re.sub(something). I tried following the documentation here on how to use if then without else regexs, but cant seem to make it work with ripgrep with or without pcre2.
My next approach was to use groups, so rg -n "(^import.+re)|(re\.sub.+)" -r '$2', but the issue with this approach is that because the first import group matches, i get a lot of empty files back in my output. The $2 is being handled correctly.
I am hoping to avoid doing a or group capture, and use the regex if option if possible.
To summarize, what I am hoping for is, if import re appears anywhere in a file, then search for re\.sub.+ and output only the matching files and lines using ripgrep. Using ripgrep is a hard dependency.
Some sample code:
import re
for i in range(10):
re.match(something)
print(i)
re.sub(something)
This can be accomplished pretty easily with a shell pipeline and xargs. The idea is to use the first regex as a filter for which files to search in, and the second regex to show the places where re.sub occurs.
Here are three Python files to test with.
import-without-sub.py has an import re but no re.sub:
import re
for i in range(10):
re.match(something)
print(i)
import-with-sub.py has both an import re and an re.sub:
import re
for i in range(10):
re.match(something)
print(i)
re.sub(something)
And finally, no-import.py has no import re but does have a re.sub:
for i in range(10):
re.match(something)
print(i)
re.sub(something)
And now here's the command to show only matches of re.sub in files that contain import re:
rg '^import\s+re$' --files-with-matches --null | xargs -0 rg -F 're.sub('
--files-with-matches and --null print out all matching file paths separated by a NUL byte. xargs -0 then reads those file paths and turns them into arguments to be given to rg -F 're.sub('. (We use --null and -0 in order to correctly handle file names that contain spaces.)
Its output in a directory with all three of the above files is:
import-with-sub.py
7:re.sub(something)

Trouble getting regex to work with grep/sed

I've been working on a script to update the PRODUCT_BUNDLE_IDENTIFIER in a pbxproj file with a new value using a build script. The regex I've come up with selects everything between 'PRODUCT_BUNDLE_IDENTIFIER = ' and any text following up to 2 occurrences of '.' which is what I want.
The regex I've put together to find these occurences is shown here:
(?<=PRODUCT_BUNDLE_IDENTIFIER = )([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
I've tested it with a validator here: https://regex101.com/r/jUhJm7/1
To save time, here's a screenshot with the regex applied and the green portions selected as desired, so the regex seems to be working and recognizes the bundle id portion of the following examples as expected:
The issue I'm experiencing is that when using this regex with grep, grep -e, egrep, or sed it doesn't seem to be working in the same manner. I would like to use sed to run the string replacement and have tried the following methods to achieve this:
# variable definitions
BUNDLE_ID='mynew.bundle.id'
PBXFILE="$SRCROOT/myproject.xcodeproj/project.pbxproj"
# check if the test bundle id is currently in the file
if grep -Fq "REPLACEABLE_BUNDLE_ID" $PBXFILE; then
# this commented version works as expected as it's using simple string replacement
#sed -i '' "s/REPLACEABLE_BUNDLE_ID/$BUNDLE_ID/g" $PBXFILE
# these are the versions of the regex I've tried with sed #
# basic version working in validator & testing with sublime text regex engine
(?<=PRODUCT_BUNDLE_IDENTIFIER = )([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
# added extra parentheses around product id first portion
(?<=(PRODUCT_BUNDLE_IDENTIFIER = ))([a-zA-Z0-9_]+(?:\.[a-zA-Z0-9_]+){0,2})
# escaped version
\(?<=PRODUCT_BUNDLE_IDENTIFIER = \)\([a-zA-Z0-9_]+\(?:\.[a-zA-Z0-9_]+\){0,2}\)
# try replacing the current bundle id using the regex
sed -i -E '' "s/I put the regex here/$BUNDLE_ID/g" $PBXFILE
fi
I'm fairly new with regex and have not used sed before. I've read about extended regular expressions here: http://www.grymoire.com/Unix/Regular.html#uh-12 and feel like I'm just failing to put the pieces together properly.
Try this for GNU sed (use -E for unix):
$ sed -r "s/(PRODUCT_BUNDLE_IDENTIFIER = )[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+){0,2}/\1${BUNDLE_ID}/"
for example:
$ cat test.txt
PRODUCT_BUNDLE_IDENTIFIER = com.test.mybundle.keyboard;
PRODUCT_BUNDLE_IDENTIFIER = com.test.mybundle.iMessage;
PRODUCT_BUNDLE_IDENTIFIER = com.test;
PRODUCT_BUNDLE_IDENTIFIER = replaceable;
$ BUNDLE_ID='mynew.bundle.id'
$ sed -r "s/(PRODUCT_BUNDLE_IDENTIFIER = )[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+){0,2}/\1${BUNDLE_ID}/" test.txt
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id.keyboard;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id.iMessage;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id;
PRODUCT_BUNDLE_IDENTIFIER = mynew.bundle.id;

sed: capture group not expanding

Can someone explain to me why my sed command isn't working? I'm sure I'm doing something stupid. Here's small text file that demonstrates my issue:
#!/usr/bin/env python
class A:
def candy(self):
print "cane"
Put that in a file and call it test.py
My goal is to add #profile before the def line with the same indentation as the function declaration. I try with this:
$ sed -i '/\( *\)def /i \
\1#profile' test.py
Note that the capture group should be the set of spaces before the def and I'm referencing the group with \1.
Here's my result:
#!/usr/bin/env python
class A:
1#profile
def candy(self):
print "cane"
Why is that 1 being placed in there literally instead of being replaced by my capture group (four spaces)?
Thanks!
I don't know this to be true but I'm going to assume that sed doesn't maintain captures from address selectors and into manually inserted text and in fact may not be evaluating references inside "literal" text at all.
Try sed -e 's/\( *\)def /\1#profile\n&/' test.py instead.
What about that :
sed -i -e 's/^\(.*\)\(def.*\)/\1#profile\n\2/' test.py
Just use awk:
$ awk '{orig=$0} sub(/def.*/,"#profile"); {print orig}' file
#!/usr/bin/env python
class A:
#profile
def candy(self):
print "cane"
simple, portable, easily extendable, debuggable, etc., etc....

Perl from command line: substitute regex only once in a file

I'm trying to replace a line in a configuration file. The problem is I only want to replace only one occurence. Part of the file looks like this. It is the gitolite default config file:
# -----------------------------------------------------------------
# suggested locations for site-local gitolite code (see cust.html)
# this one is managed directly on the server
# LOCAL_CODE => "$ENV{HOME}/local",
# or you can use this, which lets you put everything in a subdirectory
# called "local" in your gitolite-admin repo. For a SECURITY WARNING
# on this, see http://gitolite.com/gitolite/cust.html#pushcode
# LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local",
# ------------------------------------------------------------------
I would like to set LOCAL_CODE to something else from the command line. I thought I might do it in perl to get pcre convenience. I'm new to perl though and can't get it working.
I found this:
perl -i.bak -p -e’s/old/new/’ filename
The problem is -p seems to have it loop over the file line by line, and so a 'o' modifier won't any have effect. However without the -p option it doesn't seem to work...
A compact way to do this is
perl -i -pe '$done ||= s/old/new/' filename
Yet another one-liner:
perl -i.bak -p -e '$i = s/old/new/ if !$i' filename
There are probably a large number of perl one liners that will do this, but here is one.
perl -i.bak -p -e '$x++ if $x==0 && s/old/new/;' filename

grep two words near each other

Say I have a line in a file "This is perhaps the easiest place to add new functionality." and I want to grep two words close to each other. I do
grep -ERHn "\beasiest\W+(?:\w+\W+){1,6}?place\b" *
that works and gives me the line. But when I do
grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" *
it fails, defeating the whole point of the {1,10}?
This one is listed in the regular-expression.info site and also a couple of Regex books. Though they do not describe it with grep but that should not matter.
Update
I put the regex into a python script. Works, but doesn't have the nice grep -C thing ...
#!/usr/bin/python
import re
import sys
import os
word1 = sys.argv[1]
word2 = sys.argv[2]
dist = sys.argv[3]
regex_string = (r'\b(?:'
+ word1
+ r'\W+(?:\w+\W+){0,'
+ dist
+ '}?'
+ word2
+ r'|'
+ word2
+ r'\W+(?:\w+\W+){0,'
+ dist
+ '}?'
+ word1
+ r')\b')
regex = re.compile(regex_string)
def findmatches(PATH):
for root, dirs, files in os.walk(PATH):
for filename in files:
fullpath = os.path.join(root,filename)
with open(fullpath, 'r') as f:
matches = re.findall(regex, f.read())
for m in matches:
print "File:",fullpath,"\n\t",m
if __name__ == "__main__":
findmatches(sys.argv[4])
Calling it as
python near.py charlie winning 6 path/to/charlie/sheen
works for me.
Do you really need the look ahead structure?
Maybe this is enough:
grep -ERHn "\beasiest\W+(\w+\W+){1,10}new\b" *
Here is what I get:
echo "This is perhaps the easiest place to add new functionality." | grep -EHn "\beasiest\W+(\w+\W+){1,10}new\b"
(standard input):1:This is perhaps the easiest place to add new
functionality.
Edit
As Camille Goudeseune said:
To make it easily usable, this can be added in a .bashrc:
grepNear() {
grep -EHn "\b$1\W+(\w+\W+){1,10}$2\b"
}.
Then at a bash prompt: echo "..." | grepNear easiest new
grep does not support the non-capturing groups of Python regular expressions. When you write something like (?:\w+\W+), you are asking grep to match a question mark ? followed by a colon : followed by one or more word chars \w+ followed by one or more non-word chars \W+. ? is a special character for grep regexes, for sure, but since it is following the beginning of a group, it is automatically escaped (in the same way that the regex [?] matches the question mark).
Let us test it? I have the following file:
$ cat file
This is perhaps the easiest place to add new functionality.
grep does not match it with the expression you used:
$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file
Then, I created the following file:
$ cat file2
This is perhaps the easiest ?:place ?:to ?:add new functionality.
Note that each word is preceded by ?:. In this case, your expression matches the file:
$ grep -ERHn "\beasiest\W+(?:\w+\W+){1,10}?new\b" file2
file2:1:This is perhaps the easiest ?:place ?:to ?:add new functionality.
The solution is to remove the ?: of the expression:
$ grep -ERHn "\beasiest\W+(\w+\W+){1,10}?new\b" file
file:1:This is perhaps the easiest place to add new functionality.
Since you do not even need a non-capturing group (at least as far as I've seen) it does not bear any problem.
Bonus point: you can simplify your expression changing {1,10} to {0,10} and removing the following ?:
$ grep -ERHn "\beasiest\W+(\w+\W+){0,10}new\b" file
file:1:This is perhaps the easiest place to add new functionality.