Apply regular expression substitution globally to many files with a script - regex

I want to apply a certain regular expression substitution globally to about 40 Javascript files in and under a directory. I'm a vim user, but doing this by hand can be tedious and error-prone, so I'd like to automate it with a script.
I tried sed, but handling more than one line at a time is awkward, especially if there is no limit to how many lines the pattern might match.
I also tried this script (on a single file, for testing):
ex $1 <<EOF
gs/,\(\_\s*[\]})]\)/\1/
EOF
The pattern will eliminate a trailing comma in any Perl/Ruby-style list, so that "[a, b, c,]" will come out as "[a, b, c]" in order to satisfy Internet Explorer, which alone among browsers, chokes on such lists.
The pattern works beautifully in vim but does nothing if I run it in ex, as per the above script.
Can anyone see what I might be missing?

You asked for a script, but you mentioned that you are vim user. I tend to do project-wide find and replace inside of vim, like so:
:args **/*.js | argdo %s/,\(\_\s*[\]})]\)/\1/ge | update
This is very similar to the :bufdo solution mentioned by another commenter, but it will use your args list rather than your buflist (and thus doesn't require a brand new vim session nor for you to be careful about closing buffers you don't want touched).
:args **/*.js - sets your arglist to contain all .js files in this directory and subdirectories
| - pipe is vim's command separator, letting us have multiple commands on one line
:argdo - run the following command(s) on all arguments. it will "swallow" subsequent pipes
% - a range representing the whole file
:s - substitute command, which you already know about
:s_flags, ge - global (substitute as many times per line as possible) and suppress errors (i.e. "No match")
| - this pipe is "swallowed" by the :argdo, so the following command also operates once per argument
:update - like :write but only when the buffer has been modified
This pattern will obviously work for any vim command which you want to run on multiple files, so it's a handy one to keep in mind. For example, I like to use it to remove trailing whitespace (%s/\s\+$//), set uniform line-endings (set ff=unix) or file encoding (set filencoding=utf8), and retab my files.

1) Open all the files with vim:
bash$ vim $(find . -name '*.js')
2) Apply substitute command to all files:
:bufdo %s/,\(\_\s*[\]})]\)/\1/ge
3) Save all the files and quit:
:wall
:q
I think you'll need to recheck your search pattern, it doesn't look right. I think where you have \_\s* you should have \_s* instead.
Edit: You should also use the /ge options for the :s... command (I've added these above).

You can automate the actions of both vi and ex by passing the argument +'command' from the command line, which enables them to be used as text filters.
In your situation, the following command should work fine:
find /path/to/dir -name '*.js' | xargs ex +'%s/,\(\_\s*[\]})]\)/\1/g' +'wq!'

you can use a combination of the find command and sed
find /path -type f -iname "*.js" -exec sed -i.bak 's/,[ \t]*]/]/' "{}" +;

If you are on windows, Notepad++ allows you to run simple regexes on all opened files.
Search for ,\s*\] and replace with ]
should work for the type of lists you describe.

Related

Mass rename in shell script

I have a bunch of files which are of this format:
blabla.log.YYYY.MM.DD
Where YYYY.MM.DD is something like (2016.01.18)
I have quite a few folders with about 1000 files in each, so I wanted to have a simple script to rename them. I want to rename them to
blabla.log
So basically, I'm just stripping the date at the end. Here is what I have:
for f in [a-zA-Z]*.log.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]; do
mv -v $f ${f#[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]};
done
This script outputs this:
mv: `blabla.log.2016.01.18' and `blabla.log.2016.01.18' are the same file
For more information:
I'm on windows, but I run this script in gitbash
For some reason, my gitbash doesn't recognize the "rename" command
Some regex patterns (like [0-9]{4} don't seem to work)
I'm really at a lost. Thanks.
EDIT: I need to rename every single file that has a date at the end and that is of the from: *.log.2016.01.18. They all need to keep their original names. All that should change is the removal of the date.
You have to use % instead of #: you want to remove from the end, not the start of your string.
Also, you're missing a . in what has to be removed, you don't want to end up with blabla.log..
Quoting the variable names prevents surprises when file names contain special characters.
Together:
mv -v "$f" "${f%.[0-9][0-9][0-9][0-9].[0-9][0-9].[0-9][0-9]}"

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

export filenames to temp file bash

I have a lot of files in multiple directories that all have the following setup for the filename:
prob123456_01
I want to delete the trailing "_01" off of each file name and export them to a temp file. How exactly would I delete the trailing "_01" as well as export? I am rather new to scripting so any help would be greatly appreciated!
As you've tagged with bash, I'll assume that you can use globstar
shopt -s globstar # enable globstar
for f in **_[0-9][0-9]; do echo "${f%_*}"; done > tmp
With globstar enabled, the pattern **_[0-9][0-9] matches any file ending in _, followed by any 2 digit number, in the current directory and any subdirectories. ${f%_*} removes the end of the file name using bash's built-in string manipulation functionality.
Better yet, as Charles Duffy suggests (thanks), you can use an array instead of a loop:
files=( **_[0-9][0-9] ); printf '%s\n' "${files[#]%_*}"
The array is filled the filenames that match the same pattern as before. ${files[#]%_*} removes the last part from each element of the array and passes them all as arguments to printf, which prints each result on a separate line.
Either of these approaches is likely to be quicker than using find as everything is done in the shell, without executing any separate processes.
Previously I had suggested to use the pattern **_{00..99}, although this is not ideal for a couple of reasons. It is less efficient, as it expands to **_00, **_01, **_02, ..., **_99. Also, any of those 100 patterns that don't match will be included literally in the output unless another option, nullglob is enabled.
It's up to you whether you use [0-9] or [[:digit:]] but the advantage of the latter is that it matches all characters defined to be a digit, which may vary depending on your locale. If this isn't a concern, I would go with the former.
If I understand you correctly, you want a list of the filenames without the trailing _01. The following would do that:
find . -type f -name '*_01' | sed 's/_01$//' > tmp.lst
find . -type f -name '*_01' looks for all the files in the current directory, and its descendent directories, for files with names ending in _01.
| is the so-called pipe, handing the results of the left-hand call to the right-hand call.
sed 's/_01$//' removes the _01 from the end of each filename.
> tmp.lst writes the result into the file tmp.lst
These are all pretty basic parts of working with bash and its likes, so it might be a good idea to look at a tutorial or two and familiarize yourself with those and a few others ;)

Compounding switch regexes in Vim

I'm working on refactoring a bunch of PHP code for an instructor. The first thing I've decided to do is to update all the SQL files to be written in Drupal SQL coding conventions, i.e., to have all-uppercase keywords. I've written a few regular expressions:
:%s/create table/CREATE TABLE/gi
:%s/create database/CREATE DATABASE/gi
:%s/primary key/PRIMARY KEY/gi
:%s/auto_increment/AUTO_INCREMENT/gi
:%s/not null/NOT NULL/gi
Okay, that's a start. Now I just open every SQL file in Vim, run all five regular expressions, and save. This feels like five times the work it should be. Can they be compounded in to one obnoxiously long but easily copy-pastable regex?
why do you have to do it in vim? how about sed/awk?
e.g. with sed
sed -e 's/create table/\U&/g' -e's/not null/\U&/g' -e 's/.../\U&/' *.sql
btw, in vi you may do
:%s/create table/\U&/g
to change case, well save some typing.
update
if you really want a long command to execute in vi, maybe you could try:
:%s/create table\|create database\|foo\|bar\|blah/\U&/g
Open the file containing that substitution commands.
Copy its contents (to the unnamed register, by default):
:%y
If there is only one file where the substitutions should be
performed, open it as usual and run the contents of that register
as a Normal mode command:
:#"
If there are several files to edit automatically, open those
files as arguments:
:args *.sql
Execute the yanked substitutions for each file in the argument list:
:argdo #"|up
(The :update command running after the substitutions, writes
the buffer to file if it has been changed.)
While sed can handle what you want (hovewer it can be interactive as you requestred by flag 'i'), vim still much powerfull. Once I needed to change last argument in some function call in 1M SLOC code base. The arguments could be in one line or in several lines. In vim I achieved it pretty easy.
You can open all php files in vim at once:
vim *.php
After that run in ex mode:
:bufdo! %s/create table/CREATE TABLE/gi
Repeat the rest of commands. At the end save all the files and exit vim:
:xall

Controlling shell command line wildcard expansion in C or C++

I'm writing a program, foo, in C++. It's typically invoked on the command line like this:
foo *.txt
My main() receives the arguments in the normal way. On many systems, argv[1] is literally *.txt, and I have to call system routines to do the wildcard expansion. On Unix systems, however, the shell expands the wildcard before invoking my program, and all of the matching filenames will be in argv.
Suppose I wanted to add a switch to foo that causes it to recurse into subdirectories.
foo -a *.txt
would process all text files in the current directory and all of its subdirectories.
I don't see how this is done, since, by the time my program gets a chance to see the -a, then shell has already done the expansion and the user's *.txt input is lost. Yet there are common Unix programs that work this way. How do they do it?
In Unix land, how can I control the wildcard expansion?
(Recursing through subdirectories is just one example. Ideally, I'm trying to understand the general solution to controlling the wildcard expansion.)
You program has no influence over the shell's command line expansion. Which program will be called is determined after all the expansion is done, so it's already too late to change anything about the expansion programmatically.
The user calling your program, on the other hand, has the possibility to create whatever command line he likes. Shells allow you to easily prevent wildcard expansion, usually by putting the argument in single quotes:
program -a '*.txt'
If your program is called like that it will receive two parameters -a and *.txt.
On Unix, you should just leave it to the user to manually prevent wildcard expansion if it is not desired.
As the other answers said, the shell does the wildcard expansion - and you stop it from doing so by enclosing arguments in quotes.
Note that options -R and -r are usually used to indicate recursive - see cp, ls, etc for examples.
Assuming you organize things appropriately so that wildcards are passed to your program as wildcards and you want to do recursion, then POSIX provides routines to help:
nftw - file tree walk (recursive access).
fnmatch, glob, wordexp - to do filename matching and expansion
There is also ftw, which is very similar to nftw but it is marked 'obsolescent' so new code should not use it.
Adrian asked:
But I can say ls -R *.txt without single quotes and get a recursive listing. How does that work?
To adapt the question to a convenient location on my computer, let's review:
$ ls -F | grep '^m'
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte/
$ ls -R1 m*
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte:
multithread.ec
multithread.ec.original
multithread2.ec
$
So, I have a sub-directory 'mte' that contains three files. And I have six files with names that start 'm'.
When I type 'ls -R1 m*', the shell notes the metacharacter '*' and uses its equivalent of glob() or wordexp() to expand that into the list of names:
makefile
mapmain.pl
minimac.group
minimac.passwd
minimac_13.terminal
mkmax.sql.bz2
mte
Then the shell arranges to run '/bin/ls' with 9 arguments (program name, option -R1, plus 7 file names and terminating null pointer).
The ls command notes the options (recursive and single-column output), and gets to work.
The first 6 names (as it happens) are simple files, so there is nothing recursive to do.
The last name is a directory, so ls prints its name and its contents, invoking its equivalent of nftw() to do the job.
At this point, it is done.
This uncontrived example doesn't show what happens when there are multiple directories, and so the description above over-simplifies the processing.
Specifically, ls processes the non-directory names first, and then processes the directory names in alphabetic order (by default), and does a depth-first scan of each directory.
foo -a '*.txt'
Part of the shell's job (on Unix) is to expand command line wildcard arguments. You prevent this with quotes.
Also, on Unix systems, the "find" command does what you want:
find . -name '*.txt'
will list all files recursively from the current directory down.
Thus, you could do
foo `find . -name '*.txt'`
I wanted to point out another way to turn off wildcard expansion. You can tell your shell to stop expanding wildcards with the the noglob option.
With bash use set -o noglob:
> touch a b c
> echo *
a b c
> set -o noglob
> echo *
*
And with csh, use set noglob:
> echo *
a b c
> set noglob
> echo *
*