How to rename a file using regex capture group in Linux?

How to rename a file using regex capture group in Linux? - regex

I want to rename a_1.0.tgz to b_1.0.tgz, since 1.0 may be changed to any version number, how can I achieve that?
For example, I can use mv a*.tgz b.tgz if I don't need to keep the version number.

zsh comes with the utility zmv, which is intended for exactly that. While zmv does not support regex, it does provide capture groups for filename generation patterns (aka globbing).
First, you might need to enable zmv. This can be done by adding the following to your ~/.zshrc:
autoload -Uz zmv
You can then use it like this:
zmv 'a_(*)' 'b_$1'
This will rename any file matching a_* so, that a_ is replaced by b_. If you want to be less general, you can of course adjust the pattern:
to rename only .tgz files:
zmv 'a_(*.tgz)' 'b_$1'
to rename only .tgz files while changing the extension to .tar.gz
zmv 'a_(*).tgz' 'b_$1.tar.gz'
to only rename a_1.0.tgz:
zmv 'a_(1.0.tgz)' 'b_$1'
To be on the save side, you can run zmv with the option -n first. This will only print, what would happen, but not actually change anything. For more information have a look at the man zshcontrib.

I'm not too familiar with zsh so I don't know if it supports regular expressions but I don't think you really need them here.
You can match the file using a glob and use a substitution:
for file in a_[0-9].[0-9].tgz; do
echo "$file" "${file/a/b}"
done
In the glob pattern, [0-9] matches any number between 0 and 9. ${file/a/b} substitutes the first occurrence of a with b.
Change the echo to mv if you're happy with the result.

Assuming you would like to replace the first character in all files matching a*.tgz with the letter b:
for f in a*.tgz; do
echo mv "$f" "b${f:1}"
done
Remove the echo when you are certain that this does what you want it to do.
The ${f:1} uses the ${name:offset} parameter expansion. From the zshexpn manual (on OS X):
If offset is non-negative, then if the variable name is a
scalar substitute the contents starting offset characters
from the first character of the string, [...]

Related

Sed - How to read a file line by line and go the path mentioned in the file then replace string?

I am on a new project where I need to add some strings to all the API names, which are exported
Someone hinted this can be done with simple sed commands.
What really needed is : Example :
In my project say 100 files and many files have something like the below pattern
in file1 its mentioned at some line : export(xyx);
in file2 its mentioned at some line : export (abc);
What is needed here is to replace the
xyz with xyz_temp and
abc with abc_temp.
Now the problem is these APIs are in different folders and different files.
Fortunately, I got to know we can redirect the result of cscope tool to some file with matching patterns.
so I did redirect the result of a search of the "export" string and I got below. Say file I have exported the scope result - export_api.txt as below.
/path1/file1.txt export(xyz);
/path2/file2.txt export(abc);
Now, I am not sure how to use sed to do this automation of
Reading this export_ap.txt
Reading each line
Replacing the string as above.
Any direction would highly appriciated.
Thanks in advance.

If you have a list of files which need to be changed and your replacement only needs to append _tmp, then this can be accomplished with a single sed call:
sed -i 's/export(\(abc\|xyz\));/export(\1_tmp);/' files...
-i will modify the files in-place, overwriting them.
If you don't care for what you are going to replace, but append a postfix to all export expressions, match any identifier. Here is one such example:
export(\([^)]*\))
Depending on your expressions and valid identifier names, you might want to or need to change this to one of:
export(\(.*\))
export(\([_a-zA-Z][_a-zA-Z0-9]*\))
export(\([_a-zA-Z"'][_a-zA-Z0-9"']*\))
export(\([_a-zA-Z]*\))
…
Another option would be to only match lines containing "export(" and then replace the closing parenthisis (given that your input lines contain the token ");" only once):
sed -i '/export(/s/);/_tmp);/' files...
# or reusing the complete match:
sed -i '/export(/s/);/_tmp&/' files...
This avoids the backreference and makes the regular expression simpler, because they can now be of fixed size

You can use the read builtin to parse the line in your export_api.txt file, then call sed on each file. Pattern match the export snippet to choose the correct sed invocation. The way read is invoked here assumes that your path and snippet are delimited by IFS and that path does not contain any whitespace or separators:
while read -r path snippet; do
case "$snippet" in
*abc*) sed -i 's/export(abc);/export(abc_tmp);/' "$path" ;;
*xyz*) sed -i 's/export(xyz);/export(xyz_tmp);/' "$path" ;;
esac
done < export_api.txt
NOTE: this will change/overwrite any of your files. Your files might be left in a broken state.
PS I wonder why you cannot use your IDE to search/replace those occurrences?

regular expression for "11th to 16th letter"

I am new to regular expression. Need help for reading files in unix system. I want to apply regular expression on ls command.
I have below files :
DLERMS08001708161708209683.csv.gz
DLERMS13001708161330170816.csv.gz
DLERMS13001708171330170816.csv.gz
and would like to extract files which have 170816 between 11th record to 16th digit.
I tried with below command ls *170816*.gz. However I am getting 3 filenames instead of two. I want only first two filenames instead of all 3. Could you please help.
Also want to add here that my third filename already contains 170816 at the end DLERMS13001708171330170816.csv.gz. I want to avoid this in my ls command output.

Using bash parameter-expansion alone,
for file in *.csv.gz; do
[ -e "$file" ] || continue
[ "${file:10:6}" == "170816" ] && printf "%s\n" "$file"
done
${PARAMETER:OFFSET:LENGTH}
This one can expand only a part of a parameter's value, given a position to start and maybe a length. If LENGTH is omitted, the parameter will be expanded up to the end of the string. If LENGTH is negative, it's taken as a second offset into the string, counting from the end of the string
Based on comments from below, apparently OP wants to copy the files intended to an alternate path, in which case the printf() should be replaced with cp with necessary arguments
[ "${file:10:6}" == "170816" ] && cp -- "$file" path/to/destination

Firstly, be careful not to confuse regular expressions with shell glob patterns (which is what you want here).
Your glob could be:
??????????170816*.gz
Which matches 10 unknown characters followed by the sequence you specified.
Depending on your next step, you might not need to use ls at all, for example you can loop over these files like this:
for file in ??????????170816*.gz; do
something_with "$file"
done
Or output the files that match using one of the following:
echo ??????????170816*.gz
printf '%s\n' ??????????170816*.gz
If there is a possibility that no files match, then you may wish to consider enabling nullglob (using shopt -s nullglob), which would expand to nothing in that case.

If you want to use globbing, it's not the same as using regular expression.
In your example you can use "?" as a placeholder for matching a single character:
Hence to achieve what you want as output, use ls with pattern below -
ls ??????????170816*

You want to use the wildcard (not regex) "any single letter" ? appropriatly often.
ls DLERMS????170816*.csv.gz
Regexes are much more flexible/powerful and overkill for this simple use case.
But as far as I know, ls does not support them, so you would have to go via other bash tools to identify the files in case you ever need to actually use regexes for anything.
I also reflected what I perceive to be another common of your filenames, the DLERMS at the beginning, if that is NOT common, replace those letter by ?, too.

Try this:
ls ??????????170816*

A solution with find and regex
find . -regextype egrep -regex "^.{12}170816.*\.gz"
find read: ./xxxxxxxxxxxxx and .{12} means the first twelve, so 170816 is the expression between 13th record to 18th

I don't think you can use regular expressions with ls directly, but with egrep, it works fine.
ls * | egrep "DLERMS[0-9]{4}170816[0-9]{10}.csv.gz"
[0-9]{4} - any number, four times.
[0-9]{10} - any number, ten times.
Also could be used instead "egrep" the command "grep -E", the -E option allows especial regular expressions like "[{|" without need to escape them "\".

grep and regex stored in string

my question is quite short:
a="'[0-9]*'"
grep -E '[0-9]*' #for example, line containing 000 will be recognized and printed
but
grep -E $a #line containing 000 WILL NOT be printed, why is that?
Does substitution for grep regex change the command's behaviour or have I missed something from a syntactic point of view? In other words, how do I make it so that grep accepts regex from a string stored in a variable.
Thank you in advance.

Quotes go around data, not in data. That means, when you store data (in this case, a regex expression) in a variable, don't embed quotes in the variable; instead, put double-quotes around the variable when you use it:
a="[0-9]*"
grep -E "$a"
You can sometimes get away with leaving the double-quotes off when using variables (as in Avinash Raj's comment), but it's not generally safe. In this case, it'll work fine provided there are no files or subdirectories in the current working directories with names that happen to start with a digit. You see, without double-quotes around $a, the shell will take its value, try to split it into multiple words (not a problem here), try to expand each word that contains shell wildcards into a list of matching files (potential problem here), and pass that to the command (grep) as its list of arguments. That means that if you happen to have files that start with digits in the current directory, grep thinks you ran a command like this:
grep -E 1file.txt 2file.jpg 3file.etc
... and it treats the first filename as the pattern to search for, and any other filenames as files to be searched. And you'll be scratching your head wondering why your script works or fails depending on which directory you happen to be in.
Note: the pattern [0-9]* is a valid regular expression, and a valid shell glob (wildcard) pattern, but it means very different things in the two contexts. As a regex, it means 0 or more digits in a row. As a shell glob, it means something that starts with a digit. Speaking of which, grep -E '[0-9]*' is not actually going to be very useful, since everything contains strings of 0 or more digits, so it'll match every line of every file you feed it.

How to parse input parameter using regex in bash script?

I have the following bash script (I execute it using msysgit). The file is named git-open:
#!/usr/bin/env bash
tempfile=`mktemp` || exit 1
git show $1 > $tempfile
notepad++ -multiInst -notabbar -nosession -noPlugin $tempfile
rm $tempfile
I invoke it through git like so:
git open master:Applications/Survey/Source/Controller/SurveyManager.cpp
Before I open this in notepad++, I want it to append the extension to the temporary file so that the editor automatically applies the correct syntax highlighting. If there is no extension specified, then mktemp shouldn't have to add an extension.
How can I modify my script above to work like this? I have very little experience with linux scripting, so I'm not sure how to implement a regex for this (assuming regex is necessary).

You can pass mktemp a template for your file name.
tempfile=$(mktemp -t git-open.XXXXXXXX.${1##*.}) || exit 1

Regular expressions are overkill for this. Glob patterns in parameter expansion with prefix removal is completely sufficient.
tempfile=`mktemp`.${1##*.}
The ${1##*.} means "expand $1 but remove longest prefix that matches the globbing pattern *.. * matches everything and . matches itself, so this removes everything up to and including last .. What remains is the extension.
Instead of the ## you can also use # for shortest prefix, % for shortest suffix and %% for longest prefix.
Ok, you probably want to handle cases where there is no extension. That can be done with help of case and more glob patterns:
case ${1##*/} in
*.*) suffix=${1##*.};;
*) suffix='';;
esac
tempfile=`mktemp`$suffix
This will take the filename without leading directories, test whether it contains . and use the suffix only if it does. Or you can compare the expansion with the original as devnull suggests.

Rename Files Mac Command Line

I have a bunch of files in a directory that were produced with rather unfortunate names. I want to change two of the characters in the name.
For example I have:
>ch:sdsn-sdfs.txt
and I want to remove the ">" and change the ":" to a "_".
Resulting in
ch_sdsn-sdfs.txt
I tried to just say mv \\>ch\:* ch_* but that didn't work.
Is there a simple solution to this?

For command line script to rename, this stackoverflow question has good answers.
For Mac, In GUI, Finder comes with bulk rename capabilities. If source list of files has some pattern to find & replace, it comes very handy.
Select all the files that need to be replaced, right click and select rename
On rename, enter find and replace string
Other options in rename, to sequence the file names:
To prefix or suffix text:

First, I should say that the easiest way to do this is to use the
prename or rename commands.
Homebrew package rename, MacPorts package renameutils :
rename s/0000/000/ F0000*
That's a lot more understandable than the equivalent sed command.
But as for understanding the sed command, the sed manpage is helpful. If
you run man sed and search for & (using the / command to search),
you'll find it's a special character in s/foo/bar/ replacements.
s/regexp/replacement/
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
Therefore, \(.\) matches the first character, which can be referenced by \1.
Then . matches the next character, which is always 0.
Then \(.*\) matches the rest of the filename, which can be referenced by \2.
The replacement string puts it all together using & (the original
filename) and \1\2 which is every part of the filename except the 2nd
character, which was a 0.
This is a pretty cryptic way to do this, IMHO. If for
some reason the rename command was not available and you wanted to use
sed to do the rename (or perhaps you were doing something too complex
for rename?), being more explicit in your regex would make it much
more readable. Perhaps something like:
ls F00001-0708-*|sed 's/F0000\(.*\)/mv & F000\1/' | sh
Being able to see what's actually changing in the
s/search/replacement/ makes it much more readable. Also it won't keep
sucking characters out of your filename if you accidentally run it
twice or something.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js