rename files by removing the first 171 characters?

rename files by removing the first 171 characters? - regex

I have thousands files downloaded from internet with naming convention like this:
HTTP_services.cgi?FILENAME=%2Fdata%2FGPM_L3%2FGPM_3IMERGM.06%2F2020%2F3B-MO.MS.MRG.3IMERG.20200301-S000000-E235959.03.V06B.HDF5&FORMAT=bmM0Lw&BBOX=-9,114.3,-8,115.8&LABEL=3B-MO.MS.MRG.3IMERG.20200301-S000000-E235959.03.V06B.HDF5.SUB.nc4
I want to rename all the file by removing the first 171 characters in the filename. So I will have a file with name "3B-MO.MS.MRG.3IMERG.20200301-S000000-E235959.03.V06B.HDF5.SUB.nc4"
Is there any one-liner solution that I can use? I am using terminal in mac.

You may try the below regex:
.{171}
Explanation of the above regex:
. - Represents a metacharacter representing anything except a new line.
{171} - Represents a quantifier indicating any character can come 171 times.
You can find the demo of the above regex in here.
You can use the GNU rename utility in order to execute the below command to achieve your result.
rename 's/.{171}//g' *.nc4
Worth Reading: I can't run rename command on MACOS. What to do?

rename is the best solution, but you can also use substring commands:
for file in `ls *IMERG*` ; do
mv $file ${file:171}
done
or alternatively using cut:
for file in `ls *IMERG*` ; do
mv $file `echo ${file} | cut -c 171-`
done
if you sure exactly 171 characters will work for each file name.

Related

Use sed/regex to rename a file - bash with macOS

I have a list of files that a date has been added to the end.
ex: Chorus Left Octave (consolidated) (2020_10_14 20_27_18 UTC). The files will end with .wav or .mp3
I want to leave the (consolidated) but take out the date. I have come up with the regex and tested with regexr.com. It does format the text correctly there.
The regex is: /(\([0-9]+(.*)(?=.wav|.mp3))+/g
Now, I am trying to actually rename the files. In my terminal I have cd'ed into the folder with the files. Based on other answers here I have tried:
rename -n '/(\([0-9]+(.*)(?=.wav|.mp3))+/g' *.wav|*.mp3 - using rename installed with homebrew
sed '/(\([0-9]+(.*))+/g' *.wav|*.mp3
for f in *.wav|*.mp3; do mv "$f" "${f/(\([0-9]+(.*)(?=.wav|.mp3))+/g}” done
The first two do not throw any errors, but do not do any renames (I know that the -n after rename just prints out the files that will be changed, it doesn't actually change the files)
The last one starts a bash session.
I'd rather use the rename or sed, seems simpler to me. But, what am I doing wrong?.

In plain bash:
#!/bin/bash
pat='([0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9] [0-9][0-9]_[0-9][0-9]_[0-9][0-9] UTC)'
for f in *.mp3 *.wav; do echo mv "$f" "${f/$pat}"; done
Remove the echo preceding the mv after making sure it will work as intended. You may also consider adding the -i option to the mv in order to avoid clobbering an existing file unintentionally.

Rename command with regular expression not working

I want to rename files using regex.
For example: replace pattern Mod[0-9][0-9] to Mod[0-9][0-9]_temp in files N_Mod10_m.bdf and N_Mod11_n.bdf using below command:
rename 's/\(.*Mod[0-9][0-9]\)\(.*\.bdf\)/$1_temp$2 *
but this is not working.

You need to use non-greedy pattern: (.*?). Also, add the missing quote '.
I guess, this is what you are looking for: rename 's/(.*?Mod[0-9][0-9])(.*?\.bdf)/$1_temp$2' *.
Have a look at Rename Multiple Files in a Shell Prompt and Renaming files to have lower case extensions with rename.
For CentOS, you can insert _temp into file name like this:
for i in *; do j=`echo $i | sed -r 's/(.*?Mod[0-9][0-9])(.*?\.bdf)/\1_temp\2/g'`; mv "$i" "$j"; done

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)

one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:

You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'

ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.

You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done

This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'

find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)

Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.

Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Copy and Rename Multiple Files with Regular Expressions in bash

I've got a file structure that looks like:
A/
2098765.1ext
2098765.2ext
2098765.3ext
2098765.4ext
12345.1ext
12345.2ext
12345.3ext
12345.4ext
B/
2056789.1ext
2056789.2ext
2056789.3ext
2056789.4ext
54321.1ext
54321.2ext
54321.3ext
54321.4ext
I need to rename all the files that begin with 20 to start with 10; i.e., I need to rename B/2022222.1ext to B/1022222.1ext
I've seen many of the other questions regarding renaming multiple files, but couldn't seem to make it work for my case. Just to see if I can figure out what I'm doing before I actually try to do the copy/renaming I've done:
for file in "*/20?????.*"; do
echo "{$file/20/10}";
done
but all I get is
{*/20?????.*/20/10}
Can someone show me how to do this?

You just have a little bit of incorrect syntax is all:
for file in */20?????.*; do mv $file ${file/20/10}; done
Remove quotes from the argument to in. Otherwise, the filename expansion does not occur.
The $ in the substitution should go before the bracket

Here is a solution which use the find command:
find . -name '20*' | while read oldname; do echo mv "$oldname" "${oldname/20/10}"; done
This command does not actually do your bidding, it only prints out what should be done. Review the output and if you are happy, remove the echo command and run it for real.

Just wanna add to Explosion Pill's answer.
On OS X though, you must say
mv "${file}" "${file_expression}"
Or the mv command does not recognize it.

Brace expansions like :
{*/20?????.*/20/10}
can't be surrounded by quotes.
Instead, try doing (with Perl rename) :
rename 's/^10/^20/' */*.ext
You can do this using the Perl tool rename from the shell prompt. (There are other tools with the same name which may or may not be able to do this, so be careful.)
If you want to do a dry run to make sure you don't clobber any files, add the -n switch to the command.
note
If you run the following command (linux)
$ file $(readlink -f $(type -p rename))
and you have a result like
.../rename: Perl script, ASCII text executable
then this seems to be the right tool =)
This seems to be the default rename command on Ubuntu.
To make it the default on Debian and derivative like Ubuntu :
sudo update-alternatives --set rename /path/to/rename

The glob behavior of * is suppressed in double quotes. Try:
for file in */20?????.*; do
echo "${file/20/10}";
done

How to grep for a file extension

I am currently trying to a make a script that would grep input to see if something is of a certain file type (zip for instance), although the text before the file type could be anything, so for instance
something.zip
this.zip
that.zip
would all fall under the category. I am trying to grep for these using a wildcard, and so far I have tried this
grep ".*.zip"
But whenever I do that, it will find the .zip files just fine, but it will still display output if there are additional characters after the .zip so for instance .zippppppp or .zipdsjdskjc would still be picked up by grep. Having said that, what should I do to prevent grep from displaying matches that have additional characters after the .zip?

Test for the end of the line with $ and escape the second . with a backslash so it only matches a period and not any character.
grep ".*\.zip$"
However ls *.zip is a more natural way to do this if you want to list all the .zip files in the current directory or find . -name "*.zip" for all .zip files in the sub-directories starting from (and including) the current directory.

On UNIX, try:
find . -type f -name \*.zip

You can also use grep to find all files with a specific extension:
find .|grep -e "\.gz$"
The . means the current folder.
If you want to specify a folder other than the current folder, just replace the . with the path of the folder.
Here is an example: Let's find all files that end with .gz and are in the folder /var/log
find /var/log/ |grep -e "\.gz$"
The output is something similar to the following:
✘ ⚙> find /var/log/ |grep -e "\.gz$"
/var/log//mail.log.1.gz
/var/log//mail.log.0.gz
/var/log//system.log.3.gz
/var/log//system.log.7.gz
/var/log//system.log.6.gz
/var/log//system.log.2.gz
/var/log//system.log.5.gz
/var/log//system.log.1.gz
/var/log//system.log.0.gz
/var/log//system.log.4.gz
The $ sign says that the file extension is ending with gz

I use this to get a listing of the file types inside a folder.
find . -type f | egrep -i -E -o "\.{1}\w*$" | sort -su
Outputs for example:
.DS_Store
.MP3
.aif
.aiff
.asd
.doc
.flac
.jpg
.m4a
.m4p
.m4r
.mp3
.pdf
.png
.txt
.wav
.wma
.zip
BONUS: with
find . -type f | egrep -i -E -o "\.{1}\w*$" | sort | uniq -c
You'll get the file count:
106 .DS_Store
35 .MP3
89 .aif
5 .aiff
525 .asd
1 .doc
60 .flac
48 .jpg
149 .m4a
11 .m4p
1 .m4r
12844 .mp3
1 .pdf
5 .png
9 .txt
108 .wav
44 .wma
2 .zip

You need to do a couple of things. It should look like this:
grep '.*\.zip$'
You need to escape the second dot, so it will just match a dot, and not any character. Using single quotes makes the escaping a bit easier.
You need the dollar sign at the end of the line to indicate that you want the "zip" to occur at the end of the line.

grep -r pattern --include="*.txt" /path/to/dir/

Try: grep -o -E "(\\.([A-z])+)+"
I used this to get multi-dotted/multiple extensions. So if the input was hello.tar.gz, then it would output .tar.gz.
For single dotted, use grep -o -E "\\.([A-z])+$".
Tested on Cygwin/MingW+MSYS.

One more fix/addon of the above example:
# multi-dotted/multiple extensions
grep -oEi "(\\.([A-z0-9])+)+" file.txt
# single dotted
grep -oEi "\\.([A-z0-9])+$" file.txt
This will get file extensions like '.mp3' and etc.

Just reviewing some of the other answers. The .* isn't necessary, and if you're looking for a certain file extension, it's best to include -i so that it's case-insensitive; in case the file is HELLO.ZIP, for example. I don't think the quotes are necessary, either.
grep -i \.zip$

If you just want to find in the current folder, why not with this simple command without grep ?
ls *.zip

Simply do :
grep ".*.zip$"
The "$" indicates the end of line

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

rename files by removing the first 171 characters? - regex

rename is the best solution, but you can also use substring commands: for file in `ls IMERG` ; do mv $file ${file:171} done or alternatively using cut: for file in `ls IMERG` ; do mv $file `echo ${file} | cut -c 171-` done if you sure exactly 171 characters will work for each file name.

Related

Use sed/regex to rename a file - bash with macOS

Rename command with regular expression not working

Remove duplicate filename extensions

Copy and Rename Multiple Files with Regular Expressions in bash

How to grep for a file extension

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

rename files by removing the first 171 characters? - regex

rename is the best solution, but you can also use substring commands: for file in `ls *IMERG*` ; do mv $file ${file:171} done or alternatively using cut: for file in `ls *IMERG*` ; do mv $file `echo ${file} | cut -c 171-` done if you sure exactly 171 characters will work for each file name.

Related

Use sed/regex to rename a file - bash with macOS

Rename command with regular expression not working

Remove duplicate filename extensions

Copy and Rename Multiple Files with Regular Expressions in bash

How to grep for a file extension

Categories

Resources

rename is the best solution, but you can also use substring commands: for file in `ls IMERG` ; do mv $file ${file:171} done or alternatively using cut: for file in `ls IMERG` ; do mv $file `echo ${file} | cut -c 171-` done if you sure exactly 171 characters will work for each file name.