Zcat multiple compressed files in a for loop - zcat

Im fairly new to bash scripting and i've been searching here in stackoverflow for an answer that will match what im looking for but cant find an exact match.. apologies if this has already been answered.
I have multiple folders that has multiple compressed files, some are gzip extension , some doesnt have extension.
The only way Ive been able to see the content of the compressed file is by doing
zcat filename.gzip > filename
My goal is to create a for loop that will :
zcat all files and output it to a new file & add an identifier at the end, something like "-read"
delete the compressed file
Thank you!

To decompress all files that gzip would handle using find, basename and a while loop:
find . -type f | while read f; do
set -e # You should probably always script with set -e
# This will cause the subshell to stop on errors.
zcat < "$f" > "$f-new" || continue
rm "$f"
done
This should also handle any weird quoting that comes up. The while loop will run in a subshell, and set -e will set the stop on error option in it so that if there's an error with either the zcat or the rm it won't keep plowing ahead destroying everything in sight.

Related

Use sed/regex to rename a file - bash with macOS

I have a list of files that a date has been added to the end.
ex: Chorus Left Octave (consolidated) (2020_10_14 20_27_18 UTC). The files will end with .wav or .mp3
I want to leave the (consolidated) but take out the date. I have come up with the regex and tested with regexr.com. It does format the text correctly there.
The regex is: /(\([0-9]+(.*)(?=.wav|.mp3))+/g
Now, I am trying to actually rename the files. In my terminal I have cd'ed into the folder with the files. Based on other answers here I have tried:
rename -n '/(\([0-9]+(.*)(?=.wav|.mp3))+/g' *.wav|*.mp3 - using rename installed with homebrew
sed '/(\([0-9]+(.*))+/g' *.wav|*.mp3
for f in *.wav|*.mp3; do mv "$f" "${f/(\([0-9]+(.*)(?=.wav|.mp3))+/g}” done
The first two do not throw any errors, but do not do any renames (I know that the -n after rename just prints out the files that will be changed, it doesn't actually change the files)
The last one starts a bash session.
I'd rather use the rename or sed, seems simpler to me. But, what am I doing wrong?.
In plain bash:
#!/bin/bash
pat='([0-9][0-9][0-9][0-9]_[0-9][0-9]_[0-9][0-9] [0-9][0-9]_[0-9][0-9]_[0-9][0-9] UTC)'
for f in *.mp3 *.wav; do echo mv "$f" "${f/$pat}"; done
Remove the echo preceding the mv after making sure it will work as intended. You may also consider adding the -i option to the mv in order to avoid clobbering an existing file unintentionally.

use regex to specify output filename

I have a folder with many files where I only need some columns so I tried this to extract what I need:
mkdir ./raw_data/selection
doit() {
csvfix read_dsv -f 1,3,7 -s \; $1 > $1 | sed 's/raw_data/raw_data\/selection/'
}
export -f doit
Files_To_Parse=`ls ./raw_data/*csv`
parallel doit ::: $Files_To_Parse
This doesn't work.
But if I to this:
cd ./raw_data
doit() {
csvfix read_dsv -f 1,3,7 -s \; $1 > selection/$1
}
export -f doit
Files_To_Parse=`ls -1 *csv`
parallel doit ::: $Files_To_Parse
it works but I'd like to be able to run this from the top folder in this project (i.e to put this in a file named brief_csv.sh and call it from IDEs)
If you used Bash, you could:
for f in raw_data/*.csv
do
csvfix ... "$f" > raw_data/selection/"${f##*/}"
done
Also, instead of csvfix for extracting columns you could use cut:
$ cut -d \; -f 1,3,7 $f ...
I don't know the commands you are using, but this line:
csvfix read_dsv -f 1,3,7 -s \; $1 > $1 | sed ...
redirects the output in the same file you are reading; this can not work. In fact, you say that your modified code instead works. You could use temporary files to store intermediate results, don't be afraid to use many of them: debugging will be easier (you can see intermediate passages) and the system doesn't suffer. /tmp is a good place to put those intermediate files.
Use csvfix to do the first step, and redirect in /tmp/my-csvfix-intermediate; then use sed to read /tmp/my-csvfix-intermediate, and write in /tmp/my-grep-intermediate. After the last passage, you can take the last intermediate result and overwrite the original file, perhaps after having backed it up. You can move files everywhere you need, I don't see any problem in running your script from an IDE - just use as many passages as you need.
Avoid to parallelize when debugging, when the script will work, you can add parallelizing.
When two or more parallel processes will try to write in the same file (/tmp/my-...-intermediate), you will have one more problem. To overcome this you need to use different files for every process. The bash variable "$$" comes to help, just use file names like "/tmp/my-$$-blablabla", the $$ will be substituted with the PID of the process, and parallel processes can not have the same PID.
Hope it helps, regards.

Remove duplicate filename extensions

I have thousands of files named something like filename.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz.gz
I am using the find command like this find . -name "*.gz*" to locate these files and either use -exec or pipe to xargs and have some magic command to clean this mess, so that I end up with filename.gz
Someone please help me come up with this magic command that would remove the unneeded instances of .gz. I had tried experimenting with sed 's/\.gz//' and sed 's/(\.gz)//' but they do not seem to work (or to be more honest, I am not very familiar with sed). I do not have to use sed by the way, any solution that would help solve this problem would be welcome :-)
one way with find and awk:
find $(pwd) -name '*.gz'|awk '{n=$0;sub(/(\.gz)+$/,".gz",n);print "mv",$0,n}'|sh
Note:
I assume there is no special chars (like spaces...) in your filename. If there were, you need quote the filename in mv command.
I added a $(pwd) to get the absolute path of found name.
you can remove the ending |sh to check generated mv ... .... cmd, if it is correct.
If everything looks good, add the |sh to execute the mv
see example here:
You may use
ls a.gz.gz.gz |sed -r 's/(\.gz)+/.gz/'
or without the regex flag
ls a.gz.gz.gz |sed 's/\(\.gz\)\+/.gz/'
ls *.gz | perl -ne '/((.*?.gz).*)/; print "mv $1 $2\n"'
It will print shell commands to rename your files, it won't execute those commands. It is safe. To execute it, you can save it to file and execute, or simply pipe to shell:
ls *.gz | ... | sh
sed is great for replacing text inside files.
You can do that with bash string substitution:
for file in *.gz.gz; do
mv "${file}" "${file%%.*}.gz"
done
This might work for you (GNU sed):
echo *.gz | sed -r 's/^([^.]*)(\.gz){2,}$/mv -v & \1\2/e'
find . -name "*.gz.gz" |
while read f; do echo mv "$f" "$(sed -r 's/(\.gz)+$/.gz/' <<<"$f")"; done
This only previews the renaming (mv) command; remove the echo to perform actual renaming.
Processes matching files in the current directory tree, as in the OP (and not just files located directly in the current directory).
Limits matching to files that end in at least 2 .gz extensions (so as not to needlessly process files that end in just one).
When determining the new name with sed, makes sure that substring .gz doesn't just match anywhere in the filename, but only as part of a contiguous sequence of .gz extensions at the end of the filename.
Handles filenames with special chars. such as embedded spaces correctly (with the exception of filenames with embedded newlines.)
Using bash string substitution:
for f in *.gz.gz; do
mv "$f" "${f%%.gz.gz*}.gz"
done
This is a slight modification of jaypal's nice answer (which would fail if any of your files had a period as part of its name, such as foo.c.gz.gz). (Mine is not perfect, either) Note the use of double-quotes, which protects against filenames with "bad" characters, such as spaces or stars.
If you wish to use find to process an entire directory tree, the variant is:
find . -name \*.gz.gz | \
while read f; do
mv "$f" "${f%%.gz.gz*}.gz"
done
And if you are fussy and need to handle filenames with embedded newlines, change the while read to while IFS= read -r -d $'\0', and add a -print0 to find; see How do I use a for-each loop to iterate over file paths output by the find utility in the shell / Bash?.
But is this renaming a good idea? How was your filename.gz.gz created? gzip has guards against accidentally doing so. If you circumvent these via something like gzip -c $1 > $1.gz, buried in some script, then renaming these files will give you grief.
Another way with rename:
find . -iname '*.gz.gz' -exec rename -n 's/(\.\w+)\1+$/$1/' {} +
When happy with the results remove -n (dry-run) option.

Copying html files to create erb versions with bash script

I'm trying to write a bash (OSX) script that finds all html files in a directory and copies them to create erb files with underscores at the beginning of the file name. So test1.html would become _test1.html.erb for instance.
I was trying to do it a bit like this but there's probably a better way (and this way isn't finished)
find . -regex '.*/[^_].*\.html$' | while read file; do [need to do the copy X.html file to create new _X.html.erb file in here]; done
Any ideas?
Thanks!
Here is a for loop version:
for file in *html ; do
cp ${file} _${file}.ebr
done
and here is a find version:
find ./ -name "*html" -exec sh -c 'cp {} _$(basename {}).ebr' \;
find *.html | while read files
do
newname="_${files}.erb"
mv -v "${files}" "${newname}"
done

Move all images in folder to subfolder, and update all references in text files to those images to their new location?

I have a folder which contains a ~50 text files (PHP) and hundreds of images. I would like to move all the images to a subfolder, and update the PHP files so any reference to those images point to the new subfolder.
I know I can move all the images quite easily (mv *.jpg /image, mv *.gif /image, etc...), but don't know how to go about updating all the text files - I assume a Regex has to be created to match all the images in a file, and then somehow the new directory has to be appended to the image file name? Is this best done with a shell script? Any help is appreciated (Server is Linux/CentOs5)
Thanks!
sed with the -i switch is probably what you're looking for. -i tells sed to edit the file in-place.
Something like this should work:
find /my/php/location -name '*.php' | xargs sed -ie 's,/old/location/,/new/location/,g'
You could do it like this:
#!/bin/sh
for f in *.jpg *.png *.gif; do
mv $f gfx/
for p in *.txt; do
sed -i bak s,`echo $f`,gfx/`echo $f`,g $p
done
done
It finds all jpg/png/gif files and moves them to the "gfx" subfolder, then for each txt file (or whatever kind of file you want it edited in) it uses "sed" in-place to alter the path.
Btw. it will create backup files of the edited files with the extra extension of "bak". This can be avoided by omitting the "bak" part in the script.
This will move all images to a subdir called 'images' and then change only links to image files by adding 'images/' just before the basename.
mkdir images
mv -f *.{jpg,gif,png,jpeg} images/
sed -i 's%[^/"'\'']\+\.\(gif\|jpg\|jpeg\|png\)%images/\0%g' *.php
If you have thousands of files, you may need to utilize find and xargs. So, a bit slower
find ./ -regex '.*\(gif\|jpg\|png\|jpeg\)' -exec mv {} /tmp \;
find ./ -name \*.php -print0 | \
xargs -0 sed -i 's%[^/"'\'']\+\.\(gif\|jpg\|jpeg\|png\)%images/\0%g' *.php
Caution, it will also change the path to images with remote urls. Also, make sure you have a full backup of your directory, php syntax and variable names might cause problems.