Bash go through list of dirs and generate md5 - regex

What would be the bash script that:
Goes through a directory, and puts all the sub-directories in an array
For each dir, generate an md5 sum of a file inside that dir
Also, the file who's md5sum has to be generated doesn't always have the same name and path. However, the pattern is always the same:
/var/mobile/Applications/{ the dir name here is taken from the array }/{some name}.app/{ binary, who's name is the same as it's parent dir, but without the .app extension }
I've never worked with bash before (and have never needed to) so this may be something really simple and nooby. Anybody got an idea? As can be seen by the path, this is designed to be run on an iDevice.

for dir in /var/mobile/Applications/*; do
for app in "$dir"/*.app; do
appdirname=${app##*/}
appname=${appdirname%.app}
binary="$app/$appname"
if [ -f "$binary" ]; then
echo "I: dir=$dir appbase=$appbase binary=$binary"
fi
done
done
Try this, I hope the code is straight-forward. The two things worth explaining are:
${app##*/}, which uses the ## operator to strip off the longest prefix matching the expression */.
${appdirname%.app}, which uses the % operator to strip off the shortest suffix matching the expression .app. (You could have also used %% (strip longest suffix) instead of %, since the pattern .app is always four characters long.)

Try something like:
ls -1 /Applications/*/Contents/Info.plist | while read name; do md5 -r "$name"; done
the above will show md5 checksum for all Info.plist files for all applications, like:
d3bde2b76489e1ac081b68bbf18a7c29 /Applications/Address Book.app/Contents/Info.plist
6a093349355d20d4af85460340bc72b2 /Applications/Automator.app/Contents/Info.plist
f1c120d6ccc0426a1d3be16c81639ecb /Applications/Calculator.app/Contents/Info.plist

Bash is very easy but you need to know the cli-tools of your system.
For to print the md5 hash of all files of the a directory recursively:
find /yourdirectory/ -type f | xargs md5sum
If you only want to list the tree of directories:
find /tmp/ -type d
You can generate a list with:
MYLIST=$( find /tmp/ -type d )
Use "for" for iterate the list:
for i in $MYLIST; do
echo $i;
done
If you are a newbie in bash:
http://tldp.org/LDP/Bash-Beginners-Guide/html/
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

Related

For the love of BASH, regex, locate & find - contains A not B

Goal: Regex pattern for use with find and locate that "Contains A but not B"
So I have a bash script that manipulates a few video files.
In its current form, I create a variable to act on later with a for loop that works well:
if [ "$USE_FIND" = true ]; then
vid_files=$(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
else
vid_files=$(locate -ir "${DIR}.*\.\(mkv\|avi\|ts\|mp4\|m2ts\)")
fi
So "contains A" is any one of the listed extensions.
I'd like to add to a condition where if a certain string (B) is contained the file isn't added to the array (can be a directory or a filename).
I've spent some time with lookaheads trying to implement this to no avail. So an example of "not contains B" as "Robot" - I've used different forms of .*(?!Robot).*
e.g. ".*\(\?\!Robot\).*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" for find but it doesn't work.
I've sort of exhausting regex101.com, terminal and chmod +x at this point and would welcome some help. I think it's the case that's it's called through a bash script causing me the difficulty.
One of my many sources of reference in trying to sort this:
Ref: Is there a regex to match a string that contains A but does not contain B
You may want to avoid the use find inside a process substitution to build a list of files, as, while this is admittedly rare, filenames could contain newlines.
You could use an array, which will handle file names without issues (assuming the array is later expanded properly).
declare -a vid_files=()
while IFS= read -r -d '' file
do
! [[ "$file" =~ Robot ]] || continue
vid_files+=("$file")
done < <(find "${DIR}" -type f -regex ".*\.\(mkv\|avi\|ts\|mp4\|m2ts\)" -print0)
The -print0 option of find generates a null byte to separate the file names, and the -d '' option of read allows a null byte to be used as a record separator (both obviously go together).
You can get the list of files using "${vid_files[#]}" (double quotes are important to prevent word splitting). You can also iterate over the list easily :
for file in "${vid_files[#]}"
do
echo "$file"
done

locate command with regular expression to locate multiples file at a time

There are files like compile.x86.log compile.x86.log-1 compile.x86.log-2 compile.x86.log-3 and error.log error.log_1 error.log_2 error.log_3 want to use locate command to locate only compile.x86.log and error.log among them.
So far I tried
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+compile.x86\.log$')
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+error\.log$')
With above individual approach it taking search/execution time as 0m18.068s.
How to combine above two?
Also please provide if some other better solution available only with locate command preferably with locate -b option to search exact names as (compile.x86.log and error.log) in less time.
I have tried echo $(/usr/bin/locate -i -b "compile.x86.log")
It's taking command execution time 0m1.887s only but returning compile.x86.log-1 compile.x86.log-2 compile.x86.log-3 in result instead of returning only compile.x86.log which I don't want.
Is there any way to grep the locate result to return only (compile.x86.log and error.log) in this approach.
Because the locate database outputs entries as absolute path names, and the pattern match test applies to the whole path name (the globbing character * does not treat / specially),
locate -i '*/compile.x86.log' '*/error.log'
does what you want.
By the way, the echo $(…) around a command seems waste.
Following your logic, the most simple answer is:
echo $(/usr/bin/locate -ir '^/\([^.][^/]\+/\)\+(compile\.x86|error)\.log$')
where (compile\.x86|error) is the combination that means "this or this" pattern.
Otherwise, using find command would be better:
find -type f -name "compile.x86.log" -o -name "error.log"

batch renaming of files with perl expressions

This should be a basic question for a lot of people, but I am a biologist with no programming background, so please excuse my question.
What I am trying to do is rename about 100,000 gzipped data files that have existing name of a code (example: XG453834.fasta.gz). I'd like to name them to something easily readable and parseable by me (example: Xanthomonas_galactus_str_453.fasta.gz).
I've tried to use sed, rename, and mmv, to no avail. If I use any of those commands on a one-off script then they work fine, it's just when I try to incorporate variables into a shell script do I run into problems. I'm not getting any errors, just no names are changed, so I suspect it's an I/O error.
Here's what my files look like:
#! /bin/bash
# change a bunch of file names
file=names.txt
while IFS=' ' read -r r1 r2;
do
mmv ''$r1'.fasta.gz' ''$r2'.fasta.gz'
# or I tried many versions of: sed -i 's/"$r1"/"$r2"/' *.gz
# and I tried many versions of: rename -i 's/$r1/$r2/' *.gz
done < "$file"
...and here's the first lines of my txt file with single space delimiter:
cat names.txt
#find #replace
code1 name1
code2 name2
code3 name3
I know I can do this with python or perl, but since I'm stuck here working on this particular script I want to find a simple solution to fixing this bash script and figure out what I am doing wrong. Thanks so much for any help possible.
Also, I tried to cat the names file (see comment from Ashoka Lella below) and then use awk to move/rename. Some of the files have variable names (but will always start with the code), so I am looking for a find & replace option to just replace the "code" with the "name" and preserve the file name structure.
I suspect I am not escaping the variable within the single tick of the perl expression, but I have poured over a lot of manuals and I can't find the way to do this.
If you're absolutely sure than the filenames doesn't contain spaces of tabs, you can try the next
xargs -n2 < names.txt echo mv
This is for DRY run (will only print what will do) - if you satisfied with the result, remove the echo ...
If you want check the existence ot the target, use
xargs -n2 < names.txt echo mv -i
if you want NEVER allow overwriting of the target use
xargs -n2 < names.txt echo mv -n
again, remove the echo if youre satisfied.
I don't think that you need to be using mmv, a simple mv will do. Also, there's no need to specify the IFS, the default will work for you:
while read -r src dest; do mv "$src" "$dest"; done < names.txt
I have double quoted the variable names as it is generally considered good practice but in this case, a space in either of the filenames will result in read not working as you expect.
You can put an echo before the mv inside the loop to ensure that the correct command will be executed.
Note that in your file names.txt, the .fasta.gz suffix is already included, so you shouldn't be adding it inside the loop aswell. Perhaps that was your problem?
This should rename all files in column1 to column2 of names.txt. Provided they are in the same folder as names.txt
cat names.txt| awk '{print "mv "$1" "$2}'|sh

unix find filenames that are lexicographically less that a given filename

I have a list of files in a directory that are automatically generated by a system with the date in the filename. Some examples are: audit_20111020, audit_20111021, audit_20111022, etc.
I want to clean up files older than 18 months therefore I want to put together a unix find command that will find files less than audit_20100501 and delete them.
Does any know how to use lexicographical order as a criteria in the find command?
Another Perl variant:
perl -E'while(<audit_*>) { say if /(\d{8})/ && $1 < 20100501}'
Replace say by unlink if it prints expected filenames.
Note: < performs numerical comparison, use lt if you want string comparison.
With Perl it's easy. Type perl and:
for (glob "*")
{
my($n) = /(\d+)/;
unlink if ($n < 20100501);
}
^D
Test before using. Note that I'm assuming this is a fixed format and the directory only contains these files
It is possible to sort find's result using the sort command:
find . -name "audit*" | sort -n
... then find a way to split this list.
But for what you want to do, i.e. delete directories older than a certain date (18 months is ~547 days), you could use the below instead:
find -ctime -547 -type d | xargs -I{} rm -rf {}

find and replace within file

I have a requirement to search for a pattern which is something like :
timeouts = {default = 3.0; };
and replace it with
timeouts = {default = 3000.0;.... };
i.e multiply the timeout by factor of 1000.
Is there any way to do this for all files in a directory
EDIT :
Please note that some of the files are symlinks in the directory.Is there any way to get this done for symlinks also ?
Please note that timeouts exists as a substring also in the files so i want to make sure that only this line gets replaced. Any solution is acceptable using sed awk perl .
Give this a try:
for f in *
do
sed -i 's/\(timeouts = {default = [0-9]\+\)\(\.[0-9]\+;\)\( };\)/\1000\2....\3/' "$f"
done
It will make the replacements in place for each file in the current directory. Some versions of sed require a backup extension after the -i option. You can supply one like this:
sed -i .bak ...
Some versions don't support in-place editing. You can do this:
sed '...' "$f" > tmpfile && mv tmpfile "$f"
Note that this is obviously not actually multiplying by 1000, so if the number is 3.1 it would become "3000.1" instead of 3100.0.
you can do this
perl -pi -e 's/(timeouts\s*=\s*\{default\s*=\s*)([0-9.-]+)/print $1; $2*1000/e' *
One suggestion for whichever solution above you decide to use - it may be worth it to think through how you could refactor to avoid having to modify all of these files for a change like this again.
Do all of these scripts have similar functionality?
Can you create a module that they would all use for shared subroutines?
In the module, could you have a single line that would allow you to have a multiplier?
For me, anytime I need to make similar changes in more than one file, it's the perfect time to be lazy to save myself time and maintenance issues later.
$ perl -pi.bak -e 's/\w+\s*=\s*{\s*\w+\s*=\s*\K(-?[0-9.]+)/sprintf "%0.1f", 1000 * $1/eg' *
Notes:
The regex matches just the number (see \K in perlre)
The /e means the replacement is evaluated
I include a sprintf in the replacement just in case you need finer control over the formatting
Perl's -i can operate on a bunch of files
EDIT
It has been pointed out that some of the files are shambolic links. Given that this process is not idempotent (running it twice on the same file is bad), you had better generate a unique list of files in case one of the links points to a file that appears elsewhere in the list. Here is an example with find, though the code for a pre-existing list should be obvious.
$ find -L . -type f -exec realpath {} \; | sort -u | xargs -d '\n' perl ...
(Assumes none of your filenames contain a newline!)