Recursively go through directories and files in bash + use wc - regex

I need to go recursively through directories. First argument must be directory in which I need to start from, second argument is regex which describes name of the file.
ex. ./myscript.sh directory "regex"
While script recursively goes through directories and files, it must use wc -l to count lines in the files which are described by regex.
How can I use find with -exec to do that? Or there is maybe some other way to do it? Please help.
Thanks

Yes, you can use find:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \;
Or, if you want to count the total number of lines, in all files:
$ find DIR -iname "regex" -type f -exec wc -l '{}' \; | awk '{ SUM += $1 } END { print SUM }'
Your script would then look like:
#!/bin/bash
# $1 - name of the directory - first argument
# $2 - regex - second argument
if [ $# -lt 2 ]; then
echo Usage: ./myscript.sh DIR "REGEX"
exit
fi
find "$1" -iname "$2" -type f -exec wc -l '{}' \;
Edit: - if you need more fancy regular expressions, use -regextype posix-extended and -regex instead of -iname as noted by #sudo_O in his answer.

Related

Strip unwanted characters from a series of text files

Hi I've got a list of csv files which need to be formatted properly by getting rid of some unwanted characters.
original:
9: ["2019-4-24",-7.101458109105941]
10: ["2019-5-6",-7.050609022950812]
100: ["2019-5-6",-7.050609022950812]
I'd like to modify as:
2019-4-24,-7.101458109105941
2019-5-6,-7.050609022950812
2019-5-6,-7.050609022950812
There are dozens of files in this format and I was thinking of writing a sed command which makes a series of null substitutions for all the files in directory, but these don't seem to work.
find ./ -type f -exec sed -i '' -e "s/^[[:space:]]*//" {} \;
find ./ -type f -exec sed -i '' -e "s/\[//" {} \;
find ./ -type f -exec sed -i '' -e "s/\]//" {} \;
Many thanks for suggestions.
I found this to work on my linux machine.
find ./ -type f -exec sed -i "s/^.\+\[//;s/\"//g;s/\]//" {} \;
Which, from what I gather is equivalent to the following in macOS:
find ./ -type f -exec sed -i '' "s/^.\+\[//;s/\"//g;s/\]//" {} \;
It comprises of 3 substitutions(separated by semicolon):
s/^.\+\[// deletes everything from the start to the "[" character.
s/\"//g deletes all occurences of the double quote character.
s/\]// deletes the final "]" at the end.
And please make a backup or something if you are going to use sed -i.

Nesting Find and sed command in if else

I am using a find and sed command to replace characters in a file. see the code 1 below
find . -type f -exec sed -i '/Subject/{:a;s/(Subject.*)Subject/\1SecondSubject/;tb;N;ba;:b}' {} +
Given that I have multiple files I need to replace. In a given situation, the Subject I am trying to replace is not available.
Is there a way I can first check if the file contains the attribute 'Subject' if not I need to execute another command. i.e
Check if the file contains character 'Subject'
If true then execute code1 above
If there is no instance of Subject execute code 2 below
find . -name "*.html" -exec rename 's/.html$/.xml/' {} ;
Any Ideas? Thanks in advance
Something like this should work.
find . -type f \( \
-exec grep -q "Subject" {} \; \
-exec sed -i '/Subject/{:a;s/(Subject.*)Subject/\1SecondSubject/;tb;N;ba;:b}' {} \; \
-o \
-exec rename 's/.html$/.xml/' {} \; \)
-exec takes the exit code of the command it executes, so -exec grep -q "Subject" {} \; will only be true if the grep is true. And since the short circuit -o (or) has a lower precedence than the implied -a (and) between the other operators it should conversely only get executed if the grep is false.
You can use find in a process substitution like this:
while IFS= read -d'' -r file; do
echo "processing $file ..."
if grep -q "/Subject/" "$file"; then
sed -i '{:a;s/(Subject.*)Subject/\1SecondSubject/;tb;N;ba;:b}' "$file"
else if [[ $file == *.html ]]; then
rename 's/.html$/.xml/' "$file"
fi
done < <(find . -type f -print0)

How to recursively change files in directories whose name matches a string in Perl?

I have many directories for different projects. Under some project directories, there are subdirectories named "matlab_programs". In only subdirectories named matlab_programs, I would like to replace the string 'red' with 'blue' in files ending with *.m.
The following perl code will recursively replace the strings in all *.m files, regardless of what subdirectories the files are in.
find . -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
And to find the full paths of all directories called matlab_programs,
find . -type d -name "matlab_programs"
How can I combine these so I only replace strings if the files are in a subdirectory called matlab_programs?
Perl has the excellent File::Find module, that lets you specify a callback to be called on each file.
So you can specified a complex compound criteria, like this:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
sub find_files {
next unless m/\.m\z/; # skip any files that don't end in .m
if ( $File::Find::dir =~ m/matlab_programs$/ ) {
print $File::Find::name, " found\n";
}
}
find( \&find_files, "." );
And then you can do whatever you wish with the files you find - like opening/text replacing and closing.
You want to find all directories named matlab_programs using
find . -type d -name "matlab_programs"
and then execute
find $f -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
on all results $f. Judging by your use of xargs, there are no special characters such as spaces in your file names. so the following should work:
find `find . -type d -name "matlab_programs"` -name "*.m" |
xargs perl -p -i -e "s/red/blue/g"
or
find . -type d -name "matlab_programs" |
while read f
do
find $f -name "*.m" | xargs perl -p -i -e "s/red/blue/g"
done |
xargs perl -p -i -e "s/red/blue/g"
Incidentally, I'd use single quotes here; I always use them whenever the quoted string is to be taken literally.
Do you have bash? The $(...) syntax works like backticks (the way both the shell and Perl use them) but they can be nested.
perl -pi -e s/red/blue/g $(find $(find . -type d -name matlab_programs) -type f -name \*.m)
Many flavors of find also support a -path pattern test, so you can just combine your filename conditions into that argument
perl -pi -e s/red/blue/g $(find . -type f -path \*/matlab_programs/\*.m)

List the files in which i'm replacing with sed

I have the command
find . -type f \( ! -name "*.png" \) -print0 | \
xargs -0 sed -i 's#word#replace#g'
awk/sed: How to do a recursive find/replace of a string?
BASH: recursive program to replace text in a tree of files
This command works so far but i want to show the files in which sed replaces text.
Is there some parameter which allows that?
You can use the print and exec options together and print out the files that it processes:
find . -type f \( ! -name "*.png" \) -print -exec sed -i 's#word#replace#g' {} \; 2>/dev/null
You are missing cmp to compare two files pre and post change. Something on these lines you could try:
find . -type f \( ! -name "*.png" \) -exec sh -c 'sed "s/word/replace/g" $0 > $0.$$; if ! cmp $0.$$ $0; then echo $0; mv $0.$$ $0; else rm $0.$$; fi' {} \;
find - in current directory all the files name not like *.png
exec - sed search and replace word by relace word in file found by find command and put it in temp file appended by process id
if !cmp statement - compare both the files new and old and if they are not same then print file name along with the output and at the end move temp file to orginal file if they are not same else delete the temp file.
I don't know what platform you are on, but this is a possibility. Change your xargs to run a shell so you can do multiple commands and then inside that shell, test if the file is newer than some arbitrary file you created at the start - i.e. it is changed
touch /tmp/go
find ... | xargs ... -I % sh -c 'sed "%"; [ "%" -nt /tmp/go ] && echo %'

Search for lines with specific text in files and output file names (file command and grep)

I know this question may be simple to some of you, but I have tried several combinations and googled a lot, without success.
Problem: I have a bunch of files with a given file name, but in different directories.
For example, I have a file called 'THEFILE.txt' in directories a, b, c, d. I am in a directory that has these as subdirectories. In each of 'THEFILE.txts' I am looking for lines with the following pattern :'Has this property blah blah blah _apple'. So what I know for sure about the line is that it starts with 'Has this property ' and ends with '_apple'.
I tried:
find . -name 'THEFILE.txt' -exec grep -l 'Has this property' {} \;
This works, but I get each and every line with 'Has this property'. I only want ones with _apple at the end
So I tried:
find . -name 'THEFILE.txt' -exec grep -l 'Has this property*_apple' {} \; //Does not work, and from my google searches, I don't expect it to.
So, next I tried:
find . -name 'THEFILE.txt' -exec grep -l 'Has this property[!-~]*_apple' {} \;
//DOES NOT WORK
find . -name 'THEFILE.txt' -exec grep 'Has this property' {} \; | grep '_apple$'
//This outputs all matching lines, but not the file names
find . -name 'THEFILE.txt' -exec grep 'Has this property' {} \; | grep -l '_apple$'
//Says file is stdin
Expected output: (say files a and c have desired lines)
./a/THEFILE.txt
./c/THEFILE.txt
Your attempt 2. was almost there. With a little adjustment:
find . -name THEFILE.txt -exec grep -q '^Has this property.*_apple$' '{}' ';' -print
it is more precise than recursive grepping and simpler (no pipeline).
The reason why the above works (as opposed to grep -l), is that the -exec action evaluates to whatever the exit status of its command was.
grep will exit with 0 status (which is true) if it finds what it looked for, that will make -exec yield true, and that in turn will cause the next action (-print) to be taken too.
You can just use grep -r, and pipe it into awk if you only want the filenames.
grep -r "_apple$" . | awk -F: '{print $1}'