How can I make this script more concise? - regex

I wrote a little script which prints the names of files containing problematic character sequences.
#!/bin/bash
# Finds all files in the repository that contain
# undesired characters or sequences of characters
pushd .. >/dev/null
# Find Windows newlines
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l $'\r'
# Find tabs (should be spaces)
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l $'\t'
# Find trailing spaces
find . -type f | grep -v ".git/" | grep -v ".gitmodules" | grep -v "^./lib" | xargs grep -l " $"
popd >/dev/null
I'd line to combine this into one line, i.e. by having grep look for \r OR \t or trailing spaces. How would I construct a regex to do this? It seems that for escape characters a special sequence needs to be used ($'\X') and I'm not sure how to combine these...
I'm running OS X, and am looking for a solution that works on both BSD and GNU based systems.

find . -type f | grep -E -v ".git/|.gitmodules|^./lib" | xargs grep -E -l '$\r|$\t| $'
Not certain that '$\r|$\t| $' will work quoted that way, with a simple test on my system it seemed to work.
I'm using the -E (extended reg-exp) to grep, that allows 'OR'ing together multiple search targets.
Older Unix-en may or maynot support the -E option, so if you get an error message flagging that, replace all grep -E with egrep.
I hope this helps.

Related

Find & Replace String in All Found Files with LaTeX $sim$ -> $\sim$

I know this sort of question has been asked many times before, but I'm running into an odd circumstance where my feeble brain forgot to include a \ while calling $\sim$ in some markdown files. I need to go through and replace all instances of $sim$ with $\sim$. My code is running but not actually replacing any of the words that I want. Here are some variations I have tried:
grep -rl '\$sim\$' . | xargs sed -i 's/\$sim\$/$\sim$/g'
grep -rlF '$sim$' . | xargs sed -i 's/\$sim\$/$\sim$/g'
grep -rlF '$sim$' . | xargs sed -i 's/$sim$/$\sim$/g'
grep -rlF '$sim$' . | xargs sed -i '' -e 's/$sim$/$\sim$/g'
And other odd variations on a theme. The code just runs with no output but when I check the files nothing has changed. I figure this is either a sed issue (I'm macOS) or a regex issue.
Like this :
grep -rlF '$sim$' | xargs sed -i 's/$sim\$/$\\sim$/g'
for MacOsX :
grep -rlF '$sim$' | xargs sed -i '' 's/$sim\$/$\\sim$/g'
sed -i changes files in place, however you aren't telling sed to operate on any files. You are giving sed its input on stdin.
What you want is something like
find . -type f -exec sed -i 's/\$sim\$/$\\sim\$/g' {} \;

Find files with regex match and different regex not match

I have three files foo1.txt, foo2.txt and foo3.txt, which contain the following lines
# foo1.txt
JOBDONE
and
# foo2.txt
Execution halted
and
# foo3.txt
Execution halted
JOBDONE
I have been able find the ones with both JOBDONE and Execution halted using:
find ./foo*.txt | xargs grep -l 'Execution halted' | xargs grep -l "JOBDONE"
But have not been able to find those files which have JOBDONE or Execution halted but not both. I have tried:
find ./foo*.txt | xargs grep -lv "JOBDONE" | xargs grep -l "Execution halted"
find ./foo*.txt -exec grep -lv "JOBDONE" {} \; | xargs grep -l "Execution halted"
but have been incorrectly (to my understanding) returning
./foo2.txt
./foo3.txt
What is wrong with my understanding of how xargs and exec works with grep and how do I use grep or another portable command to select those logs that have JOBDONE but not Execution halted or vice versa?
Here is an gnu awk (gnu due to multiple characters in RS)
awk -v RS="#-#-#" '/JOBDONE/ && /Execution halted/ {print FILENAME}' foo*
foo3.txt
Setting RS to something that is not in the file, it will thread all lines as one.
Then test if the long line has both string, and if yes, print filename

Using pcregrep for multiple files

I am trying to use pcregrep multiline match on a set of files. And those files itself are coming out some searches from the current directory, something like below:
l | grep -P "\d\.mt.+" | cut -d":" -f 2 | cut -d" " -f 2 | xargs
So, I want to do a pcregrep on these set of files, and that is a multiline match, as below:
pcregrep -Mi "index(.+\n)+" list of files
I don't know, if it's possible to give the list of file names like this.
Can someone help?
Regards,
Manu
Try this :
l | grep -P "\d\.mt.+" | cut -d":" -f 2 | cut -d" " -f 2 | xargs pcregrep -Mi "index(.+\n)+"
Your command provides xargs at the end but with no command to use it.
Now, xargs is useful and the command is just like
pcregrep <*list of all found files*>
That's the idea behind xargs.

grep with extended regex over multiple lines

I'm trying to get a pattern over multiple lines. I would like to ensure the line I'm looking for ends in \r\n and that there is specific text that comes after it at some point. The two problems I've had are I often get unmatched parenthesis in groupings or I get a positive match when there is none. Here are two simple examples.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'(\r\n)+.*TEST'
grep: Unmatched ( or \(
What exactly is unmatched there? I don't get it.
echo -e -n "ab\r\ncd" | grep -U -c -z -E $'\r\n.*TEST'
1
There is no TEST in the string, so why does this return a count of 1 for matches?
I'm using grep (GNU grep) 2.16 on Ubuntu 14. Thanks
Instead of -E you can use -P for PCRE support in gnu grep to use advanced regex like this:
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*TEST'
0
echo -ne "ab\r\ncd" | ggrep -UczP '\r\n.*cd'
1
grep -E matches only in single line input.

Exclude multiple lines from in Bash (Version: 3.2.48)

The idea is to calculate SHA256 hashes for all files in a directory (including all subdirectories), but exclude some files specified in another text file.
The problem is if I specify the following files (see below the code) to exclude, only one of them is excluded, not both.
Here is my code:
while read line
do
if [ $line_count -eq 0 ]
then
exclude_files=".*/$line$"
else
exclude_files="${exclude_files}\|.*/$line$"
fi
line_count=$(( $line_count + 1 ))
done < exclude-files.txt
find . -type f -print0 | xargs -0 shasum -a 256 | grep -v -P "${exclude_files}" > ~/out.txt
Contents of the file exclude-files.txt:
Icon\\r
.DS_Store
--- empty line ---
The file Icon\r is a special file for changing a folder's icon, its name contains a CR. (I'm on Mac OS X 10.7.4)
This is because in your variable \ is recognized as escape symbol for |:
exclude_files="${exclude_files}\|.*/$line$"
you need to add anothers \ to escape \ to get it work:
exclude_files="${exclude_files}\\|.*/$line$"
Also you're using -P option in grep. In this case you don't need to escape |. Therefore you use it without backslash at all.
You should to chose which way you will use: escape or -P. Both together they won't work.
grep would not be safe if filenames contain character with a special meaning, maybe this can help
cmd=(find . -type f \( )
while read line;do cmd=("${cmd[#]}" \! -name "$line" -a);done < exclude-files.txt
cmd[${#cmd[*]}-1]=\)
echo "${cmd[#]}" | cat -v
"${cmd[#]}" -print0 | xargs -0 shasum -a 256