How to properly run find | parallel with grep + escape characters? - regex

I have approximately 1500 2GB files in a folder and would like to extract lines from them based on a regex. I tried:
find . -regex "filename pattern" -exec grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {} +
which works perfectly, but is pretty slow. I then read about running grep with GNU parallel, but couldn't figure out how to properly use it. Here's what I tried:
find . -regex "filename pattern" | parallel grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {}
along with a few variations of this command. However, I get in return:
/bin/bash: pattern1t: command not found
/bin/bash: pattern3t: command not found
/bin/bash: pattern2t: command not found
...
It seems the problem lies with the \t I use to ensure I match an entire string in a column of a TSV file. The grep command without parallel works perfectly with this regex.
How can I use escape characters in the grep regex with parallel?

As #Mark Setchell pointed out, I missed the "--quote" argument! This solution works:
find . -regex "filename pattern" -print0 | parallel -0 --quote grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t"

Related

UNIX: error when I try to use array

I have this code and when I try it in FreeBSD it shows me a lot of errors... how can I fix it? I check directories, if it match with variable IGN. NAME_d should be an array.
max_d=$(find "${DIR}" -type d | wc -l)
for i in `seq 1 $max_d`
do
check_d=$(find "${DIR}" -type d | head -n "${i}" | tail -n -1 | tr '\/' '\n' | egrep -n "${IGN}")
if [ ! -z "$check_d" ]; then
NAME_d+=$i"d "
fi
done
directory_d=${NAME_d[*]}
sedCmds_d=${directory_d// /;}
Arrays are a bashism not supported by the Almquist shell, the default bourne style shell on FreeBSD (i.e. /bin/sh). An advantage of the shell is that most scripts run about 3 times faster.
If you want to use bashisms, use bash to execute your script. E.g. call it bash dirstat.sh or change the shebang.
This is the correct one for FreeBSD.
#!/usr/local/bin/bash
This is the portable version but requires PATH to be set:
#!/usr/bin/env bash
You also might have to install bash first: pkg add bash
Not sure if it will solve it, but if you're using bash, you should initiate NAME_d as an array
NAME_d=()
and then adding to the array you should also use parens, e.g.
NAME_d+=("${i}d ")

How to escape a parenthesis in a perl pie one-liner?

I want to replace all occurrences of "foo" with "bar(" in all files that contain "foo". I have tried
perl -pie 's/foo/bar\(/g' `grep -ril foo .`
but that just hangs and nothing happens. I have tried varying the number of escape backslashes in front of the opening parenthesis, but to no success. I'm working in bash 4.1.5.
The replacement works fine if I remove the opening parenthesis. Does anyone know how to escape the opening parenthesis?
What you posted exits immediately as Perl tries to open s/foo/bar\(/g as a source file since the e is treated the argument of -i.
$ perl -pie 's/foo/bar\(/g' `grep -ril foo .`
Can't open perl script "s/foo/bar\(/g": No such file or directory
I'm guessing you ran the following instead:
perl -i -pe's/foo/bar\(/g' `grep -ril foo .`
This will hang when grep finds nothing. When no arguments are given, the -i is effectively ignored. The program will read from STDIN and write to STDOUT. So, when grep returns nothing, this program will block waiting for input from STDIN.
Solution:
grep -ril foo . | xargs perl -i -pe's/foo/bar\(/ig'

pattern matching while using ls command in bash script

In a sh script, I am trying to loop over all files that match the following pattern
abc.123 basically abc. followed by only numbers, number following . can be of any length.
Using
$ shopt -s extglob
$ ls abc.+([0-9])
does the job but on terminal and not through the script. How can I get only files that match the pattern?
if I understood you right, the pattern could be translated into regex:
^abc\.[0-9]+$
so you could
keep using ls and grep the output. for example:
ls *.*|xargs -n1|grep -E '^abc\.[0-9]+$'
or use find
find has an option -regex
If you're using sh and not bash, and presumably you also want to be POSIX compliant, you can use:
for f in ./*
do
echo "$f" | grep -Eq '^\./abc.[0-9]+$' && continue
echo "Something with $f here"
done
It will work fine with filenames with spaces, quotes and such, but may match some filenames with line feeds in them that it shouldn't.
If you tagged your question bash because you're using bash, then just use extglob like you described.

How to go from a multiple line sed command in command line to single line in script

I have sed running with the following argument fine if I copy and paste this into an open shell:
cat test.txt | sed '/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}/{N
s/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}\n\-\-\-//}'
The problem is that when I try to move this into a KornShell (ksh) script, the ksh throws errors because of what I think is that new line character. Can anyone give me a hand with this? FYI: the regular expression is supposed to be a multiple line replacement.
Thank you!
This: \{0,\} can be replaced by this: *
This: \{1,\} can be replaced by this: \+
It's not necessary to escape hyphens.
The newline can be replaced by -e (or by a semicolon)
The cat can be replaced by using the filename as an argument to sed
The result:
sed -e '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N' -e 's/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
or
sed '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N;s/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
(untested)
can you try to put your regex in a file and call sed with the option -f ?
cat test.txt | sed -f file.sed
Can you try to replace the new line character with `echo -e \\r`
The Korn Shell - unlike the C Shell - has no problem with newlines in strings. The newline is very unlikely to be your problem, therefore. The same comments apply to Bourne and POSIX shells, and to Bash. I've copied your example and run it on Linux under both Bash and Korn shell without any problem.
If you use C Shell for your work, are you sure you're running 'ksh ./script' and not './script'?
Otherwise, there is some other problem - an unbalanced quote somewhere, perhaps.
Check out the '-v' and '-n' options as well as the '-x' option to the Korn Shell. That may tell you more about where the problem is.

error /usr/local/bin/perl: Argument list too long

ive executed this command to delete malwarm from all my websites files and keep backup from each files but after 1 mint from executing got error
/usr/local/bin/perl: Argument list too long
Can anyone suggest a way to avoid this error , PS ive a huge a mount of files :)
perl -e "s/<script.*PaBUTyjaZYg.*script>//g;" -pi.save $(find /home/ -type f -name '*php*')
Use the xargs command which reads file names from STDIN and runs the command multiple times passing as many filenames as it can to each invocation of the target command
find /home/ -type f -name '*php*' -print0 | xargs -0 perl -e "s/<script.*PaBUTyjaZYg.*script>//g;"
The print0 argument to find works with the -0 argument to xargs to ensure that file names are terminated with a null character. This prevents filenames with embedded spaces from causing an error.