pattern matching while using ls command in bash script - regex

In a sh script, I am trying to loop over all files that match the following pattern
abc.123 basically abc. followed by only numbers, number following . can be of any length.
Using
$ shopt -s extglob
$ ls abc.+([0-9])
does the job but on terminal and not through the script. How can I get only files that match the pattern?

if I understood you right, the pattern could be translated into regex:
^abc\.[0-9]+$
so you could
keep using ls and grep the output. for example:
ls *.*|xargs -n1|grep -E '^abc\.[0-9]+$'
or use find
find has an option -regex

If you're using sh and not bash, and presumably you also want to be POSIX compliant, you can use:
for f in ./*
do
echo "$f" | grep -Eq '^\./abc.[0-9]+$' && continue
echo "Something with $f here"
done
It will work fine with filenames with spaces, quotes and such, but may match some filenames with line feeds in them that it shouldn't.
If you tagged your question bash because you're using bash, then just use extglob like you described.

Related

How to properly run find | parallel with grep + escape characters?

I have approximately 1500 2GB files in a folder and would like to extract lines from them based on a regex. I tried:
find . -regex "filename pattern" -exec grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {} +
which works perfectly, but is pretty slow. I then read about running grep with GNU parallel, but couldn't figure out how to properly use it. Here's what I tried:
find . -regex "filename pattern" | parallel grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t" {}
along with a few variations of this command. However, I get in return:
/bin/bash: pattern1t: command not found
/bin/bash: pattern3t: command not found
/bin/bash: pattern2t: command not found
...
It seems the problem lies with the \t I use to ensure I match an entire string in a column of a TSV file. The grep command without parallel works perfectly with this regex.
How can I use escape characters in the grep regex with parallel?
As #Mark Setchell pointed out, I missed the "--quote" argument! This solution works:
find . -regex "filename pattern" -print0 | parallel -0 --quote grep -P "pattern1\t|pattern2\t|pattern3\t|...|patternN\t"

renaming a file with regex in bash MacOSX

I have file names like this
223h123.sdsdas.png
which I would like to rename to
sdsdas.png
I am using this command
for i in *.png;do mv "$i" "${i/[a-zA-Z0-9]*/}";done
which gives me this instead
png,
I am using bash on MacOS X.
You're confusing regular expressions with Parameter Expansion. They look a bit similar, but they are not the same.
A regular expression parser might allow you to make a regex-based substitution, for example:
[ghoti#pc ~]$ i="223h123.sdsdas.png"
[ghoti#pc ~]$ echo "$i" | sed 's/^[^.]*\.\([^.]*\)/\1/'
sdsdas.png
[ghoti#pc ~]$
Sed, the Stream Editor, has a substitute command that takes a Basic Regular Expression and replaces \1 with the first bracketed atom of the regex.
Alternately, you could use parameter expansion to strip off text to the first dot.
[ghoti#pc ~]$ i="223h123.sdsdas.png"
[ghoti#pc ~]$ echo ${i#*.}
sdsdas.png
[ghoti#pc ~]$

SED command matches regex but does not substitute

I am working on building a .sed file to start scripting the setup of multiple apache servers. I am trying to get sed to match the default webmaster email addresses in the .conf file which works great with this egrep. However when I use sed to try and so a substitute search and replace i get no errors back but it also does not do any substituting. I test this by running the same egrep command again.
egrep -o '\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b' /home/test/httpd.conf
returns
admin#your-domain.com
root#localhost
webmaster#dummy-host.example.com
The sed command I'm trying to use is
sed -i '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
After running I try and verify the results by running the egrep again and it returns the same 3 email address indicating nothing was replaced.
Don't assume that any two tools use the same regular expression syntax. If you're going to be doing replacements with sed, use sed to test - not egrep. It's easy to use sed as if it were a grep command: sed -ne '/pattern/p'.
sed must be told that it needs to use extended regular expressions using the -r option then making the sed command as follows.
sed -ir '/ServerAdmin/ s/\b[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+(\.[A-Za-z]{2,4})?\b/MY_ADMIN_ADDRESS#gmail.com/g' /home/test/httpd.conf
Much thanks to Kent for pointing out that the address it was missing wasnt following a ServerName

Making a mistake with | (or) in regular expressions

I'm sure I'm doing something really obviously wrong here, but I can't figure out what. Using grep from a bash shell, I have a file test.txt:
ABC123
ABC456
ABC789
DEF123
DEF456
DEF789
Now at the command line:
$ grep ABC test2.txt
ABC123
ABC456
ABC789
$ grep DEF test2.txt
DEF123
DEF456
DEF789
So those work great. Now, I expect the following command to print the whole file, but:
$ grep ABC\|DEF test2.txt
$ grep (ABC)\|(DEF) test2.txt
-bash: syntax error near unexpected token `ABC'
$ grep \(ABC\)\|\(DEF\) test2.txt
$ grep 'ABC|DEF' test2.txt
What am I doing wrong?
Turn on the extended regex with -E:
grep -E "ABC|DEF" test2.txt
As others have pointed out, standard grep command does not support the or syntax. Unfortunately, from there, things are a mishmash.
Some systems have a egrep that does offer or syntax.
Some systems can use a -E or -P (for Perl) flag to extend grep syntax.
Some systems have both the -E and egrep that do the same thing. This implies that there are systems out there where grep -E and egrep are not the same. (sad but true).
Some systems now use the extended regular expressions in their standard grep command. Apparently, your system doesn't.
Read your manpages to see what your system does support. Some systems have a manpage for re_formatthat will explain what they support and don't support in extended format.
Then again, you could always just use a Perl one-liner:
$ perl -ne "print if /(ABC)|(DEF)/" test.txt
At least you know all the stuff that supports.
I don't think standard syntax supports it. You could use -P switch if available:
grep -P "(ABC|DEF)" test2.txt
Use egrep instead, which is the same as using grep -E:
egrep 'ABC|DEF' test2.txt

How to go from a multiple line sed command in command line to single line in script

I have sed running with the following argument fine if I copy and paste this into an open shell:
cat test.txt | sed '/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}/{N
s/[,0-9]\{0,\}[0-9]\{1,\}[acd][0-9]\{1,\}[,0-9]\{0,\}\n\-\-\-//}'
The problem is that when I try to move this into a KornShell (ksh) script, the ksh throws errors because of what I think is that new line character. Can anyone give me a hand with this? FYI: the regular expression is supposed to be a multiple line replacement.
Thank you!
This: \{0,\} can be replaced by this: *
This: \{1,\} can be replaced by this: \+
It's not necessary to escape hyphens.
The newline can be replaced by -e (or by a semicolon)
The cat can be replaced by using the filename as an argument to sed
The result:
sed -e '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N' -e 's/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
or
sed '/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*/{N;s/[,0-9]*[0-9]\+[acd][0-9]\+[,0-9]*\n---//}' test.txt
(untested)
can you try to put your regex in a file and call sed with the option -f ?
cat test.txt | sed -f file.sed
Can you try to replace the new line character with `echo -e \\r`
The Korn Shell - unlike the C Shell - has no problem with newlines in strings. The newline is very unlikely to be your problem, therefore. The same comments apply to Bourne and POSIX shells, and to Bash. I've copied your example and run it on Linux under both Bash and Korn shell without any problem.
If you use C Shell for your work, are you sure you're running 'ksh ./script' and not './script'?
Otherwise, there is some other problem - an unbalanced quote somewhere, perhaps.
Check out the '-v' and '-n' options as well as the '-x' option to the Korn Shell. That may tell you more about where the problem is.