bash ~ How to search through files in a directory - regex

I have a list of files
shopping-list.txt
our-shopping-list.txt
test.txt
my-test.txt
I want to run myscript shopping and get the two files that have the word shopping. I want to run myscript our list and get just the one file.
at the moment I have this
if [[ $fs =~ .*${*}*.* ]]; then
echo $fs
fi
It works a bit, but it would not give me our-shopping-list if each variable has a gap ie. myscript our list it would work if I typed myscript our - list
I have a big list of files and want to find the one I need by guessing the name
my attempt to apply #pacholik's code
snippetdir="~/my_snippets/"
for filename in $snippetdir*; do
file=`basename "$filename"`
fs=${file%.*}
for i in ${*}; do
for j in *${i}*; do
if [[ $fs =~ .*$j*.* ]]; then
echo $fs
fi
done
done
done

Here's a simple brute-force loop.
for file in *; do
t=true
for word in "$#"; do
case $file in
*"$word"*) ;;
*) t=false; break;;
esac
done
$t && echo "$file"
done
I believe this should be portable to any POSIX shell, and potentially beyond (old Solaris etc).

Using just bash expansion (only you have to evaluate twice). Bash join from here.
files=`IFS=* eval 'echo *"${*}"*'`
Then you can iterate over $files
for i in $files; do
echo $i
done

Related

Pattern matching in if statement in bash

I'm trying to count the words with at least two vowels in all the .txt files in the directory. Here's my code so far:
#!/bin/bash
wordcount=0
for i in $HOME/*.txt
do
cat $i |
while read line
do
for w in $line
do
if [[ $w == .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
then
wordcount=`expr $wordcount + 1`
echo $w ':' $wordcount
else
echo "In else"
fi
done
done
echo $i ':' $wordcount
wordcount=0
done
Here is my sample from a txt file
Last modified: Sun Aug 20 18:18:27 IST 2017
To remove PPAs
sudo apt-get install ppa-purge
sudo ppa-purge ppa:
The problem is it doesn't match the pattern in the if statement for all the words in the text file. It goes directly to the else statement. And secondly, the wordcount in echo $i ':' $wordcount is equal to 0 which should be some value.
Immediate Issue: Glob vs Regex
[[ $string = $pattern ]] doesn't perform regex matching; instead, it's a glob-style pattern match. While . means "any character" in regex, it matches only itself in glob.
You have a few options here:
Use =~ instead to perform regular expression matching:
[[ $w =~ .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
Use a glob-style expression instead of a regex:
[[ $w = *[aeiouAEIOU]*[aeiouAEIOU]* ]]
Note the use of = rather than == here; while either is technically valid, the former avoids building finger memory that would lead to bugs when writing code for a POSIX implementation of test / [, as = is the only valid string comparison operator there.
Larger Issue: Properly Reading Word-By-Word
Using for w in $line is innately unsafe. Use read -a to read a line into an array of words:
#!/usr/bin/env bash
wordcount=0
for i in "$HOME"/*.txt; do
while read -r -a words; do
for word in "${words[#]}"; do
if [[ $word = *[aeiouAEIOU]*[aeiouAEIOU]* ]]; then
(( ++wordcount ))
fi
done
done <"$i"
printf '%s: %s\n' "$i" "$wordcount"
wordcount=0
done
Try:
awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
Sample output looks like:
$ awk '/[aeiouAEIOU].*[AEIOUaeiou]/{n++} ENDFILE{print FILENAME":"n; n=0}' RS='[[:space:]]' *.txt
one.txt:1
sample.txt:9
How it works:
/[aeiouAEIOU].*[AEIOUaeiou]/{n++}
Every time we find a word with two vowels, we increment variable n.
ENDFILE{print FILENAME":"n; n=0}
At the end of each file, we print the name of the file and the 2-vowel word count n. We then reset n to zero.
RS='[[:space:]]'
This tells awk to use any whitespace as a word separator. This makes each word into a record. Awk reads the input one record at a time.
Shell issues
The use of awk avoids a multitude of shell issues. For example, consider the line for w in $line. This will not work the way you hope. Consider a directory with these files:
$ ls
one.txt sample.txt
Now, let's take line='* Item One' and see what happens:
$ line='* Item One'
$ for w in $line; do echo "w=$w"; done
w=one.txt
w=sample.txt
w=Item
w=One
The shell treats the * in line as a wildcard and expands it into a list of files. Odds are you didn't want this. The awk solution avoids a variety of issues like this.
Using grep - this is pretty simple to do.
#!/bin/bash
wordcount=0
for file in ./*.txt
do
count=`cat $file | xargs -n1 | grep -ie "[aeiou].*[aeiou]" | wc -l`
wordcount=`expr $wordcount + $count`
done
echo $wordcount

Native bash regexp [[ $f =~ "^[^\.]+$" ]] never matching

I'm currently trying to loop through all files in a certain directory using bash. If the file matches the following regular expression, it outputs the filename. If it doesn't, it outputs 'not' and then the filename. The regular expression is supposed to filter out any files that have a '.' in them.
for f in * ; do
if [[ $f =~ "^[^\.]+$" ]]; then
echo "$f"
else
echo "not $f"
fi
done
It correctly loops through all the files, but for a reason that has stumped me for quite a while, I cannot get it to only exclude files with a '.' in them. For example, in a directory with the following files:
bashrc
gitconfig
install.sh
README.md
vimrc
the output of the script is such:
not bashrc
not gitconfig
not install.sh
not README.md
not vimrc
I validated the regular expression here. Any thoughts?
Don't quote the right-hand side of your expression.
if [[ $f =~ ^[^.]+$ ]]; then
Quotes make the string a literal substring, rather than a regular expression.
For better portability across bash versions, put your regex in a variable (single-quoted, which will make the backslash literal):
re='^[.]+$'
if [[ $f =~ $re ]]; then
That said, you could do this with an extglob as well:
shopt -s extglob # enable extended globs
for f in +([!.]); do
printf 'Matched %q\n' "$f"
done
...or with a general-purpose pattern match:
for f in *; do
if [[ $f = *.* ]]; then
printf '%q contains a dot\n' "$f"
else
printf '%q does not contain a dot\n' "$f"
fi
done

Bash Script sed command not working correctly with file passed through command line

Problem
As I am trying to write a script to rename massive files according to some regex requirement, the command work ok on my iTerm2 succeeds but the same command fails to do the work in the script.
Plus some of my file names includes some Chinese and Korean characters.(don't know whether that is the problem or not)
code
So My code takes three input: Old regex, New regex and the files that need to be renamed.
Here is not code:
#!/bin/bash
# we have less than 3 arguments. Print the help text:
if [ $# -lt 3 ] ; then
cat << HELP
ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp'
'replacement' files...
EXAMPLE: rename all *.HTM files into *.html:
ren 'HTM' 'html' *.HTM
HELP
exit 0
fi
OLD="$1"
NEW="$2"
# The shift command removes one argument from the list of
# command line arguments.
shift
shift
# $# contains now all the files:
for file in "$#"; do
if [ -f "$file" ] ; then
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
if [ -f "$newfile" ]; then
echo "ERROR: $newfile exists already"
else
echo "renaming $file to $newfile ..."
mv "$file" "$newfile"
fi
fi
done
I register the bash command in the .profile as:
alias ren="bash /pathtothefile/ren.sh"
Test
The original file name is "제01과.mp3" and I want it to become "第01课.mp3".
So with my script I use:
$ ren "제\([0-9]*\)과" "第\1课" *.mp3
And it seems that the sed in the script has not worked successfully.
But the following which is exactly the same, works to replaces the name:
$ echo "제01과.mp3" | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
Any thoughts? Thx
Print the result
I have make the following change in the script so that it could print the process information:
newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"`
echo "The ${file} is changed to ${newfile}"
And the result for my test is:
The 제01과.mp3 is changed into 제01과.mp3
ERROR: 제01과.mp3 exists already
So there is no format problem.
Updating(all done under bash 4.2.45(2), Mac OS 10.9)
Testing
As I try to execute the command from the bash directly. I mean with the for loop. There is something interesting. I first stored all the names into a files.txt file using:
$ ls | grep mp3 > files.txt
And do the sed and bla bla. While single command in bash interactive mode like:
$ file="제01과.mp3"
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
gives
第01课.mp3
While in the following in the interactive mode:
files=`cat files.txt`
for file in $files
do
echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
done
gives no changes!
And by now:
echo $file
gives:
$ 제30과.mp3
(There are only 30 files)
Problem Part
And I tried the first command which worked before:
$ echo $file | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
It gives no changes as:
$ 제30과.mp3
So I create a new newfile and tried again as:
$ newfile="제30과.mp3"
$ echo $newfile | sed s/"제\([0-9]*\)과\.mp3"/"第\1课\.mp3"/g
And it gives correctly:
$第30课.mp3
WOW ORZ... Why! Why ! Why! And I try to see whether file and newfile are the same, and of course, they are not:
if [[ $file == $new ]]; then
echo True
else
echo False
fi
gives:
False
My guess
I guess there are some encoding problems , but I have found non reference, could anyone help? Thx again.
Update 2
I seem to understand that there are a huge difference between string and the file name. To be specific, it I directly use a variable like:
file="제30과.mp3"
in the script, the sed works fine. However, if the variable was passed from the $# or set the variable like:
file=./*mp3
Then the sed fails to work. I don't know why. And btw, mac sed has no -r option and in ubuntu -r does not solve the question I mention above.
Some errors combined:
In order to use groups in a regex, you need extended regex -r in sed, -E in grep
escaping correctly is a beast :)
Example
files="제2과.mp3 제30과.mp3"
for file in $files
do
echo $file | sed -r 's/제([0-9]*)과\.mp3/第\1课.mp3/g'
done
outputs
第2课.mp3
第30课.mp3
If you are not doing this as a programming project, but want to skip ahead to the part where it just works, I found these resources listed at http://www.tldp.org/LDP/GNU-Linux-Tools-Summary/html/x4055.htm:
MMV (and MCP, MLN, ...) utilities use a specialized syntax to perform bulk file operations on paths. (http://linux.maruhn.com/sec/mmv.html)
mmv before\*after.mp3 Before\#1After.mp3
Esomaniac, a Java alternative that also works on Windows, is apparently dead (home page is parked).
rename is a perl script you can download from CPAN: https://metacpan.org/release/File-Rename
rename 's/\.JPG$/.jpg/' *.JPG

bash regular expression test: if vs grep

I need to scan each line of a file looking for any characters above hex \x7E. The file has several million rows, so improving efficiency would be great. So far, reading each line in a while loop, this works and finds lines with invalid characters:
echo "$line" | grep -P "[\x7F-\xFF]" > /dev/null 2>&1
if [ $? -eq 0 ]; then...
But this doesn't:
if [[ "$line" =~ [\x7F-\xFF] ]]; then...
I'm assuming it would be more efficient the second way, if I could get it to work. What am I missing?
If you're interested in efficiency, you shouldn't write your loop in bash. You should rethink your program in terms of pipes and use efficient tools.
That said, you can do this with
LC_CTYPE=C LC_COLLATE=C
if [[ "$line" =~ [$'\x7f'-$'\xff'] ]]
then
echo "It contains bytes \x7F or up"
fi
I basically have to split the file. Valid records go to one file, invalid records go to another.
sed -n '/[^\x0-\x7e]/w badrecords
//! w goodrecords'
If you're already using Perl regular expressions, you might as well use perl for the task:
perl -ne '
if (/[\x7F-\xFF]/) {print STDERR $_} else {print}
' file > valid 2> invalid
I'd bet that's faster than a bash loop.
I suspect this would be more efficient, even though it processes the file twice:
grep -P "[\x7F-\xFF]" file > invalid
grep -vP "[\x7F-\xFF]" file > valid
You'd want to write your grep code as
if grep -qP "[\x7F-\xFF]" <<< "$line"; then...

Edit Windows .reg Files with sed / grep: search for a FilePath output RegPath Key Value

I am currently trying to edit a full dump .reg File on Linux Mint. The Goal is to find a given path in the values an then to print out the corresponding regpath the key and the full value itself.
I know that I can achieve this using regex patterns in grep or sed unfortunately I am pretty new to the named programs.
Heres one Example: I am searching for C:\\ProgramData
[HKEY_LOCAL_MACHINE\...]
"noPath0"="1.1.9103.0"
"path0Key"="C:\\ProgramData\\..."
"noPath1"="2.1.9103.0"
"path1Key"="...C:\\ProgramData\\..."
[HKEY_LOCAL_MACHINE\...]
"noPath0"=dword:00000000
The output should be the following:
[HKEY_LOCAL_MACHINE\...]
"path0Key"="C:\\ProgramData\\..."
"path1Key"="...C:\\ProgramData\\..."
I've figured out the following two regexPattern:
Regpath: ^\[.[^\]]*\n
Key+Value: .*C\:\\\\ProgramData.*
The problem is how do I combine both patterns and use them in grep or sed or what ever is more suitable for this task ?
A sed script would be a more elegant way, but for a quick-and-dirty solution, I'd write a script that runs csplit on your first regex, then grep with your second regex on each split file. i.e.
if exist xx* del /q xx*
csplit myfile.reg.txt /^\[/ {*}
for %%f in (xx*) do call :search %%f
goto :EOF
:search
grep ".*C\:\\\\ProgramData.*" %1 >nul
if not "%errorlevel%"=="0" goto :EOF
grep "^\[\|.*C\:\\\\ProgramData.*" %1
goto :EOF
Bacause of ty733420's help I was able to create a bash script that does job ... really really slowly ... but at least it works:
#!/bin/bash
#: <<'COMMENT'
if [ -e xx00 ]; then
rm xx*
fi
csplit -s ./regBackup.reg.txt "/^\[/" '{*}'
for name in $(ls xx*)
do
#echo $name
if grep '.*C\:\\\\Program Files.*' $name > /dev/null; then
grep '^\[.*' $name
grep '.*C\:\\\\Program Files.*' $name
echo ""
fi
done
exit 0
:
./sedS.sh > out.txt
Thanks for you help ty733420 :)
If someone knows how to speed this task up I would appreciate it.