Getting specific text from filenames in Bash script - regex

I'm trying to write bash script, in which I could get the specific text from all filenames in directory, e.g
I have files in my folder:
.\file1.txt
.\file2.hex
.\DSConf-11.22.33[4444].pkg
.\file3.cpp
What I need to do is taking 4444 number from above filenames. I created a script but it doesn't work and I can't find mistake:
#!/usr/bin/bash
FILENAME=$(find . -maxdepth 1 -name "DSConf-*.*.*\[*\].pkg")
PATTERN='^DSConf-(\d+).(\d+).(\d+)\[([^)]+)\]'
[[ $FILENAME =~ $PATTERN ]]
echo "${BASH_REMATCH[4]}"

for f in ./; do echo "$f" | grep -Eo "DSConf-\d+.\d+.\d+\[([^)]+)\]"; done
will generate your output.

You may look for files with names starting with DSConf- and end with .pkg using find -name "DSConf-*" and then get the digits between square brackets using sed:
find . -maxdepth 1 -name "DSConf-*.pkg" | sed -n 's/.*\[\([0-9]*\)].*/\1/p'
The sed -n 's/.*\[\([0-9]*\)].*/\1/p' command suppresses sed line output using -n, then s/.*\[\([0-9]*\)].*/\1/p substitutes the whole string with just the 0 or more digits between [ and ] and p prints that number.

Related

Perl regex is not matching

I'm trying to pipe the output of a find command to a perl one-liner to replace a line that ends with ?> with RedefineForDocker::standardizeXmlmc() but for some reason the value isn't being replaced. I've checked the output of the find command and it is performing as expected, and I've double checked my regex and it should match.
find . -name *.php -exec ggrep -Ezl 'class XmlMethodCall.*([?]>)$' {} \; \
| xargs perl -ewpn -i.bak2 \
"s/[?]>\s*?$/RedefineForDocker::standardizeXmlmc()\n/gm"
I get no warnings and no indication that it isn't working, the backups are created, but the file remains unchanged. The list of matched files run from the find command is below.
./swsupport/clisupp/trending/services/data.helpers.php
./swsupport/clisupp/_bpmui/arch/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm/service/data.helpers.php
./swsupport/clisupp/_bpmui/itsm_default/service/data.helpers.php
./webclient_code/php/session.php
./webclient_code/service/storedquery/helpers.php
./php/_phpinclude/itsm/xmlmc/xmlmc.php
./php/_phpinclude/itsmf/xmlmc/xmlmc.php
./php/_phpinclude/itsm_default/xmlmc/xmlmc.php
Here is an example of one of the files it should match
https://regex101.com/r/BUoCif/1
Run your perl command as this:
perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Order of command line option is important here.
Full pipeline should be like this:
find . -name '*.php' -exec ggrep -PZzl '(?ms)class XmlMethodCall.*\?>\h*$' {} + |
xargs -0 perl -i.bak2 -wpe 's/\?>\h*$/RedefineForDocker::standardizeXmlmc()\n/gm'
Note use -Z option in grep and -0 option in xargs to address issues with filenames with whitespaces etc.

Replace multiline string in all files in terminal

I have a couple of files in a directory, in which I have a piece of text between two separators.
Text to keep
//###==###
Text to remove
//###==###
Text to keep
After an extensive search, I found the following Mac OS X Terminal command, with which I can remove the separators themselves.
perl -pi -w -e 's|//###==###||g' `find . -type f`
However, I need something with a regex that does not only remove the separators themselves, but also what is in between. Something like this, although this line doesn't do anything.
perl -pi -w -e 's|//###==###(.*)//###==###||g' `find . -type f`
EDIT AFTER DUPLICATE FLAG
I see something similar here, using the scalar range operator, but I cannot make it work for me. Failed attempts include:
perl -pi -w -e 's|//###==###..//###==###||g' `find . -type f`
perl -pi -w -e 's|//###==###(..)//###==###||g' `find . -type f`
perl -pi -w -e 's|//###==###[..]//###==###||g' `find . -type f`
SOLUTION
With the help of dawg below, the following oneliner will do exactly what I want:
$ perl -0777 -p -i -e 's/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms' `find . -type f -name "index.php"`
You can use:
s/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms
Working Demo
Then in the terminal and in Perl. Given:
$ echo "$tgt"
Text to keep
//###==###
Text to remove
//###==###
Text to keep
Use the -0777 command flag to slurp the whole file and then:
$ echo "$tgt" | perl -0777 -ple 's/(^\s*^\/\/###==###.*?\/\/###==###\s*)//gms'
Text to keep
Text to keep
Or, you can use the range operator. If done this way, you cannot remove the leading and trailing blank lines if that is your intent:
$ echo "$tgt" | perl -lne 'print unless (/\/\/###==###/ ... /\/\/###==###/)'
Text to keep
Text to keep

How do I search files containing a specific string using grep?

For instance, if a file has a line "blahblah myID=1234567 blahblah", I want to search all files containing 1234567 somewhere in the whole file.
I tried grep -r '.* 1234567.* ' directory, but it didn't work.
Do the following:
grep -rw 'directory' -e "pattern"
-r is recursive and -w stands match the whole word.
example
grep -rw '/home/lib/foldername/' -e "1234567"
you can also use -n which will tell you the line number where it matched the string
files=$(ls -l /dir |awk '/^-/ {print $NF}')
for i in $files
do
cat $i | grep "1234567" >> output.txt
done
List files of /dir, and grep "1234567" and write to output.txt

find works in terminal but fails in bash script

So I'm writing a bash script that counts the number of files in a directory and outputs a number. The function takes a directory argument as well as an optional file-type extension argument.
I am using the following lines to set the dir variable to the directory and ext variable to a regular expression that will represent all the file types to count.
dir=$1
[[ $# -eq 2 ]] && ext="*.$2" || ext="*"
The problem I am encountering occurs when I attempt to run the following line:
echo $(find $dir -maxdepth 1 -type f -name $ext | wc -l)
Running the script from the terminal works when I provide the second file-type argument but fails when I don't.
harrison#Luminous:~$ bash Documents/howmany.sh Documents/ sh
3
harrison#Luminous:~$ bash Documents/howmany.sh Documents/
find: paths must precede expression: Desktop
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
0
I have searched for this error and I know it's an issue with the shell expanding my wildcard as explained here. I've tried experimenting with single quotes, double quotes, and backslashes to escape the asterisk but nothing seems to work. What's particularly interesting is that when I try running this directly through the terminal, it works perfectly fine.
harrison#Luminous:~$ echo $(find Documents/ -maxdepth 1 -type f -name "*" | wc -l)
6
Simplified:
dir=${1:-.} #if $1 not set use .
name=${2+*.$2} #if $2 is set use *.$2 for name
name=${name:-*} #if name still isnt set, use *
find "$dir" -name "$name" -print #use quotes
or
name=${2+*.$2} #if $2 is set use *.$2 for name
find "${1:-.}" -name "${name:-*}" -print #use quotes
also, as #John Kugelman says, you could use:
name=${2+*.$2}
find "${1:-.}" ${name:+-name "$name"} -print
find . -name "*" -print is the same as find . -print, so if $name isn't set, there's no need to specify -name "*".
Try this:
dir="$1"
[[ $# -eq 2 ]] && ext='*.$2' || ext='*'
If that doesn't work, you can just switch to an if statement, where you use the -name pattern in a branch and you don't in the other.
A couple more points:
Those are not regular expressions, but rather shell patterns.
echo $(command) is just equivalent to command.

Pass sed output to mv

I'm trying to batch rename text files according to a string they contain.
I used sed to isolate the pattern with \( and \) as I couldn't get this to work in grep.
sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt | mv *.txt $sed.txt
(the text I want to use as filename is between html title tags)`
Where I wrote $sed would be the output of sed.
hope that's clear!
A simple loop in bash can accomplish this. If each file is valid HTML, meaning you have only one <title> tag in the file, you can rename them all this way:
for file in *.txt; do
mv "$file" `sed -n 's/<title>\([^<]*\)<\/title>/\1/p;' $file| sed -e 's/[ ][ ]*/_/g'`.txt
done
So, if you have files 1.txt, 2.txt and 3.txt, each with cat, dog and my hippo in their TITLE tags, you'll end up with cat.txt, dog.txt and my_hippo.txt after the above loop.
EDIT: quoted initial $file in case there are spaces in filenames; and added a second sed to convert any spaces in the <title> tag to _'s in resulting filenames. NOTE the whitespace inside the []'s in the second sed command is a literal space and tab character.
You can enclose expression in grave accent characters (`) to make it insert its output to the place you want. Try:
mv *.txt `sed -i '' 's/<title>\(.*\)<\/title>/&/g' *.txt`.txt
It is rather not flexible, but should work.
(I haven't used it in a while and cannot test it now, so I might be wrong).
Here is the command I would use:
for i in *.txt ; do
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
done
The sed substitution search for pattern in each one of your .txt files. For each file it creates string mv 'file_name' 'found_pattern'.
With the e command at the end of sed commands, this resulting string is directly executed in terminal, thus it renames your files.
Some hints:
Note the use of =s instead of /s as delimiters for sed substition: it's more readable as you already have /s in your pattern (you could use many other symbols if you don't like =). And in this way you don't have to escape the / in your pattern.
The e command for sed executes the created string.
(I'm speaking of this one below:
sed "s=<title>\(.*\)</title>=mv '$i' '\1'=e" $i
^
)
So use it with caution! I would recommand to first use the line without final e: it won't execute any mv command, but just print instead what would be executed if you were to add the e.
What I read from your question is:
you have a number of text (html) files in a directory
each file contains at least the tag <title> ... </title>
you want to extract the content (elements.text) and use it as filename
last you want to rename that file to the extracted filename
Is this correct?
So, then you need to loop through the files, e.g. with xargs or find
ls '*.txt' | xargs -i\{\} command "{}" ...
find -maxdepth 1 -type f -name '*.txt' -exec command "{}" ... \;
I always replace the xargs substitues by -i\{\} because the resulting command is compatible if I use it sometimes with find and its substitute {}.
Next the -maxdepth option will help find not to dive deeper in directory, if no subdir, you can leave it out.
command could be something very simple like echo "Testing File: {}" or a really small script if you use it with bash:
find . -name '*.txt' -exec bash -c 'CUR_FILE="{}"; echo "Working on: $CUR_FILE"; ls -l "$CUR_FILE";' \;
The big decision for your question is: how to get the text from title element.
A simple solution (suitable if opening and closing tag is on same textline) would be by grep
A more solid solution is to use a HTML Parser and navigate by DOM operation
The simple solution base on:
get the title line
remove the everything before and after title content
So do it together:
ls *.txt | xargs -i\{\} bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"'
Same with usage of find:
find . -maxdepth 1 -type f -name '*.txt' -exec bash -c 'TITLE=$(egrep "<title>[^<]*</title>" "{}"); NEW_FNAME=$(echo "$TITLE" | sed -e "s#.*<title>\([^<]*\)</title>.*#\1#"); mv -v "{}" "$NEW_FNAME.txt"' \;
Hopefully it is what you expected.