Use awk to pattern matching

Use awk to pattern matching - regex

I have a file hoge.txt like this:
case $1 in
[ $input = "q" ] && exit
if [ -s $filename ]
if [ ! -f $1 -o -f $2 ]
echo $list
rm -f ${BKDIR}
BKDIR=/${HOME}/backup
And I want to find all alphabetic variables, exclude every parameters like "$1" and output to a new file like this:
$input
$filename
$list
The best i can do now is
cat hoge.txt | awk '{for(i=1;i<=NF;i++){ if($i=="$/[a-zA-Z]/"){print $i} } }'
But it doesn't return any results.

You don't need to use Awk for such a trivial example, just use extended regular expressions support using the -E flag and print only the matching word using -o
grep -Eo '\$[a-zA-Z]+' file
produces
$input
$filename
$list
and write to a new file using the re-direction(>) operator
grep -Eo '\$[a-zA-Z]+' file > variablesList
(or) saving two key strokes (suggested in comments below) with enabling the case insensitive flag with -i
grep -iEo '\$[a-z]+' file

Related

How can I format this data with bash script

I want to format data from this
header1|header2|header3
"ID001"|"""TEST"""|"
TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
into
header1|header2|header3
"ID001"|"TEST"|"TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
So the logics are
keep the header as original
check other lines if not start with " then move this line to end of previous line
replace """ to "
I want to format this with bash script.
I've created this line but still not working
#!/bin/bash
if [ $# -eq 0 ]
then
echo "No arguments supplied"
exit;
fi
FOLD=$1"*"
CHECK=$1"/bix.done"
if test -f $CHECK; then
date > /result.txt
echo "starting Covert.... "
echo "from folder : " $1
for file in $FOLD
do
if [[ $file != *History* ]]; then
if [[ $file == *.csv ]]; then
FILETEMP=$file".temp"
mv $file $FILETEMP
awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' $FILETEMP > $file
#rm $FILETEMP
fi
fi
done
date > /home/result.txt
fi
#ls $1 -l

This might work for you (GNU sed):
sed '1b;:a;N;/\n"/!s/\n//;ta;s/"""/"/g;P;D' file
Always print the first header line. Append the next line to the current line and if that line does not begin with a " remove the newline and repeat until there is such a line. Now substitute a single " for """ globally, print the first line and repeat.

Specific to joining the 2nd line and condensing the muliple-double quotes to a single double-quote you could do:
sed '2{s/""*/"/g;h;N;s/\n//}' file
print all lines by default, except for
the 2 second line, then
s/""*/"/g substitute multiple double quotes for a single double quote,
h copy pattern-space to hold space,
N append the next line to hold space, and
s/\n// substitute the '\n' with nothing joining the line.
Example Use/Output
With your data in file you could do:
$ sed '2{s/""*/"/g;h;N;s/\n//}' file
header1|header2|header3
"ID001"|"TEST"|"TEST TEST TEST"|"TEST 4"
"ID002"|"TEST"|"TESTTESTTEST"|"TEST 5"
(note: if you need to condense multiple double quotes to single double quotes in all lines, you can turn the command around and use sed 's/""*/"/g;2{h;N;s/\n//}')

It's been resolved with below codes
if test -f $CHECK; then
date > /home/startconvert.txt
echo "starting Convert.... "
echo "from folder : " $1
for file in $FOLD
do
if [[ $file != *History* ]]; then
if [[ $file == *.csv ]]; then
#FILETEMP=$file".temp"
#mv $file $FILETEMP
#awk '/^"/ {if (f) print f; f=$0; next} {f=f FS $0} END {print f}' $FILETEMP > $file
#rm $FILETEMP
perl -i -0777pe 's/\r\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\n"""/"/' $file;
perl -i -0777pe 's/\r("\|)/ $1/g' $file;
sed -i -e 's/"""/"/g' $file;
perl -i -0777pe 's/\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\n("\|)/ $1/g' $file;
sed -i -e 's/""-/-/g' $file;
perl -i -0777pe 's/\n([^"])/ $1/g' $file;
perl -i -0777pe 's/\r([^"])/ $1/g' $file;
perl -i -0777pe 's/\r\n([^"])/ $1/g' $file;
fi
fi
done
date > /home/endconvert.txt
fi

Not sure about the bash part, this expression though,
[\r\n]^([^"])
with a replacement of $1 might be somewhat close.
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.

Find regular expression in a file matching a given value

I have some basic knowledge on using regular expressions with grep (bash).
But I want to use regular expressions the other way around.
For example I have a file containing the following entries:
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
Now I want to use bash to figure out to which line a particular number matches.
For example:
grep 8 file
should return:
line_three=[7-9]
Note: I am aware that the example of "grep 8 file" doesn't make sense, but I hope it helps to understand what I am trying to achieve.
Thanks for you help,
Marcel

As others haven pointed out, awk is the right tool for this:
awk -F'=' '8~$2{print $0;}' file
... and if you want this tool to feel more like grep, a quick bash wrapper:
#!/bin/bash
awk -F'=' -v seek_value="$1" 'seek_value~$2{print $0;}' "$2"
Which would run like:
./not_exactly_grep.sh 8 file
line_three=[7-9]

My first impression is that this is not a task for grep, maybe for awk.
Trying to do things with grep I only see this:
for line in $(cat file); do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done
Using while for file reading (following comments):
while IFS= read -r line; do echo 8 | grep "${line#*=}" && echo "${line%=*}" ; done < file

This can be done in native bash using the syntax [[ $value =~ $regex ]] to test:
find_regex_matching() {
local value=$1
while IFS= read -r line; do # read from input line-by-line
[[ $line = *=* ]] || continue # skip lines not containing an =
regex=${line#*=} # prune everything before the = for the regex
if [[ $value =~ $regex ]]; then # test whether we match...
printf '%s\n' "$line" # ...and print if we do.
fi
done
}
...used as:
find_regex_matching 8 <file
...or, to test it with your sample input inline:
find_regex_matching 8 <<'EOF'
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
EOF
...which properly emits:
line_three=[7-9]
You could replace printf '%s\n' "$line" with printf '%s\n' "${line%%=*}" to print only the key (contents before the =), if so inclined. See the bash-hackers page on parameter expansion for a rundown on the syntax involved.

This is not built-in functionality of grep, but it's easy to do with awk, with a change in syntax:
/[0-3]/ { print "line one" }
/[4-6]/ { print "line two" }
/[7-9]/ { print "line three" }
If you really need to, you could programmatically change your input file to this syntax, if it doesn't contain any characters that need escaping (mainly / in the regex or " in the string):
sed -e 's#\(.*\)=\(.*\)#/\2/ { print "\1" }#'

As I understand it, you are looking for a range that includes some value.
You can do this in gawk:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[7-9]
$ awk -v n=8 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[7-9]
Since the digits are being treated as numbers (vs a regex) it supports larger ranges:
$ cat /tmp/file
line_one=[0-3]
line_two=[4-6]
line_three=[75-95]
line_four=[55-105]
$ awk -v n=92 'match($0, /([0-9]+)-([0-9]+)/, a){ if (a[1]<n && a[2]>n) print $0 }' /tmp/file
line_three=[75-95]
line_four=[55-105]
If you are just looking to interpret the right hand side of the = as a regex, you can do:
$ awk -F= -v tgt=8 'tgt~$2' /tmp/file

You would like to do something like
grep -Ef <(cut -d= -f2 file) <(echo 8)
This wil grep what you want but will not display where.
With grep you can show some message:
echo "8" | sed -n '/[7-9]/ s/.*/Found it in line_three/p'
Now you would like to transfer your regexp file into such commands:
sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file
Store these commands in a virtual command file and you will have
echo "8" | sed -nf <(sed 's#\(.*\)=\(.*\)#/\2/ s/.*/Found at \1/p#' file)

In bash, how can I check a string for partials in an array?

If I have a string:
s='path/to/my/foo.txt'
and an array
declare -a include_files=('foo.txt' 'bar.txt');
how can I check the string for matches in my array efficiently?

You could loop through the array and use a bash substring check
for file in "${include_files[#]}"
do
if [[ $s = *${file} ]]; then
printf "%s\n" "$file"
fi
done
Alternately, if you want to avoid the loop and you only care that a file name matches or not, you could use the # form of bash extended globbing. The following example assumes that array file names do not contain |.
shopt -s extglob
declare -a include_files=('foo.txt' 'bar.txt');
s='path/to/my/foo.txt'
printf -v pat "%s|" "${include_files[#]}"
pat="${pat%|}"
printf "%s\n" "${pat}"
#prints foo.txt|bar.txt
if [[ ${s##*/} = #(${pat}) ]]; then echo yes; fi

For an exact match to the file name:
#!/bin/bash
s="path/to/my/foo.txt";
ARR=('foo.txt' 'bar.txt');
for str in "${ARR[#]}";
do
# if [ $(echo "$s" | awk -F"/" '{print $NF}') == "$str" ]; then
if [ $(basename "$s") == "$str" ]; then # A better option than awk for sure...
echo "match";
else
echo "no match";
fi;
done

Delete everything except all surrounded by ()

Let's say i have file like this
adsf(2)
af(3)
g5a(65)
aafg(1245)
a(3)df
How can i get from this only numbers between ( and ) ?
using BASH

A couple of solution comes to mind. Some of them handles the empty lines correctly, others not. Trivial to remove those though, using either grep -v '^$' or sed '/^$/d'.
sed
sed 's|.*(\([0-9]\+\).*|\1|' input
awk
awk -F'[()]' '/./{print $2}' input
2
3
65
1245
3
pure bash
#!/bin/bash
IFS="()"
while read a b; do
if [ -z $b ]; then
continue
fi
echo $b
done < input
and finally, using tr
cat input | tr -d '[a-z()]'

while read line; do
if [ -z "$line" ]; then
continue
fi
line=${line#*(}
line=${line%)*}
echo $line
done < file

Positive lookaround:
$ echo $'a1b(2)c\nd3e(456)fg7' | grep -Poe '(?<=\()[0-9]*(?=\))'
2
456

Another one:
while read line ; do
[[ $line =~ .*\(([[:digit:]]+)\).* ]] && echo "${BASH_REMATCH[1]}"
done < file

Using grep to match md5 hashes

How can I match md5 hashes with the grep command?
In php I used this regular expression pattern in the past:
/^[0-9a-f]{32}$/i
But I tried:
grep '/^[0-9a-f]{32}$/i' filename
grep '[0-9a-f]{32}$/' filename
grep '[0-9a-f]{32}' filename
And other variants, but I am not getting anything as output, and i know for sure the file contains md5 hashes.

You want this:
grep -e "[0-9a-f]\{32\}" filename
Or more like, based on your file format description, this:
grep -e ":[0-9a-f]\{32\}" filename

Well, given the format of your file, the first variant won't work because you are trying to match the beginning of the line.
Given the following file contents:
a1:52:d048015ed740ae1d9e6998021e2f8c97
b2:667:1012245bb91c01fa42a24a84cf0fb8f8
c3:42:
d4:999:85478c902b2da783517ac560db4d4622
The following should work to show you which lines have the md5:
grep -E -i '[0-9a-f]{32}$' input.txt
a1:52:d048015ed740ae1d9e6998021e2f8c97
b2:667:1012245bb91c01fa42a24a84cf0fb8f8
d4:999:85478c902b2da783517ac560db4d4622
-E for extended regular expression support, and -i for ignore care in the pattern and the input file.
If you want to find the lines that don't match, try
grep -E -i -v '[0-9a-f]{32}$' input.txt
The -v inverts the match, so it shows you the lines that don't have an MD5.

Meh.
#!/bin/sh
while IFS=: read filename filesize hash
do
if [ -z "$hash" ]
then
echo "$filename"
fi
done < hashes.lst

A little one-liner which works cross platform on Linux and OSX, only returning the MD5 hash value (replace YOURFILE with your filename):
[ "$(uname)" = "Darwin" ] && { MD5CMD=md5; } || { MD5CMD=md5sum; } \
&& { ${MD5CMD} YOURFILE | grep -o "[a-fA-F0-9]\{32\}"; }
Example:
$ touch YOURFILE
$ [ "$(uname)" = "Darwin" ] && { MD5CMD=md5; } || { MD5CMD=md5sum; } && { ${MD5CMD} YOURFILE | grep -o "[a-fA-F0-9]\{32\}"; }
d41d8cd98f00b204e9800998ecf8427e

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Use awk to pattern matching - regex

Related

How can I format this data with bash script

Find regular expression in a file matching a given value

In bash, how can I check a string for partials in an array?

Delete everything except all surrounded by ()

Using grep to match md5 hashes

Categories

Resources