Grep regex contained in a file (not grep -f option!)

Grep regex contained in a file (not grep -f option!) - regex

I am reading some equipment configuration output and check if the configuration is correct, according to the HW configuration. The template configurations are stored as files with all the params, and the lines contain regular expressions (basically just to account for variable number of spaces between "object", "param" and "value" in the output, also some index variance)
First of all, I cannot use grep -f $template $output, since I have to process each line of the template separately. I have something like this running
while read line
do
attempt=`grep -E "$line" $file`
# ...etc
done < $template
Which works just fine if the template doesn't contain regex.
Problem: grep interpretes the search option literally when these are read form file. I tested the regex themselves, they work fine from the command line.
With this background, the question is:
How to read regex from a file (line by line) and have grep not interprete them literally?

Using the following script:
#!/usr/bin/env bash
# multi-grep
regexes="$1"
file="$2"
while IFS= read -r rx ; do
result="$(grep -E "$rx" "$file")"
grep -q -E "$rx" "$file" && printf 'Look ma, a match: %s!\n' "$result"
done < "$regexes"
And files with the following contents:
$ cat regexes
RbsLocalCell=S.C1.+eulMaxOwnUuLoad.+100
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
I get this result:
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
This works for different spacing as well
$ cat data
RbsLocalCell=S1C1 eulMaxOwnUuLoad 100
$ ./multi-grep regexes data
Look ma, a match: RbsLocalCell=S1C1 eulMaxOwnUuLoad 100!
Seems okay to me.

Use the -F option, or fgrep.
What's more, you seem to want to match full lines: add the -x option as well.

Another point: make sure the pattern is not interpreted in some wrong way by the shell by putting "$line" in quotes.
All in all that looks like you better write a perl than a shell script.

Related

Replacing string in linux using sed/awk based

i want to replace this
#!/usr/bin/env bash
with this
#!/bin/bash
i have tried two approaches
Approach 1
original_str="#!/usr/bin/env bash"
replace_str="#!/bin/bash"
sed s~${original_str}~${replace_str}~ filename
Approach 2
line=`grep -n "/usr/bin" filename`
awk NR==${line} {sub("#!/usr/bin/env bash"," #!/bin/bash")}
But both of them are not working.

You cannot use ! inside a double quotes in BASH otherwise history expansion will take place.
You can just do:
original_str='/usr/bin/env bash'
replace_str='/bin/bash'
sed "s~$original_str~$replace_str~" file
#!/bin/bash

Using escape characters :
$ cat z.sh
#!/usr/bin/env bash
$ sed -i "s/\/usr\/bin\/env bash/\/bin\/bash/g" z.sh
$ cat z.sh
#!/bin/bash

Try this out in the terminal:
echo "#!/usr/bin/env bash" | sed 's:#!/usr/bin/env bash:#!/bin/bash:g'
In this cases I use : because sed gets confused between the different slashes and it isn't able to tell anymore with one separates and wich one is part of the text.
Plus it looks really clean.
The cool thing is that you can use every symbol you want as a separator.
For example a semicolon ; or the pipe symbol | .
By using the escape character \ I think that the code would look too messy and wouldn't be very readable, considering the fact that you have to put it before every forward slash in the command.
The command above will just print out the replaced line, but if you want to modify the file, than you need to specify the input and output file, like this:
sed 's:#!/usr/bin/env bash:#!/bin/bash:g' <inputfile >outputfile-new
Remember to put that -new if the inputfile and the output file have the same name, because without it your original one will be cleared completely: this happend me in the past, and it's not the best thing at all. For example:
<test.txt >test-new.txt

Troubles with regular expressions

I wanted some help on extended regular expressions.
I have been trying to figure out but in vain
I have a file conflicts.txt which looks like this please note that it is only a part of this file , there are many lines like these
Server/core/wildSetting.json
Server/core
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
Client/arcade/src/Game/
i am writing a shell script which goes thorugh this file line by line :
if [ -s "$CONFLICTS" ] ; then
count=0
while read LINE
do
let count++
echo -e "\n $LINE \n"
done < $CONFLICTS
fi
the above prints the file line by line what i am trying now is to redirect the lines which have a certain text into some other file for that i have modified echo line of the code to :
echo -e "\n $LINE \n" | grep -E "Server/game" > newfile.txt
My Query :
As we can see there are many lines of the form Server/Core...
I want to write a regular expression and use it in grep, which matches two kind of lines
1) line s containing the ONLY the string "Server/core" preceeded and suceeded by any number of spaces
2) all the lines containing the string "assets"
I have written a regular expression for the same but it doesn't work
here my regEx:
grep -E '[^' '*Server/core$] | [assets]'
can you please tell me what is the right way of doing it ?
Please note that there can be any number of spaces before and after "Server/core" as this file is a result of parsing a previous file.
Thanks !

Based on what's asked in the comments:
1) the lines containing the string "assets"
$ grep "assets" file
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
2) lines that contain only the sting "Server/core" preceeded and succeed by any amount of space
$ grep "^[ ]*Server/core[ ]*$" file
Server/core

sed (Stream EDitor) can solve your problem perfectly.
Try this command sed -n '/^ *Server\/core\|assets/p' conflicts.txt.
There is something wrong with your grep -E '[^' '*Server/core$] | [assets]'.
The ^ in a squared brackets omits all the strings containing any of the subsequent characters in the brackets.
If you want to perform in-place modification, add the -i option to the sed command like
sed -in '/^ *Server\/core\|assets/p' conflicts.txt

Your regex just needs to be this:
assets|^\s*Server/Core\s*$
I think sed or awk would be a better tool than grep - you would need to escape the forward slash if you used one of these.

BASH shell use regex to get value from file into a parameter

I've got a file that I need to get a piece of text from using regex. We'll call the file x.txt. What I would like to do is open x.txt, extract the regex match from the file and set that into a parameter. Can anyone give me some pointers on this?
EDIT
So in x.txt I have the following line
$variable = '1.2.3';
I need to extract the 1.2.3 from the file into my bash script to then use for a zip file

Use sed to do it efficiently† in a single pass:
var=$(sed -ne "s/\\\$variable *= *['\"]\([^'\"]*\)['\"] *;.*/\1/p" file)
The above works whether your value is enclosed in single or double quotes.
Also see Can GNU Grep output a selected group?.
$ cat dummy.txt
$bla = '1234';
$variable = '1.2.3';
blabla
$variable="hello!"; #comment
$ sed -ne "s/\\\$variable *= *['\"]\([^'\"]*\)['\"] *;.*/\1/p" dummy.txt
1.2.3
hello!
$ var=$(sed -ne "s/^\\\$variable *= *'\([^']*\)' *;.*/\1/p" dummy.txt)
$ echo $var
1.2.3 hello!
† or at least as efficiently as sed can churn through data when compared to grep on your platform of choice. :)

You can use the grep-chop-chop technique
var="$(grep -F -m 1 '$variable =' file)"; var="${var#*\'}"; var="${var%\'*}"

If all the file lines have that format ($<something> = '<value>'), the you can use cut like this:
value=$(cut -d"'" -f2 file)

Save part of matching pattern to variable

I want to extract a substring matching a pattern and save it to a file. An example string:
Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk
I want to extract the part between the brackets, in this case [sdf].
I tried to do something like grep -e '[$subtext]' to save the text in the brackets to a variable. Of course it doesn't work, but I am looking for a way similar to this. It would be very elegant to include a variable in a regex like this. What can I do best?
Thanks!

BASH_REMATCH is an array containing groups matched by the shell.
$ line='Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk'
$ [[ $line =~ \[([^]]+)\] ]]; echo "${BASH_REMATCH[1]}"
sdf
If you want to put this in a loop, you can do that; here's an example:
while read -r line; do
if [[ $line =~ \[([^]]+)\] ]] ; then
drive="${BASH_REMATCH[1]}"
do_something_with "$drive"
fi
done < <(dmesg | egrep '\[([hsv]d[^]]+)\]')
This approach puts no external calls into the loop -- so the shell doesn't need to fork and exec to start external programs such as sed or grep. As such, it is arguably significantly cleaner than other approaches offered here.
BTW, your initial approach (using grep) was not that far off; using grep -o will output only the matching substring:
$ subtext=$(egrep -o "\[[^]]*\]" <<<"$line")
...though this includes the brackets inside the capture, and thus is not 100% correct.

There's probably a better way using bash only, but:
echo 'Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk' \
| sed -s 's/.*\[\(.*\)\].*/\1/'
As Jurgen points out, this matches non-matching lines. If you don't want to output nonmatching lines, use '-n' so it doesn't output the pattern, and '/p' to outputs the pattern when it matches.
| sed -n 's/.*\[\(.*\)\].*/\1/p'

Match against regex, replace using grouping and only print if regex matched:
sed -n "s/.*\[\(.*\)\].*/\1/p"

sed is greedy, so the sed answers will miss out some of the data if there are more [] pairs in your data. Use the grep+tr solution or you can use awk
$ cat file
[sss]Apr 12 19:24:17 PC_NMG kernel: sd 11:0:0:0: [sdf] Attached SCSI removable disk [tag] blah blah
$ awk -F"[" '{for(i=2;i<=NF;i++){if($i~/\]/){sub("].*","",$i)};print $i}}' file
sss
sdf
tag

Using regular expressions in shell script

What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl and sed (not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//
But somehow I feel that sed is not the proper tool to use here. I heard that grep is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl).
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
What is the correct way to parse a string using regular expressions in a linux shell script?
Is sed the right thing to use here?
Could this be done using grep?
Is there any other command that's more easier/appropriate?

The grep command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed in a pipeline:
someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'
Alternatively, awk (or perl if available) can be used. It's a far more powerful text processing tool than sed in my opinion.
someCommand | awk '/Amarghosh/ { do something }'
For simple text manipulations, just stick with the grep/sed combo. When you need more complicated processing, move on up to awk or perl.
My first thought is to just use:
echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
| sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'
which keeps the number of sed processes to one (you can give multiple commands with -e).

You may be interested in using Perl for such tasks. As a demonstration, here is a Perl script which prints the number you want:
#!/usr/local/bin/perl
use warnings;
use strict;
use LWP::Simple;
use JSON;
my $url = "http://stackoverflow.com/users/flair/165297.json";
my $flair = get ($url);
my $parsed = from_json ($flair);
print "$parsed->{reputation}\n";
This script requires you to install the JSON module, which you can do with just the command cpan JSON.

For working with JSON in shell script, use jsawk which like awk, but for JSON.
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | jsawk 'return this.reputation' # 2,747

My proposition:
$ echo $json | sed 's/,//g;s/^.*reputation...\([0-9]*\).*$/\1/'
I put two commands in sed argument:
s/,//g is used to remove all commas, in particular the ones that are present in the reputation value.
s/^.*reputation...\([0-9]*\).*$/\1/ locates the reputation value in the line and replaces the whole line by that value.
In this particular case, I find that sed provides the most compact command without loss of readability.
Other tools for manipulating strings (not only regex) include:
grep, awk, perl mentioned in most of other answers
tr for replacing characters
cut, paste for handling multicolumn inputs
bash itself with its rich $(...) syntax for accessing variables
tail, head for keeping last or first lines of a file

sed is appropriate, but you'll spawn a new process for every sed you use (which may be too heavyweight in more complex scenarios). grep is not really appropriate. It's a search tool that uses regexps to find lines of interest.
Perl is one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.

You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.
$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747

1) What is the correct way to parse a string using regular expressions in a linux shell script?
Tools that include regular expression capabilities include sed, grep, awk, Perl, Python, to mention a few. Even newer version of Bash have regex capabilities. All you need to do is look up the docs on how to use them.
2) Is sed the right thing to use here?
It can be, but not necessary.
3) Could this be done using grep?
Yes it can. you will just construct similar regex as you would if you use sed, or others. Note that grep just does what it does, and if you want to modify any files, it will not do it for you.
4) Is there any other command that's easier/more appropriate?
Of course. regex can be powerful, but its not necessarily the best tool to use everytime. It also depends on what you mean by "easier/appropriate".
The other method to use with minimal fuss on regex is using the fields/delimiter approach. you look for patterns that can be "splitted". for eg, in your case(i have downloaded the 165297.json file instead of using curl..(but its the same)
awk 'BEGIN{
FS="reputation" # split on the word "reputation"
}
{
m=split($2,a,"\",\"") # field 2 will contain the value you want plus the rest
# Then split on ":" and save to array "a"
gsub(/[:\",]/,"",a[1]) # now, get rid of the redundant characters
print a[1]
}' 165297.json
output:
$ ./shell.sh
2747

sed is a perfectly valid command for your task, but it may not be the only one.
grep may be useful too, but as you say it prints the whole line. It's most useful for filtering the lines of a multi-line file, and discarding the lines you don't want.
Efficient shell scripts can use a combination of commands (not just the two you mentioned), exploiting the talents of each.

Blindly:
echo $json | awk -F\" '{print $8}'
Similar (the field separator can be a regex):
awk -F'{"|":"|","|"}' '{print $5}'
Smarter (look for the key and print its value):
awk -F'{"|":"|","|"}' '{for(i=2; i<=NF; i+=2) if ($i == "reputation") print $(i+1)}'

You can use a proper library (as others noted):
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
or
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
depending on OS/shell combination.

Simple RegEx via Shell
Disregarding the specific code in question, there may be times when you want to do a quick regex replace-all from stdin to stdout using shell, in a simple way, using a string syntax similar to JavaScript.
Below are some examples for anyone looking for a way to do this. Perl is a better bet on Mac since it lacks some sed options. If you want to get stdin as a variable you can use MY_VAR=$(cat);.
echo 'text' | perl -pe 's/search/replace/g'; # using perl
echo 'text' | sed -e 's/search/replace/g'; # using sed
And here's an example of a custom, reusable regex function. Arguments are source string (or -- for stdin), search, replace, and options.
regex() {
case "$#" in
( '0' ) exit 1 ;; ( '1' ) echo "$1"; exit 0 ;;
( '2' ) REP='' ;; ( '3' ) REP="$3"; OPT='' ;;
( * ) REP="$3"; OPT="$4" ;;
esac
TXT="$1"; SRCH="$2";
if [ "$1" = "--" ]; then [ ! -t 0 ] && read -r TXT; fi
echo "$TXT" | perl -pe 's/'"$SRCH"'/'"$REP"'/'"$OPT";
}
echo 'text' | regex -- search replace g;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grep regex contained in a file (not grep -f option!) - regex

Use the -F option, or fgrep. What's more, you seem to want to match full lines: add the -x option as well.

Another point: make sure the pattern is not interpreted in some wrong way by the shell by putting "$line" in quotes. All in all that looks like you better write a perl than a shell script.

Related

Replacing string in linux using sed/awk based

Troubles with regular expressions

BASH shell use regex to get value from file into a parameter

Save part of matching pattern to variable

Using regular expressions in shell script

Categories

Resources