Grep and Egrep options - regex

When I use grep -ow it affects the regex so I'm wondering what the regex would be without these options
I know that:
-o means show the line that matches the pattern
-w select lines that only match whole words
I'd like to convert egrep -ow '[1-9][0-9][0-9]+' text
egrep '[1-9][0-9][0-9]+' text but this regex is wrong with no options

You need to add word boundary.
egrep -o '\b[1-9][0-9][0-9]+\b' file
OR
Since egrep is depreciated, it's better to use grep with -E parameter.
grep -Eo '\b[1-9][0-9][0-9]+\b' file

Related

grep strings between "{{_(" and ")}}"

I want to parse html files to extract strings between "{{_(" and ")}}" using GREP. I tried something like this:
grep '"[^{{_(|)}}$]"' *.html
but it didn't work.
Can someone help me please?
Thanks!
You may use
grep -oP '(?<={{_\().+?(?=\)}})' file
Details
-o - output only matched substrings
-P - enable the PCRE regex engine
(?<={{_\().+?(?=\)}}) match:
(?<={{_\() - a location that is immediately preceded with {{+(
.+? - any 1 or more more chars other than line break chars, as few as possible
(?=\)}}) - a location that is immediately followed with )}} .
See the regex demo.
#Wiktor Stribiżew's answer works really good. However, if you have multiple files, you would get an output like this, where the respective file name per each match is also displayed:
foo.html: content abc
foo.html: test 123
bar.html: first match
bar.html: second match
So, if you are only interested in the matching string as output, you can try sed instead
sed -n 's/.*{{_(\(.*\))}}.*/\1/p' *.html
You can also count the unique occurrence of matches and things like that...
Update:
Or just use the -h | --no-filename with the grep that #Wiktor Stribiżew has provided.
grep -h -oP '(?<={{_\().+?(?=\)}})' *.html
Or the -c flag in order to display the count of matches per each file:
grep -c -oP '(?<={{_\().+?(?=\)}})' *.html
As in the posts before with it is possible to grep the value of an HTML property.
placeholder="SOME TEXT_HERE" -> grep -> "SOME TEXT_HERE"
grep -oP '(?<=placeholder=").+?(?=")' *html

How to do a grep regex search for single-quotes?

How do you use grep to do a text file search for a pattern like ABC='123'?
I'm currently using:
grep -rnwi some/path -e "ABC\s*=\s*[\'\"][^\'\"]+[\'\"]"
but this only finds text like ABC="123". It misses any instances that use single-quotes. What's wrong with my regex?
You are using a PCRE. So, you need the -P flag. So, use this:
grep -rnwi some/path -P "ABC\s*=\s*[\'\"][^\'\"]+[\'\"]"
We don't need a \\ for single quotes inside the character classes. So, your regex can also be written as:
"ABC\s*=\s*['\"][^'\"]+['\"]"
Input file:
ABC="123"
ABC='123'
Run grep with your PCRE:
grep -P "ABC\s*=\s*['\"][^'\"]+['\"]" input.txt
Output:
ABC="123"
ABC='123'

How to grep file to find lines like <version>1.1.9-beta</version>?

Looking for suggestion to cat file | grep REGEX to get the lines with <version>anything</version>.
grep -F '<version>1.1.9-beta</version>' file
-F will match your pattern as literal text
you don't need that useless cat
if you really mean anything: try grep '<version>.*</version>' file or grep -P '<version>.*?</version>' file , however searching xml with regex is bad idea.
Use the -E option to match a regular expression:
grep -E "<version>.*</version>" file
Refer to these rules for the regular expression: https://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html#Regular-Expressions
For example, to match the typical version format (3.14, or 13.14, or 0.1458) you can type:
grep -E "<version>[0-9]?\.[0-9]?</version>" file
You can do:
grep '<version>[^<]*</version>' file.xml
[^<]* will match zero or more characters upto next <.

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Regex to match string between quotes

I'm using a shell script to read in a file and then piping the output to grep and trying to extract the string contained between two quotes (while excluding the quotes).
./readFile.sh | grep -e "[\^\"]*[\?\"]"
This returns the entire contents of the file I that I'm reading.
My file is organized this way:
TITLE="foo"
DATA="bar"
SERVER="foo.bar.server"
I read the regex tutorial here http://www.regular-expressions.info/lookaround.html and tried to use the lookahead and lookbehind as best as I could, but I don't understand what's wrong here.
check this example with grep with look-behind
kent$ echo 'TITLE="foo"
DATA="bar"
SERVER="foo.bar.server"'|grep -Po '(?<=")[^"]*'
foo
bar
foo.bar.server
alternative is grep -Po '"\K[^"]*'
If you want to give awk a chance it is pretty simple:
awk -F '"' 'NF>2{print $2}' inFile
I don't understand why you use a script for file reading, since grep works with files, but it's your own choice (maybe you do some preprocessing).
This extracts what is between '"':
$ grep -o '".*"' <file>
"foo"
"bar"
"foo.bar.server"
If you need to get rid of '"':
$ grep -o '".*"' <file> | tr -d '"'
foo
bar
foo.bar.server
If you want grep to return only the matching strings (and not the entire line) you should use the -o (or --only-matching) option.