Detect Russian characters with grep

Detect Russian characters with grep - regex

I'm trying to detect Russian characters with grep, but what I have at the moment does not appear to be doing anything:
echo "Ёё" | grep -Eo "/[А-Яа-яЁё]/u"
No output is returned. Is there anything I have to do to tell grep to return the output?

there is no output because grep is looking for pattern /yourletters/u
try this:
echo "Ёё" | grep -Eo "[А-Яа-яЁё]*"
test here:
kent$ echo "Ёё" | grep -Eo "[А-Яа-яЁё]*"
Ёё

Related

using linux grep with look ahead regexp

I have string in txt file
cat list.txt
userone#ex.com, usertwo#ex.com, userthree#ex.com
and i want to print every user login w/o #ex.com each new line and try to use regexp with linux grep
grep -oe '[a-z](?=#ex.com,)' list.txt
but nothing happens, why? It will be like:
userone
usertwo
userthree
Thanks.

Without grep -P, you can use grep + cut:
grep -oE '[^# ]+#ex\.com' list.txt | cut -d# -f1
userone
usertwo
userthree
With gnu grep:
grep -oP '[^# ]+(?=#ex\.com)' list.txt
userone
usertwo
userthree

grep within nested brackets

How do I grep strings in between nested brackets using bash? Is it possible without the use of loops? For example, if I have a string like:
[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]
I wish to grep only the two target strings inside the [[]]:
TargetString1
TargetString2
I tried the following command which cannot get TargetString2
grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1

With GNU's grep P option:
grep -oP "(?<=\[\[)[\w\s]+"
The regex will match a sequence of word characters (\w+) when followed by two brackets ([[). This works for your sample string, but will not work for more complicated constructs like:
[[[[TargetString1]]TargetString2:SomethingIDontWantAfterColon[[TargetString3]]]]
where only TargetString1 and TargetString3 are matched.

To extract from nested [[]] brackets, you can use sed
#!/bin/bash
str="[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]"
echo $str | grep -o -P '(?<=\[\[).*(?=\]\])'|cut -d ':' -f1
echo $str | sed 's/.*\[\([^]]*\)\].*/\1/g' #which works only if string exsit between []
Output:
TargetString1
TargetString2

You can use grep regex grep -Eo '\[\[\w+' | sed 's/\[\[//g' for doing this
[root#localhost ~]# echo "[[TargetString1:SomethingIDontWantAfterColon[[TargetString2]]]]" | grep -Eo '\[\[\w+' | sed 's/\[\[//g'
TargetString1
TargetString2
[root#localhost ~]#

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

I want to find the set of words that contain two consecutive e’s, and also contains two y’s.
So far i got to /eeyy/

Alteration with ERE:
$ echo evyyree | grep -E '.*ee.*yy|.*yy.*ee'
evyyree
$ echo eveeryy | grep -E '.*ee.*yy|.*yy.*ee'
eveeryy
If the match needs to be in the same word, you can do:
$ echo "eee yyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee' # no match
$ echo "eeeyyyy" | grep -E 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyyyy
Then only that word:
$ echo 'eeeyy heelo' | grep -Eo 'ee[^[:space:]]*yy|yy[^[:space:]]*ee'
eeeyy

Pipe it:
$ echo eennmmyy | grep ee | grep yy
eennmmyy

awk approach to match all words that contain both ee and yy:
s="eennmmyy heello thees-whyy someyy"
echo $s | awk '{for(i=1;i<=NF;i++) if($i~/ee/ && $i~/yy/) print $i}'
The output:
eennmmyy
thees-whyy

The only sensible and extensible way to do this is with awk:
awk '/ee/&&/yy/' file
Imagine trying to do it the grep way if you also had to find zz. Here's awk:
awk '/ee/&&/yy/&&/zz/' file
and here's grep:
grep -E 'ee.*yy.*zz|ee.*zz.*yy|yy.*ee.*zz|yy.*zz.*ee|zz.*yy.*ee|zz.*ee.*yy' file
Now add a 4th additional string to search for and see what that looks like!

How to display part of matched pattern in grep?

I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?

Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12

You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.

One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits

Can not extract the capture group with either sed or grep

I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.

1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234

To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"

Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234

use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'

You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
 
$ echo "var=${var} value=${value}"
var=employee_id value=1234

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Detect Russian characters with grep - regex

I'm trying to detect Russian characters with grep, but what I have at the moment does not appear to be doing anything: echo "Ёё" | grep -Eo "/[А-Яа-яЁё]/u" No output is returned. Is there anything I have to do to tell grep to return the output?

there is no output because grep is looking for pattern /yourletters/u try this: echo "Ёё" | grep -Eo "[А-Яа-яЁё]" test here: kent$ echo "Ёё" | grep -Eo "[А-Яа-яЁё]" Ёё

Related

using linux grep with look ahead regexp

grep within nested brackets

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

How to display part of matched pattern in grep?

Can not extract the capture group with either sed or grep

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Detect Russian characters with grep - regex

I'm trying to detect Russian characters with grep, but what I have at the moment does not appear to be doing anything: echo "Ёё" | grep -Eo "/[А-Яа-яЁё]/u" No output is returned. Is there anything I have to do to tell grep to return the output?

there is no output because grep is looking for pattern /yourletters/u try this: echo "Ёё" | grep -Eo "[А-Яа-яЁё]*" test here: kent$ echo "Ёё" | grep -Eo "[А-Яа-яЁё]*" Ёё

Related

using linux grep with look ahead regexp

grep within nested brackets

How do I grep for all words that contain two consecutive e’s, and also contains two y’s

How to display part of matched pattern in grep?

Can not extract the capture group with either sed or grep

Categories

Resources

there is no output because grep is looking for pattern /yourletters/u try this: echo "Ёё" | grep -Eo "[А-Яа-яЁё]" test here: kent$ echo "Ёё" | grep -Eo "[А-Яа-яЁё]" Ёё