Regular Expression for "D210" for Linux? - regex

The tile says it all. Right now I'm using:
grep "^D[\d][\d][\d]" file.txt
to no avail.

\d is not recognized unless -P or --perl-regexp option is specified. (assuming GNU grep).
$ echo D210 | grep '^D\d\d\d'
$ echo D210 | grep -P '^D\d\d\d'
D210
$ echo D210 | grep -P '^D\d{3}'
D210
If your grep does not accept -P, use [0-9] or [[:digit:]]:
$ echo D210 | grep '^D[0-9][0-9][0-9]'
D210
$ echo D210 | grep '^D[[:digit:]][[:digit:]][[:digit:]]'
D210

Related

Parse Args with R.E

can you help me?
I want parse this: {'$', '$0', '$qwerty', '$123'} # $Previous_Character_Or_Group_Repeated_0_Or_More_Time
In ScriptShell:
echo "$" | grep '^\$.*$'
$
it's work.
echo "$1" | grep '^\$.*$'
echo "$hello" | grep '^\$.*$'
echo "$Qwerty123" | grep '^\$.*$'
it's doesn't work.
thx for reply,
Ok, Just use single quote, not double quote like this:
echo '$1' | grep '^\$.*$'
$1

sed regular expression extraction

i have a range of strings which conform to one of the two following patters:
("string with spaces",4)
or
(string_without_spaces,4)
I need to extract the "string" via a bash command, and so far have found a pattern that works for each, but not for both.
echo "(\"string with spaces\",4)" | sed -n 's/("\(.*\)",.*)/\1/ip'
output:string with spaces
echo "(string_without_spaces,4)" | sed -n 's/(\(.*\),.*)/\1/ip'
output:string_without_spaces
I have tried using "\? however it does not match the " if it is there:
echo "(SIM,0)" | sed -n 's/("\?\(.*\)"\?,.*)/\1/ip'
output: SIM
echo "(\"SIM\",0)" | sed -n 's/("\?\(.*\)"\?,.*)/\1/ip'
output: SIM"
can anyone suggest a pattern that would extract the string in both scenarios? I am not tied to sed but would prefer to not have to install perl in this environment.
How about using [^"] instead of . to exclude " to be matched.
$ echo '("string with spaces",4)' | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
string with spaces
$ echo "(string_without_spaces,4)" | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
string_without_spaces
$ echo "(SIM,0)" | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
SIM
$ echo '("SIM",0)' | sed -n 's/("\?\([^"]*\)"\?,.*)/\1/p'
SIM

Grep and sed returning only first match

I am trying to extract the title and description of a rss Feed , I have written following script to return all the title in the Feed , But its returning only the first Title from the xml:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*)</title>" |sed -e 's,.*<title>\(.*\)</title>.*,\1,g' | less
How can I also find the description ?
You can use grep -P:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null |\
grep -oP "<title>\K[\s\S]*?(?=</title>)"
First put each title and description on its own line. Here is an example:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' |
sed -n 's,.*<title>\(.*\)</title>.*,\1,gp'
For the description:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | \
grep -E -o "<title>(.*)</title>" | \
sed -e 's,<\(title\|description\)>,\n<\1>,g' | \
sed 's,<title>\([^<]*\)</title>,T:\1,' | \
sed 's,<description>\([^<]*\)</description>,D:\1,' | \
sed -n 's/[DT]://p'
You should use non-greedy match (.*?) instead of greedy matching (.*) to get all the titles:
curl "http://www.dailystar.com.lb/RSS.aspx?id=113" 2>/dev/null | grep -E -o "<title>(.*?)</title>" |sed -e 's,.*<title>\(.*?\)</title>.*,\1,g' | less

Can not extract the capture group with either sed or grep

I want to extract the value pair from a key-value pair syntax but I can not.
Example I tried:
echo employee_id=1234 | sed 's/employee_id=\([0-9]+\)/\1/g'
But this gives employee_id=1234 and not 1234 which is actually the capture group.
What am I doing wrong here? I also tried:
echo employee_id=1234| egrep -o employee_id=([0-9]+)
but no success.
1. Use grep -Eo: (as egrep is deprecated)
echo 'employee_id=1234' | grep -Eo '[0-9]+'
1234
2. using grep -oP (PCRE):
echo 'employee_id=1234' | grep -oP 'employee_id=\K([0-9]+)'
1234
3. Using sed:
echo 'employee_id=1234' | sed 's/^.*employee_id=\([0-9][0-9]*\).*$/\1/'
1234
To expand on anubhava's answer number 2, the general pattern to have grep return only the capture group is:
$ regex="$precedes_regex\K($capture_regex)(?=$follows_regex)"
$ echo $some_string | grep -oP "$regex"
so
# matches and returns b
$ echo "abc" | grep -oP "a\K(b)(?=c)"
b
# no match
$ echo "abc" | grep -oP "z\K(b)(?=c)"
# no match
$ echo "abc" | grep -oP "a\K(b)(?=d)"
Using awk
echo 'employee_id=1234' | awk -F= '{print $2}'
1234
use sed -E for extended regex
echo employee_id=1234 | sed -E 's/employee_id=([0-9]+)/\1/g'
You are specifically asking for sed, but in case you may use something else - any POSIX-compliant shell can do parameter expansion which doesn't require a fork/subshell:
foo='employee_id=1234'
var=${foo%%=*}
value=${foo#*=}
 
$ echo "var=${var} value=${value}"
var=employee_id value=1234

Replace string if first letter is uppercase using sed

I try to write sed answer to this question Edit a file using sed/awk using:
sed -e 's/^[A-Z]/$:$&/' file.txt
but the result is:
wednesday
$:$Weekday
$:$thursday
$:$Weekday
$:$friday
$:$Weekday
$:$saturday
$:$MaybeNot
$:$sunday
$:$MaybeNot
$:$monday
$:$Weekday
$:$tuesday
$:$Weekday
Why it replace if first character is lower case?
This is a "feature" according to this bug report caused by unexpected character ordering in the locale, further explained here and here.
$ locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[A-Z]/./g'
..........................a.........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[a-z]/./g'
.........................Z..........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | LC_ALL=C sed -e 's/[A-Z]/./g'
..........................abcdefghijklmnopqrstuvwxyz
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | LC_ALL=C sed -e 's/[a-z]/./g'
ABCDEFGHIJKLMNOPQRSTUVWXYZ..........................
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[[:upper:]]/./g'
..........................abcdefghijklmnopqrstuvwxyz
$ echo "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | sed -e 's/[[:lower:]]/./g'
ABCDEFGHIJKLMNOPQRSTUVWXYZ..........................
$ sed --version
GNU sed version 4.2.1