regex match specific pattern - regex

I have
[root#centos64 ~]# cat /tmp/out
[
"i-b7a82af5",
"i-9d78f4df",
"i-92ea58d0",
"i-fa4acab8"
]
I would like to pipe though sed or grep to match the format "x-xxxxxxxx" i.e. a mix of a-z 0-9 always in 1-[8 chars length], and omit everything else
[root#centos64 ~]# cat /tmp/out| sed s/x-xxxxxxxx/
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
I know this is basic, but I can only find examples of text substitution.

grep -Eo '[a-z0-9]-[a-z0-9]{8}' file
The -E option makes it recognize extended regular expressions, so it can use {8} to match 8 repetitions.
The -o option makes it only print the part of the line that matches the regexp.

Why not just print whatever's between the quotes:
$ sed -n 's/[^"]*"\([^"]*\).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
$ awk -F\" 'NF>1{print $2}' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8

Through GNU sed,
$ sed -nr 's/.*([a-z0-9]-[a-z0-9]{8}).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8

I think this is all you need: [0-9a-zA-Z]-[0-9a-zA-Z]{8}. Try it out here.

This should work ^[a-z0-9]-[a-zA-Z0-9]{8}$

Related

Extract version using grep/regex in bash

I have a file that has a line stating
version = "12.0.08-SNAPSHOT"
The word version and quoted strings can occur on multiple lines in that file.
I am looking for a single line bash statement that can output the following string:
12.0.08-SNAPSHOT
The version can have RELEASE tag too instead of SNAPSHOT.
So to summarize, given
version = "12.0.08-SNAPSHOT"
expected output: 12.0.08-SNAPSHOT
And given
version = "12.0.08-RELEASE"
expected output: 12.0.08-RELEASE
The following command prints strings enquoted in version = "...":
grep -Po '\bversion\s*=\s*"\K.*?(?=")' yourFile
-P enables perl regexes, which allow us to use features like \K and so on.
-o only prints matched parts instead of the whole lines.
\b ensures that version starts at a word boundary and we do not match things like abcversion.
\s stands for any kind of whitespace.
\K lets grep forget, that it matched the part before \K. The forgotten part will not be printed.
.*? matches as few chararacters as possible (the matching part will be printed) ...
(?=") ... until we see a ", which won't be included in the match either (this is called a lookahead).
Not all grep implementations support the -P option. Alternatively, you can use perl, as described in this answer:
perl -nle 'print $& if m{\bversion\s*=\s*"\K.*?(?=")}' yourFile
Seems like a job for cut:
$ echo 'version = "12.0.08-SNAPSHOT"' | cut -d'"' -f2
12.0.08-SNAPSHOT
$ echo 'version = "12.0.08-RELEASE"' | cut -d'"' -f2
12.0.08-RELEASE
Portable solution:
$ echo 'version = "12.0.08-RELEASE"' |sed -E 's/.*"(.*)"/\1/g'
12.0.08-RELEASE
or even:
$ perl -pe 's/.*"(.*)"/\1/g'.
$ awk -F"\"" '{print $2}'

regex -- grepping for alphabetic characters only

I have a quick regex question.
Let's say I have a list of packages:
packageA-0:8.39-6.fc24.x86_64
packageB-0:6.4-1.fc24.x86_64
packageB-utils-0:3.63-2.fc24.x86_64
What I want returned is:
packageA
packageB
packageB-utils
I've tried
grep -oP '^[a-z]*' myfile.txt
and
awk -F"[_-]" '{print $1}' myfile.txt
Any ideas? I think I'm sort of close, but I just can't get packageB-utils
.*?(?=-\d)
.*? => everything non greedy
(?=-\d) => until "-" followed by a digit
Try this. Selects everything upto the last alphabet:
grep -o "^[a-zA-Z-]*[a-zA-Z]" file.txt
Or, if your package name also contains digits, you can use sed to trim out everything after -0:...:
sed 's|-[0-9]*:.*||' file.txt
With sed using grouping:
sed -rn 's/([A-Za-z\-]+)\-(.*)/\1/p' packages.txt
Should yield:
#packageA
#packageB
#packageB-utils
packages.txt contains:
packageA-0:8.39-6.fc24.x86_64
packageB-0:6.4-1.fc24.x86_64
packageB-utils-0:3.63-2.fc24.x86_64

sed regex with alternative on Solaris doesn't work

Currently I'm trying to use sed with regex on Solaris but it doesn't work.
I need to show only lines matching to my regex.
sed -n -E '/^[a-zA-Z0-9]*$|^a_[a-zA-Z0-9]*$/p'
input file:
grtad
a_pitr
_aupa
a__as
baman
12353
ai345
ki_ag
-MXx2
!!!23
+_)#*
I want to show only lines matching to above regex:
grtad
a_pitr
baman
12353
ai345
Is there another way to use alternative? Is it possible in perl?
Thanks for any solutions.
With Perl
perl -ne 'print if /^(a_)?[a-zA-Z0-9]*$/' input.txt
The (a_)? matches a_ one-or-zero times, so optionally. It may or may not be there.
The (a_) also captures the match, what is not needed. So you can use (?:a_)? instead. The ?: makes () only group what is inside (so ? applies to the whole thing), but not remember it.
with grep
$ grep -xiE '(a_)?[a-z0-9]*' ip.txt
grtad
a_pitr
baman
12353
ai345
-x match whole line
-i ignore case
-E extended regex, if not available, use grep -xi '\(a_\)\?[a-z0-9]*'
(a_)? zero or one time match a_
[a-z0-9]* zero or more alphabets or numbers
With sed
sed -nE '/^(a_)?[a-zA-Z0-9]*$/p' ip.txt
or, with GNU sed
sed -nE '/^(a_)?[a-z0-9]*$/Ip' ip.txt

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Getting defined substring with help of sed or egrep

Everyone!!
I want to get specific substring from stdout of command.
stdout:
{"response":
{"id":"110200dev1","success":"true","token":"09ad7cc7da1db13334281b84f2a8fa54"},"success":"true"}
I need to get a hex string after token without quotation marks, the length of hex string is 32 letters.I suppose it can be done by sed or egrep. I don't want to use awk here. Because the stdout is being changed very often.
This is an alternate gnu-awk solution when grep -P isn't available:
awk -F: '{gsub(/"/, "")} NF==2&&$1=="token"{print $2}' RS='[{},]' <<< "$string"
09ad7cc7da1db13334281b84f2a8fa54
grep's nature is extracting things:
grep -Po '"token":"\K[^"]+'
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Or an option using sed...
sed 's/.*"token":"\([^"]*\)".*/\1/'
With sed:
your-command | sed 's/.*"token":"\([^"]*\)".*/\1/'
YourStreamOrFile | sed -n 's/.*"token":"\([a-f0-9]\{32\}\)".*/\1/p'
doesn not return a full string if not corresponding