Extract few matching strings from matching lines in file using sed - regex

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'

It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.

For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Related

General solutions to replace string regex preceded and followed by '\n'

I have a file in CentOS which looks like following
[root#localhost nn]# cat -A excel.log
real1$
0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real2$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I$
real3$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real4$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real5$
0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I1^I0.5^I0.5^I0.5^I0.5^I$
real6$
I would like to replace \nreal[2-6]\n with \t\t\t' and have tried unsuccessfully the following
sed -i 's/\nreal[2-6]\n/\t\t\t/g' file
It seems that sed has difficulty to deal with line break. Any idea to fulfill the regex in CentOS?
Much appreciated!
If you want to consider perl then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])(?:\n|\z)/\t\t\t/g' file
If you want to avoid last real\d+ line to be replaced with \t\t\t then use:
perl -i -0777 -pe 's/\n(?:51[23]real|real[2-6])\n(?!\z)/\t\t\t/g' file
(?!\z) is negative lookahead to fail the match when we have line end just ahead of us.
With GNU sed, you need to use the -z option:
sed -i -z 's/\nreal[2-6]\n/\t\t\t/g' file
# ^^
Now, that you also want to handle specific alternations, you need to enable the POSIX ERE syntax, either with -r or -E option:
sed -i -Ez 's/\n(51[23]real|real[2-6])\n/\t\t\t/g' file

sed remove lines that starts with a specific pattern

I'm trying to use sed command with a regex pattern that works fine with grep. But it's not matching nothing with sed command.
I have a text file and want to delete each line that starts with (wow or waw).
This is the command I'm using But it's not working.
sed -i '/^w\(o\|a\)w/d' text.txt
I tried using the same pattern with grep and it works fine:
grep '^w\(o\|a\)w' text.txt
Anything wrong with the regex in the sed command ?
With GNU sed, you can use
sed -i '/^w[oa]w/d' file
With FreeBSD sed, use
sed -i '' '/^w[oa]w/d' file
Here, [oa] is a bracket expression matching either o or a.
See an online sed demo:
sed '/^w[oa]w/d' <<< "wow 1
waw 2
wiw 3"
Output: wiw 3.

Removing bullet point characters from text file with sed

I have a large text file in which some lines start with a bullet point (•). I'd like to remove those. I've tried
sed 's/\u2022//g' filename.txt
but that doesn't match the bullets. I've also tried pasting the bullet into my sed command, but also with no success.
E: The output of
sed --version
is
sed (GNU sed) 4.2.2
E2: If it helps figure out how to capture the bullet characters, they were originally added in Access.
E3: As suggesting in the comments,
echo -n '•' | hexdump -C
returns
00000000 95 |.|
00000001
I suggest with GNU sed:
sed 's/\x95//g' file
This is a working command for me:
# Force paste the bullet into the command line
sed 's/^•//g' filename.txt
If it doesn't work, try escaping with echo:
sed 's/^'"$(echo -ne '\u2022')"'//g' filename.txt
As PesaThe suggests, you can also use printf for escaping:
sed 's/^'"$(printf '\u2022')"'//g' filename.txt
It looks like sed doesn't understand \u sequences.
According to user manual it should be compatible with POSIX.2 BRE, which i think should work, but it doesn't.
You can try capturing the hexadecimal sequence (which i got using hexdump -C).
sed 's/^\xe2\x80\xa2//g' filename.txt
Or, alternatively, you could force bash to parse it. Just add a $ before the string.
sed $'s/\u2022//g' filename.txt

regex match specific pattern

I have
[root#centos64 ~]# cat /tmp/out
[
"i-b7a82af5",
"i-9d78f4df",
"i-92ea58d0",
"i-fa4acab8"
]
I would like to pipe though sed or grep to match the format "x-xxxxxxxx" i.e. a mix of a-z 0-9 always in 1-[8 chars length], and omit everything else
[root#centos64 ~]# cat /tmp/out| sed s/x-xxxxxxxx/
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
I know this is basic, but I can only find examples of text substitution.
grep -Eo '[a-z0-9]-[a-z0-9]{8}' file
The -E option makes it recognize extended regular expressions, so it can use {8} to match 8 repetitions.
The -o option makes it only print the part of the line that matches the regexp.
Why not just print whatever's between the quotes:
$ sed -n 's/[^"]*"\([^"]*\).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
$ awk -F\" 'NF>1{print $2}' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
Through GNU sed,
$ sed -nr 's/.*([a-z0-9]-[a-z0-9]{8}).*/\1/p' file
i-b7a82af5
i-9d78f4df
i-92ea58d0
i-fa4acab8
I think this is all you need: [0-9a-zA-Z]-[0-9a-zA-Z]{8}. Try it out here.
This should work ^[a-z0-9]-[a-zA-Z0-9]{8}$

Getting defined substring with help of sed or egrep

Everyone!!
I want to get specific substring from stdout of command.
stdout:
{"response":
{"id":"110200dev1","success":"true","token":"09ad7cc7da1db13334281b84f2a8fa54"},"success":"true"}
I need to get a hex string after token without quotation marks, the length of hex string is 32 letters.I suppose it can be done by sed or egrep. I don't want to use awk here. Because the stdout is being changed very often.
This is an alternate gnu-awk solution when grep -P isn't available:
awk -F: '{gsub(/"/, "")} NF==2&&$1=="token"{print $2}' RS='[{},]' <<< "$string"
09ad7cc7da1db13334281b84f2a8fa54
grep's nature is extracting things:
grep -Po '"token":"\K[^"]+'
-P option interprets the pattern as a Perl regular expression.
-o option shows only the matching part that matches the pattern.
\K throws away everything that it has matched up to that point.
Or an option using sed...
sed 's/.*"token":"\([^"]*\)".*/\1/'
With sed:
your-command | sed 's/.*"token":"\([^"]*\)".*/\1/'
YourStreamOrFile | sed -n 's/.*"token":"\([a-f0-9]\{32\}\)".*/\1/p'
doesn not return a full string if not corresponding