Bash sed match a string with newlines above and below - regex

Here's an excerpt of a text file.
http_server = Server(
uuid = "9a44b850-c54f-11e3-9c1a-0800200c9a66",
)
# https_server = Server(
# uuid = "0c9cb0c0-c55e-11e3-9c1a-0800200c9a66",
# )
I want to use sed (or something similar) to extract the: "0c9cb0c0-c55e-11e3-9c1a-0800200c9a66" out of the file.
I've tried cat server.conf | sed -n 's/.*uuid = "\(.*\)",/\1/p' but it gives me both uuids. When I put in newlines like \n the sed doesn't work at all.
The unique marker for the uuid is https_server, the regex must make sure the uuid was inside the https_server.

Try this instead:
cat server.conf | sed -n -e '/https_server/{N;p}' | sed -n -e 's/.*uuid = "\([^ ]*\)",/\1/p'
Or this invoking sed once only:
cat server.conf | sed -n -e '/https_server/{N;s/.*uuid = "\([^ ]*\)",/\1/p}'
Or if there is chance of multiple empty lines between the https_server and uuid line inside the block:
cat server.conf | sed -n -e '/https_server/,/uuid/p' | sed -n -e 's/.*uuid = "\([^ ]*\)",/\1/p'

For making sure the uuid is inside the https_server, you can skip all lines until you reach that string:
cat server.conf | sed -n '/https_server/,//{//!p}' | sed -n 's/.*uuid = "\(.*\)",/\1/p'

This one looks for a https_server line which is possibly commented out, then extracts any uuid in the block which follows before the next closing parenthesis.
sed -n '/^#* *https_server *=.*(/,/)/!d;/^#* *uuid *= *"/!d;s///;s/",//p' searver.conf
This avoids the useless use of cat and the silly multiple sed invocations.
/regex/,/regex/
selects a region. The action !d simply discards any lines outside of this region.
/^ *uuid *= *"/
selects any line in the region matching this pattern. Again, !d discards any line which is not selected.
s///
deletes the previously matched pattern.
Finally,
s/",//
removes the quote and the comma at the end of the string.
Some sed dialects might want you to backslash the literal parentheses.

This might work for you (GNU sed):
sed -rn '/https_server/{n;s/.*uuid = "([^"]*)".*/\1/p;q}' server.conf
If the uuid is not on the next line but some other following line, use:
sed -rn '/https_server/{:a;n;/\)\s*$/b;s/.*uuid = "([^"]*)".*/\1/p;Ta;q}' server.conf

Related

Delete any special character using Sed

I have yet another list of subdomain. I want to remove any Wildcard subdomain which include these special characters:
()!&$#*+?
Mostly, the data are prefixly random. Also, could be middle. Here's some sample of output data
(www.imgur.com
***************diet.blogspot.com
*-1.gbc.criteo.com
------------------------------------------------------------i.imgur.com
This has been quite an inconvenience while scanning through the list. As always, I'm trying sed to fix it:
sed -i "/[!()#$&?+]/d" foo.txt ###Didn't work
sed -i "/[\!\(\)\#\$\&\?\+]/d" ###Escaping char didn't work
Performing commands above still result in an unchanged list and the file still on original state. I'm thinking that; to fix this is to pipe series of sed command in order to remove it one by one:
cat foo.txt | sed -e "/!/d" -e "/#/d" -e "/\*/d" -e "/\$/d" -e "/(/d" -e "/)/d" -e "/+/d" -e "/\'/d" -e "/&/d" >> foo2.txt
cat foo.txt | sed -e "/\!/d" | sed -e "/\#/d" | sed -e "/\*/d" | sed -e "/\$/d" | sed -e "/\+/d" | sed -e "/\'/d" | sed -e "/\&/d" >> foo2.txt
If escaping all special char doesn't work, it must've been my false logic. Also tried with /g still doesn't increase my luck.
As a side note: I don't want - to be deleted as some valid subdomain can have - character:
line-apps.com
line-apps-beta.com
line-apps-rc.com
line-apps-dev.com
Any help would be cherished.
Using sed
$ sed '/[[:punct:]]/d' input_file
This should delete all lines with special characters, however, it would help if you provided sample data.
To do what you're trying to do in your answer (which adds [ and ] and more to the set of characters in your question) would be:
sed '/[][!?+,#$&*() ]/d'
or just:
grep -v '[][!?+,#$&*() ]'
Per POSIX to include ] in a bracket expression it must be the first character otherwise it indicates the end of the bracket expression.
Consider printing lines you want instead of deleting lines you do not want, though, e.g.:
grep '^[[:alnum:]_.-]$' file
to print lines that only contain letters, numbers, underscores, dashes, and/or periods.

sed: struggling with substitution and regex for ^*=

I am running a linux bash script. From stout lines like: /gpx/trk/name=MyTrack1, I want to keep only the end of line after =.
I am struggling to understand why the following sed command is not working as I expect:
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
(I also tried)
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*\=//"
The return is always /gpx/trk/name=MyTrack1 and not MyTrack1
An even simpler way if this is the only structure you are concerned about:
echo "/gpx/trk/name=MyTrack1" | cut -d = -f 2
Simply try:
echo "/gpx/trk/name=MyTrack1" | sed 's/.*=//'
Solution 2nd: With another sed.
echo "/gpx/trk/name=MyTrack1" | sed 's/\(.*=\)\(.*\)/\2/'
Explanation: As per OP's request adding explanation for this code here:
s: Means telling sed to do substitution operation.
\(.*=\): Creating first place in memory to keep this regex's value which tells sed to keep everything in 1st place of memory from starting to till = so text /gpx/trk/name= will be in 1 place.
\(.*\): Creating 2nd place in memory for sed telling it to keep everything now(after the match of 1st one, so this will start after =) and have value in it as MyTrack1
/\2/: Now telling sed to substitute complete line with only 2nd memory place holder which is MyTrack1
Solution 3rd: Or with awk considering that your Input_file is same as shown samples.
echo "/gpx/trk/name=MyTrack1" | awk -F'=' '{print $2}'
Solution 4th: With awk's match.
echo "/gpx/trk/name=MyTrack1" | awk 'match($0,/=.*$/){print substr($0,RSTART+1,RLENGTH-1)}'
$ echo "/gpx/trk/name=MyTrack1" | sed -e "s/^.*=//"
MyTrack1
The regular expression ^.*= matches anything up to and including the last = in the string.
Your regular expression ^*= would match the literal string *= at the start of a string, e.g.
$ echo "*=/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
/gpx/trk/name=MyTrack1
The * character in a regular expression usually modifies the immediately previous expression so that zero or more of it may be matched. When * occurs at the start of an expression on the other hand, it matches the character *.
Not to take you off the sed track, but this is easy with Bash alone:
$ echo "$s"
/gpx/trk/name=MyTrack1
$ echo "${s##*=}"
MyTrack1
The ##*= pattern removes the maximal pattern from the beginning of the string to the last =:
$ s="1=2=3=the rest"
$ echo "${s##*=}"
the rest
The equivalent in sed would be:
$ echo "$s" | sed -E 's/^.*=(.*)/\1/'
the rest
Where #*= would remove the minimal pattern:
$ echo "${s#*=}"
2=3=the rest
And in sed:
$ echo "$s" | sed -E 's/^[^=]*=(.*)/\1/'
2=3=the rest
Note the difference in * in Bash string functions vs a sed regex:
The * in Bash (in this context) is glob like - itself means 'any character'
The * in a regex refers to the previous pattern and for 'any character' you need .*
Bash has extensive string manipulation functions. You can read about Bash string patterns in BashFAQ.

How to remove a space between matching words?

I've read a lot of questions about how to replace spaces from a file but I have the following problem:
I have a file like so:
<foo>"crazy foo"</foo> <bar>dull-bar</bar>
and I'm trying to remove spaces between > < and only those ones so the file would be like:
`<foo>"crazy foo"</foo><bar>dull-bar</bar>`
So far I've tried to remove then by using sed and tr. Sed is not working by any chance and using tr '> <' '><' outputs:
<foo>"crazy foo"</foo><<bar>dull-bar</bar>
sed -i -e "s/> *</></g" YourFile
-i means YourFile is modified. Remove this option to test your command and display the result in shell output.
* matches n spaces.
The g at the end of sed expression means "Replace all the occurrences".
You could try something like this
echo "<foo>"crazy foo"</foo> <bar>dull-bar</bar>" | sed 's/>[[:space:]]*</></g '
awk -F"\"" '{print $3}' file.txt | sed 's/ //g'

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

Sed : print all lines after match

I got my research result after using sed :
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | cut -f 1 - | grep "pattern"
But it only shows the part that I cut. How can I print all lines after a match ?
I'm using zcat so I cannot use awk.
Thanks.
Edited :
This is my log file :
[01/09/2015 00:00:47] INFO=54646486432154646 from=steve idfrom=55516654455457 to=jone idto=5552045646464 guid=100021623456461451463 n
um=6 text=hi my number is 0 811 22 1/12 status=new survstatus=new
My aim is to find all users that spam my site with their telephone numbers (using grep "pattern") then print all the lines to get all the information about each spam. The problem is there may be matches in INFO or id, so I use sed to get the text first.
Printing all lines after a match in sed:
$ sed -ne '/pattern/,$ p'
# alternatively, if you don't want to print the match:
$ sed -e '1,/pattern/ d'
Filtering lines when pattern matches between "text=" and "status=" can be done with a simple grep, no need for sed and cut:
$ grep 'text=.*pattern.* status='
You can use awk
awk '/pattern/,EOF'
n.b. don't be fooled: EOF is just an uninitialized variable, and by default 0 (false). So that condition cannot be satisfied until the end of file.
Perhaps this could be combined with all the previous answers using awk as well.
Maybe this is what you actually want? Find lines matching "pattern" and extract the field after text= up through just before status=?
zcat file* | sed -e '/pattern/s/.*text=\(.*\)status=[^/]*/\1/'
You are not revealing what pattern actually is -- if it's a variable, you cannot use single quotes around it.
Notice that \(.*\)status=[^/]* would match up through survstatus=new in your example. That is probably not what you want? There doesn't seem to be a status= followed by a slash anywhere -- you really should explain in more detail what you are actually trying to accomplish.
Your question title says "all line after a match" so perhaps you want everything after text=? Then that's simply
sed 's/.*text=//'
i.e. replace up through text= with nothing, and keep the rest. (I trust you can figure out how to change the surrounding script into zcat file* | sed '/pattern/s/.*text=//' ... oops, maybe my trust failed.)
The seldom used branch command will do this for you. Until you match, use n for next then branch to beginning. After match, use n to skip the matching line, then a loop copying the remaining lines.
cat file | sed -n -e ':start; /pattern/b match;n; b start; :match n; :copy; p; n ; b copy'
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | ***cut -f 1 - | grep "pattern"***
instead change the last 2 segments of your pipeline so that:
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | **awk '$1 ~ "pattern" {print $0}'**