How Can I Match the Last String Using this REGEX - regex

I'm working on a bash script and need to use SED and a REGEX to match this line in a text file:
database.system = "pgsql://hostaddr=127.0.0.1 port=5432 dbname=mydb user=myuser password=mypassword options='' application_name='myappname'";
This is the regex I've come up with:
database.system\s=\s((?=")(.*)(?=;))
So far my regex is matching everything except for the last semi-colon. How do I modify the regex to catch the semi-colon as well?

You're using look-ahead assertions in your regular expression ((?=...)), which sed doesn't support.
However, you don't need them, if all you're trying to do is to extract the string inside the double quotes (using GNU sed syntax):
line=$'database.system = "pgsql://hostaddr=127.0.0.1 port=5432 dbname=mydb user=myuser password=mypassword options=\'\' application_name=\'myappname\'";'
sed -rn 's/database\.system\s*=\s*"(.*)";/\1/p' <<<"$line"
# use var=$(sed ...) to capture command output in a variable.
will extract
pgsql://hostaddr=127.0.0.1 port=5432 dbname=mydb user=myuser password=mypassword options='' application_name='myappname'
-r activates support for extended regular expressions, which function more like regular expressions in other languages (without -r, sed only supports basic regular expressions, whose feature set is limited and whose escaping rules are different).
-n suppresses printing of each input line by default, so that an explicit output command is needed to produce output.
s/<regex>/<replacement>/p matches each input line against <regex>, replaces it with <replacement>, and prints the result (p), but only if a match was found; \1 refers to the first (and only, in this case) capture group ((...)) defined in .
The basic approach is to match the entire line, yet limit the (one and only) capture group to the substring of interest, and then replace the line with only the capture group, which effectively outputs only the substring of interest for each matching line.

Related

How to comment a include line using sed [duplicate]

I am using sed in a shell script to edit filesystem path names. Suppose I want to replace
/foo/bar
with
/baz/qux
However, sed's s/// command uses the forward slash / as the delimiter. If I do that, I see an error message emitted, like:
▶ sed 's//foo/bar//baz/qux//' FILE
sed: 1: "s//foo/bar//baz/qux//": bad flag in substitute command: 'b'
Similarly, sometimes I want to select line ranges, such as the lines between a pattern foo/bar and baz/qux. Again, I can't do this:
▶ sed '/foo/bar/,/baz/qux/d' FILE
sed: 1: "/foo/bar/,/baz/qux/d": undefined label 'ar/,/baz/qux/d'
What can I do?
You can use an alternative regex delimiter as a search pattern by backslashing it:
sed '\,some/path,d'
And just use it as is for the s command:
sed 's,some/path,other/path,'
You probably want to protect other metacharacters, though; this is a good place to use Perl and quotemeta, or equivalents in other scripting languages.
From man sed:
/regexp/
Match lines matching the regular expression regexp.
\cregexpc
Match lines matching the regular expression regexp. The c may be any character other than backslash or newline.
s/regular expression/replacement/flags
Substitute the replacement string for the first instance of the regular expression in the pattern space. Any character other than backslash or newline can be used instead of a slash to delimit the RE and the replacement. Within the RE and the replacement, the RE delimiter itself can be used as a literal character if it is preceded by a backslash.
Perhaps the closest to a standard, the POSIX/IEEE Open Group Base Specification says:
[2addr] s/BRE/replacement/flags
Substitute the replacement string for instances of the BRE in the
pattern space. Any character other than backslash or newline can
be used instead of a slash to delimit the BRE and the replacement.
Within the BRE and the replacement, the BRE delimiter itself can be
used as a literal character if it is preceded by a backslash."
When there is a slash / in theoriginal-string or the replacement-string, we need to escape it using \. The following command is work in ubuntu 16.04(sed 4.2.2).
sed 's/\/foo\/bar/\/baz\/qux/' file

How can I use SED to replace a specific character in a substring

So, i have a csv file with multiple lines like
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3"","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"
I would like to replace the "", with ", only for the column 8 or where its after GAUZE PACKING STRIPS 1/4, or ACE WRAP 3 without touching other "", in the line.
Have tried sed 's/[[:alnum:]]""//g' file.csv but it removes <num>"" as well.
Any ideas? Much appreciated!
You can use capture groups to match and replace anything that is between double quotes and followed immediately by double quotes.
The regex to match would look something like this: ("[^",]*")". Note two things: the first one is that " are matched literally and the expression in the middle [^",]* just means that the regex will match anything except a " or a ,. This means it will prevent the matched string from having a quote inside.
Lastly, the parenthesis are a capture group and we can reference anything that matched the sub-regex between the () with a backslash and a number. For example, \1 will be replaced by the match of the first capture group, \3 with the third and so on.
The sed script for what you need may look something like this:
sed -re 's/("[^",]*")"/\1/g'
See how the last double quote is outside the capture group, and it will not be replaced with the \1.
Capture groups are a feature of Extended Regular Expressions (ERE), so the flag -r is needed to enable them in sed, otherwise it will use Basic Regular Expressions (BRE).
Notice also the /g at the end. This is needed for sed to be able to match and replace more than one occurrence in the same line.
Example:
$ cat test
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4"","","","2019-02-04T19:09:00-05:00",""","XXX","XXX","2019-02-12T23:57:48-06:00"","XXX-XXX-176568981"
$ cat test | sed -re 's/("[^",]*")"/\1/g'
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
Using awk:
$ awk '
BEGIN { FS=OFS="," } # set delimiters
{
if($8!="\"\"") # if $8 is not empty ie. ""
sub(/\"\"$/,"\"",$8) # replace trailing double quotes with a single double quote
}1' file # output
Output:
"ABC-DEF-d98263","12345678","176568981","","588","ABC-DEF-11947","","GAUZE PACKING STRIPS 1/4","","","2019-02-04T19:09:00-05:00","","XXX","XXX","2019-02-12T23:57:48-06:00","XXX-XXX-176568981"
"ABC-DEF-d1494751","98765432","98765432","1073552394","284","ABC-DEF-77997","","ACE WRAP 3","","","2015-10-29T18:45:00-07:00","Sent","XXX","XXX","2018-04-05T19:38:41-05:00","XXX-XXX-76954940"

Recursively wrapping a regular expression with given text

For a given path, I wish to wrap a given regular expression in all files in that path or that path's sub-directories with some given text using standard Linux shell commands.
More specifically, wrap all my syslog commands with an assert command such as syslog(LOG_INFO,json_encode($obj)); becomes assert(syslog(LOG_INFO,json_encode($obj)));.
I thought the following might work, but received sed: -e expression #1, char 47: Invalid preceding regular expression error.
sed -i -E "s/(?<=syslog\()(.*)(?=\);)/assert(syslog(\1));/" /path/to/somewhere
BACKUP INFO IN RESPONSE TO Wiktor Stribiżew's ANSWER
I've never used sed before. Please confirm my understanding of your answer:
sed -i "s/syslog(\(.*\));/assert(syslog(\1));/g" /path/to/somewhere
-i edit files in place. One could first leave out to see on the screen what will be changed.
s substitute text
The three /'s surrounding the pattern and replacement (i.e. /pattern/replacement/) are deliminator and can be any single character and not just /.
syslog(\(.*\)); The pattern with one placeholder. Uses escaped parentheses.
assert(syslog(\1)); The replacement using escaped 1 (or 2, 3, etc) for replacement sub-strings.
g Replace all and not just the first match.
Would sed -i "s/syslog(.*);/assert(&);/g" /path/to/somewhere work as well?
sed patterns do not support lookarounds like (?<=...) and (?=...).
You may use a capturing group/replacement backreference:
sed -i "s/syslog(\(.*\));/assert(syslog(\1));/g" /path/to/somewhere
The pattern is of BRE POSIX flavor (no -E option is passed), so to define a capturing group you need to use escaped parentheses, and unescaped ones will match literal parentheses.
Details
syslog( - syslog( substring
\(.*\) - Group 1: any 0+ chars as many as possible
); - a ); substring
The replacement is assert(syslog(\1));, that is, the match is replaced with assert(syslog(, the contents of Group 1, and then ));.
If you need Perl-compatible regex constructs, you can use Perl (sic).
perl -i -pe 's/(?<=syslog\()(.*)(?=\);)/assert(syslog($1));/' /path/to/somewhere
Regardless of this specific solution I switched to single quotes on the assumption that you are on a Unix-ish platform. Backslashes inside double quotes are pesky (sometimes you need to double them, sometimes not).
Perl prefers $1 over \1 in the replacement pattern, though the latter will also technically work.

What does the following SED pattern exactly do?

I am working on a CGI script and the developer who worked on this before me has used a SED Pattern.
COMMAND=`echo "$QUERY_STRING" | sed -n 's/^.*com_tex=\([^&]*\).*$/\1/p' | sed "s/%20/ /g"`
Here com_tex is the name of the text box in HTML.
What this line does is it takes a value form the HTML text box and assigns it to a SHELL variable. The SED pattern is apparently (not sure) necessary to extract the value from HTML without the other unnecessary accompanying stuff.
I will also mention the issue what I am asking this. The same pattern is used for a text area where I am entering a command and I need it retrieved exactly as it is. However it's getting jumbled up. Eg. IF I enter the following command in text box:
/usr/bin/free -m >> /home/admin/memlog.txt
The value that gets stored in the variable is:
%2Fusr%2Fbin%2Ffree+-m+%3E%3E+%2Fhome%2Fadmin%2Fmemlog.txt
All of us can get that / is being substituted by %2F, a space by + and the > sign by %3E.
But I just can not figure how this is specified in the above pattern! Will someone please tell me how that pattern works or what pattern should I substitute there so that I would get my entered command instead of the output I am getting?
sed -n
-n switch means "Dont print"
's/
s is for substitutions, / is a delimiter so the command looks like
s/Thing to sub/subsitution/optional extra command
^.*com_tex=
^ means the start of the line
.* means match 0 or more of any character
So it will match the longest string from the start of the line up to com_tex=
\(\)
This is a capture group, whatever is matched inside these brackets is saved and can be used later
[^&]*
[^] When the hat is used inside square brackets it means do not match any characters inside the brackets
* The same as before means 0 or more matches
The capture group combined with this means capture any character except &.
.*$
The same as the first bit except $ means the end of the line, so this matches everything until the end
/\1/p'
After the second / is the substitution. \1 is the capture group from before, so this will substitute everything we matched in the first part(the whole line) with the capture group.
p means print, this must be explicitly stated as the -n switch was used and will prevent other lines from being printed.
|
PIPE
s/%20/ /g
Sub %20 for a space, g means global so do it for every match on the line
HTH :)
This is not performed by any of the patterns. My best guess is that this escaping is performed by the shell or whatever fetches the HTML.
I will try to explain the patterns a little at a time
sed -n
-n specifies that sed should not print out the text to be matched, ie the html, after applying the commands.
The command following is of the form 's/regexp/replacement/flags'
^.*com_tex=\([^&]*\).*$
^ matches the beginning of the line
.* matches zero to many of any character
com_tex= matches the characters literally
\([^&]*\) '\(' specifies the beginning of a group that can later be backreferenced via its index. '[^&]*' matches zero to many characters which are not '&'. '\)' specifies the end of the group.
.* See above
$ matches the end of the line
\1
The above replacement is a backreference to the first (and only) group in the regexp i.e. '[^&]*'. So the replacement replaces the entire line with all characters immediately following 'com_tex=' till the first '&'.
The p flag specifies that if a substitution took place, the current line post substitution should be printed.
sed "s/%20/ /g"
The above is much simpler, it replaces all (not just the first) occurences of '%20' with a space ' '.

How do I write a SED regex to extract a string delimited by another string?

I am using GNU sed version 4.2.1 and I am trying to write a non-greedy SED regex to extract a string that delimited by two other strings. This is easy when the delimiting strings are single-character:
s:{\([^}]*\)}:\1:g
In that example the string is delimited by '{' on the left and '}' on the right.
If the delimiting strings are multiple characters, say '{{{' and '}}}' I can adjust the above expression like this:
s:{{{\([^}}}]*\)}}}:\1:g
so the centre expression matches anything not containing the '}}}' closing string. But this only works if the match string does not contain '}' at all. Something like:
{{{cannot match {this broken} example}}}
will not work but
{{{can match this example}}}
does work. Of course
s:{{{\(.*\)}}}:\1:g
always works but is greedy so isn't suitable where multiple patterns occur on the same line.
I understand [^a] to mean anything except a and [^ab] to mean anything except a or b so, despite it appearing to work, I don't think [^}}}] is the correct way to exclude that sequence of 3 consecutive characters.
So how to I write a regex for SED that matches a string that is delimited bt two other strings ?
You are correct that [^}}}] doesn't work. A negated character class matches anything that is not one of the characters inside it. Repeating characters doesn't change the logic. So what you wrote is the same as [^}]. (It is easy to see why this works when there are no braces inside the expression).
In Perl and compatible regular expressions, you can use ? to make a * or + non-greedy:
s:{{{(.*?)}}}:$1:g
This will always match the first }}} after the opening {{{.
However, this is not possible in Sed. In fact, I don't think there is any way in Sed of doing this match. The only other way to do this is use advanced features like look-ahead, which Sed also does not have.
You can easily use Perl in a sed-like fashion with the -pe options, which cause it to take a single line of code from the command line (-e) and automatically loop over each line and print the result (-p).
perl -pe 's:{{{(.*?)}}}:$1:g'
The -i option for in-place editing of files is also useful, but make sure your regex is correct first!
For more information see perlrun.
With sed you could do something like:
sed -e :a -e 's/\(.*\){{{\(.*\)}}}/\1\2/ ; ta'
With:
{{{can match this example}}} {{{can match this 2nd example}}}
This gives:
can match this example can match this 2nd example
It is not lazy matching, but by replacing from right to left we can make use of sed's greediness.