PCRE regex to sed regex

PCRE regex to sed regex - regex

First of all sorry for my bad english. I'm a german guy.
The code given below is working fine in PHP:
$string = preg_replace('/href="(.*?)(\.|\,)"/i','href="$1"',$string);
Now T need the same for sed. I thought it should be:
sed 's/href="(.*?)(\.|\,)"/href="{$\1}"/g' test.htm
But that gives me this error:
sed: -e expression #1, char 36:
invalid reference \1 on `s' command's
RHS

sed does not support non-greedy regex match.

sed -e 's|href=\"\(.[^"][^>]*\)\([.,]\)\">|href="\1">|g' file

You need a backslash in front of the parentheses you want to reference, thus
sed 's/href="\(.*?\)(.|\,)"/href="{$\1}"/g' test.htm

You have to escape the block selector characters ( and ) as follows.
sed 's/href="\(.*?\)\(.|\,\)"/href="{$\1}"/g' test.htm

here is a solution, it is not prefect, only deal with the situation of one extra "," or "."
sed -r -e 's/href="([^"]*)([.,]+)"/href="\1"/g' test.htm

If you want to match a literal ".", you need to escape it or use it in a character class. As an alternative to slashing the capturing parentheses (which you need to do with basic REs), you can use the -E option to tell sed to use extended REs. Lastly, the REs used by sed use \N to refer to subpatterns, where N is a digit.
sed -E "s/href=([\"'])([^\"']*)[.,]\1/href=\1\2\1/i"
This has its own issue that will prevent matches of href attributes that use both types of quotes.
man sed and man re_format will give more information on REs as used in sed.

Related

The sed command is not working with regex

I'm parsing the output of a HTTP GET request with sed to retrieve the contents of a given html tag. The result of that request is like this:
"<!DOCTYPE html><html><body><h1>Hello!</h1><p>v1.0.4-b</p></body></html>"
And I want to retrieve the version number inside the p element.
However, sed seems to have a bug in regex parsing.
When I use:
sed 's/.*<p>//'
It correctly replaces the text at the left of the version (i.e., it outputs "v1.0.4-b</p></body></html>"). But, when I try to use regex groups, with
sed 's/.*<p>(.*)<\/p>.*/\1/'
It fails to match and gives an error:
sed: -e expression #1, char 20: invalid reference \1 on `s' command's RHS.
Despite that, when I test the regex on online regex validators it works.
Thank you in advance

You need to use
sed -n 's~.*<p>\([^<]*\)</p>.*~\1~p'
sed -n -E 's~.*<p>([^<]*)</p>.*~\1~p'
See the online demo:
#!/bin/bash
sed -n 's~.*<p>\([^<]*\)</p>.*~\1~p' <<< \
"<!DOCTYPE html><html><body><h1>Hello!</h1><p>v1.0.4-b</p></body></html>"
## => v1.0.4-b
The sed 's/.*<p>(.*)<\p>.*/\1/' command would not work because
You are using a POSIX BRE pattern where the unescaped ( and ) are treated as literal parentheses chars, not a capturing group. In POSIX BRE, you need \(...\) to define a capturing group (this is why you get the invalid reference \1 exception)
If you add -E option to enable POSIX ERE, you can use (...) to define a capturing group
You are not matching /p, you have \p in the pattern.
As there are slashes in the pattern, it is more convenient to choose regex delimiters other than /, I chose ~ here.
Also, I used -n option to suppress default line output and p flag to print only the result of the substitution.

Regex with sed to search in files

I want to search recursiv in files for a given pattern and replace them. The search is for a string like "['DB']['1']['HOST'] = 'localhost'". If testing the regex the following doesn't print anything. Can't see an error in this regex? Could anyone help?
sed -n '/\[\'HOST\'\]\s?=\s?(?:\'|")(.+)(?:\'|")/p' /path/to/file

POSIX regex does not support non-capturing groups. Besides, you have not specified the -E option and the pattern is parsed as a BRE POSIX pattern where the capturing parentheses should be escaped. Also, the single quotes cannot be escaped to be used in a sed regex pattern, use \x27 instead.
Use
sed -En '/\[\x27HOST\x27\]\s?=\s?[\x27"][^\x27"]+[\x27"]/p'
See an online demo:
s="a string like ['DB']['1']['HOST'] = 'localhost'."
sed -En '/\[\x27HOST\x27\]\s?=\s?[\x27"][^\x27"]+[\x27"]/p' <<< "$s"
Besides, instead of \s, it might be a good idea to use [[:space:]].

Printing a matched regexp with sed

So I'm trying to match a regexp with any string in the middle of it and then print out just that string. The syntax is sort of like this...
sed -n 's/<title>.*</title>/"what do I put here"/p' input.file
and I just want to print out whatever .* is where I typed "what do I put here". I'm not very comfortable with sed at this point so this is likely a very simple answer and I'm having trouble finding one in any of the other questions. Thanks in advance!

Capture the pattern you want to extract within \(...\), and then you can refer to it as \1 in the replacement string:
sed -n 's/<title>\(.*\)</title>/\1/p' input.file
You can have multiple \(...\) expressions, and refer to them with \1, \2, \3, and so on.
If you have the GNU version of sed, or gsed, then you could simplify a bit:
sed -rn 's/<title>(.*)</title>/\1/p' input.file
With the -r flag, sed can use "extended regular expressions", which practically let's you write (...) instead of \(...\), + instead of \+, and other goodies.

Backreferences in sed returning wrong value

I am trying to replace an expression using sed. The regex works in vim but not in sed. I'm replacing the last dash before the number with a slash so
/www/file-name-1
should return
/www/file-name/1
I am using the following command but it keeps outputting /www/file-name/0 instead
sed 's/-[0-9]/\/\0/g' input.txt
What am I doing wrong?

You must surround between parentheses the data to reference it later, and sed begins to count in 1. To recover all the characters matched without the need of parentheses, it is used the & symbol.
sed 's/-\([0-9]\)/\/\1/g' input.txt
That yields:
/www/file-name/1

You need to capture using parenthesis before you can back reference (which start a \1). Try sed -r 's|(.*)-|\1/|':
$ sed -r 's|(.*)-|\1/|' <<< "/www/file-name-1"
/www/file-name/1
You can use any delimiter with sed so / isn't the best choice when the substitution contains /. The -r option is for extended regexp so the parenthesis don't need to be escaped.

It seems sed under OS X starts counting backreferences at 1. Try \1 instead of \0

using backreferences regex in sed

I would like to remove multiple spaces in a file with a single character.
Example
cat kill rat
dog kill cat
I used the following regex, which seemed to matched in http://www.regexpal.com/ but wasn't working in sed.
([^ ])*([ ])*
I used the sed command like so:
sed s/\(\[\^\ \]\)*\(\[\ \]\)*/\$1\|/g < inputfile
I expect,
cat|kill|rat
dog|kill|cat
But I couldn't get it to work. Any help would be much appreciated. Thanks.
Edit:
kindly note that cat/dog could be any character than whitespace.

sed backreferences with backslashes, so use \1 instead of $1.
Surround your expressions with quotes:
sed 's/match/replace/g' < inputfile
Manpages are the best invention in Linux world: man sed
Watch out for *, it can actually match NOTHING.
If you want to replace multiple spaces with a '|', use this RE:
sed -r 's/ +/\|/g'
From man sed:
-r, --regexp-extended
use extended regular expressions in the script.
You don't need any backreferences if you just want to replace all spaces.
Replace (space) by \s if you want to match tabs too.

I know the OP wanted with sed and that the question is old, but what about tr -s ' ' input?

What about :
s/\s+/\|/g

You can use:
sed -e 's/[[:blank:] ]/\|/g ' < inputfile
whete [:blank:] is space and tab characters

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PCRE regex to sed regex - regex

sed does not support non-greedy regex match.

sed -e 's|href=\"\(.[^"][^>]*\)\([.,]\)\">|href="\1">|g' file

You need a backslash in front of the parentheses you want to reference, thus sed 's/href="\(.*?\)(.|\,)"/href="{$\1}"/g' test.htm

You have to escape the block selector characters ( and ) as follows. sed 's/href="\(.*?\)\(.|\,\)"/href="{$\1}"/g' test.htm

here is a solution, it is not prefect, only deal with the situation of one extra "," or "." sed -r -e 's/href="([^"]*)([.,]+)"/href="\1"/g' test.htm

Related

The sed command is not working with regex

Regex with sed to search in files

Printing a matched regexp with sed

Backreferences in sed returning wrong value

using backreferences regex in sed

Categories

Resources