PCRE Regex to SED

PCRE Regex to SED - regex

I am trying to take PCRE regex and use it in SED, but I'm running into some issues. Please note that this question is representative of a bigger issue (how to convert PCRE regex to work with SED) so the question is not simply about the example below, but about how to use PCRE regex in SED regex as a whole.
This example is extracting an email address from a line, and replacing it with "[emailaddr]".
echo "My email is abc#example.com" | sed -e 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g'
I've tried the following replace regex:
([a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4})
[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}
([a-zA-Z0-9]+[#][a-zA-Z0-9]+[.][A-Za-z]{2,4})
[a-zA-Z0-9]+[#][a-zA-Z0-9]+[.][A-Za-z]{2,4}
I've tried changing the delimited of sed from s/find/replace/g to s|find|replace|g as outlined here (stack overflow: pcre regex to sed regex).
I am still not able to figure out how to use PCRE regex in SED, or how to convert PCRE regex to SED. Any help would be great.

Want PCRE (Perl Compatible Regular Expressions)? Why don't you use perl instead?
perl -pe 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
<<< "My email is abc#example.com"
Output:
My email is [emailaddr]
Write output to a file with tee:
perl -pe 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
<<< "My email is abc#example.com" | tee /path/to/file.txt > /dev/null

Use the -r flag enabling the use of extended regular expressions. ( -E instead of -r on OS X )
echo "My email is abc#example.com" | sed -r 's/[a-zA-Z0-9]+#[a-zA-Z0-9]+\.[A-Za-z]{2,4}/[emailaddr]/g'
Ideone Demo

GNU sed uses basic regular expressions or, with the -r flag, extended regular expressions.
Your regex as a POSIX basic regex (thanks mklement0):
[[:alnum:]]\{1,\}#[[:alnum:]]\{1,\}\.[[:alpha:]]\{2,4\}
Note that this expression will not match all email addresses (not by a long shot).

for multiline use the 0!
perl -0pe 's/search/replace/gms' file

Sometimes this might be helpful too as a work-around:
str=$(grep -Poh "pcre-pattern" file)
sed -i "s/$str/$something_else/" file
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Related

Regex with sed to search in files

I want to search recursiv in files for a given pattern and replace them. The search is for a string like "['DB']['1']['HOST'] = 'localhost'". If testing the regex the following doesn't print anything. Can't see an error in this regex? Could anyone help?
sed -n '/\[\'HOST\'\]\s?=\s?(?:\'|")(.+)(?:\'|")/p' /path/to/file

POSIX regex does not support non-capturing groups. Besides, you have not specified the -E option and the pattern is parsed as a BRE POSIX pattern where the capturing parentheses should be escaped. Also, the single quotes cannot be escaped to be used in a sed regex pattern, use \x27 instead.
Use
sed -En '/\[\x27HOST\x27\]\s?=\s?[\x27"][^\x27"]+[\x27"]/p'
See an online demo:
s="a string like ['DB']['1']['HOST'] = 'localhost'."
sed -En '/\[\x27HOST\x27\]\s?=\s?[\x27"][^\x27"]+[\x27"]/p' <<< "$s"
Besides, instead of \s, it might be a good idea to use [[:space:]].

Bash: sed regex pattern won't match strings

I have tested this particular regex in RegExr.com:
/(\*)*((\s)?(\w)*)/g
to match the following:
* Global Links contained...etc
* Change User, contact list...etc
(everything from ... on is just extra words in the sentence, not a literal ...etc)
I tried to use this regex in a sed command as part of a bash script like so:
sed "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt
But these two lines still remain in stripped.txt. Is there something I'm not accounting for in the regex or in the file? before these two lines is the start of a block comment (/**) and the block comment end is after them(*/), both of these are on new lines. Am i missing something obscure with new lines or is the sed command/regex wrong?

You aren't accounting for the dialect of regex in use by sed by default. That's not a valid BRE (basic regular expression).
You need to tell sed to use ERE's (extended regular expressions).
For GNU sed that is the -r flag and for BSD sed that is the -E flag (though -r is often available as a compat flag).
sed -r "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt

BASH: replacing PERL with SED for in-place substitution

Would like to replace this statement with perl:
perl -pe "s|(?<=://).+?(?=/)|$2:80|"
with
sed -e "s|<regex>|$2:80|"
Since sed has a much less powerful regex engine (for example it does not support look-arounds) the task boils down to writing a sed compatible regex to match only a domain name in a fully qualitied URL. Examples:
http://php2-mindaugasb.c9.io/Testing/JS/displayName.js
http://php2-mindaugasb.c9.io?a=Testing.js
http://www.google.com?a=Testing.js
Should become:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
A solution like this would be ok:
sed -e "s|<regex>|http://$2:80|"
Thanks :)

Use the below sed command.
$ sed "s~//[^/?]\+\([?/]\)~//\$2:80\1~g" file
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js
You must need to escape the $ at the replacement part.

sed 's|http://[^/?]*|http://$2:80|' file
Output:
http://$2:80/Testing/JS/displayName.js
http://$2:80?a=Testing.js
http://$2:80?a=Testing.js

Why doesn't this simple RegEx work with sed?

This is a really simple RegEx that isn't working, and I can't figure out why. According to this, it should work.
I'm on a Mac (OS X 10.8.2).
script.sh
#!/bin/bash
ZIP="software-1.3-licensetypeone.zip"
VERSION=$(sed 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g' <<< $ZIP)
echo $VERSION
terminal
$ sh script.sh
1.3-licensetypeone.zip

Looking at the regex documentation for OS X 10.7.4 (but should apply to OP's 10.8.2), it is mentioned in the last paragraph that
Obsolete (basic) regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality...
... The parentheses for nested subexpressions are \(' and )'...
sed, without any options, uses basic regular expression (BRE).
To use | in OS X or BSD's sed, you need to enable extended regular expression (ERE) via -E option, i.e.
sed -E 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g'
p/s: \| in BRE is a GNU extension.
Alternative ways to extract version number
chop-chop (parameter expansion)
VERSION=${ZIP#software-}
VERSION=${VERSION%-license*.zip}
sed
VERSION=$(sed 's/software-\(.*\)-license.*/\1/' <<< "$ZIP")
You don't necessarily have to match strings word-by-word with shell patterns or regex.

sed works with simple regular expressions. You have to backslash parentheses and a vertical bar to make it work.
sed 's/software-//g;s/-\(licensetypeone\|licensetypetwo\)\.zip//g'
Note that I backslashed the dot, too. Otherwise, it would have matched any character.

You can do this in the shell, don't need sed, parameter expansion suffices:
shopt -s extglob
ZIP="software-1.3-licensetypeone.zip"
tmp=${ZIP#software-}
VERSION=${tmp%-licensetype#(one|two).zip}
With a recent version of bash (may not ship with OSX) you can use regular expressions
if [[ $ZIP =~ software-([0-9.]+)-licensetype(one|two).zip ]]; then
VERSION=${BASH_REMATCH[1]}
fi
or, if you just want the 2nd word in a hyphen-separated string
VERSION=$(IFS=-; set -- $ZIP; echo $2)

$ man sed | grep "regexp-extended" -A2
-r, --regexp-extended
use extended regular expressions in the script.

Add a prefix to all media links in a html file

I'm trying to insert an absolute path before all images in an HTML file, like this:
<img src="/media/some_path/some_image.png"> to <img src="{ABS_PATH}/some_path/some_image.png">
I tried the following regex to identify the lines :
egrep '(src|href)="/media([^"]*)"'
I want to use sed to make these changes, but the above regexp doesn't work, any hints?
sed 's#(src|href)="/media([^"]*)"##g'
sed: -e expression #1, char 32: unknown option to `s'
EDIT:
ok, now i have:
echo 'src="/media/some_image.png"' | "egrep -o '(src|href)="/media([^"]*)"' | sed 's/(src|href)=\"\/media([^"]*)\"//g'
Sed should match the string, but it doesn't

sed doesn't understand ERE (extended regular expressions), only BRE (basic regular expressions). GNU sed has "-r" option which turn on ERE.
You should change delimiters for regular expressions, because you have slash in the regex, like this:
sed -r 's#(src|href)="/media([^"]*)"##g'
You can use almost any punctuation for delimiters.

You must escape / in sed if using it as a delimiter for the pattern.
So:
sed 's/(src|href)="/media([^"]*)"//g'
becomes:
sed 's/(src|href)="\/media([^"]*)"//g'
Perhaps what is confusing is that egrep (which uses extended regular expressions) has different rules to sed, and vanilla grep (which use basic regular expressions) when it comes to what must be escaped.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

PCRE Regex to SED - regex

Use the -r flag enabling the use of extended regular expressions. ( -E instead of -r on OS X ) echo "My email is abc#example.com" | sed -r 's/[a-zA-Z0-9]+#[a-zA-Z0-9]+\.[A-Za-z]{2,4}/[emailaddr]/g' Ideone Demo

GNU sed uses basic regular expressions or, with the -r flag, extended regular expressions. Your regex as a POSIX basic regex (thanks mklement0): [[:alnum:]]\{1,\}#[[:alnum:]]\{1,\}\.[[:alpha:]]\{2,4\} Note that this expression will not match all email addresses (not by a long shot).

for multiline use the 0! perl -0pe 's/search/replace/gms' file

Sometimes this might be helpful too as a work-around: str=$(grep -Poh "pcre-pattern" file) sed -i "s/$str/$something_else/" file -o, --only-matching: Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Related

Regex with sed to search in files

Bash: sed regex pattern won't match strings

BASH: replacing PERL with SED for in-place substitution

Why doesn't this simple RegEx work with sed?

Add a prefix to all media links in a html file

Categories

Resources