Sed regex not matching 'either or' inner group

Sed regex not matching 'either or' inner group - regex

I would like to match multiple file extensions passed through a pipe using sed and regex.
The following works:
sed '/.\(rb\)\$/!d'
But if I want to allow multiple file extensions, the following does not work.
sed '/.\(rb\|js\)\$/!d'
sed '/.\(rb|js\)\$/!d'
sed '/.(rb|js)\$/!d'
Any ideas on how to do either/or inner groups?
Here is the whole block of code:
#!/bin/sh
files=`git diff-index --check --cached $against | # Find all changed files
sed '/.\(rb\|js\)\$/!d' | # Only process .rb and .js files
uniq` # Remove duplicate files

I am using a Mac OSX 10.8.3 and the previous answer does not work for me, but this does:
sed -E '/\.(rb|js)$/!d'
Note: use -E to
Interpret regular expressions as extended (modern) regular expressions
rather than basic regular expressions (BRE's).
and this enables the OR function |; other versions seem to want the -r flag to enable extended regular expressions.
Note that the initial . must be escaped and the trailing $ must not be.

Try something like this:
sed '/\.\(rb\|js\)$/!d'
or if you have then use -r option to use extended regular expression for avoiding escaping special character.

Related

Bash: sed regex pattern won't match strings

I have tested this particular regex in RegExr.com:
/(\*)*((\s)?(\w)*)/g
to match the following:
* Global Links contained...etc
* Change User, contact list...etc
(everything from ... on is just extra words in the sentence, not a literal ...etc)
I tried to use this regex in a sed command as part of a bash script like so:
sed "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt
But these two lines still remain in stripped.txt. Is there something I'm not accounting for in the regex or in the file? before these two lines is the start of a block comment (/**) and the block comment end is after them(*/), both of these are on new lines. Am i missing something obscure with new lines or is the sed command/regex wrong?

You aren't accounting for the dialect of regex in use by sed by default. That's not a valid BRE (basic regular expression).
You need to tell sed to use ERE's (extended regular expressions).
For GNU sed that is the -r flag and for BSD sed that is the -E flag (though -r is often available as a compat flag).
sed -r "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt

Why doesn't this simple RegEx work with sed?

This is a really simple RegEx that isn't working, and I can't figure out why. According to this, it should work.
I'm on a Mac (OS X 10.8.2).
script.sh
#!/bin/bash
ZIP="software-1.3-licensetypeone.zip"
VERSION=$(sed 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g' <<< $ZIP)
echo $VERSION
terminal
$ sh script.sh
1.3-licensetypeone.zip

Looking at the regex documentation for OS X 10.7.4 (but should apply to OP's 10.8.2), it is mentioned in the last paragraph that
Obsolete (basic) regular expressions differ in several respects. | is an ordinary character and there is no equivalent for its functionality...
... The parentheses for nested subexpressions are \(' and )'...
sed, without any options, uses basic regular expression (BRE).
To use | in OS X or BSD's sed, you need to enable extended regular expression (ERE) via -E option, i.e.
sed -E 's/software-//g;s/-(licensetypeone|licensetypetwo).zip//g'
p/s: \| in BRE is a GNU extension.
Alternative ways to extract version number
chop-chop (parameter expansion)
VERSION=${ZIP#software-}
VERSION=${VERSION%-license*.zip}
sed
VERSION=$(sed 's/software-\(.*\)-license.*/\1/' <<< "$ZIP")
You don't necessarily have to match strings word-by-word with shell patterns or regex.

sed works with simple regular expressions. You have to backslash parentheses and a vertical bar to make it work.
sed 's/software-//g;s/-\(licensetypeone\|licensetypetwo\)\.zip//g'
Note that I backslashed the dot, too. Otherwise, it would have matched any character.

You can do this in the shell, don't need sed, parameter expansion suffices:
shopt -s extglob
ZIP="software-1.3-licensetypeone.zip"
tmp=${ZIP#software-}
VERSION=${tmp%-licensetype#(one|two).zip}
With a recent version of bash (may not ship with OSX) you can use regular expressions
if [[ $ZIP =~ software-([0-9.]+)-licensetype(one|two).zip ]]; then
VERSION=${BASH_REMATCH[1]}
fi
or, if you just want the 2nd word in a hyphen-separated string
VERSION=$(IFS=-; set -- $ZIP; echo $2)

$ man sed | grep "regexp-extended" -A2
-r, --regexp-extended
use extended regular expressions in the script.

Add a prefix to all media links in a html file

I'm trying to insert an absolute path before all images in an HTML file, like this:
<img src="/media/some_path/some_image.png"> to <img src="{ABS_PATH}/some_path/some_image.png">
I tried the following regex to identify the lines :
egrep '(src|href)="/media([^"]*)"'
I want to use sed to make these changes, but the above regexp doesn't work, any hints?
sed 's#(src|href)="/media([^"]*)"##g'
sed: -e expression #1, char 32: unknown option to `s'
EDIT:
ok, now i have:
echo 'src="/media/some_image.png"' | "egrep -o '(src|href)="/media([^"]*)"' | sed 's/(src|href)=\"\/media([^"]*)\"//g'
Sed should match the string, but it doesn't

sed doesn't understand ERE (extended regular expressions), only BRE (basic regular expressions). GNU sed has "-r" option which turn on ERE.
You should change delimiters for regular expressions, because you have slash in the regex, like this:
sed -r 's#(src|href)="/media([^"]*)"##g'
You can use almost any punctuation for delimiters.

You must escape / in sed if using it as a delimiter for the pattern.
So:
sed 's/(src|href)="/media([^"]*)"//g'
becomes:
sed 's/(src|href)="\/media([^"]*)"//g'
Perhaps what is confusing is that egrep (which uses extended regular expressions) has different rules to sed, and vanilla grep (which use basic regular expressions) when it comes to what must be escaped.

Regex in sed to replace parts of an url given a specific format

I'm having some issues in doing a simple regex using sed.
I've to do some replacement in a sql file and I'm trying to use sed.
I should replace the url of some links. The links are in the following format:
www.site1.com/blog/2012/12/12
I would like to replace site1 with site2 in all links.
To find these links I've written the following regex:
(site1.com)\/blog\/\d{4}\/\d{2}\/\d{2}
And seems to wokr properly.
Using sed to do the replacement things I've written the following code
cat back.sql | sed 's:(site1.com)\/blog\/\d{4}\/\d{2}\/\d{2}:site2.com:' > fixed.sql
But it seems is not working..

sed does not support \d (not to my knowing at least), and supports {4} only with extended regular expressions.
sed -r 's:site1.com(/blog/[0-9]{4}/[0-9]{2}/[0-9]{2}):site2.com/\1:'
as a basic regular expression (requires lots of escaping):
sed 's:site1.com\(/blog/[0-9]\{4\}/[0-9]\{2\}/[0-9]\{2\}\):site2.com/\1:'
ps. you don't need to escape slashes if you use different delemiters (:)

Looks to be a straight substitution to me:
$ sed -i s/\.site1\./\.site2\./g afile.txt
... where afile.txt contains your list of sites.
If you want to output to another file, remove the -i and redirect the output using > .

Java regex and sed aren't the same...?

Get these strings:
00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528
Apply this exp in java: ^(06700|067|00)([0-9]*).
My intention is to remove leading "06700, 067 and 00" from the beggining of the string.
It is all cool in java, group 2 always have the number I intend to, but in sed it isnt the same:
$ cat strings|sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'
00543515703528
00582124628575
0034911320020
0034911320020
005217721320739
0902345623
067913187056
00543515703528
What the heck am I missing?
Cheers,
f.

When using extended regular expressions, you also need to omit the \ before ( and ). This works for me:
sed -r 's/^(06700|067|00)([0-9]*)/\2/g' strings
note also that there's no need for a separate call to cat

I believe your problem is this:
sed defaults to BRE: The default
behaviour of sed is to support Basic
Regular Expressions (BRE). To use all
the features described on this page
set the -r (Linux) or -E (BSD) flag to
use Extended Regular Expressions
Source
Without this flag, the | character is interpreted literally. Try this example:
echo "06700|067|0055555" | sed -e 's/^\(06700|067|00\)\([0-9]*\)/\2/g'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sed regex not matching 'either or' inner group - regex

Try something like this: sed '/\.\(rb\|js\)$/!d' or if you have then use -r option to use extended regular expression for avoiding escaping special character.

Related

Bash: sed regex pattern won't match strings

Why doesn't this simple RegEx work with sed?

Add a prefix to all media links in a html file

Regex in sed to replace parts of an url given a specific format

Java regex and sed aren't the same...?

Categories

Resources