Bash: sed regex pattern won't match strings - regex

I have tested this particular regex in RegExr.com:
/(\*)*((\s)?(\w)*)/g
to match the following:
* Global Links contained...etc
* Change User, contact list...etc
(everything from ... on is just extra words in the sentence, not a literal ...etc)
I tried to use this regex in a sed command as part of a bash script like so:
sed "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt
But these two lines still remain in stripped.txt. Is there something I'm not accounting for in the regex or in the file? before these two lines is the start of a block comment (/**) and the block comment end is after them(*/), both of these are on new lines. Am i missing something obscure with new lines or is the sed command/regex wrong?

You aren't accounting for the dialect of regex in use by sed by default. That's not a valid BRE (basic regular expression).
You need to tell sed to use ERE's (extended regular expressions).
For GNU sed that is the -r flag and for BSD sed that is the -E flag (though -r is often available as a compat flag).
sed -r "/(\*)*((\s)?(\w)*)/d" test.txt > stripped.txt

Related

PCRE Regex to SED

I am trying to take PCRE regex and use it in SED, but I'm running into some issues. Please note that this question is representative of a bigger issue (how to convert PCRE regex to work with SED) so the question is not simply about the example below, but about how to use PCRE regex in SED regex as a whole.
This example is extracting an email address from a line, and replacing it with "[emailaddr]".
echo "My email is abc#example.com" | sed -e 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g'
I've tried the following replace regex:
([a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4})
[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}
([a-zA-Z0-9]+[#][a-zA-Z0-9]+[.][A-Za-z]{2,4})
[a-zA-Z0-9]+[#][a-zA-Z0-9]+[.][A-Za-z]{2,4}
I've tried changing the delimited of sed from s/find/replace/g to s|find|replace|g as outlined here (stack overflow: pcre regex to sed regex).
I am still not able to figure out how to use PCRE regex in SED, or how to convert PCRE regex to SED. Any help would be great.
Want PCRE (Perl Compatible Regular Expressions)? Why don't you use perl instead?
perl -pe 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
<<< "My email is abc#example.com"
Output:
My email is [emailaddr]
Write output to a file with tee:
perl -pe 's/[a-zA-Z0-9]+[#][a-zA-Z0-9]+[\.][A-Za-z]{2,4}/[emailaddr]/g' \
<<< "My email is abc#example.com" | tee /path/to/file.txt > /dev/null
Use the -r flag enabling the use of extended regular expressions. ( -E instead of -r on OS X )
echo "My email is abc#example.com" | sed -r 's/[a-zA-Z0-9]+#[a-zA-Z0-9]+\.[A-Za-z]{2,4}/[emailaddr]/g'
Ideone Demo
GNU sed uses basic regular expressions or, with the -r flag, extended regular expressions.
Your regex as a POSIX basic regex (thanks mklement0):
[[:alnum:]]\{1,\}#[[:alnum:]]\{1,\}\.[[:alpha:]]\{2,4\}
Note that this expression will not match all email addresses (not by a long shot).
for multiline use the 0!
perl -0pe 's/search/replace/gms' file
Sometimes this might be helpful too as a work-around:
str=$(grep -Poh "pcre-pattern" file)
sed -i "s/$str/$something_else/" file
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

Sed regex not matching 'either or' inner group

I would like to match multiple file extensions passed through a pipe using sed and regex.
The following works:
sed '/.\(rb\)\$/!d'
But if I want to allow multiple file extensions, the following does not work.
sed '/.\(rb\|js\)\$/!d'
sed '/.\(rb|js\)\$/!d'
sed '/.(rb|js)\$/!d'
Any ideas on how to do either/or inner groups?
Here is the whole block of code:
#!/bin/sh
files=`git diff-index --check --cached $against | # Find all changed files
sed '/.\(rb\|js\)\$/!d' | # Only process .rb and .js files
uniq` # Remove duplicate files
I am using a Mac OSX 10.8.3 and the previous answer does not work for me, but this does:
sed -E '/\.(rb|js)$/!d'
Note: use -E to
Interpret regular expressions as extended (modern) regular expressions
rather than basic regular expressions (BRE's).
and this enables the OR function |; other versions seem to want the -r flag to enable extended regular expressions.
Note that the initial . must be escaped and the trailing $ must not be.
Try something like this:
sed '/\.\(rb\|js\)$/!d'
or if you have then use -r option to use extended regular expression for avoiding escaping special character.

Replace certain strings from text with SED and REGEX

I have the following strings in a text file (big one, more like these and different):
79A18D7F-1517-5981-8446-3A0452727B06
7842A72D-1517-5281-84E4-EAEF09B743F7
6040BEE7-1517-5982-84C1-419B224E647E
615F2747-1517-5981-84AF-787C34967FB2
7468A3E3-1517-5931-84B3-3FC3F701C269
I can find them using grep and regex:
'[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
what's the sed regex syntax to delete them because:
sed "s/[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}//g"
doesn't seem to work.
Thanks!
Use sed -r. You are relying on extended regular expression syntax features without escaping them, but with sed -r you don't have to. If you want to actually delete the lines instead of just clearing them, you can use:
sed -r "/regex/d"
In addition, for regular sed (BRE) you would need to escape the curly braces:
sed 's/[0-9A-F]\{8\}-[0-9]\{4\}-[0-9]\{4\}-[0-9A-F]\{4\}-[0-9A-F]\{12\}//g' file

sed remove digits at end of the line

I need to find out how to delete up to 10 digits that are at the end of the line in my text file using sed.
For example if I have this:
ajsdlfkjasldf1234567890
asdlkjfalskdjf123456
adsf;lkjasldfkjas123
it should become:
ajsdlfkjasldf
asdlkjfalskdjf
adsf;lkjasldfkjas
can anyone help?
I have this, but its not working:
sed 's/[0-9]{10}$//g'
Have you tried this:
sed 's/[0-9]+$//'
Your command would only match and delete exactly 10 digits at the end of line and only, if you enabled extended regular expressions (-E or -r, depending on your version of sed).
You should try
sed -r 's/[0-9]{1,10}$//'
The following should work:
sed 's/[0-9]\{1,10\}$//' file
Regex syntax in sed requires backslashes before the brackets to use them for repetition, unless you use an extended regex option.
A quick look here suggests you should try this:
$ sed 's/[0-9]\{0,10\}$//g'
{ } should be escaped, unless you switch to extended regex syntax:
$ sed -r 's/[0-9]{0,10}$//g'

Add a prefix to all media links in a html file

I'm trying to insert an absolute path before all images in an HTML file, like this:
<img src="/media/some_path/some_image.png"> to <img src="{ABS_PATH}/some_path/some_image.png">
I tried the following regex to identify the lines :
egrep '(src|href)="/media([^"]*)"'
I want to use sed to make these changes, but the above regexp doesn't work, any hints?
sed 's#(src|href)="/media([^"]*)"##g'
sed: -e expression #1, char 32: unknown option to `s'
EDIT:
ok, now i have:
echo 'src="/media/some_image.png"' | "egrep -o '(src|href)="/media([^"]*)"' | sed 's/(src|href)=\"\/media([^"]*)\"//g'
Sed should match the string, but it doesn't
sed doesn't understand ERE (extended regular expressions), only BRE (basic regular expressions). GNU sed has "-r" option which turn on ERE.
You should change delimiters for regular expressions, because you have slash in the regex, like this:
sed -r 's#(src|href)="/media([^"]*)"##g'
You can use almost any punctuation for delimiters.
You must escape / in sed if using it as a delimiter for the pattern.
So:
sed 's/(src|href)="/media([^"]*)"//g'
becomes:
sed 's/(src|href)="\/media([^"]*)"//g'
Perhaps what is confusing is that egrep (which uses extended regular expressions) has different rules to sed, and vanilla grep (which use basic regular expressions) when it comes to what must be escaped.