sed not performing expected substitution - regex

I have a bash variable, some file path (with spaces) and filename, e.g:
$ echo $tmp
/home/xyz/some/path/with spaces/AlbumArt_{random-number-sequence}_Large.jpg
When I attempt to identify the filename part with grep, e.g:
$ echo "$tmp" | egrep 'AlbumArt.*Large.jpe?g$'
/home/xyz/some/path/with spaces/**AlbumArt_{random-number-sequence}_Large.jpg**
The filename part appears to be identified correctly, but when I attempt to convert this to a sed substitution expression, e.g:
$ echo "$tmp" | sed 's#AlbumArt.*Large.jpe?g$#NewString#'
/home/xyz/some/path/with spaces/AlbumArt_{random-number-sequence}_Large.jpg
The expected substitution isn't happening. Thanks in advance for any help.

In fact egrep is a variant of grep -E, allowing to 'activate' extended regular expression (you can see: https://en.wikipedia.org/wiki/Regular_expression#Standards).
Thus, you just need to use the same option with sed:
echo "$tmp" | sed -E 's#AlbumArt.*Large.jpe?g$#NewString#'

Related

Transform a dynamic alphanumeric string

I have a Build called 700-I20190808-0201. I need to convert it to 7.0.0-I20190808-0201. I can do that with regular expression:
sed 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3\4/' abc.txt
But the solution does not work when the build ID is 7001-I20190809-0201. Can we make the regular expression dynamic so that it works for both (700 and 7001)?
Could you please try following.
awk 'BEGIN{FS=OFS="-"}{gsub(/[0-9]/,"&.",$1);sub(/\.$/,"",$1)} 1' Input_file
If you have Perl available, lookahead regular expressions make this straightforward:
$ cat foo.txt
700-I20190808-0201
7001-I20190809-0201
$ perl -ple 's/(\d)(?=\d+\-I)/\1./g' foo.txt
7.0.0-I20190808-0201
7.0.0.1-I20190809-0201
You can implement a simple loop using labels and branching using sed:
$ echo '7001-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0.1-I20190809-0201
$ echo '700-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0-I20190809-0201
If your sed support -E flag:
sed -E ':1; s/^([0-9]+)([0-9][-.])/\1.\2/; t1'
sed -e 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3.\4/' -e 's/\.\-/\-/' abc.txt
This worked for me, very simple one. Just needed to extract it in my ant script using replaceregex pattern

Replace string with another string based on backreference with sed

I'm trying to convert a predefined string %c# where # can be some number with another string. The catch is that the length of the other string must be truncated to # number of characters.
Ideally these set of commands would work:
FORMAT="%c10"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
echo $FORMAT | sed "s/%c\([0-9]\+\)/${LAST_COMMIT:0:\1}/g"
but clearly there is a syntax error on the \1. You can replace it with a number to see what I'm trying to get as output.
I'm open to using some other program other than sed to achieve this but ideally it should be programs that are pretty much native to most linux installations.
Thanks!
This is my idea.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c//')
Get number with sed and get first some character with head.
EDIT1
This might be better.
echo ${LAST_COMMIT} | head -c $(echo ${FORMAT} | sed -e 's/%c\([0-9]\+\)/\1/')
EDIT2
I make the script because it is too tough to understand. Please try this.
$ cat sample.sh
#!/bin/bash
FORMAT="%b-%t-%c10-%c5"
LAST_COMMIT="5189e42b14797b1e36ffb7fc5657c7eea08f1c0f"
## List numbers
lengths=$(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g")
## Substitute %cXX to first XX characters of LAST_COMMIT
for n in ${lengths}
do
to_str=$(echo ${LAST_COMMIT:0:${n}})
FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/")
done
## Print result
echo ${FORMAT}
This is the result.
$ ./sample.sh
%b-%t-5189e42b1410-5189e5
Also this is one line commands (Same contents but too long and too tough)
for n in $(echo ${FORMAT} | sed -e "s/%[^c]//g" -e "s/-//g" -e "s/%c/ /g"); do to_str=$(echo ${LAST_COMMIT:0:${n}}); FORMAT=$(echo ${FORMAT} | sed "s/%c${length}/${to_str}/"); done; echo ${FORMAT}
The value of $LAST_COMMIT gets interpolated before sed runs, so there is no backreference to refer back to yet. There is an /e extension in GNU sed which would support something like this, but I would simply use a slightly more capable tool.
perl -e '$fmt = shift; $fmt=~ s/%c(\d+)/%.$1s/g; printf("$fmt\n", #ARGV)' '%c10' "$LAST_COMMIT"
Of course, if you can let go of your own ad-hoc format string specifier, and switch to a printf-compatible format string altogether, just use the printf shell command straight off.
length=$(echo $FORMAT | sed "s/%c\([0-9]\+\)/\1/g")
echo "${LAST_COMMIT:0:$length}"

Sed replace asterisk symbols

I'm am trying to replace a series of asterix symbols in a text file with a -999.9 using sed. However I can't figure out how to properly escape the wildcard symbol.
e.g.
$ echo "2006.0,1.0,************,-5.0" | sed 's/************/-999.9/g'
sed: 1: "s/************/-999.9/g": RE error: repetition-operator operand invalid
Doesn't work. And
$ echo "2006.0,1.0,************,-5.0" | sed 's/[************]/-999.9/g'
2006.0,1.0,-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9-999.9,-5.0
puts a -999.9 for every * which isn't what I intended either.
Thanks!
Use this:
echo "2006.0,1.0,************,-5.0" | sed 's/[*]\+/-999.9/g'
Test:
$ echo "2006.0,1.0,************,-5.0" | sed 's/[*]\+/-999.9/g'
2006.0,1.0,-999.9,-5.0
Any of these (and more) is a regexp that will modify that line as you want:
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\**/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\+/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*+/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\{12\}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*{12}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed 's/\*\{1,\}/999.9/g'
2006.0,1.0,999.9,-5.0
$ echo "2006.0,1.0,************,-5.0" | sed -r 's/\*{1,}/999.9/g'
2006.0,1.0,999.9,-5.0
sed operates on regular expressions, not strings, so you need to learn regular expression syntax if you're going to use sed and in particular the difference between BREs (which sed uses by default) and EREs (which some seds can be told to use instead) and PCREs (which sed never uses but some other tools and "regexp checkers" do). Only the first solution above is a BRE that will work on all seds on all platforms. Google is your friend.
* is a regex symbol that needs to be escaped.
You can even use BASH string replacement:
s="2006.0,1.0,************,-5.0"
echo "${s/\**,/-999.9,}"
2006.0,1.0,-999.9,-5.0
Using sed:
sed 's/\*\+/999.9/g' <<< "$s"
2006.0,1.0,999.9,-5.0
Ya, * are special meta character which repeats the previous token zero or more times. Escape * in-order to match literal * characters.
sed 's/\*\*\*\*\*\*\*\*\*\*\*\*/-999.9/g'
When this possibility was introduced into gawk I have no idea!
gawk -F, '{sub(/************/,"-999.9",$3)}1' OFS=, file
2006.0,1.0,-999.9,-5.0

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Applying regex in bash

I'm trying to get my filename without its extension using a regex I found on Stack Overflow. The regex is:
(.+?)(\.[^.]*$|$)
I try this on the command line
echo TestFileName.1.0.0.2.zip | grep "(.+?)(\.[^.]*$|$)"
And I get nothing in the command line. If I try it with this regex:
echo TestFileName.1.0.0.2.zip | grep "Test"
I do see the TestFileName.1.0.0.2.zip gets printed to the console with Test highlighted in red. When I tried my data in this website: http://rubular.com/r/LNrI4inMU1
It does seem to work. Am I applying the regex wrong in Bash?
You're using an extended regular expression; the standard regex language which grep uses doesn't support what you're trying to do. Change grep to be grep -E and the match will work. This specifies that your regex is an extended one.
$ echo TestFileName.1.0.0.2.zip | grep -E "(.+?)(\.[^.]*$|$)"
TestFileName.1.0.0.2.zip
See this link for more information on the distinction between regular and extended regex.
Using BASH regex:
s='TestFileName.1.0.0.2.zip'
[[ "$s" =~ ^(.*)\.[^.]+$ ]] && echo "${BASH_REMATCH[1]}"
TestFileName.1.0.0.2
Add -P (Perl-regexp) parameter to your grep along with -o (only-matching).
$ echo TestFileName.1.0.0.2.zip | grep -oP "(.+)(?=\.)"
TestFileName.1.0.0.2