Regex with sed to parse archive name - regex

I'd like to parse different kinds of Java archive with the sed command line tool.
Archives can have the followin extensions:
.jar, .war, .ear, .esb
What I'd like to get is the name without the extension, e.g. for Foobar.jar I'd like to get Foobar.
This seems fairly simple, but I cannot come up with a solution that works and is also robust.
I tried something along the lines of sed s/\.+(jar|war|ear|esb)$//, but could not make it work.

You were nearly there:
sed -E 's/\.+(jar|war|ear|esb)$//' file
Just needed to add the -E flag to sed to interpret the expression. And of course, respect the sed 's/something/new/' syntax.
Test
$ cat a
aaa.jar
bb.war
hello.ear
buuu.esb
hello.txt
$ sed -E 's/\.+(jar|war|ear|esb)$//' a
aaa
bb
hello
buuu
hello.txt

Using sed:
s='Foobar.jar'
sed -r 's/\.(jar|war|ear|esb)$//' <<< "$s"
Foobar
OR better do it in BASH itself:
echo "${s/.[jwe]ar/}"
Foobar

You need to escape the | and the () and also add ' if you do not add option like -r or -E
echo "test.jar" | sed 's/\.\(jar\|war\|ear\|esb\)$//'
test
* is also not needed, sine you normal have only one .

On traditionnal UNIX (tested with AIX/KSH)
File='Foobar.jar'
echo ${File%.*}
from a list having only your kind of file
YourList | sed 's/\....$//'
form a list of all kind of file
YouList | sed -n 's/\.[jew]ar$/p
t
s/\.esb$//p'

Related

how to replace continuous pattern in text

i have text like 1|2|3||| , and try to replace each || with |0|, my command is following
echo '1|2|3|||' | sed -e 's/||/|0|/g'
but get result 1|2|3|0||, the pattern is only replaced once.
could someone help me improve the command, thx
Just do it 2 times
l_replace='s#||#|0|#g'
echo '1|2|3||||||||4||5|||' | sed -e "$l_replace;$l_replace"
Using any sed or any awk in any shell on every Unix box:
$ echo '1|2|3|||' | sed -e 's/||/|0|/g; s/||/|0|/g'
1|2|3|0|0|
$ echo '1|2|3|||' | awk '{while(gsub(/\|\|/,"|0|"));}1'
1|2|3|0|0|
This might work for you (GNU sed):
sed 's/||/|0|/g;s//[0]/g' file
or:
sed ':a;s/||/|0|/g;ta' file
The replacement needs to actioned twice because part of the match is in the replacement.

Transform a dynamic alphanumeric string

I have a Build called 700-I20190808-0201. I need to convert it to 7.0.0-I20190808-0201. I can do that with regular expression:
sed 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3\4/' abc.txt
But the solution does not work when the build ID is 7001-I20190809-0201. Can we make the regular expression dynamic so that it works for both (700 and 7001)?
Could you please try following.
awk 'BEGIN{FS=OFS="-"}{gsub(/[0-9]/,"&.",$1);sub(/\.$/,"",$1)} 1' Input_file
If you have Perl available, lookahead regular expressions make this straightforward:
$ cat foo.txt
700-I20190808-0201
7001-I20190809-0201
$ perl -ple 's/(\d)(?=\d+\-I)/\1./g' foo.txt
7.0.0-I20190808-0201
7.0.0.1-I20190809-0201
You can implement a simple loop using labels and branching using sed:
$ echo '7001-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0.1-I20190809-0201
$ echo '700-I20190809-0201' | sed ':1; s/^\([0-9]\{1,\}\)\([0-9][-.]\)/\1.\2/; t1'
7.0.0-I20190809-0201
If your sed support -E flag:
sed -E ':1; s/^([0-9]+)([0-9][-.])/\1.\2/; t1'
sed -e 's/\([0-9]\)\([0-9]\)\([0-9]\)\(.\)/\1.\2.\3.\4/' -e 's/\.\-/\-/' abc.txt
This worked for me, very simple one. Just needed to extract it in my ant script using replaceregex pattern

Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:
abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'
I have to find current_count and total_count for each line of file. I am trying below command but its not working. Please help.
grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"
It is outputting the whole line but I want something like this:
'current_count': u'3', 'total_count': u'3'
It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.
sed regexes don't support \d for digits, or x+ for xx*. GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \d still doesn't work. GNU sed also allows \+ as a meta-character in basic regex mode, but that's not POSIX standard.
So anyway, this will work:
echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output: 'current_count': u'2'
Notice that I skip the grep by using sed -n s///p. I could also have used /current_count/ as an address:
sed -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"
Or with just grep printing only the matching part of the pattern, instead of the whole line:
grep -E -o "'current_count': u'[[:digit:]]+'
(or egrep instead of grep -E). I forget if grep -o is POSIX-required behaviour.
For me this looks like some sort of serialized Python data. Basically I would try to find out the origin of that data and parse it properly.
However, while being hackish, sed can also being used here:
sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

How to cut a string from a string

My script gets this string for example:
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
let's say I don't know how long the string until the /importance.
I want a new variable that will keep only the /importance/lib1/lib2/lib3/file from the full string.
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
Here is the command in my code:
find <main_path> -name file | sed 's/.*importance//
I am not familiar with the regex, so I need your help please :)
Sorry my friends I have just wrong about my question,
I don't need the output /importance/lib1/lib2/lib3/file but /importance/lib1/lib2/lib3 with no /file in the output.
Can you help me?
I would use awk:
$ echo "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file" | awk -F"/importance/" '{print FS$2}'
importance/lib1/lib2/lib3/file
Which is the same as:
$ awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
importance/lib1/lib2/lib3/file
That is, we set the field separator to /importance/, so that the first field is what comes before it and the 2nd one is what comes after. To print /importance/ itself, we use FS!
All together, and to save it into a variable, use:
var=$(find <main_path> -name file | awk -F"/importance/" '{print FS$2}')
Update
I don't need the output /importance/lib1/lib2/lib3/file but
/importance/lib1/lib2/lib3 with no /file in the output.
Then you can use something like dirname to get the path without the name itself:
$ dirname $(awk -F"/importance/" '{print FS$2}' <<< "/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file")
/importance/lib1/lib2/lib3
Instead of substituting all until importance with nothing, replace with /importance:
~$ echo $var
/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
~$ sed 's:.*importance:/importance:' <<< $var
/importance/lib1/lib2/lib3/file
As noted by #lurker, if importance can be in some dir, you could add /s to be safe:
~$ sed 's:.*/importance/:/importance/:' <<< "/dir1/dirimportance/importancedir/..../importance/lib1/lib2/lib3/file"
/importance/lib1/lib2/lib3/file
With GNU sed:
echo '/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file' | sed -E 's#.*(/importance.*)#\1#'
Output:
/importance/lib1/lib2/lib3/file
pure bash
kent$ a="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
kent$ echo ${a/*\/importance/\/importance}
/importance/lib1/lib2/lib3/file
external tool: grep
kent$ grep -o '/importance/.*' <<<$a
/importance/lib1/lib2/lib3/file
I tried to use sed 's/.*importance//' but it's giving me the path without the importance....
You were very close. All you had to do was substitute back in importance:
sed 's/.*importance/importance/'
However, I would use Bash's built in pattern expansion. It's much more efficient and faster.
The pattern expansion ${foo##pattern} says to take the shell variable ${foo} and remove the largest matching glob pattern from the left side of the shell variable:
file_name="/dir1/dir2/dir3.../importance/lib1/lib2/lib3/file"
file_name=${file_name##*importance}
Removeing the /file at the end as you ask:
echo '<path>' | sed -r 's#.*(/importance.*)/[^/]*#\1#'
Input /dir1/dir2/dir3.../importance/lib1/lib2/lib3/file
Returns: /importance/lib1/lib2/lib3
See this "Match groups" tutorial.

Rewrite URL using sed while maintaining filename

I would like to find all instances of a URL in a file and replace them with a different link structure.
An example would be convert http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png to /images/Security_Panda.png.
I am able to identify the link using a regular expression such as:
^(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)
but need to rewrite using sed so that the file name is maintained. I understand that I will need to use s/${PATTERN}/${REPLACEMENT}/g.
Tried: sed -i 's#(http:)|([/|.|\w|\s])*\.(?:jpg|gif|png)#/dir/$1#g' test without success? Thoughts on how to improve the approach?
In basic sed, you need to escape the () symbols like \(..\) to mean a capturing group.
sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g' file
Example:
$ echo 'http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png' | sed 's~http://[.a-zA-Z0-9_/-]*\/\(\w\+\.\(jpg\|gif\|png\)\)~/images/\1~g'
/images/Security_Panda.png
You can use:
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' file
/images/Security_Panda.png
Testing:
s='http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png'
sed 's~^.*/\([^/]\{1,\}\)$~/images/\1~' <<< "$s"
/images/Security_Panda.png
Easier way if you change your idea.
#!/usr/bin/env bash
URL="http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png"
echo "/image/${URL##*/}"
Another way
command line
sed 's#^http:.*/\(.*\).$#/images/\1#g'
Example
echo "http://www.domain.com/wp-content/uploads/2013/03/Security_Panda.png "|sed 's#^http:.*/\(.*\).$#/images/\1#g'
results
/images/Security_Panda.png
An awk version:
awk -F\/ '/(jpg|gif|png) *$/ {print "/images/"$NF}' file
/images/Security_Panda.png