How to use sed to remove part of file path - regex

I have lines like these
my_list=cloning/cloning-1.7.jar,commons/commons-lang-2.5.jar
my_lib_list=antlr/antlr-1.0.jar,aopa/aopa-1.0.jar
and I need to remove the part before '/' like this:
my_list=cloning-1.7.jar,commons-lang-2.5.jar
my_lib_list=antlr-1.0.jar,aopa-1.0.jar
I tried this
sed -i -e "s/(?<=\/).*?(\.jar)//g"
Nothing happens. Regex seems to be right (might need to be inverted), but atleast something should change in the file, right?

Your pattern - (?<=\/).*?(\.jar) - contains a lookbehind ((?<=...)) and lazy matching quantifier (*?). Neither are supported by sed.
You can use
sed -E 's/[[:alnum:]]+\///g'
See the IDEONE demo
Pattern details:
[[:alnum:]]+ - 1 or more alphanumeric symbols
\/ - a literal /

You can do:
sed -r 's#^([^=]+=)[^/]+/([^,]+,)[^/]+/(.*)#\1\2\3#'
Example:
$ sed -r 's#^([^=]+=)[^/]+/([^,]+,)[^/]+/(.*)#\1\2\3#' <<<'my_list=cloning/cloning-1.7.jar,commons/commons-lang-2.5.jar'
my_list=cloning-1.7.jar,commons-lang-2.5.jar
$ sed -r 's#^([^=]+=)[^/]+/([^,]+,)[^/]+/(.*)#\1\2\3#' <<<'my_lib_list=antlr/antlr-1.0.jar,aopa/aopa-1.0.jar'
my_lib_list=antlr-1.0.jar,aopa-1.0.jar

Related

Extract string between underscores and dot

I have strings like these:
/my/directory/file1_AAA_123_k.txt
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
So basically, the number of underscores is not fixed. I would like to extract the string between the first underscore and the dot. So the output should be something like this:
AAA_123_k
CCC
KK_45
I found this solution that works:
string='/my/directory/file1_AAA_123_k.txt'
tmp="${string%.*}"
echo $tmp | sed 's/^[^_:]*[_:]//'
But I am wondering if there is a more 'elegant' solution (e.g. 1 line code).
With bash version >= 3.0 and a regex:
[[ "$string" =~ _(.+)\. ]] && echo "${BASH_REMATCH[1]}"
You can use a single sed command like
sed -n 's~^.*/[^_/]*_\([^/]*\)\.[^./]*$~\1~p' <<< "$string"
sed -nE 's~^.*/[^_/]*_([^/]*)\.[^./]*$~\1~p' <<< "$string"
See the online demo. Details:
^ - start of string
.* - any text
/ - a / char
[^_/]* - zero or more chars other than / and _
_ - a _ char
\([^/]*\) (POSIX BRE) / ([^/]*) (POSIX ERE, enabled with E option) - Group 1: any zero or more chars other than /
\. - a dot
[^./]* - zero or more chars other than . and /
$ - end of string.
With -n, default line output is suppressed and p only prints the result of successful substitution.
With your shown samples, with GNU grep you could try following code.
grep -oP '.*?_\K([^.]*)' Input_file
Explanation: Using GNU grep's -oP options here to print exact match and to enable PCRE regex respectively. In main program using regex .*?_\K([^.]*) to get value between 1st _ and first occurrence of .. Explanation of regex is as follows:
Explanation of regex:
.*?_ ##Matching from starting of line to till first occurrence of _ by using lazy match .*?
\K ##\K will forget all previous matched values by regex to make sure only needed values are printed.
([^.]*) ##Matching everything till first occurrence of dot as per need.
A simpler sed solution without any capturing group:
sed -E 's/^[^_]*_|\.[^.]*$//g' file
AAA_123_k
CCC
KK_45
If you need to process the file names one at a time (eg, within a while read loop) you can perform two parameter expansions, eg:
$ string='/my/directory/file1_AAA_123_k.txt.2'
$ tmp="${string#*_}"
$ tmp="${tmp%%.*}"
$ echo "${tmp}"
AAA_123_k
One idea to parse a list of file names at the same time:
$ cat file.list
/my/directory/file1_AAA_123_k.txt.2
/my/directory/file2_CCC.txt
/my/directory/file2_KK_45.txt
$ sed -En 's/[^_]*_([^.]+).*/\1/p' file.list
AAA_123_k
CCC
KK_45
Using sed
$ sed 's/[^_]*_//;s/\..*//' input_file
AAA_123_k
CCC
KK_45
This is easy, except that it includes the initial underscore:
ls | grep -o "_[^.]*"

Why does this regex work in grep but not sed?

I have two regular expressions:
$ grep -E '\-\- .*$' *.sql
$ sed -E '\-\- .*$' *.sql
(I am trying to grep lines in sql files that have comments and remove lines in sql files that have comments)
The grep command works using this regex; however, the sed returns the following error:
sed: -e expression #1, char 7: unterminated address regex
What am I doing incorrectly with sed?
(The space after the two hyphens is required for sql comments if you are unfamiliar with MySql comments of this type)
You're trying to use:
sed -E '\-\- .*$' *.sql
Here sed command is not correct because you're not really telling sed to do something.
It should be:
sed -n '/-- /p' *.sql
and equivalent grep would be:
grep -- '-- ' *.sql
or even better with a fixed string search:
grep -F -- '-- ' *.sql
Using -- to separate pattern and arguments in grep command.
There is no need to escape - in a regex if it is outside bracket expression (or character class) i.e. [...].
Based on comments below it seems OP's intent is to remove commented section in all *.sql files that start with 2 hyphens.
You may use this sed for that:
sed -i 's/-- .*//g' *.sql
The problem here is not the regex, the problem is that sed requires a command. The equivalent of your grep would be:
sed -n '/\-\- .*$/p'
You suppress output for non-matching lines -n ... you search (wrap your regex in slashes) and you print p (after the last slash).
P.S.: As Anub pointed out, escaping the hyphens - inside the regex is unnecessary.
You are trying to use sed's \cregexpc syntax where with \-<...> you are telling sed the delimiter character you want use is a dash -, but you didn't terminate it where it should be: \-<...>- also add d command to delete those lines.
sed '\-\-\-.*$-d' infile
see man sed about that:
\cregexpc
Match lines matching the regular expression regexp. The c may be any character.
if default / was used this was not required so:
sed '/--.*$/d' infile
or simply:
sed '/^--/d' infile
and more accurately:
sed '/^[[:blank:]]*--/d' infile

Can grep delete context, but not a full line?

I am removing keys from a config file by the following command:
cat showrunningconfig.txt | grep -v '[ \t\r\n\v\f]*[A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9][A-Fa-f0-9]'
This removes the whole line.
But I want to remove only the relevant patterns.
grep has the -o option, which shows only the relevant pattern and not the whole line.
But the -o option is not working in combination with -v
Any idea?
Thanks a lot!
You should use sed when you have a partial pattern to remove from a string.
sed -i 's/[[:space:]]*[[:xdigit:]]\{8\}//g' showrunningconfig.txt
See the online demo
s="Text A1f4E3D4 and more text"
sed 's/[[:space:]]*[[:xdigit:]]\{8\}//g' <<< "$s"
# => Text and more text
Details
-i - in-place replacement (GNU sed option)
s/[[:space:]]*[[:xdigit:]]\{8\}//g:
s - substitute command
[[:space:]]* - 0+ whitespaces
[[:xdigit:]]\{8\} - eight A-F, a-f and 0-9 chars.

Replace a string between regular expressions

I have a csv file with the following contents:
INTERB-MNT,2008-09-10T21:05:38Z,2008-09-10T21:05:38Z,MARIA
How can I use sed to replace the characters 'T' and 'Z', such that the contents of the file are changed to the following?:
INTERB-MNT,2008-09-10,21:05:38,UTC,2008-09-10,21:05:38,UTC,MARIA
I tried the following, but obviously I'm missing something because that does not produce the desired results:
sed -e 's/[0-9]{4}-[0-9]{2}-[0-9]{2}.T.[0-9]{2}:[0-9]{2}:[0-9]{2}Z/[0-9]{4}-[0-9]{2}-[0-9]{2},[0-9]{2}:[0-9]{2}:[0-9]{2}UTC/g'
To keep your text after substitution, you have to capture input with parens, and then use \1 through \9 to refer to the captured matching in the substitution part. To be able to use \1 through \9 backreferences, you have to use -E or -r options.
The command will look like this:
sed -r 's/(.+)T(.+)Z/\1,\2,UTC/g'
But this can't be used: the T will match the last part of the string because (.+) is greedy. So your idea to match 2008-09-10 and 21:05:38 pattern is good. You ended up with this:
sed -r 's/([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2})Z/\1,\2,UTC/g'
This works. You could also use this simpler command:
sed -r 's/(....-..-..)T(..:..:..)Z/\1,\2,UTC/g'
It is easier to read and write, and a false positive is very unlikely. It depends on your needs.

Using sed to replace IP using regex

Assuming a simple text file:
123.123.123.123
I would like to replace the IP inside of it with 222.222.222.222. I have tried the below but nothing changes, however the same regex seems to work in this Regexr
sed -i '' 's/(\d{1,3}\.){3}\d{1,3}/222.222.222.222/' file.txt
Am I missing something?
Two problems here:
sed doesn't like PCRE digit property \d, use range: [0-9] or POSIX [[:digit:]]
You need to use -r flag for extended regex as well.
This should work:
s='123.123.123.123'
sed -r 's/([0-9]{1,3}\.){3}[0-9]{1,3}/222.222.222.222/' <<< "$s"
222.222.222.222
Better would be to use anchors to avoid matching unexpected input:
sed -r 's/^([0-9]{1,3}\.){3}[0-9]{1,3}$/222.222.222.222/' <<< "$s"
PS: On OSX use -E instead of -r:
sed -E 's/^([0-9]{1,3}\.){3}[0-9]{1,3}$/222.222.222.222/' <<< "$s"
222.222.222.222
You'd better use -r, as indicated by anubhava.
But in case you don't have it, you have to escape every single (, ), { and }. And also, use [0-9] instead of \d:
$ sed 's/\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}/222.222.222.222/' <<< "123.123.123.123"
222.222.222.222