sed regex substitution with backreference why not working? - regex

I am trying to get the complete filenames with extension (abc.jpg) using sed.
input.txt:
sfdb/asdjfj/abc.jpg
asdfj/asdfj/abd.gif
sfdb/asdjfj/abc.jpg
sfdb/asdjfj/abc-2.jpg
asdfjk/asdjf/asdf_?/sfdb/asdjfj/abc_12.jpg
asdfj/asdfj/abdasdfj
current command:
grep ".jpg" input.txt|sed 's:/\([^/]+\.jpg\):\1:gi'
the grep is just an example for getting the specified lines (although not necessary here). I already tested my regex to get only the last '/' + filename + .jpg: https://www.regex101.com/r/5GTgak/2
Expected Output:
abc.jpg
abc.jpg
abc-2.jpg
abc_12.jpg
But I am still getting the same input file. What am I doing wrong?

Use the following command:
sed -rn 's/^.*\/([^/]+\.jpg)$/\1/gp' input.txt
-r option allows extended regular expressions
/g - apply the replacement to all matches to the regexp, not just the first.
/p - if the substitution was made, then print the new pattern space.

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

Use "sed" to Remove Capture Group 1 From All Lines In a File

I currently have a file with lines like the below:
ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z
My goal is to remove everything from the "#" to the next comma, such that it instead looks like the below:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z
I'm not that experienced with utilizing sed and RegEx expressions. In playing around on a testing website, I came up with the below RegEx string, in which capture group 1 is perfectly matching to what I want to remove:
regex101.com Test
How would I go about putting this in a "sed" command against a given input file, and writing the results to a new output file. I had tried the below most recently:
sed 's/(#.+?),//' input.csv > input_Corrected.csv
Just as another note, I'm doing this in a bash script in which I have an API call generating the "input.csv" file, and then want to run this sed command to clean up the data format to match my needs.
You can use
sed 's/#[^,]*,/,/' input.csv > input_Corrected.csv
sed 's/#[^,]*//' input.csv > input_Corrected.csv
The #[^,]*, POSIX BRE pattern matches a # and then any zero or more chars other than , and then a , (in the first example, use it if there MUST be a comma after the match) and replaces with a comma (in the first example, keep the replacement empty if you use the second approach).
See the online demo:
s='ABCD123RTY,steve_tyler#gmail.com,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy#hotmail.com,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2#netnet,10.20.30.l6,2021-08-20T15:30:34.480Z'
sed 's/#[^,]*,/,/' <<< "$s"
Output:
ABCD123RTY,steve_tyler,10.20.30.142,2021-08-20T14:49:51.035Z
ABCD123QWE,thisguy,10.20.30.245,2021-08-20T14:10:22.254Z
ABCD123DFG,calvin_hobbes2,10.20.30.l6,2021-08-20T15:30:34.480Z
You can used the below regular expression in order to remove the content of the valid email address only.
sed "s/#([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})//g" input.csv > input_Corrected.csv
And as per your requirement you can use the below code. As it is going to replace all the email address on the file as you have on your file "calvin_hobbes2#netnet" which is not valid email address.
sed "s/#[^,]*//g" input.csv > input_Corrected.csv

regex that works with sed not honored with ${var//search/replace}

I am trying to simply do a regular expression replace in bash but cannot figure it out. In my test, I would like the following string transformed:
test_data(123)
to
test_xyz
I've tried the following:
echo "test_data(123)" | sed -e 's/.*\(data(.*)\).*/xyz/g'
And that gets me: xyz
Then I tried:
var=${"test_data(123)"//.*\(data(.*)\).*/xyz}
But I get an error - bad substitution
How do I get my desired results on the regex replace in bash?
${foo//$match/$replace} uses fnmatch (glob-style) patterns, not any form compatible with BRE/ERE/PCRE or other conventional regex syntax formats.
input="test_data(123)"
match='data(*)'
replace='xyz'
result=${input//$match/$replace}
echo "$result"
...properly emits test_xyz.

sed regular expression does not work as expected. Differs on pipe and file

I have a string in text file where i want to replace the version number. Quotation marks can vary from ' to ". Also spaces around = can be there and can be not as well:
$data['MODULEXXX_VERSION'] = "1.0.0";
For testing i use
echo "_VERSION'] = \"1.1.1\"" | sed "s/\(_VERSION.*\)[1-9]\.[1-9]\.[1-9]/\11.1.2/"
which works perfectly.
When i change it to search in the file (the file has the same string):
sed "s/\(_VERSION.*\)[1-9]\.[1-9]\.[1-9]/\11.1.2/" -i test.php
, it does not find anything.
After after playing with the search part of regex, i found one more odd thing:
sed "s/\(_VERSION.*\)[1-9]\./\1***/" -i test.php
works and changes the string to $data['MODULEXXX_VERSION'] = "***0.0";, but
sed "s/\(_VERSION.*\)[1-9]\.[1-9]/\1***/" -i test.php
does not find anything anymore. Why?
I am using Ubuntu 17.04 desktop.
Anyone can explain what am I doing wrong? What would be the best command for replacing version numbers in the file for the string $data['MODULEXXX_VERSION'] = "***0.0";?
The main problem is that [1-9] doesn't match the 0s in the version number. You need to use [0-9].
Besides that, you may use the following sed command:
sed -r 's/(.*_VERSION['\''"]]\s*=\s*).*/\1"1.0.1";/' conf.php
This doesn't look at the current value, it simply replaces everything after the =.
I've used -r which enables extended posix regular expressions which makes it a bit simpler to formulate the pattern.
Another, probably cleaner attempt is to store the conf.php as a template like conf.php.tpl and then use a template engine to render the file. Or if you really want to use sed, the file may look like:
$data['FOO_VERSION'] = "FOO_VERSION_TPL";
Then just use:
sed 's/FOO_VERSION_TPL/1.0.1/' conf.php.tpl > conf.php
If there are multiple values to replace:
sed \
-e 's/FOO/BAR/' \
-e 's/HELLO/WORLD/' \
conf.php.tpl > conf.php
But I recommend a template engine instead of sed. That becomes more important when the content of the variables to replace may contain characters special to regular expressions.

search and replace substring in string in bash

I have the following task:
I have to replace several links, but only the links which ends with .do
Important: the files have also other links within, but they should stay untouched.
<li>Einstellungen verwalten</li>
to
<li>Einstellungen verwalten</li>
So I have to search for links with .do, take the part before and remember it for example as $a , replace the whole link with
<s:url action=' '/>
and past $a between the quotes.
I thought about sed, but sed as I know does only search a whole string and replace it complete.
I also tried bash Parameter Expansions in combination with sed but got severel problems with the quotes and the variables.
cat ./src/main/webapp/include/stoBox2.jsp | grep -e '<a href=".*\.do">' | while read a;
do
b=${a#*href=\"};
c=${b%.do*};
sed -i 's/href=\"$a.do\"/href=\"<s:url action=\'$a\'/>\"/g' ./src/main/webapp/include/stoBox2.jsp;
done;
any ideas ?
Thanks a lot.
sed -i sed 's#href="\(.*\)\.do"#href="<s:url action='"'\1'"'/>"#g' ./src/main/webapp/include/stoBox2.jsp
Use patterns with parentheses to get the link without .do, and here single and double quotes separate the sed command with 3 parts (but in fact join with one command) to escape the quotes in your text.
's#href="\(.*\)\.do"#href="<s:url action='
"'\1'"
'/>"#g'
parameters -i is used for modify your file derectly. If you don't want to do this just remove it. and save results to a tmp file with > tmp.
Try this one:
sed -i "s%\(href=\"\)\([^\"]\+\)\.do%\1<s:url action='\2'/>%g" \
./src/main/webapp/include/stoBox2.jsp;
You can capture patterns with parenthesis (\(,\)) and use it in the replacement pattern.
Here I catch a string without any " but preceding .do (\([^\"]\+\)\.do), and insert it without the .do suffix (\2).
There is a / in the second pattern, so I used %s to delimit expressions instead of traditional /.