pattern match and add line at the end or start of a line in a text file using sed - regex

I have a text file which contains:
First link https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_f304e840-bb1d-4bcf-a993-d966c0b99ae3.jpeg?v=1452842355
Second link https://cdn.shopify.com/s/files/1/0151/0741/products/549542c704da78a0e5208b9f8c2cd26e.jpeg?v=1452842263
Third link https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_70e7e6b9-bedd-40a7-b322-542facf94c05.jpeg?v=1452842230
Fourth link https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_5485fd04-c852-4fd7-b142-92595329568a.jpeg?v=1452841841
lst link https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_fb613b45-fbbb-4b6d-b9c0-45d7f069879e.jpeg?v=1452841831
I want to match last url and append a word at start or end of the line using sed.
But it is not working. HELP
output of the command gives this error.
$sed -e 's_https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_f304e840-bb1d-4bcf-a993-d966c0b99ae3.jpeg\?v=1452842355 .*_& NOTFOUND_'
sed: -e expression #1, char 148: unknown option to `s'

Unfortunately sed is the not the best tool for this task. There is no way you can pass a plain non-regex string in a sed pattern without doing all the escaping before hand.
Better to use awk for this:
awk 'index($0, "https://cdn.shopify.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_fb613b45-fbbb-4b6d-b9c0-45d7f069879e.jpeg?v=1452841831"){
$0 = $0 " NOTFOUND"} 1' file
index function just searched for presence of given URL in a record and if found appends " NOTFOUND string at the end.
Equivalent working sed would be this:
sed 's~https://cdn\.shopify\.com/s/files/1/0151/0741/products/2c60070615ceaa44c934ca876fe4ccc0_fb613b45-fbbb-4b6d-b9c0-45d7f069879e\.jpeg?v=1452841831.*~& NOTFOUND~' file
As you can see it requires you to escape all the DOTs and pick a regex delimiter which is not already present in input string.

Why are you using _ as your regex delimiter, when that char shows up in the URLs?
[..snip..]/products/2c60070615ceaa44c934ca876fe4ccc0_fb613b45-fb
^---
You're effectively doing
s/.../f
and f is an unknown modifier for an s/ regex.

The pattern has an underscore ...fe4ccc0_f304... which you used as the delimiter for the substitute command. use some other delimiter that does not appear unescaped in the pattern or replacement string.
Try using | character instead, as in s|http://... .*$|& NOT_FOUND|.

Related

Convert regex positive look ahead to sed operation

I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file

How to find and replace a pattern string using sed/perl/awk?

I have a file foo.properties with contents like
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.03,delta:1.0,gamma:.5
In my script, I need to replace whatever value is against ph (The current value is unknown to the bash script) and change it to 0.5. So the the file should look like
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
I know it can be easily done if the current value is known by using
sed "s/\,ph\:0.03\,/\,ph\:0.5\,/" foo.properties
But in my case, I have to actually read the contents against allNames and search for the value and then replace within a for loop. Rest all is taken care of but I can't figure out the sed/perl command for this.
I tried using sed "s/\,ph\:.*\,/\,ph\:0.5\,/" foo.properties and some variations but it didn't work.
A simpler sed solution:
sed -E 's/([=,]ph:)[0-9.]+/\10.5/g' file
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
Here we match ([=,]ph:) (i.e. , or = followed by ph:) and capture in group #1. This should be followed by 1+ of [0-9.] character to natch any number. In replacement we put \1 back with 0.5
With your shown samples, please try following awk code.
awk -v new_val="0.5" '
match($0,/,ph:[0-9]+(\.[0-9]+)?/){
val=substr($0,RSTART+1,RLENGTH-1)
sub(/:.*/,":",val)
print substr($0,1,RSTART) val new_val substr($0,RSTART+RLENGTH)
next
}
1
' Input_file
Detailed Explanation: Creating awk's variable named new_val which contains new value which needs to put in. In main program of awk using match function of awk to match ,ph:[0-9]+(\.[0-9]+)? regex in each line, if a match of regex is found then storing that matched value into variable val. Then substituting everything from : to till end of value in val variable with : here. Then printing values as pre requirement of OP(values before matched regex value with val(edited matched value in regex) with new value and rest of line), using next will avoid going further and by mentioning 1 printing rest other lines which are NOT having a matched value in it.
2nd solution: Using sub function of awk.
awk -v newVal="0.5" '/^allNames=/{sub(/,ph:[^,]*/,",ph:"newVal)} 1' Input_file
Would you please try a perl solution:
perl -pe '
s/(?<=\bph:)[\d.]+(?=,|$)/0.5/;
' foo.properties
The -pe option makes perl to read the input line by line, perform
the operation, then print it as sed does.
The regex (?<=\bph:) is a zero-length lookbehind which matches
the string ph: preceded by a word boundary.
The regex [\d.]+ will match a decimal number.
The regex (?=,|$) is a zero-length lookahead which matches
a comma or the end of the string.
As the lookbehind and the lookahead has zero length, they are not
substituted by the s/../../ operator.
[Edit]
As Dave Cross comments, the lookahead (?=,|$) is unnecessary as long as the input file is correctly formatted.
Works with decimal place or not, or no value, anywhere in the line.
sed -E 's/(^|[^-_[:alnum:]])ph:[0-9]*(.[0-9]+)?/ph:0.5/g'
Or possibly:
sed -E 's/(^|[=,[:space:]])ph:[0-9]+(.[0-9]+)?/ph:0.5/g'
The top one uses "not other naming characters" to describe the character immediately before a name, the bottom one uses delimiter characters (you could add more characters to either). The purpose is to avoid clashing with other_ph or autograph.
Here you go
#!/usr/bin/perl
use strict;
use warnings;
print "\nPerl Starting ... \n\n";
while (my $recordLine =<DATA>)
{
chomp($recordLine);
if (index($recordLine, "ph:") != -1)
{
$recordLine =~ s/ph:.*?,/ph:0.5,/g;
print "recordLine: $recordLine ...\n";
}
}
print "\nPerl End ... \n\n";
__DATA__
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.03,delta:1.0,gamma:.5
output:
Perl Starting ...
recordLine: allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5 ...
Perl End ...
Using any sed in any shell on every Unix box (the other sed solutions posted that use sed -E require GNU or BSD seds):
a) if ph: is never the first tag in the allNames list (as shown in your sample input):
$ sed 's/\(,ph:\)[^,]*/\10.5/' foo.properties
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5
b) or if it can be first:
$ sed 's/\([,=]ph:\)[^,]*/\10.5/' foo.properties
foo=bar
# another property
test=true
allNames=alpha:.02,beta:0.25,ph:0.5,delta:1.0,gamma:.5

can sed replace words in pattern substring match in one line?

original line in file sed.txt:
outer_string_PATTERN_string(PATTERN_And_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string
only need to replace PATTERN to pattern which in brackets, not lowercase, it could replace to other word.
expect result:
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
I could use ([^)]*) pattern to find the substring which would be replace some worlds in. But I can't use this pattern to index the substring's position, and it will replace the whole line's PATTERN to pattern.
:/tmp$ sed 's/([^)]*)/---/g' sed.txt
outer_string_PATTERN_string---PATTERN_outer_string---_outer_string
:/tmp$ sed '/([^)]*)/s/PATTERN/pattern/g' sed.txt
outer_string_pattern_string(pattern_And_pattern_pattern_i)pattern_outer_string(i_pattern_inner)_outer_string
I also tried to use the regex group in sed to capture and replace the words, but I can't figure out the command.
Can sed implement that? And how to achieve that? THX.
Can sed implement that?
It can be done using GNU sed and basic regular expressions
(BRE):
sed '
s/)/)\n/g
:1
s/\(([^)]*\)PATTERN\([^)]*)\n\)/\1pattern\2/
t1
s/\n//g
' < file
where
1st s inserts a newline after each )
2nd s replaces the last (* is greedy) PATTERN inside ()s with pattern
t loops back if a substitution was made
3rd s strips all inserted newlines
EDIT
2nd substitute command edited according to OP's suggestion
since there is no need to match \n inside ().
Can sed implement that?
Yes. But you do not want to do it in sed. Use other programming language, like Python, Perl, or awk.
how to achieve that?
Implementing non-greedy regex is not simple in sed. Basically, generally, it consists of:
taking chunk of the input
process the chunk
put it in hold space
shuffle hold with pattern space - extract what been already processed, what's not
repeat
shuffle with hold space
output
Anyway, the following script:
#!/bin/bash
sed <<<'outer_string_PATTERN_string(PATTERN_i_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string' '
:loop;
/\([^(]*\)\(([^)]*)\)\(.*\)/{
# Lowercase the second part.
s//\1\L\2\E\n\3/;
# Mix with hold space.
G;
s/\(.*\)\n\(.*\)\n\(.*\)/\3\1\n\2/;
# Put processed stuff into hold spcae
h; s/\n.*//; x;
# Process the other stuff again.
s/.*\n//;
bloop;
};
# Is hold space empty?
x; /^$/!{
# Pattern space has trailing stuff - add it.
G; s/\n//;
# We will print it.
h;
# Clear hold space
s/.*//
};x;
'
outputs:
PATTERN_outer_string(i_pattern_inner)outer_string_PATTERN_string(pattern_i_pattern_pattern_i)_outer_string
As an alternative, it is easier to do this in gnu awk with RS that matches (...) substring:
awk -v RS='\\([^)]+)' '{gsub(/PATTERN/, "pattern", RT); ORS=RT} 1' file
outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
Steps:
RS='\\([^)]+)' captures a (...) string as record separator
gsub function then replaces PATTERN with pattern in matched text i.e. RT
ORS=RT sets ORS as the new modified RT
1 prints each record to stdout
Another alternative solution using lookahead assertion in a perl regex:
perl -pe 's/PATTERN(?=[^()]*\))/pattern/g' file
Solved by this:
:/tmp$ sed 's/(/\n(/g' sed.txt | sed 's/)/)\n/g' | sed '/([^)]*)/s/PATTERN/pattern/g' | sed ':a;N;$!ba;s/\n//g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
make pattern () in a new line
find the () lines and replace the PATTERN to pattern
merge multiple lines in one line
thanks for How can I replace a newline (\n) using sed?

How to find/extract a pattern from a file?

Here are the contents of my text file named 'temp.txt'
---start of file ---
HEROKU_POSTGRESQL_AQUA_URL (DATABASE_URL) ----backup---> b687
Capturing... done
Storing... done
---end of file ----
I want to write a bash script in which I need to capture the string 'b687' in a variable. this is really a pattern (which is the letter 'b' followed by 'n' number of digits). I can do it the hard way by looping through the file and extracting the desired string (b687 in example above). Is there an easy way to do so? Perhaps by using awk or sed?
Try using grep
v=$(grep -oE '\bb[0-9]{3}\b' file)
This will seach for a word starting with b followed by '3' digits.
regex101 demo
Using sed
v=$(sed -nr 's/.*\b(b[0-9]{3})\b.*/\1/p' file)
varname=$(awk '/HEROKU_POSTGRESQL_AQUA_URL/{print $4}' filename)
what this does is reads the file when it matches the pattern HEROKU_POSTGRESQL_AQUA_URL print the 4th token in this case b687
your other option is to use sed
varname=$(sed -n 's/.* \(b[0-9][0-9]*\)/\1/p' filename)
In this case we are looking for the pattern you mentioned b####... and only print that pattern the -n tells sed not to print line that do not have that pattern. the rest of the sed command is a substitution .* is any string at the beginning. followed by a (...) which forms a group in which we put the regex that will match your b##### the second part says out of all that match only print the group 1 and the p at the end tells sed to print the result (since by default we told sed not to print with the -n)

Search and replace patterns on multiple line

I have a pattern like
Fixed pattern
text which can change(world)
I want to replace this with
Fixed pattern
text which can change(hello world)
What I am trying to use
cat myfile | sed -e "s#\(Fixed Pattern$A_Z_a_z*\(\)#\1 hello#g > newfile
UPDATE:
The above word world is also a variable and will change
Basically add hello after the first parenthesis encountered after the expression.
Thanks in advance.
Assuming your goal is to add 'hello ' inside of every opening parentheses on the line after 'Fixed pattern', here is a solution that should work:
sed -e '/^Fixed pattern$/!b' -e 'n' -e 's/(/(hello /' myfile
Here is an explanation of each portion:
/^Fixed pattern$/!b # skip all of the following commands if 'Fixed pattern'
# doesn't match
n # if 'Fixed pattern' did match, read the next line
s/(/(hello / # replace '(' with '(hello '
To do this with sed, use n:
sed '/Fixed pattern/{n; s/world/hello world/}' myfile
You may need to be more careful, but this should work for most situations. Whenever sed sees the Fixed pattern (you may want to use line anchors ^ and $), it will read the next line and then apply the substitution to it.