Why those two sed commands get different result? - regex

A csv file example.csv, it has
hello,world,wow
this,is,amazing
I want to get the first column elements, at the beginning I wrote a sed command like:
sed -n 's/\([^,]*\),*/\1/p' example.csv
output:
helloworld,now
thisis,amazing
Then I modified my command to the following and get what I want:
sed -n 's/\([^,]*\).*/\1/p' example.csv
output:
hello
this
command1 I used comma(,) and command2 I replaced comma with dot(.), and it works as expected, can anyone explain how sed really works to get the 1st output? What's the story behind? Is it because of the dot(.) or because of the substitution group & back-reference?

In both regexes, ([^,]*) will consume the same part of the string - all the symbols preceding the first encountered comma. Apparently the difference is how are the remaining parts of those regexes treated.
In the first one, it's ,* - zero or more comma symbols. Obviously all it might consume is
the comma itself - the rest of the line isn't covered by a pattern.
In the second one, it's .* - zero or more of any symbols. It's not a big surprise that'll cover the remaining string completely - as it has nothing to stop at; any is, well, any. )
In both cases the pattern-covered part of the string is replaced by the contents of the capturing group (and that's, as I said already, 'all the symbols before the first comma') - and what's covered by the remaining part of the regex is just removed. So in first case the very first comma is erased, in the second - the comma and the rest of the string.

The reason behind that is that the pattern matches only to the first part of the word, i.e. only the Hello, part is replaced. The part ,* takes arbitrary amount of commas, and then nothing is set to be next, i.e. nothing else matches the pattern. For example:
hello,,,,,,,,,,,,,,,,,,world
would be replaced to
helloworld
A good example would be
sed -n 's/\([^,]*\),*$/\1/p' example.csv
This will work if and only if all the commas are at the end of the line and will trim them, e.g.
hello,,,,,,
Hope this makes the problem a bit clearer.

On regex the . (dot) is a place holder for one, single character.

Can I suggest not using sed?
cut -d, -f1 example.csv
Personally, I'm a huge sed fan, but cut is much more appropriate in this instance.

If you like first word, why not use awk
awk -F, '{print $1}' file
hello
this
Using sed with back reference
sed -nr 's/([^,]*),.*/\1/p' file
hello
this
It seems that to make it work you need the .* so it get the whole line.
The r option make you not need to escape the parentheses \(

Related

regex in sed removing only the first occurrence from every line

I have the following file I would like to clean up
cat file.txt
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
My desired output is:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
I would like to remove everything between ":" and the first occurence of "or"
I tried sed 's/MNS:d*?or /MNS:/g' though it removes the second "or" as well.
I tried every option in https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/
to no avail. should I create alias sed='perl -pe'? It seems that sed does not properly support regex
perl should be more suitable here because we need Lazy match logic here.
perl -pe 's|(:.*?or +)(.*)|:\2|' Input_file
by using .*?or we are checking for the first nearest match for or string in the line.
This might work for you (GNU sed):
sed '/:.*\<or\>/{s/\<or\>/\n/;s/:.*\n//}' file
If a line contains : followed by the word or, then substitute the first occurrence of the word or with a unique delimiter (e.g.\n) and then remove everything between : and the unique delimiter.
Wrt I would like to remove everything between ":" and the first occurence of "or" - no you wouldn't. The first occurrence of or in the 2nd line of sample input is as the start of orweqqwe. That text immediately after : looks like it could be any set of characters so couldn't it contain a standalone or, e.g. MNS:2 or eqqwe or M+ GYPA*02 or GYPA*N
Given that and the fact it's apparently a fixed number of characters to be removed on every line, it seems like this is what you should really be using:
$ sed 's/:.\{14\}/:/' file
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
If it is sure the or always occurs twice a line as provided example, please try:
sed 's/\(MNS:\).\+ or \(.\+ or .*\)/\1\2/' file.txt
Result:
MNS:N+ GYPA*01 or GYPA*M
MNS:M+ GYPA*02 or GYPA*N
MNS:Mc GYPA*08 or GYP*Mc
MNS:Vw GYPA*09 or GYPA*Vw
MNS:Mg GYPA*11 or GYPA*Mg
MNS:Vr GYPA*12 or GYPA*Vr
Otherwise using perl is a better solution which supports the shortest match as RavinderSingh13 answers.
ex supports lazy matching with \{-}:
ex -s '+%s/:\zs.\{-}or //g|wq' input_file
The pattern :\zs.\{-}or matches any character after the first : up to the first or.

Using grep to extract very specific strings from binary file

I have a large binary file. I want to extract certain strings from it and copy them to a new text file.
For example, in:
D-wM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM-FM MM-[o#^B^#^#^#^#^#E7cacscKLrrok9bwC3Z64NTnZM-^G
I want to take the number '7' (after the #^#^#E) and every character after it stopping at the Z ('ignoring the M-^G).
I want to copy this 7cacscKLrrok9bwC3Z64NTnZ to a new file.
There will be multiple such strings in one file. The end will always be denoted by the M- (which I don't want copied). The start will always be denoted by a 7 (which I do want copied).
Unfortunately, my knowledge of grep, sed, etc, does not extend to this level. Can someone please suggest a viable way to achieve this?
cat -v filename | grep [7][A-Z,a-z] will show all strings with a '7' followed by a letter but that's not much.
Thank you.
I've noticed that my requirements are rather more complicated.
(I've performed the correct - I hope - formatting this time). Thanks to 'tshiono' for his (?) answer to the earlier submission.
I want to check the ending of a string and, if it ends in M-, grep another string that follows it (with junk in between). If the string does not end in M-, then I don't want it copied (let alone any other strings).
So what I would like is:
grep -a -Po "7[[:alnum:]]+(?=M-)" file_name and if the ending is M- then grep -a -Po "5x[[:alnum:]]+(?=\^)" file_name to copy the string that starts with 5x and ends with a ^.
In this example:
D-wM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM-FM MM-[o#^B^#^#^#^#^#E7cacscKLrrok9bwC3Z64NTnZM-^GwM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
The outcome would be:
7cacscKLrrok9bwC3Z64NTnZ
5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
However, if the ending is not M- (more precisely, if the ending is ^S), then do not try the second grep and do not record anything at all.
In this example:
D-wM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM-FM MM-[o#^B^#^#^#^#^#E7cacscKLrrok9bwC3Z64NTnZ^SGwM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
The outcome would be null (nothing copied) as the 7cacs... string ends in ^S.
Is grep the correct tool? Grep a file and if the condition in the grep command is 'yes' then issue a different grep command but if the condition is 'no' then do nothing.
Thanks again.
I have noticed one addition modification.
Can one add an OR command to the second part? Grep if the second string starts with 5x OR 6x?
In the example below, grep -aPo "7[[:alnum:]]+M-.*?5x[[:alnum:]]+\^" filename | grep -aPo "7[[:alnum:]]+(?=M-)|5x[[:alnum:]]+(?=\^)" will extract the strings starting with 7 and the strings starting with 5x.
How can one change the 5x to 5x or 6x?
D-wM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM-FM MM-[o#^B^#^#^#^#^#E7cacscKLrrok9bwC3Z64NTnZM-^GwM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
D-wM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM-FM MM-[o#^B^#^#^#^#^#E7AAAAAscKLrrok9bwC3Z64NTnZM-^GwM-^?^#^#^#^#^#^#^#^Y^#^#^#^#^#^#^#M-lM-FM-MM-[o#^B^#M-lM6x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk^89038432nowefe
In this example, the desired outcome would be:
7cacscKLrrok9bwC3Z64NTnZ
5x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
7AAAAAscKLrrok9bwC3Z64NTnZ
6x8w09qewqlkcklwnlkewflewfiewjfoewnflwenfwlkfwelk
UPDATE MARCH 09:
I need to create a series of complex grep (or perl) commands to extract strings from a series of binary files.
I need two strings from the binary file.
The first string will always start with a 1.
The first string will end with a letter or number. The next letter will always be a lower case k. I do not want this k character.
The difficulty is that the ending k will not always be the first k in the string. It might be the first k but it might not.
After the k, there is a second string. The second string will always start with an A or a B.
The ending of the second string will be in one of two forms:
a) it will end with a space then display the first three characters from the first string in lower case followed by a )
b) it will end with a ^K then display the first three characters from the first string in lower case.
For example:
1pppsx9YPar8Rvs75tJYWZq3eo8PgwbckB4m4zT7Yg042KIDYUE82e893hY ppp)
Should be:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc and B4m4zT7Yg042KIDYUE82e893hY - delete the k and the space then ppp.
For example:
1zzzsx9YPkr8Rvs75tJYWZq3eo8PgwbckA2m4zT7Yg042KIDYUE82e893hY^Kzzz
Should be:
1zzzsx9YPkar8Rvs75tJYWZq3eo8Pgwbc and A4m4zT7Yg042KIDYUE82e893hY - delete the second k and the ^Kzzz.
In the second example, we see that the first k is part of the first string. It is the k before the A that breaks up the first and second strings.
I hope there is a super grep expert who can help! Many thanks!
If your grep supports -P option, would you please try:
grep -a -Po "7[[:alnum:]]+(?=M-)" file
The -a option forces grep to read the input as a text file.
The -P option enables the perl-compatible regex.
The -o option tells grep to print only the matched substring(s).
The pattern (?=M-) is a zero-width lookahead assertion (introduced in
Perl) without including it in the result.
Alternatively you can also say with sed:
sed 's/M-/\n/g' file | sed -n 's/.*\(7[[:alnum:]]\+\).*/\1/p'
The first sed command splits the input file into miltiple lines by
replacing the substring M- with a newline.
It has two benefits: it breaks the lines to allow multiple matches with
sed and excludes the unnecessary portion M- from the input.
The next sed command extracts the desired pattern from the input.
It assumes your sed accepts \n in the replacement, which is
a GNU extension (not POSIX compliant). Otherwise please try (in case you are working on bash):
sed 's/M-/\'$'\n''/g' file | sed -n 's/.*\(7[[:alnum:]]\+\).*/\1/p'
[UPDATE]
(The requirement has been updated by the OP and the followings are solutions according to it.)
Let me assume the string which starts with 7 and ends with M- is always followed
by another (no more and no less than one) string which starts with 5x and ends
with ^ (ascii caret character) with junks in between.
Then would you please try the following:
grep -aPo "7[[:alnum:]]+M-.*?5x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|5x[[:alnum:]]+(?=\^)"
It executes the task in two steps (two cascaded greps).
The 1st grep narrows down the input data into the candidate substring
which will include the desired two sequences and junks in between.
The regex .*? in between matches any (ascii or binary) characters
except for a newline character.
The trailing ? enables the shortest match
which avoids the overrun due to the greedy nature of regex. The regex is intended to match junks in between.
The 2nd grep includes two regex's merged with a pipe | meaning logical OR.
Then it extracts two desired sequences.
A potential problem of grep solution is that grep is a line oriented command
and cannot include the newline character in the matched string.
If a newline character is included in the junks in between (I'm not sure about the possibility), the above solution will fail.
As a workaround, perl will provide flexible manipulations with binary data.
perl -0777 -ne '
while (/(7[[:alnum:]]+)M-.*?(5x[[:alnum:]]+)\^/sg) {
printf("%s\n%s\n", $1, $2);
}
' file
The regex is mostly same as that of grep because the -P option of grep means
perl-compatible.
It can capture multiple patterns at once in variables $1 and $2 hence just one regex is enough.
The -0777 option to the perl command tells perl to slurp all data
at once.
The s option at the end the regex makes a dot match a newline character.
The g option enables the global (multiple) match.
[UPDATE2]
In order to make the regex match either 5x or 6x, replace 5x with (5|6)x.
Namely:
grep -aPo "7[[:alnum:]]+M-.*?(5|6)x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|(5|6)x[[:alnum:]]+(?=\^)"
As mentioned before, the pipe | means OR. The OR operator has the lowest priority in the evaluation, hence you need to enclose them with parens in this case.
If there is a possibility any other number than 5 or 6 may appear, it will be safer to put [[:digit:]] instead, which matches any one digit betweeen 0 and 9:
grep -aPo "7[[:alnum:]]+M-.*?[[:digit:]]x[[:alnum:]]+\^" file | grep -aPo "7[[:alnum:]]+(?=M-)|[[:digit:]]x[[:alnum:]]+(?=\^)"
[UPDATE3]
(Answering the OP's requirement on March 9th)
Let me start with a perl code which regex will be relatively easier
to explain.
perl -0777 -ne 'while (/(1(.{3}).+)k([AB].*)[\013 ]\2/g){print "$1 $3\n"}' file
Output:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc B4m4zT7Yg042KIDYUE82e893hY
1zzzsx9YPkr8Rvs75tJYWZq3eo8Pgwbc A2m4zT7Yg042KIDYUE82e893hY
[Explanation of regex]
(1(.{3}).+)k([AB].*)[\013 ]\2
( start of the 1st capture group referred by $1 later
1 literal "1"
( start of the 2nd capture group referred by \2 later
.{3} a sequence of the identical three characters such as ppp or zzz
) end of the 2nd capture group
.+ followed by any characters with "greedy" match which may include the 1st "k"
) end of the 1st capture group
k literal "k"
( start of the 3rd capture group referred by $3 later
[AB].* the character "A" or "B" followed by any characters
) end of the 3rd capture group
[\013 ] followed by ^K or a whitespace
\2 followed by the capture group 2 previously assigned
When implementing it with grep, we will encounter a limitation of grep.
Although we want to extract multiple patterns from the input file,
the -e option (which can specify multiple search patterns) does not
work with -P option. Then we need to split the regex into two patterns
such as:
grep -Po "(1(.{3}).+)(?=k([AB].*)[\013 ]\2)" file
grep -Po "(1(.{3}).+)k\K([AB].*)(?=[\013 ]\2)" file
And the result will be:
1pppsx9YPar8Rvs75tJYWZq3eo8Pgwbc
1zzzsx9YPkr8Rvs75tJYWZq3eo8Pgwbc
B4m4zT7Yg042KIDYUE82e893hY
A2m4zT7Yg042KIDYUE82e893hY
Please be noted the order of output is not same as the order of appearance in the original file.
Another option will be to introduce ripgrep or rg which is a fast
and versatile version of grep. You may need to install ripgrep with
sudo apt install ripgrep or using other package handling tool.
An advantage of ripgrep is it supports -r (replace) option in which
you can make use of the backreferences:
rg -N -Po "(1(.{3}).+)k([AB].*)[\013 ]\2" -r '$1 $3' file
The -r '$1 $3' option prints the 1st and the 3rd capture groups and the result will be the same as perl.
In the general case, you can use the strings utility to pluck out ASCII from binary files; then of course you can try to grep that output for patterns that you find interesting.
Many traditional Unix utilities like grep have internal special markers which might get messed up by binary input. For example, the character \xFF was used for internal purposes by some versions of GNU grep so you can't grep for that character even if you can figure out a way to represent it in the shell (Bash supports $'\xff' for example).
A traditional approach would be to run hexdump or a similar utility, and then grep that for patterns. However, more modern scripting languages like Perl and Python make it easy to manipulate arbitrary binary data.
perl -ne 'print if m/\xff\xff/' </dev/urandom
This might work for you (GNU sed):
sed -En '/\n/!{s/M-\^G/\n/;s/7[^\n]*\n/\n&/};/^7[^\n]*/P;D' file
Split each line into zero or more lines that begin with 7 and end just before M-^G and only print such lines.

bash regexp to extract part of URL

From the following URL:
https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]
I need to extract the following part:
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
I'm pretty bad at regex. I came up with the following but it doesn't work:
sed -n "s/^.*browser\(test-lab.*/.*/\).*$/\1/p"
Can anyone help with what I'm doing wrong?
Could you please try with awk solution also and let me know if this helps you.
echo "https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/" | awk '{sub(/.*browser\//,"");sub(/\/$/,"");print}'
Explanation: Simply, substituting everything till browser/ then substituting last / with NULL.
EDIT1: Adding a sed solution here too.
sed 's/\(.[^//]*\)\/\/\(.[^/]*\)\(.[^/]*\)\(.[^/]*\)\/\(.*\)/\5/' Input_file
Output will be as follows.
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
Explanation of sed command: Dividing the whole line into parts and using sed's ability to keep the matched regex into memory so here are the dividers I used.
(.[^//]):* Which will have the value till https: in it and if anyone wants to print it you could use \1 for it because this is very first buffer for sed.
//: Now as per URL // comes to mentioning them now.
(.[^/]):* Now comes the 2nd part for sed's buffer which will have value console.developers.google.com in it, because REGEX looks for very first occurrence of / and stops matching there itself.
(.[^/]) && (.[^/]) && /(.):* These next 3 occurrences works on same method of storing buffers like they will look for first occurrence of / and keep the value from last matched letter's next occurrence to till 1st / comes.
/\5/: Now I am substituting everything with \5 means 5th buffer which contains values as per OP's instructions.
Use a different sed delimiter and don't forget to escape the braces.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | sed 's~.*/browser/\([^/]*/[^/]*/\).*~\1~'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/
OR
Use grep with oP parameters.
avinash:~/Desktop$ echo 'https://console.developers.google.com/storage/browser/test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/]' | grep -oP '/browser/\K[^/]*/[^/]*/'
test-lab-acteghe53j0sf-jrf3f8u8p12n4/2017-09-27_15:23:07.566833_MPoy/

my sed is close... but not quite there, can you help please?

I want to print only the lines that meet the criteria : "worde:" and "wordo;"
I got this far:
sed -n '/\([a-z]*\)\1e:\1o;/p;'
But it doesn't quite work.
Can someone please perfect it and tell me exactly how its a fixed version/what was wrong with mine?
(Please note there are no capital letters ever, hence why I didn't bother including that within my initial character range)
Thanks heaps,
This will handle lines where "worde:wordo;" (nothing between the words) appears:
sed -n '/\([a-z]*\)e:\1o;/p;'
If you need to allow for characters BETWEEN the words, you'll need something like this:
sed -n '/\([a-z]*\)e:.*\1o;/p;'
My interpretation of your question is that you want to match lines which contain both worde: and wordo;
sed -n '/worde:/{/wordo;/p}' infile
The -n parameter prevents sed from printing the pattern space (infile), the first regex matches, then control flows into the block, if the regex isn't matched, then the line is ignored. Inside the block, the if the second regex is matched, the line is printed.
One way using alternation:
sed -n '/word\(e:\|o;\)/ p' infile
Is it a requirement to use capture groups? I went without them.
$ sed -n '/[\w]*[oe][:;]/p'
[\w]* - Match any word character. (if you really want only [a-z], swap
that back in)
[oe] - Those word characters must end in an e or
o
[:;] - And then have a : or ;
This might work for you:
sed '/^\(.*\)[eE]:\s*\1[oO];/!d' file

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input