Find and Replace Specific characters in a variable with sed - regex

Problem: I have a variable with characters I'd like to prepend another character to within the same string stored in a variable
Ex. "[blahblahblah]" ---> "\[blahblahblah\]"
Current Solution: Currently I accomplish what I want with two steps, each step attacking one bracket
Ex.
temp=[blahblahblah]
firstEscaped=$(echo $temp | sed s#'\['#'\\['#g)
fullyEscaped=$(echo $firstEscaped | sed s#'\]'#'\\]'#g)
This gives me the result I want but I feel like I can accomplish this in one line using capturing groups. I've just had no luck and I'm getting burnt out. Most examples I come across involve wanting to extract the text between brackets instead of what I'm trying to do. This is my latest attempt to no avail. Any ideas?

There may be more efficient ways, (only 1 s/s/r/ with a fancier reg-ex), but this works, given your sample input
fully=$(echo "$temp" | sed 's/\([[]\)/\\\1/;s/\([]]\)/\\\1/') ; echo "$fully"
output
\[blahblahblah\]
Note that it is quite OK to chain together multiple sed operations, separated by ; OR if in a sed script file, by blank lines.
Read about sed capture-groups using \(...\) pairs, and referencing them by number, i.e. \1.
IHTH

$ temp=[blahblahblah]
$ fully=$(echo "$temp" |sed 's/\[\|\]/\\&/g'); echo "$fully"
\[blahblahblah\]
Brief explanation,
\[\|\]: target to substitute '[' or ']', and for '[', ']', and '|' need to be escaped.
&: the character & to refer to the pattern which matched, and mind that it also needs to be escaped.
As #Gordon Davisson's suggestion, you may also use bracket expression to avoid the extended format regex,
sed 's/[][]/\\&/g'

Related

how to replace repetitive string of variable length with another string in bash?

I have files where missing data is inserted as '+'. So lines look like this:
substring1+++++substring2++++++++++++++substring3+substring4
I wanna replace all repetitions of '+' >5 with 'MISSING'. This makes it more readable for my team and makes it easier to see the difference between missing data and data entered as '+' (up to 5 is allowed).
So far I have:
while read l; do
echo "${l//['([+])\1{5}']/'MISSING'}"
done < /path/file.txt
but this replaces every '+' with 'MISSING'. I need it to say 'MISSING' just once.
Thanks in advance.
You can't use regex in Bash variable expansion.
In your loop, you may use
sed 's/+\{1,\}/MISSING/g' <<< "$l"
Or, you may use sed directly on the file
sed 's/+\{1,\}/MISSING/g' /path/file.txt
The +\{1,\} POSIX BRE pattern matches a literal + (+) 1 or more times (\{1,\}).
See the sed demo online
sed 's/+\{1,\}/MISSING/g' <<< "substring1+++++substring2++++++++++++++substring3+substring4"
# => substring1MISSINGsubstring2MISSINGsubstring3MISSINGsubstring4
If you need to make changes to the same file use any technique described at sed edit file in place.

How to replace arbritary combinations of (special) characters and numbers using sed and regular expressions

I have a csv file with nearly arbritary filled colums like this:
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03,123456789,"bla::38594f-47849-h945f",""
and now I want to replace the comma between the two numbers with a point:
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03.123456789,"bla::38594f-47849-h945f",""
I tried a lot but nothing helped. :-(
sed s/[0-9],[0-9]/./g data.csv
works but it delets the two numbers before and after the comma. So I tried things like
sed s/\(\.[0-9]\),\([0-9]\.\)/\1.\2/g data.csv
but that changed nothing.
Try with s/\([0-9]\),\([0-9]\)/\1.\2/g:
$ echo '"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03,123456789,"bla::38594f-47849-h945f",""' | sed 's/\([0-9]\),\([0-9]\)/\1.\2/g'
"bla","","blabla","bla::bla::blabla",19.05.16 12:00:03.123456789,"bla::38594f-47849-h945f",""
Regex Demo Here
You don't really need the additional dot \. in the capturing groups.

How can I use sed to regex string and number in bash script

I want to separate string and number in a file to get a specific number in bash script, such as:
Branches executed:75.38% of 1190
I want to only get number
75.38
. I have try like the code below
$new_value=value | sed -r 's/.*_([0-9]*)\..*/\1/g'
but it was incorrect and it was failed.
How should it works? Thank you before for your help.
You can use the following regex to extract the first number in a line:
^[^0-9]*\([0-9.]*\).*$
Usage:
% echo 'Branches executed:75.38% of 1190' | sed 's/^[^0-9]*\([0-9.]*\).*$/\1/'
75.38
Give this a try:
value=$(sed "s/^Branches executed:\([0-9][.0-9]*[0-9]*\)%.*$/\1/" afile)
It is assumed that the line appears only once in afile.
The value is stored in the value variable.
There are several things here that we could improve. One is that you need to escape the parentheses in sed: \(...\)
Another one is that it would be good to have a full specification of the input strings as well as a good script that can help us to play with this.
Anyway, this is my first attempt:
Update: I added a little more bash around this regex so it'll be more easy to play with it:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]*\.[0-9]*\).*/\1/g'`
echo $new_value
Update 2: as john pointed out, it will match only numbers that contain a decimal dot. We can fix it with an optional group: \(\.[0-9]\+\)?.
An explanation for the optional group:
\(...\) is a group.
\(...\)? Is a group that appears zero or one times (mind the question mark).
\.[0-9]\+ is the pattern for a dot and one or more digits.
Putting all together:
value='Branches executed:75.38% of 1190'
new_value=`echo $value | sed -e 's/[^0-9]*\([0-9]\+\(\.[0-9]\+\)\?\).*/\1/g'`
echo $new_value

gsub regex pattern

I am using gsub to substitute tabs with commas
gsub(/\t/,\",\")
a\tb will be a,b
In some instances I have two tabs follwed by each other
For example
a/t/tb
In that case gsub converts it to a,,b
I want that in cases like that, the string should be converted to a,-,b (a minus sign in between).
I tried writing two sepearate gsubs
gsub(/\t/,\",\") // for tab
gsub(/,,/,\"/,-,/\") // for consecutive commas
The second doesn't seem to work.
Whats wrong with it. Is there a way, I can combine both in one gsub.
I take it you're asking about awk?
I don't think it can be done with a single gsub, in fact I needed three:
$ abc=$(echo 'a.b..c...d....e.....f' | tr . '\t')
$ echo "$abc" | awk '{gsub(/\t/, ","); gsub(/,,/, ",-,"); gsub(/,,/, ",-,"); print}'
a,b,-,c,-,-,d,-,-,-,e,-,-,-,-,f
The problem is that a single gsub on /,,/ will consume both commas, so it will leave a gap between the next pair of commas, if there are three or more consecutive ones. In a more powerful regexp engine, such as Perl, it can be done in a single pass using a lookahead:
$ echo "$abc" | perl -pe 's/\t/,/g; s/,(?=,)/,-/g;'
a,b,-,c,-,-,d,-,-,-,e,-,-,-,-,f

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input