sed replace : delete a character between two strings while preserving a regex - regex

I have a csv file which is delimited by #~#.
there is a field which contains 0 and then n(more than 1) number of '.'(dot).
I need to remove the zero and preserve the later dots. I have to also take care that floating numbers are not affected.
So effectively replace #~#0.....#~# to #~#.....#~# (dots can be from 1 to any)

To limit the replacement with fields matching the pattern use this
$ echo "#~#0.12#~#0.....#~#0.1#~#0.#~#" | sed -r 's/#~#0(\.+)#~#/#~#\1#~#/g'
will preserve 0.12 and 0.1 but replace 0..... and 0.
#~#0.12#~#.....#~#0.1#~#.#~#
+ in regex means one or more. Anchoring with the field delimiters will make sure nothing else will be replaced.

Using sed you can do:
s='#~#0.....#~#'
sed -r 's/(^|#~#)0(\.+($|#~#))/\1\2/g' <<< "$s"
#~#.....#~#
sed -r 's/(^|#~#)0(\.+($|#~#))/\1\2/g' <<< "#~#0.00#~#"
#~#0.00#~#

]$ echo "#~#0.....#~#" | sed 's/#0/#/g'
#~#.....#~#

Escape the dots ans include all characters that should match:
echo "#~#0.1234#~#0.....#~#" | sed 's/#~#0\.\./#~#../g'
Using var's will not improve much:
delim="#~#"
echo "#~#0.1234#~#0.....#~#" | sed "s/${delim}0\.\./${delim}../g"

Related

How to extract text between first 2 dashes in the string using sed or grep in shell

I have the string like this feature/test-111-test-test.
I need to extract string till the second dash and change forward slash to dash as well.
I have to do it in Makefile using shell syntax and there for me doesn't work some regular expression which can help or this case
Finally I have to get smth like this:
input - feature/test-111-test-test
output - feature-test-111- or at least feature-test-111
feature/test-111-test-test | grep -oP '\A(?:[^-]++-??){2}' | sed -e 's/\//-/g')
But grep -oP doesn't work in my case. This regexp doesn't work as well - (.*?-.*?)-.*.
Another sed solution using a capture group and regex/pattern iteration (same thing Socowi used):
$ s='feature/test-111-test-test'
$ sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}"
feature-test-111-
Where:
-E - enable extended regex support
s/\//-/ - replace / with -
s/^....*$/ - match start and end of input line
(([^-]-){3}) - capture group #1 that consists of 3 sets of anything not - followed by -
\1 - print just the capture group #1 (this will discard everything else on the line that's not part of the capture group)
To store the result in a variable:
$ url=$(sed -E 's/\//-/;s/^(([^-]*-){3}).*$/\1/' <<< "${s}")
$ echo $url
feature-test-111-
You can use awk keeping in mind that in Makefile the $ char in awk command must be doubled:
url=$(shell echo 'feature/test-111-test-test' | awk -F'-' '{gsub(/\//, "-", $$1);print $$1"-"$$2"-"}')
echo "$url"
# => feature-test-111-
See the online demo. Here, -F'-' sets the field delimiter as -, gsub(/\//, "-", $1) replaces / with - in Field 1 and print $1"-"$2"-" prints the value of --separated Field 1 and 2.
Or, with a regex as a field delimiter:
url=$(shell echo 'feature/test-111-test-test' | awk -F'[-/]' '{print $$1"-"$$2"-"$$3"-"}')
echo "$url"
# => feature-test-111-
The -F'[-/]' option sets the field separator to - and /.
The '{print $1"-"$2"-"$3"-"}' part prints the first, second and third value with a separating hyphen.
See the online demo.
To get the nth occurrence of a character C you don't need fancy perl regexes. Instead, build a regex of the form "(anything that isn't C, then C) for n times":
grep -Eo '([^-]*-){2}' | tr / -
With sed and cut
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/'
Output
feature-test-111
echo feature/test-111-test-test| cut -d'-' -f-2 |sed 's/\//-/;s/$/-/'
Output
feature-test-111-
You can use the simple BRE regex form of not something then that something which is [^-]*- to get all characters other than - up to a -.
This works:
echo 'feature/test-111-test-test' | sed -nE 's/^([^/]*)\/([^-]*-[^-]*-).*/\1-\2/p'
feature-test-111-
Another idea using parameter expansions/substitutions:
s='feature/test-111-test-test'
tail="${s//\//-}" # replace '/' with '-'
# split first field from rest of fields ('-' delimited); do this 3x times
head="${tail%%-*}" # pull first field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}" # pull first field; append to previous field
tail="${tail#*-}" # drop first field
head="${head}-${tail%%-*}-" # pull first field; append to previous fields; add trailing '-'
$ echo "${head}"
feature-test-111-
A short sed solution, without extended regular expressions:
sed 's|\(.*\)/\([^-]*-[^-]*\).*|\1-\2|'

How to match a regex 1 to 3 times in a sed command?

Problem
I want to get any text that consists of 1 to three digits followed by a % but without the % using sed.
What I tried
So i guess the following regex should match the right pattern : [0-9]{1,3}%.
Then i can use this sed command to catch the three digits and only print them :
sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
Example
However when i run it, it shows :
$ echo "100%" | sed -nE 's/.*([0-9]{1,3})%.*/\1/p'
0
instead of
100
Obviously, there's something wrong with my sed command and i think the problem comes from here :
[0-9]{1,3}
which apparently doesn't do what i want it to do.
edit:
Solution
The .* at the start of sed -nE 's/.*([0-9]{1,3})%.*/\1/p' "ate" the two first digits.
The right way to write it, according to Wicktor's answer, is :
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
The .* grabs all digits leaving just the last of the three digits in 100%.
Use
sed -nE 's/(.*[^0-9])?([0-9]{1,3})%.*/\2/p'
Details
(.*[^0-9])? - (Group 1) an optional sequence of any 0 or more chars up to the non-digit char including it
([0-9]{1,3}) - (Group 2) one to three digits
% - a % char
.* - the rest of the string.
The match is replaced with Group 2 contents, and that is the only value printed since n suppresses the default line output.
It will be easier to use a cut + grep option:
echo "abc 100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
echo "100%" | cut -d% -f1 | grep -oE '[0-9]{1,3}'
100
Or else you may use this awk:
echo "100%" | awk 'match($0, /[0-9]{1,3}%/){print substr($0, RSTART, RLENGTH-1)}'
100
Or else if you have gnu grep then use -P (PCRE) option:
echo "abc 100%" | ggrep -oP '[0-9]{1,3}(?=%)'
100
This might work for you (GNU sed):
sed -En 's/.*\<([0-9]{1,3})%.*/\1/p' file
This is a filtering exercise, so use the -n option.
Use a back reference to capture 1 to 3 digits, followed by % and print the result if successful.
N.B. The \< ensures the digits start on a word boundary, \b could also be used. The -E option is employed to reduce the number of back slashes which would normally be necessary to quote (,),{ and } metacharacters.

sed: struggling with substitution and regex for ^*=

I am running a linux bash script. From stout lines like: /gpx/trk/name=MyTrack1, I want to keep only the end of line after =.
I am struggling to understand why the following sed command is not working as I expect:
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
(I also tried)
echo "/gpx/trk/name=MyTrack1" | sed -e "s/^*\=//"
The return is always /gpx/trk/name=MyTrack1 and not MyTrack1
An even simpler way if this is the only structure you are concerned about:
echo "/gpx/trk/name=MyTrack1" | cut -d = -f 2
Simply try:
echo "/gpx/trk/name=MyTrack1" | sed 's/.*=//'
Solution 2nd: With another sed.
echo "/gpx/trk/name=MyTrack1" | sed 's/\(.*=\)\(.*\)/\2/'
Explanation: As per OP's request adding explanation for this code here:
s: Means telling sed to do substitution operation.
\(.*=\): Creating first place in memory to keep this regex's value which tells sed to keep everything in 1st place of memory from starting to till = so text /gpx/trk/name= will be in 1 place.
\(.*\): Creating 2nd place in memory for sed telling it to keep everything now(after the match of 1st one, so this will start after =) and have value in it as MyTrack1
/\2/: Now telling sed to substitute complete line with only 2nd memory place holder which is MyTrack1
Solution 3rd: Or with awk considering that your Input_file is same as shown samples.
echo "/gpx/trk/name=MyTrack1" | awk -F'=' '{print $2}'
Solution 4th: With awk's match.
echo "/gpx/trk/name=MyTrack1" | awk 'match($0,/=.*$/){print substr($0,RSTART+1,RLENGTH-1)}'
$ echo "/gpx/trk/name=MyTrack1" | sed -e "s/^.*=//"
MyTrack1
The regular expression ^.*= matches anything up to and including the last = in the string.
Your regular expression ^*= would match the literal string *= at the start of a string, e.g.
$ echo "*=/gpx/trk/name=MyTrack1" | sed -e "s/^*=//"
/gpx/trk/name=MyTrack1
The * character in a regular expression usually modifies the immediately previous expression so that zero or more of it may be matched. When * occurs at the start of an expression on the other hand, it matches the character *.
Not to take you off the sed track, but this is easy with Bash alone:
$ echo "$s"
/gpx/trk/name=MyTrack1
$ echo "${s##*=}"
MyTrack1
The ##*= pattern removes the maximal pattern from the beginning of the string to the last =:
$ s="1=2=3=the rest"
$ echo "${s##*=}"
the rest
The equivalent in sed would be:
$ echo "$s" | sed -E 's/^.*=(.*)/\1/'
the rest
Where #*= would remove the minimal pattern:
$ echo "${s#*=}"
2=3=the rest
And in sed:
$ echo "$s" | sed -E 's/^[^=]*=(.*)/\1/'
2=3=the rest
Note the difference in * in Bash string functions vs a sed regex:
The * in Bash (in this context) is glob like - itself means 'any character'
The * in a regex refers to the previous pattern and for 'any character' you need .*
Bash has extensive string manipulation functions. You can read about Bash string patterns in BashFAQ.

Change CSV Delimiter with sed

I've got a CSV file that looks like:
1,3,"3,5",4,"5,5"
Now I want to change all the "," not within quotes to ";" with sed, so it looks like this:
1;3;"3,5";5;"5,5"
But I can't find a pattern that works.
If you are expecting only numbers then the following expression will work
sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
e.g.
$ echo '1,3,"3,5",4,"5,5"' | sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5"
You can't just replace the [0-9][0-9]* with .* to retain any , in that is delimted by quotes, .* is too greedy and matches too much. So you have to use [a-z0-9]*
$ echo '1,3,"3,5",4,"5,5",",6","4,",7,"a,b",c' | sed -e 's/,/;/g' -e 's/\("[a-z0-9]*\);\([a-z0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5";",6";"4,";7;"a,b";c
It also has the advantage over the first solution of being simple to understand. We just replace every , by ; and then correct every ; in quotes back to a ,
You could try something like this:
echo '1,3,"3,5",4,"5,5"' | sed -r 's|("[^"]*),([^"]*")|\1\x1\2|g;s|,|;|g;s|\x1|,|g'
which replaces all commas within quotes with \x1 char, then replaces all commas left with semicolons, and then replaces \x1 chars back to commas. This might work, given the file is correctly formed, there're initially no \x1 chars in it and there're no situations where there is a double quote inside double quotes, like "a\"b".
Using gawk
gawk '{$1=$1}1' FPAT="([^,]+)|(\"[^\"]+\")" OFS=';' filename
Test:
[jaypal:~/Temp] cat filename
1,3,"3,5",4,"5,5"
[jaypal:~/Temp] gawk '{$1=$1}1' FPAT='([^,]+)|(\"[^\"]+\")' OFS=';' filename
1;3;"3,5";4;"5,5"
This might work for you:
echo '1,3,"3,5",4,"5,5"' |
sed 's/\("[^",]*\),\([^"]*"\)/\1\n\2/g;y/,/;/;s/\n/,/g'
1;3;"3,5";4;"5,5"
Here's alternative solution which is longer but more flexible:
echo '1,3,"3,5",4,"5,5"' |
sed 's/^/\n/;:a;s/\n\([^,"]\|"[^"]*"\)/\1\n/;ta;s/\n,/;\n/;ta;s/\n//'
1;3;"3,5";4;"5,5"

sed for removing trailing zeroes - regex - nongreedy

I have a file which has few lines as below
ABCD|100.19000|90.100|1000.000010|SOMETHING
BCD|10.100|90.1|100.019900|SOMETHING
Now, after applying sed on this, I would like the output to be as below (To use it for further processing)
ABCD|100.19|90.1|1000.00001|SOMETHING
BCD|10.1|90.1|100.0199|SOMETHING
i.e. I would like all the trailing zeros (the ones before the |) to be removed from the result.
I tried the following: (regtest is the file containing the original data as shown above)
cat regtest | sed 's/|\([0-9]*\)\.\([0-9]*\)0*|/|\1\.\2|/g'
Did not work as I think it's greedy.
cat regtest | sed 's/|\([0-9]*\)\.\([0-9]*\)0|/|\1\.\2|/g'
Will work. But, I will have to apply this sed command repeatedly on the same file to remove the zeros one after another. Does not make sense.
How can I go about it? Thanks!
$ echo "ABCD100|100.19000|90.100|1000.000010|STH" | \
sed -r -e 's/\|/||/g' -e 's/(\|[0-9.]+[1-9])0+\|/\1|/g' -e 's/\|\|/|/g'
ABCD100|100.19|90.1|1000.00001|STH
If you want to depend on the | following the zeroes to be removed
cat regtest | sed -r 's/(00*)(\|)/\2/g'
If you want to remove zeroes not trailed by a . or a digit
cat regtest | sed -r 's/(00*)([^.0-9])/\2/g'
(Note I'm using the 00* instead of 0+ to avoid unique features of GNU sed not available in other versions)
Edit: answer to comment request for removing trailing zeroes only between a decimal point and a pipe:
cat regtest | sed -r 's/(\.[1-9])*(00*)(\|)/\1\3/g'
Using Perl's extended regular expressions
perl -pe 's{\.\d*?\K0*(\||$)}{$1}g'
This removes zeroes that occur between (a dot and optionally some digits) and (a pipe or the end of the line).