I've been trying to clean up the data in a csv file which contain data similar to this:
8979880, Number One : Exclusive Mix, 387387, http://www.smashhits.com
4844404, Top 40 : 1988, 3893938, http://www.best80s.com
48094940, Highlander:The Return, 489494, http://www.instantaccess.com
My goal is to replace the colon in field 2 with a space. Initially I used sed to replace the : with a spacelike so:
sed i "s/:/ /g" file.csv
This works in removing the colon but unfortunately this also removes the colon in the url which is not what I want. How can I specify that I only want the command to affect the data in field 2?
Using awk you can do
awk '/:/{sub(/:/, " ")} 1' file.csv
With /:/ you match the first occurrence of :
With {sub(/:/, " ")} you replace : with a space
1 simply prints the line.
You can use gnu sed like this:
sed -r 's/^([^,]*,[^,]*):/\1 /g' file.csv
Explanation
^ anchors the expression at the start of each line
now [^,]*, matches the first field including the separator
and then [^,]*: matches from the second field to the :
the parenthises ^(...): take care that everything up to but not including the : in the second field is captured into \1
finally the replacement with \1 (there is a space after the \1 does the replacement of the : with space on line where the regex matched
Related
Is there a way to use sed (with potential other command) to transform all the keys in a file that lists key-values like that :
a.key.one-example=a_value_one
a.key.two-example=a_value_two
and I want that
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two
What I did so far :
sed -e 's/^[^=]*/\U&/'
it produced this :
A.KEY.ONE-EXAMPLE=a_value_one
A.KEY.TWO-EXAMPLE=a_value_two
But I still need to replace the "." and "-" on left part of the "=". I don't think it is the right way to do it.
It should be done very easily done in awk. awk is the better tool IMHO for this task, it keeps it simple and easy.
awk 'BEGIN{FS=OFS="="} {$1=toupper($1);gsub(/[.-]/,"_",$1)} 1' Input_file
Simple explanation:
Make field separator and output field separator as =
Then use awk's default function named toupper which will make $1(first field) upper case and save it into $1 itself.
Using gsub to substitute . OR - with _ in $1 as per requirement.
use 1 which is idiomatic way to print a line in awk.
This might work for you (GNU sed):
sed -E 'h;y/.-/__/;s/.*/\U&/;G;s/=.*=/=/' file
Make a copy of the current line.
Translate . and - to _.
Capitalize the whole line.
Append the copy.
Remove the centre portion.
You can use
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' file > newfile
Details:
:a - sets an a label
s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g - replaces ^\([^=]*\)[.-]\([^=]*\) pattern that matches
^ - start of string
\([^=]*\) - Group 1 (\1): any zero or more chars other than =
[.-] - a dot or hyphen
\([^=]*\) - Group 2 (\2): any zero or more chars other than =
ta - jumps back to a label position upon successful replacement
and replaces with Group 2 + _ + Group 1
See the online demo:
#!/bin/bash
s='a.key.one-example=a_value_one
a.key.two-example=a_value_two'
sed ':a;s/^\([^=]*\)[.-]\([^=]*\)/\U\1_\2/g;ta' <<< "$s"
Output:
A_KEY_ONE_EXAMPLE=a_value_one
A_KEY_TWO_EXAMPLE=a_value_two
I would like to sed to find and replace every occurrence of - with _ but only before the first occurrence of = on every line.
Here is a dataset to work with:
ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"
In the end the dataset should look like this:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
I found this regex will match properly.
\-(?=.*=)
However the regex uses positive lookaheads and it appears that sed (even with -E, -e or -r) dose not know how to work with positive lookaheads.
I tried the following but keep getting Invalid preceding regular expression
cat dataset.txt | sed -r "s/-(?=.*=)/_/g"
Is it possible to convert this in a usable way with sed?
Note, I do not want to use perl. However I am open to awk.
You can use
sed ':a;s/^\([^=]*\)-/\1_/;ta' file
See the online demo:
#!/bin/bash
s='ke-y_0-1="foo"
key_two="bar"
key_03-three="baz-jazz-mazz"
key-="rax_foo"
key-05-five="craz-"'
sed ':a; s/^\([^=]*\)-/\1_/;ta' <<< "$s"
Output:
ke_y_0_1="foo"
key_two="bar"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
Details:
:a - setting a label named a
s/^\([^=]*\)-/\1_/ - find any zero or more chars other than a = char from the start of string (while capturing into Group 1 (\1)) and then matches a - char, and replaces with Group 1 value (\1) and a _ (that replaces the found - char)
ta - jump to lable a location upon successful replacement. Else, stop.
You might also use awk setting the field separator to = and replace all - with _ for the first field.
To print only the replaced lines:
awk 'BEGIN{FS=OFS="="}gsub("-", "_", $1)' file
Output
ke_y_0_1="foo"
key_03_three="baz-jazz-mazz"
key_="rax_foo"
key_05_five="craz-"
If you want to print all lines:
awk 'BEGIN{FS=OFS="="}{gsub("-", "_", $1);print}' file
I have a file which I want to modify into a new file using cat.
So the file contains lines like:
name "myName"
place "xyz"
and so on....
I want these lines to be changed to
name "Jon"
place "paris"
I tried to do it like this but its not working:
cat originalFile | sed 's/^name\*/name "Jon"/' > tempFile
I tried using all sorts of special characters and it did not work. I am unable to recognize the space characters after name and then "myName".
You may match the rest of the line using .*, and you may match a space with a space, or [[:blank:]] or [[:space:]]:
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' originalFile > tempFile
Note there are two replace commands here joined with s semicolon. The first parts are wrapped with a capturing group that is necessary because the space POSIX character class is not literal and in order to keep it after replacing the \1 backreference should be used (to insert the text captured with Group 1).
See the online demo:
s='name "myName"
place "xyz"'
sed 's/^\(name[[:space:]]\).*/\1"Jon"/;s/^\(place[[:space:]]\).*/\1"paris"/' <<< "$s"
Output:
name "Jon"
place "paris"
An awk alternative:
awk '$1=="name"{$0="name \"Jon\""} $1=="place"{$0="place \"paris\""} 1' originalFile
It will work when there're space(s) before name or place.
It's not regex match here but just string compare.
awk separates fields by space characters which including \n or .
Append > tempFile to it when the results seems correct to you.
I need to search for each instance of a colon ":" and then prepend a string to the word before that colon.
Example:
some data here word:number
Desired outcome:
some data here prepend_word:number
I've tried:
sed "s/:/s/^/prepend_/g"
This adds prepend_ to the beginning of the line: prepend_some data here word:number
sed "s/:/prepend_&/g"
this adds prepend_ right before the colon: some data here wordprepend_:number
You need to use
sed 's/[^[:space:]]*:/prepend_&/g'
The [^[:space:]]*: pattern searches for 0 or more non-whitespace chars and a : after them, and the prepend_& replacement pattern will replace the match with itself (see &) and insert prepend_ before it.
See an online sed demo:
sed 's/[^[:space:]]*:/prepend_&/g' <<< "some data here word:number more:here"
Output: some data here prepend_word:number prepend_more:here.
I am attempting to change all coordinate information in a fastq file to zeros. My input file is composed of millions of entries in the following repeating 4-line structure:
#HWI-SV007:140:C173GACXX:6:2215:16030:89299 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I would like to replace the two numeric strings in the first line 16030:89299 with zeros in a generic way, such that any numeric string between the colons, before the space, is replaced. I would like the output to appear as follows, replacing the two strings globally throughout the file with zeros:
#HWI-SV007:140:C173GACXX:6:2215:0:0 1:N:0:CAGATC
GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAG
+
###FFFDFHGGDHIIHGIJJJJJJJJJJJGIJJJJJJJIIIDHGHIGIJJIIIJJIJ
I am attempting to do this using the following sed:
sed 's/:^[0-9]+$:^[0-9]+$\s/:0:0 /g'
However, this does not behave as expected.
I think you will need to use sed -r option.
Also, ^ matches beginning of the line and $ matches end of the line.
Thus this is the command line that works against your sample.
sed -r 's/:[0-9]+:[0-9]+\s/:0:0 /g'
some alternative
awk -F ":" 'BEGIN{ OFS = ":" }{ if ( NF > 1 ) {$6 = 0; sub( /^[0-9]*/, 0, $7)}; print $0 }' YourFile
using column separate by :
sed 's/^\(\([^:]*:\)\{5\}\)[^[:blank:]]*/\10:0/' YourFile
using 5 first element separate by : thant space as delimiter
for your sed
sed 's/:[0-9]+:[0-9]+\(\s\)/:0:0\1/'
^and $ are relative to the whole string not the current word
option to keep the original space instead of replacing by a blank space (case of several or other like \t)
g is not needed (and better not to use here) because normaly only 1 occurence per line
you need to be sure that the pattern is not possible somewhere else (never a space after the previous number) because it's a small one