I want to find single quote ' between double quotes and replace it with (back slash single quote single quote) \' ' using sed command.
input = 'gender':"Men's",'colour':'Red','name':"Men's levi's"
output = 'gender':"Men\' 's",'colour':'Red','name':"Men\' 's levi\' 's"
I tried this where I can replace comma with pipe but when trying to replace single quote with \' ' it doesn't work:
sed 's/(\"[^"\'']\{1,\}),([^"\'']\{1,\}\")/\1 | \2/g' test.csv
Here is a way to do that using awk:
awk 'BEGIN{FS=OFS=","} {
for (i=1; i<=NF; i++)
if (split($i, a, / *: */) == 2 && a[2] ~ /^"/) {
gsub("\047", "\\\047 \047", a[2])
$i=a[1] ":" a[2]
}
} 1' file
'gender':"Men\' 's",'colour':'Red','name':"Men\' 's levi\' 's"
With GNU awk for multi-char RS and RT, all you need is:
$ awk -v RS='"[^"]+"' '{gsub(/\047/,"\\\047 \047",RT); ORS=RT} 1' file
'gender':"Men\' 's",'colour':'Red','name':"Men\' 's levi\' 's"
With sed you could do this:
sed -e ":a"
-e "s/'\([^\\\":]*\(\\.[^\\\":]*\)*\"\)/\\\\\f \f\1/"
-e "ta"
-e "s/\\\\\f \f/\\\' '/g" file
Linebreaks and indentations are for readability. The whole point is that you first match single quotes that are followed by a double quote (might not be immediately), replace it with a \\\f \f (\\ a literal backslash, \f form feed) do the same thing using a loop (t) then you replace previous replacement with your desired string. The main regex also takes care from escaped double quotation marks inside a double quoted string but it fails if you have colons : or commas , within it.
One-liner:
sed -e ":a" -e "s/'\([^\\\":]*\(\\.[^\\\":]*\)*\"\)/\\\\\f \f\1/" -e "ta" -e "s/\\\\\f \f/\\\' '/g" file
Related
I'm converting a double quoted CSV to pipeline delimited txt file in Unix.
I have used the following sed command to replace "," into | then remove starting and ending double quote.
sed -e 's/","/|/g' -e 's/"//g' filenm.csv > filenm.txt
But the file seems to have consecutive commas without double quotes and they are not getting replaced.
Col1|col2|col3|col4|col5|col6|col7|col8
Val1|val2|val3,,,,val7|val8
Now I want to convert all these consecutive commas to consecutive pipelines as they indicate empty or null fields.
And other fields also have commas inside field values which should not be altered.
I tried using below for that, but not working.
sed -e 's/,{1,\}/|{1,\}/g' filenm.csv > filenm.txt
sample csv file opened in notepad:
"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
I hope this helps to reproduce the issue and resolve.
Thanks in advance....
This might work for you (GNU sed):
sed -E ':a;s/^(("[^",]*",+)*"[^",]*),/\1\n/;ta;y/,\n/|,/' file
Iteratively replace ,'s between "'s with newlines, then translate ,'s for |'s and newlines for ,'s.
You can use perl:
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' filenm.csv > filenm.txt
Details:
"([^"]*)"|, - the regex pattern that matches ", then captures into Group 1 any zero or more chars other than " and then matches a ", or just matches a , in all other contexts
defined($1) ? $1 : "|" - RHS, replacement, that replaces the match either with Group 1 value (if Group 1 was matched) or with a | (if the , was matched)
ge - g stands for global (replaces all occurrences) and e makes Perl treat the RHS as a Perl expression.
See an online test:
#!/bin/bash
s='"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","0","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"'
perl -pe 's/"([^"]*)"|,/defined($1) ? $1 : "|"/ge' <<< "$s"
Output:
ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|0|No.38,3rd st, RRR NNN, TRT||||9999999999
Using awk:
awk -F \" '{ for(i=1;i<=NF;i++) { if ($i ~ /^[,]{2,}$/) { $i="," } } OFS="\"";gsub("\",\"","\"|\"",$0)}1' sample.csv
Explanation:
awk -F \" '{ # Set the field delimiter to double quote
for(i=1;i<=NF;i++) {
if ($i ~ /^[,]{2,}$/) {
$i="," # Loop through each field and if is contains 2 or more commas, set that field to one comma
}
}
OFS="\"";
gsub("\",\"","\"|\"",$0) # Substitute "," for "|"
}1' sample.csv
I would use GNU AWK for that following way. Let file.txt content be
"ID","Name","DOB","Age","Address","City","State","Country","Phone number"
"123","ABC","12/20/2020","15","No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
"456","DEF","12/20/2020",,,,,"test-country","9999999999"
"465","XYZ",,,"No.38,3rd st, RRR NNN, TRT",,,,"9999999999"
then
awk 'BEGIN{FS="\"";OFS=""}{for(i=1;i<=NF;i+=2){$i=gensub(/,/,"|","g",$i)};print $0}' file.txt
output
ID|Name|DOB|Age|Address|City|State|Country|Phone number
123|ABC|12/20/2020|15|No.38,3rd st, RRR NNN, TRT||||9999999999
456|DEF|12/20/2020|||||test-country|9999999999
465|XYZ|||No.38,3rd st, RRR NNN, TRT||||9999999999
I assumed that first and last column is never empty. I use " as field separator and then in every odd field (these contain solely ,) I change all , to |. Finally I print whole such altered line.
(tested in GNU Awk 5.0.1)
I am trying to remove a constant string of characters that match a pattern. I can match the pattern via awk, is there a combination of awk and sed or perhaps just awk that can delete the string in place?
Example:
I need to match the 14th, 15th and 16th "|" symbol and delete the content in between.
Before:
00000000,003377fdh,,BLUE,YELLOW,ORANGE,UANGTANG,||57000000|1250000000|2|ramp|CAR|||||||24000|11000|apples,12-15-2017
After:
00000000,003377fdh,,BLUE,YELLOW,ORANGE,UANGTANG,||57000000|1250000000|2|ramp|CAR||||||,12-15-2017
awk -F '|' -v 'OFS=|' '{$13 = $16; NF -= 3; print}' file
or
perl -F'\|' -ne 'splice(#F, 12, 3); print join("|", #F)' file
You can try this sed too
sed -E 's/(([^|]*\|){3})//5' infile
I want to replace blanks with newline characters in a file. Bunch of other things I tried from the answers to other questions here didn't work:
sed -e 's/\s\+/\n/g' file
sed -e 's/[[:blank:]]\+/\n/g' file
These both return the file as it is. I tried the following:
sed -e 's/[[:blank:]]/\n/g' file
which replaces the blanks with ns.
I assume the difference is due to the difference between gnu sed and the one in OS X. How can I achieve this in OS X?
The trick is to insert a new line (actually a new line).
$ echo 'this will replace blanks with new lines' | sed 's/ /\
/g'
sed on OS X doesn't recognize \n in the replacement, you need to use a literal newline, and you have to escape it to prevent it from ending the command. It also doesn't understand the \s or +, so use [[:blank:]]\{1,} to match one or more spaces.
sed -e 's/[[:blank:]]+/\
/g' file
The tr command is easier/more-suitable IMO:
tr ' ' '\n' < $FILE_PATH
or:
echo 'this will replace blanks with new lines' | tr ' ' '\n'
I have a file with the condition like this :
"one","two","three"" four","five"
So I want to remove the quotes mark within the double quotes, so the output be like this :
"one","two","three four ","five"
How can I do that with awk function and regular expression on ubuntu? Thanks...
You can simply look for "" and replace it by an empty string.
Like:
sed -i 's/""//' *.txt
For example:
echo '"one","two","three"" four","five"' | sed 's/""//'
"one","two","three four","five"
sed is the right tool for this.
$ echo '"one","two","three"" four","five"' | sed 's/\([^,]\)"\+\([^,]\)/\1\2/g'
"one","two","three four","five"
The above regex captures the character (character not of a comma) which exits before and after to one or more double quotes. So this would match the double quotes which exists at the center.
OR
$ echo '"one","two","three"" four","five"' | sed -r 's/([^,])"+([^,])/\1\2/g'
"one","two","three four","five"
[^,] matches any character but not of a comma.
([^,]) matched character was captured into group 1. It's like aa temporary storage area.
"+ one or more +
([^,]) captures the following character which won't be a comma.
\1\2 all the matched chars are replaced with the characters stored inside group index 1 and the group index 2.
Update:
$ echo '"one","two","three" vg " "gfh" four","five"' | sed -r 's/([^,])"+([^,])/\1\2/g;s/([^,])"+([^,])/\1\2/g'
"one","two","three vg gfh four","five"
Using awk you can do:
s="one","two","three"" four","five"'
awk 'BEGIN{FS=OFS=","} {for (i=1; i<=NF; i++) gsub(/""/, "", $i)} 1' <<< "$s"
"one","two","three four","five"
I'm new to SED and have what may be a simple question. I've used it before to replace and delete characters but this is a little different. I need to eliminate commas within quotations, then eliminate the quotations in a csv file. So this:
"5,196,386","99,017",493,21
should end up looking like this:
5196386,99017,493,21
gnu awk one-liner:
awk -v FPAT='([^,]+|"[^"]+")' -v OFS="," '{for(i=1;i<=NF;i++)gsub(/[",]/,"",$i)}7'
with your example:
kent$ awk -v FPAT='([^,]+|"[^"]+")' -v OFS="," '{for(i=1;i<=NF;i++)gsub(/[",]/,"",$i)}7' <<< '"5,196,386","99,017",493,21'
5196386,99017,493,21
You'll need to do that with multiple s/// operations. The first will eliminate the commas between pairs of quotes when there are only commas and digits between the quotes; the second will eliminate the quotes (which by now have only digits between them):
sed -e 's/"\([0-9][0-9]*\),\([0-9,][0-9,]*\)"/"\1\2"/g' \
-e 's/"\([0-9][0-9]*\),\([0-9,][0-9,]*\)"/"\1\2"/g' \
-e 's/"\([0-9][0-9]*\)"/\1/g'
You have to repeat the first operation as often as the maximum number of commas that can appear between quotes. If your values go into the billions, you'll need a third copy of it.
I'd use a language with a proper CSV parser. For example:
echo '"5,196,386","99,017",493,21' |
ruby -rcsv -ne 'CSV.parse($_) do |row|
puts CSV.generate_line(row.map {|e| e.delete(",")})
end'
5196386,99017,493,21
This should work with nearly all awk
echo '"5,196,386","99,017",493,21' | awk 'BEGIN {FS=OFS=""} {for (i=1;i<=NF;i++) {if ($i=="\"") {f=!f;$i=""}; if (f && $i==",") $i=""}}1'
5196386,99017,493,21
How does it work:
awk '
BEGIN { # Begin block
FS=OFS=""} # Set input and output Field separator to "" (nothing) makes loop work on every characters
{for (i=1;i<=NF;i++) { # Looping trough line, one and one character at the time
if ($i=="\"") { # If a double quote is found do:
f=!f # Swap the flag "f" (If "f" is true, you are inside a double quote string
$i=""} # Delete the double quote
if (f && $i==",") # If "f" is true and we find a comma "," (inside a double quote string):
$i=""} # Delete the comma
}
1 # Print the line.
' file
This might work for you (GNU sed):
sed -r ':a;s/"[0-9,]+"/\n&\n/;T;h;s/[,"]//g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/;ta' file
This puts \n markers round a double quoted field, makes a copy of the whole line, removes double quotes and commas, then puts the line back together again and repeats till no more changes are needed.
An alternative method:
sed -r 's/^/\n/;ta;:a;s/\n+$//;t;s/\n\n"/\n/;ta;s/\n"/\n\n/;ta;s/\n\n,/\n\n/;ta;s/(\n+)(.)/\2\1/;ta' file
Passes character by character through the string using a \n as marker. Two \n's marks when the next character is within a quoted field.
awk '{gsub(/"5,196,386","99,017"/,"5196386,99017")}1' file
5196386,99017,493,21