How to replace multiple columns in a file? - replace

Let's say I have a file where I want to replace 1st , 3rd and 5th column with AAAA, The file is delimited by | (pipe ).
So far I am using a for loop to do that but I think this is consuming a lot of space as I am copying the result after each column replace to a temporary file (input file filename is huge)
ARRAY=(1,3, 5)
for i in ${ARRAY[#]}
do
sed "s/[^|]*/AAAA/$i" filename >> /tmp/tempfile
cp /tmp/tempfile filename
rm /tmp/tmpfile
done
Please suggest a smarter way

Why don't you just use awk for this? It is straight forward:
awk -v repl="AAAA" 'BEGIN{FS=OFS="|"} {$1=$3=$5=repl}1' file > new_file
Test
$ cat a
1|2|3|4|5|6|7|8
1|2|3|4|5|6|7|8
1|2|3|4|5|6|7|8
1|2|3|4|5|6|7|8
1|2|3|4|5|6|7|8
$ awk -v repl="AAAA" 'BEGIN{FS=OFS="|"} {$1=$3=$5=repl}1' a
AAAA|2|AAAA|4|AAAA|6|7|8
AAAA|2|AAAA|4|AAAA|6|7|8
AAAA|2|AAAA|4|AAAA|6|7|8
AAAA|2|AAAA|4|AAAA|6|7|8
AAAA|2|AAAA|4|AAAA|6|7|8

Generic to several column (set in Col) and pattern replacement (sert in Pat)
awk -v 'Col=1,3,5' -v 'Pat=AAAA' 'BEGIN{OFS=FS="|"} {for(i=1;i<=NF;i++) if( ","Col"," ~ ","i",")$i=Pat;print}' YourFile > NewFile

Related

Remove a line from file that starts with a number in bash

I am trying to create a simple CSV editor in bash,
and I struggle with removing a line. The user passes in the ID of
the line to remove (each row is defined with an ID as the first column).
This is an example file structure:
ID,Name,Surname
0,Mark,Twain
1,Cristopher,Jones
So, having the id saved as a variable and the file name in another variable (say its file.csv) I attempt to remove it from bash with this line:
read -p "Pass the object's ID: " idtoremove
fname=file.csv
sed -i -e "'/^$idtoremove*,/d'" $fname
However, this has no effect on the file. What could be wrong with this line?
Also, how can I replace a line starting with given ID with a string from a variable? This is another problem I will have to face but I have no Idea how to approach this one.
Following script could help you. It asks user to enter an id.
cat script.ksh
echo "Please enter the id to be removed:"
read value
awk -v val="$value" -F, '$1!=val' Input_file
In case you want to save output into Input_file itself append > tmp_file && mv tmp_file Input_file in above awk code.
With sed:
cat script.ksh
echo "Please enter the id to be removed:"
read value
sed "/^$value,/d" Input_file
Kindly use sed -i.bak option in above sed to save output into Input_file itself and have a backup of Input_file(before change) too.
This is best done in awk:
awk -v id="$idtoremove" -F, '$1 != id' file.csv
If you're using gnu awk then you can save in-place also:
awk -i inplace -v id="$idtoremove" -F, '$1 != id' file.csv
For other awk versions use:
awk -v id="$idtoremove" -F, '$1 != id' file.csv > $$.csv &&
mv $$.csv file.csv

sed remove all lines containing the word password.But don't keep the empty line

My file has lines
Database Name:Mydb
DatabaseServer:DbServer
Password:Example
Username:User1
Database Name:Mydb1
DatabaseServer:DbServer1
Password:Example1
Username:User11
I used sed -i "s/password.//gI" file but that is leaving an empty line(in place of password.) which i don't want.
Desired result:
Database Name:Mydb
DatabaseServer:DbServer
Username:User1
Database Name:Mydb1
DatabaseServer:DbServer1
Username:User11
give this a try:
sed -i '/Password:/d' file
Solution 1st: grep here to help.
grep -v '^Password' Input_file
In case you need to save into same Input_file then you could following.
grep -v '^Password' Input_file > temp && mv temp Input_file
Solution 2nd: Using awk.
awk '!/^Password/' Input_file > temp_file && mv temp_file Input_file

sed & regex expression

I'm trying to add a 'chr' string in the lines where is not there. This operation is necessary only in the lines that have not '##'.
At first I use grep + sed commands, as following, but I want to run the command overwriting the original file.
grep -v "^#" 5b110660bf55f80059c0ef52.vcf | grep -v 'chr' | sed 's/^/chr/g'
So, to run the command in file I write this:
sed -i -E '/^#.*$|^chr.*$/ s/^/chr/' 5b110660bf55f80059c0ef52.vcf
This is the content of the vcf file.
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="#ref plus strand,#ref minus strand, #alt plus strand, #alt minus strand">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 24430-0009S21_GM17-12140
1 955597 95692 G T 1382 PASS VARTYPE=1;BGN=0.00134309;ARL=150;DER=53;DEA=55;QR=40;QA=39;PBP=1091;PBM=300;TYPE=SNP;DBXREF=dbSNP:rs115173026,g1000:0.2825,esp5400:0.2755,ExAC:0.2290,clinvar:rs115173026,CLNSIG:2,CLNREVSTAT:mult,CLNSIGLAB:Benign;SGVEP=AGRN|+|NM_198576|1|c.45G>T|p.:(p.Pro15Pro)|synonymous GT:DP:AD:DP4 0/1:125:64,61:50,14,48,13
chr1 957898 82729935 G T 1214 off_target VARTYPE=1;BGN=0.00113362;ARL=149;DER=50;DEA=55;QR=38;QA=40;PBP=245;PBM=978;NVF=0.53;TYPE=SNP;DBXREF=dbSNP:rs2799064,g1000:0.3285;SGVEP=AGRN|+|NM_198576|2|c.463+56G>T|.|intronic GT:DP:AD:DP4 0/1:98:47,51:9,38,10,41
If I understand what is your expected result, try:
sed -ri '/^(#|chr)/! s/^/chr/' file
Your question isn't clear and you didn't provide the expected output so we can't test a potential solution but if all you want is to add chr to the start of lines where it's not already present and which don't start with # then that's just:
awk '!/^(#|chr)/{$0="chr" $0} 1' file
To overwrite the original file using GNU awk would be:
awk -i inplace '!/^(#|chr)/{$0="chr" $0} 1' file
and with any awk:
awk '!/^(#|chr)/{$0="chr" $0} 1' file > tmp && mv tmp file
This can be done with a single sed invocation. The script itself is something like the following.
If you have an input of format
$ echo -e '#\n#\n123chr456\n789chr123\nabc'
#
#
123chr456
789chr123
abc
then to prepend chr to non-commented chrless lines is done as
$ echo -e '#\n#\n123chr456\n789chr123\nabc' | sed '/^#/ {p
d
}
/chr/ {p
d
}
s/^/chr/'
which prints
#
#
123chr456
789chr123
chrabc
(Note the multiline sed script.)
Now you only need to run this script on a file in-place (-i in modern sed versions.)

Removing spaces for all the columns of a CSV file in bash/unix

I have a CSV file in which every column contains unnecessary extra spaces added to it before the actual value. I want to create a new CSV file by removing all the spaces.
For example
One line in input CSV file
123, ste hen, 456, out put
Expected output CSV file
123,ste hen,456,out put
I tried using awk to trim each column but it didn't work.
This sed should work:
sed -i.bak -E 's/(^|,)[[:blank:]]+/\1/g; s/[[:blank:]]+(,|$)/\1/g' file.csv
This will remove leading spaes, trailing spaces and spaces around comma.
Update: Here is an awk command to do the same:
awk -F '[[:blank:]]*,[[:blank:]]*' -v OFS=, '{
gsub(/^[[:blank:]]+|[[:blank:]]+$/, ""); $1=$1} 1' file
awk is your friend.
Input
$ cat 38609590.txt
Ted Winter, Evelyn Salt, Peabody
Ulrich, Ethan Hunt, Wallace
James Bond, Q, M
(blank line)
Script
$ awk '/^$/{next}{sub(/^[[:blank:]]*/,"");gsub(/[[:blank:]]*,[[:blank:]]*/,",")}1' 38609590.txt
Output
Ted Winter,Evelyn Salt,Peabody
Ulrich,Ethan Hunt,Wallace
James Bond,Q,M
Note
This one removes the blank lines too - /^$/{next}.
See the [ awk ] manual for more information.
To remove leading blank chars with sed:
$ sed -E 's/(^|,) +/\1/g' file
123,ste hen,456,out put
With GNU awk:
$ awk '{$0=gensub(/(^|,) +/,"\\1","g")}1' file
123,ste hen,456,out put
With other awks:
$ awk '{sub(/^ +/,""); gsub(/, +/,",")}1' file
123,ste hen,456,out put
To remove blank chars before and after the values with sed:
$ sed -E 's/ *(^|,|$) */\1/g' file
123,ste hen,456,out put
With GNU awk:
$ awk '{$0=gensub(/ *(^|,|$) */,"\\1","g")}1' file
123,ste hen,456,out put
With other awks:
$ awk '{gsub(/^ +| +$/,""); gsub(/ *, */,",")}1' file
123,ste hen,456,out put
Change (a single blank char) to [[:blank:]] if you can have tabs as well as blank chars.
echo " 123, ste hen, 456, out put" | awk '{sub(/^ +/,""); gsub(/, /,",")}1'
123,ste hen,456,out put
Another way to do with awk to remove multiple leading white-spaces is as below:-
$ awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "123, ste hen, 456, out put"
123,ste hen,456,out put
FS=OFS="," sets the input and output field separator to ,
s = ""; for (i = 1; i <= NF; i++) loops across each column entry up to the end (i.e. from $1,$2...NF) and the gsub(/^[ \t]+/,"",$i) trims only the leading white-space and not anywhere else (one ore more white-space, note the +) from each column.
If you are want to do this action for an entire file, suggest using a simple script like below
#!/bin/bash
# Output written to the file 'output.csv' in the same path
while IFS= read -r line || [[ -n "$line" ]]; do # Not setting IFS here, all done in 'awk', || condition for handling empty lines
awk 'BEGIN{FS=OFS=","} {s = ""; for (i = 1; i <= NF; i++) gsub(/^[ \t]+/,"",$i);} 1' <<< "$line" >> output.csv
done <input.csv
$ cat > test.in
123, ste hen, 456, out put
$ awk -F',' -v OFS=',' '{for (i=1;i<=NF;i++) gsub(/^ +| +$/,"",$i); print $0}' test.in
123,ste hen,456,out put
or written out loud:
BEGIN {
FS="," # set the input field separator
OFS="," # and the output field separator
}
{
for (i=1;i<=NF;i++) # loop thru every field on record
gsub(/^ +| +$/,"",$i) # remove leading and trailing spaces
print $0 # print out the trimmed record
}
Run with:
$ awk -f test.awk test.in
awk -F' *, *' '$1=$1' OFS=, file_path
You could try :
your file : ~/path/file.csv
cat ~/path/file.csv | tr -d "\ "
sed "s/, /,/g" ~/path/file.csv

Pipe awk's results to sed (deletion)

I am using an awk command (someawkcommand) that prints these lines (awkoutput):
>Genome1
ATGCAAAAG
CAATAA
and then, I want to use this output (awkoutput) as the input of a sed command. Something like that:
someawkcommand | sed 's/awkoutput//g' file1.txt > results.txt
file1.txt:
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The final objective is to delete all lines in a file (file1.txt) containing the exact pattern found previously by awk.
The file results.txt contains (output of sed):
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
How should I write the sed command? Is there any simple way that sed will recognize the output of awk as its input?
Using GNU awk for multi-char RS:
$ cat file1
>Genome1
ATGCAAAAG
CAATAA
$ cat file2
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
$ gawk -v RS='^$' -v ORS= 'NR==FNR{rmv=$0;next} {sub(rmv,"")} 1' file1 file2
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The stuff that might be non-obvious to newcomers but are very common awk idioms:
-v RS='^$' tells awk to read the whole file as one string (instead of it's default one line at a time).
-v ORS= sets the Output Record Separator to the null string (instead of it's default newline) so that when the file is printed as a string awk doesn't add a newline after it.
NR==FNR is a condition that is only true for the first input file.
1 is a true condition invoking the default action of printing the current record.
Here is a possible sed solution:
someawkcommand | sed -n 's_.*_/&/d;_;H;${x;s_\n__g p}' | sed -f - file1.txt
First sed command turns output from someawkcommand into a sed expression.
Concretely, it turns
>Genome1
ATGCAAAAG
CAATAA
into:
/>Genome1/d;/ATGCAAAAG/d;/CAATAA/d;
(in sed language: delete lines containing those patterns; mind that you will have to escape /,[,],*,^,$ in your awk output if there are some, with another substitution for instance).
Second sed command reads it as input expression (-f - reads sed commands from file -, i.e. gets it from pipe) and applies to file file1.txt.
Remark for other readers:
OP wants to use sed, but as notified in comments, it may not be the easiest way to solve this question. Deleting lines with awk could be simpler. Another (easy) solution could be to use grep with -v (invert match) and -f (read patterns from files) options, in this way:
someawkcommand | grep -v -f - file1.txt
Edit: Following #rici's comments, here is a new command that takes output from awk as a single multiline pattern.
Disclaimer: It gets dirty. Kids, don't do it home. Grown-ups are strongly encouraged to consider avoiding sed for that.
someawkcommand | \
sed -n 'H;${x;s_\n__;s_\n_\\n_g;s_.*_H;${x;s/\\n//;s/&//g p}_ p}' | \
sed -n -f - file1.txt
Output from inner sed is:
H;${x;s/\n//;s/>Genome1\nATGCAAAAG\nCAATAA//g p}
Additional drawback: it will add an empty line instead of removed pattern. Can't fix it easily (problems if pattern is at beginning/end of file). Add a substitution to remove it if you really feel like it.
This is can more easily be done in awk, but the usual "eliminate duplicates" code is not correct. As I understand the question, the goal is to remove entire stanzas from the file.
Here's a possible solution which assumes that the first awk script outputs a single stanza:
awk 'NR == FNR {stanza[nstanza++] = $0; next}
$0 == stanza[i] {++i; next}
/^>/ && i == nstanza {i=0; next}
i {for (j=0; j<i; ++j) print stanza[j]; i=0}
{print $0;}
' <(someawkcommand) file1.txt
This might work for you (GNU sed):
sed '1{h;s/.*/:a;$!{N;ba}/p;d};/^>/!{H;$!d};x;s/\n/\\n/g;s|.*|s/&\\n*//g|p;$s|.*|s/\\n*$//|p;x;h;d' file1
sed -f - file2
This builds a script from file1 and then runs it against file2.
The script slurps in file2 and then does a gobal substitution(s) using the contents of file1. Finally it removes any blank lines at the end file caused by the contents deletion.
To see the script produced from file1, remove the pipe and the second sed command.
An alternative way would be to use diff and sed:
diff -e file2 file1 | sed 's/d/p/g' | sed -nf - file2