sed conditional replace single character csv file - replace

The problem is removing , errors in text files.
The error is the ',' after M.D
xxx","M.D,","abc","xxx
The desired string is to replace the single , after D with a .
xxx","M.D.","abc","xxx
There are over 30 fields in the line

I don't know all your cases but for inits this may work
$ echo 'xxx","M.D,","abc","xxx' | sed -r 's/([A-Z]),/\1./'
xxx","M.D.","abc","xxx

As an alternate solution try this sed command as follows;
sed 's/M.D,/M.D./g' filename
Output:
$ sed 's/M.D,/M.D./g' sample
xxx","M.D.","abc","xxx

Related

Deleting everything between two string matches in a file

I got this text in file.txt:
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v2=6990226111024612869; tt_webid=6990226111024612869; tt_csrf_token=VD5Nb_TQFH4RKhoJeSe2nzLB; R6kq3TV7=AHkh4PB6AQAA3LIS90nWf2ss0Q7ZTCQjUat4axctvhQY68DdUEz92RwpmVSX|1|0|e9d6917c2fe555827dcf5ee916ba9778079ab2a9; ttwid=1%7CAFodeNF0iZM2fyy-ZeiZ6HTpZoG_MSx6SmXHgGVQ-V4%7C1627538859%7C59ca1e4a56f9f537b55e655a6dabff88e44eb48502b164ed6b4199f5a5263cb0; passport_csrf_token_default=6f7653c3ce946a6ce5444723fb0c509b; passport_csrf_token=6f7653c3ce946a6ce5444723fb0c509b; sid_guard=0483b7d37f4e4bd20ab3046e29724798%7C1627538893%7C5184000%7CMon%2C+27-Sep-2021+06%3A08%3A13+GMT; uid_tt=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; uid_tt_ss=27b52febe6222486b9f6b6a90ef4ffeace5ea25c09d29a1583be5a1ecf760996; sid_tt=0483b7d37f4e4bd20ab3046e29724798; sessionid=0483b7d37f4e4bd20ab3046e29724798; sessionid_ss=0483b7d37f4e4bd20ab3046e29724798; store-idc=maliva; store-country-code=us; odin_tt=294845c8f7711db177f7c549a9f44edb1555031b27a2a485df809cd92c4e544ac0772bf462df5b7a100f6e488c45303cd62df3b6b950f0842520cd887850137b035d990f29cc8b752765e594560c977f; cmpl_token=AgQQAPNSF-RMpbE89z5HYF0_-2PcrxjXf4fZYP5_ZA
How can I delete everything from the string inside ( first & only instance ) from :tt_ to _ZA in file.txt keeping only Osmun.Prez#mail.com:c7lB2m6b#3.a.a using bash linux?
Thank you
Something like:
sed -i "s/:tt_.*//" file.txt
if you want to edit the file in place. If not, remove the -i switch.
The sed command means: replace (s), in each line of file.txt, all the chars (.*) starting by the pattern :tt_ with an empty string (//).
Or the command:
sed -i "s/:tt_.*_ZA//" file.txt
which is more adherent to what you ask for, but returns the same output.
Use pattern substitution:
i=$(cat file.txt)
echo "${i/:tt*_ZA}"
Assuming the general requirement is to remove everything after the 2nd : ...
Sample data:
$ cat file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a:tt_webid_v ... to end of line
some.one#home.com:B52_m6b#9_az.more.stuff:delete from here ... to end of line
One sed idea:
$ sed -En 's/^([^:]*:[^:]*).*$/\1/p' file.txt
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
some.one#home.com:B52_m6b#9_az.more.stuff
Using awk
awk 'BEGIN{FS=OFS=":"}{print $1,$2}'
Using : as the delimiter, it is easy to extract the columns before :tt
This deletes all chars from ":tt_" to the last "_ZA", inclusive, in file.txt
Mac_3.2.57$cat file.txt | sed 's/\(\)[:]tt.*_ZA\(.*\)/\1\2/'
Osmun.Prez#mail.com:c7lB2m6b#3.a.a
Mac_3.2.57$
Or if it is always the first 2 values which are separated by colon (as per you example)
cat file.txt | cut -f1,2 -d’:’

Sed matches unwanted extra characters

I want to replace parts of file paths in a configuration file using sed in Cygwin. The file paths are in form of \\\\some\\constant\\path\\2018-03-20_2030.1\\Release\\base\\some_dll.dll (yes, double backslashes in the file) and the beginning part containing date should be replaced.
For matching I've written following regex: \\\\\\\\some\\\\constant\\\\path\\\\[0-9_\.-]* with a character set supposed to match only date, consisting of digits and "-", "_" and "." symbols. This results into following command for replacement: sed 's/\\\\\\\\some\\\\constant\\\\path\\\\[0-9_\.-]*/bla/g' file.txt
The problem is that, after replacement, I get blaRelease\\base\\some_dll.dll instead of bla\\Release\\base\\some_dll.dll as it was successfully replaced using Regexr.
Why does sed behave this way and how can I fix it?
The problem is that the character class [0-9_\.-] is matching backslashes. If you replace the class with [0-9_.-], it will do what you expect.
Note that in a character class, . isn't special and doesn't need quoting. For example, from my Cygwin command line:
$ echo '\.' | sed 's/[\.]/x/g'
xx
$ echo '\.' | sed 's/[.]/x/g'
\x
A simple sed may help you on same.
sed 's/.*Release/bla\\\\Release/' Input_file
In case you want to have backup of Input_file and save the output of it into Input_file itself then following may help you on same.
sed -i.bak 's/.*Release/bla\\\\Release/' Input_file
In another case if you simply want to save output into Input_file itself then following may help you on same too.(difference between above and this one is this one will not create a backup of original Input_file).
sed -i 's/.*Release/bla\\\\Release/' Input_file

Sed: how do I replace all "#" characters with "%" (but in a batch file)

I tried:
sed "s/#/\%/g"
but the batch file stripped out the %, and sed gave an error
sed "s/#/\x37/g"
didn't work either, it just put the text x37 in there
Note I need this to work in a batch file, not the command line.
%% does the trick:
echo "foo ## bar ##" | sed 's/#/%%/g'

Replace wrong lines in csv

I have a csv file like that :
0;test1;description;toto
1;test2;description;tata
2;test3;desc
ription;tutu
3;test4;description;tete
In shell, I would like to replace all the line that doesn't start with a number.
In this exemple I want to replace \nription by ription
I don't find the correct expression with sed, grep... :(
I want this result :
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete
Thanks a lot
EDIT 1 :
I have try something like this :
LC_ALL=C tr '(\n)[0-9]' ' ' < hotels.csv > test.csv
Or this :
sed ':a;N;$!ba;s/\r\n?![0-ç-9]/ /g' hotels.csv
But i think my regex is wrong and it doesn't work :(
With awk this seems feasible:
awk -F ';' '{if (NR>1 && match($1,/^[0-9]+$/)) printf("\n"); printf("%s",$0);} END{printf("\n")}' infile.csv
What it does:
from the second line: check if first field is a number and print a newline
in any line: print the entire line ($0) without trailing newline
Output is sent to STDOUT, input comes from infile.csv
EDIT: Sorry, i missed to copy the match(...)
Using grep -P
grep -P "^\d" file.csv
Use grep to match lines that begin with a digit.
due to peculiarities of sed's pattern space processing, you will have to use something like this ..
Note: ~ must be a char not present in your text
$cat file
0;test1;description;toto
1;test2;description;tata
2;test3;desc
ription;tutu
3;test4;description;tete
$ sed 'N;s/\n/~/' file | sed -r 's/~([0-9])/\n\1/g;s/~//g'
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete
PS: if your input file has Windows line endings you will have to use \r\n instead of \n
awk '{sub(/3;desc/,"3;description;tutu")}NR == 4 {next}1' file
0;test1;description;toto
1;test2;description;tata
2;test3;description;tutu
3;test4;description;tete

Bash - how to put each line within quotation

I want to put each line within quotation marks, such as:
abcdefg
hijklmn
opqrst
convert to:
"abcdefg"
"hijklmn"
"opqrst"
How to do this in Bash shell script?
Using awk
awk '{ print "\""$0"\""}' inputfile
Using pure bash
while read FOO; do
echo -e "\"$FOO\""
done < inputfile
where inputfile would be a file containing the lines without quotes.
If your file has empty lines, awk is definitely the way to go:
awk 'NF { print "\""$0"\""}' inputfile
NF tells awk to only execute the print command when the Number of Fields is more than zero (line is not empty).
I use the following command:
xargs -I{lin} echo \"{lin}\" < your_filename
The xargs take standard input (redirected from your file) and pass one line a time to {lin} placeholder, and then execute the command at next, in this case a echo with escaped double quotes.
You can use the -i option of xargs to omit the name of the placeholder, like this:
xargs -i echo \"{}\" < your_filename
In both cases, your IFS must be at default value or with '\n' at least.
This sed should work for ignoring empty lines as well:
sed -i.bak 's/^..*$/"&"/' inFile
or
sed 's/^.\{1,\}$/"&"/' inFile
Use sed:
sed -e 's/^\|$/"/g' file
More effort needed if the file contains empty lines.
I think the sed and awk are the best solution but if you want to use just shell here is small script for you.
#!/bin/bash
chr="\""
file="file.txt"
cp $file $file."_backup"
while read -r line
do
echo "${chr}$line${chr}"
done <$file > newfile
mv newfile $file
paste -d\" /dev/null your-file /dev/null
(not the nicest looking, but probably the fastest)
Now, if the input may contain quotes, you may need to escape them with backslashes (and then escape backslashes as well) like:
sed 's/["\]/\\&/g; s/.*/"&"/' your-file
This answer worked for me in mac terminal.
$ awk '{ printf "\"%s\",\n", $0 }' your_file_name
It should be noted that the text in double quotes and commas was printed out in terminal, the file itself was unaffected.
I used sed with two expressions to replace start and end of line, since in my particular use case I wanted to place HTML tags around only lines that contained particular words.
So I searched for the lines containing words contained in the bla variable within the text file inputfile and replaced the beginnign with <P> and the end with </P> (well actually I did some longer HTML tagging in the real thing, but this will serve fine as example)
Similar to:
$ bla=foo
$ sed -e "/${bla}/s#^#<P>#" -e "/${bla}/s#\$#</P>#" inputfile
<P>foo</P>
bar
$