Remove double quotes around integers only in a csv file, regex? - regex

I have a csv file with fields like so:
"231444","344","some string","222"
I have been trying, without success, to remove the double quotes from around the integers in the csv. I have tried a bit of sed, and attempted to awk/gawk but I am really having trouble with this one. Expected output would be:
231444,344,"some string",222
There are no negative integers. Any help would be much appreciated, and thank you in advance.

Just for a reference. This things are can be done using Perl's one liner as well.
linux:
perl -i.bak -p -e 's/"(\d+)"/$1/g' input.txt
For reference, Windows(single quote doesn't work):
perl -i.bak -p -e "s/\"(\d+)\"/$1/g" input.txt

Your regex would be /"(\d+)"/g which should be replaced with \1.
Without knowing sed, I assume it'd be something like this (based on the Wikipedia example):
sed 's/"(\d+)"/\1/g' inputFileName > outputFileName
Regex 101 Demo

Related

Prefix all numbers in a file with a string using sed

Using sed, awk or possibly something else I would like to prefix all numbers in file with a string e.g.
input:
sometext(0, 456)
sometext(01, 10)
output:
sometext(somestring0, somestring456)
sometext(somestring01, somestring10)
I have attempted using sed but my skills are limited so I have not managed to produce any meaningful output.
Using OSX10.11 so I know that sed has slightly different behaviour in BSD than under other *nix's.
I also have perl and python at hand if that solves this better but sed and awk are preferred.
You can use this sed command that matches and captures a number and in replacement prefixes it:
sed -E 's/[[:digit:]]+/somestring&/g' file
sometext(somestring0, somestring456)
sometext(somestring01, somestring10)
Please keep in mind that somestring should not contain special replacements constructs like &, \1, \2, \3 etc.

How to convert CSV with Double Quotes into OpenCSV using SED or AWK?

I want to convert the CSV with double quotes into OpenCSV (no double quotes and comma is escaped with backslash) using unix utilities SED or AWK. I do find examples with perl or java online, but i am looking for one which is simply done using basic SED or AWK.
Not sure about OpenCSV standards, but going by your description you can use this to do a find and replace using SED.
sed -i -e 's/FINDME/REPLACEWITH/g' folder/file.csv
Multiple find/replace can be separated by a semi-colon ;. -i edits a file in place and -e runs a script.
So for your particular example, backslashes and commas make it a little difficult, but this should work:
sed -i -e 's/"/'\''/g;s/,/\\,/g' file.csv
From your description, this might be what you are after:
awk -F'" *, *"|^ *"|" *$' '{a="";for(i=2;i<=NF-1;i++){gsub(/,/,"\\,",$i); if(a){a=a","$i}else{a=$i}};print a}
A bash example:
awk -F'" *, *"|^ *"|" *$' '{a="";for(i=2;i<=NF-1;i++){gsub(/,/,"\\,",$i); if(a){a=a","$i}else{a=$i}};print a}'<<<$'"a","b",","\n"d", "e" ,",,,"'
a,b,\,
d,e,\,\,\,

how to select lines containing several words using sed?

I am learning using sed in unix.
I have a file with many lines and I wanna delete all lines except lines containing strings(e.g) alex, eva and tom.
I think I can use
sed '/alex|eva|tom/!d' filename
However I find it doesn't work, it cannot match the line. It just match "alex|eva|tom"...
Only
sed '/alex/!d' filename
works.
Anyone know how to select lines containing more than 1 words using sed?
plus, with parenthesis like "sed '/(alex)|(eva)|(tom)/!d' file" doesn't work, and I wanna the line containing all three words.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
awk '/alex/ && /eva/ && /tom/' file
delete all lines except lines containing strings(e.g) alex, eva and tom
As worded you're asking to preserve lines containing all those words but your samples preserve lines containing any. Just in case "all" wasn't a misspeak: Regular expressions can't express any-order searches, fortunately sed lets you run multiple matches:
sed -n '/alex/{/eva/{/tom/p}}'
or you could just delete them serially:
sed '/alex/!d; /eva/!d; /tom/!d'
The above works on GNU/anything systems, with BSD-based userlands you'll have to insert a bunch of newlines or pass them as separate expressions:
sed -n '/alex/ {
/eva/ {
/tom/ p
}
}'
or
sed -e '/alex/!d' -e '/eva/!d' -e '/tom/!d'
You can use:
sed -r '/alex|eva|tom/!d' filename
OR on Mac:
sed -E '/alex|eva|tom/!d' filename
Use -i.bak for inline editing so:
sed -i.bak -r '/alex|eva|tom/!d' filename
You should be using \| instead of |.
Edit: Looks like this is true for some variants of sed but not others.
This might work for you (GNU sed):
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{3}//p' file
This method would allow a range of values to be present i.e. you wanted 2 or more of the list then use:
sed -nr '/alex/G;/eva/G;/tom/G;s/\n{2,3}//p' file

Replace certain strings from text with SED and REGEX

I have the following strings in a text file (big one, more like these and different):
79A18D7F-1517-5981-8446-3A0452727B06
7842A72D-1517-5281-84E4-EAEF09B743F7
6040BEE7-1517-5982-84C1-419B224E647E
615F2747-1517-5981-84AF-787C34967FB2
7468A3E3-1517-5931-84B3-3FC3F701C269
I can find them using grep and regex:
'[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
what's the sed regex syntax to delete them because:
sed "s/[0-9A-F]{8}-[0-9]{4}-[0-9]{4}-[0-9A-F]{4}-[0-9A-F]{12}//g"
doesn't seem to work.
Thanks!
Use sed -r. You are relying on extended regular expression syntax features without escaping them, but with sed -r you don't have to. If you want to actually delete the lines instead of just clearing them, you can use:
sed -r "/regex/d"
In addition, for regular sed (BRE) you would need to escape the curly braces:
sed 's/[0-9A-F]\{8\}-[0-9]\{4\}-[0-9]\{4\}-[0-9A-F]\{4\}-[0-9A-F]\{12\}//g' file

How to use regular expression in sed command

i have some strings with this pattern in some files:
domain.com/page-10
domain.com/page-15
....
and i want to replace them with something like
domain.com/apple-10.html
domain.com/apple-15.html
i have found that i can use sed command to replace them at a time but because after the numbers should something be added i guess i have to use regular expression to do it. but i don't know how.
sed -i.bak -r 's/page-([0-9]+)/apple-\1.html/' file
sed 's/page-\([0-9][0-9]*\)/apple-\1.html/' file > t && mv t file
Besides sed, you can also use gawk's gensub()
awk '{b=gensub(/page-([0-9]+)/,"apple-\\1.html","g",$0) ;print b }' file
sed -i 's/page-\([0-9]*\)/apple-\1.html/' <filename>
The ([0-9]*) captures a group of digits; the \1 in the replacement string references that capture and adds it as part of the replacement string.
You may want to use something like -i.backup if you need to keep a copy of the file without the replacements, or just omit the -i and instead use the I/O redirection method instead.
One more way to resolve the problem:
sed -i.bak 's/\(^.*\)\(page-\)\(.*\)/\1apple-\3.html/' Files
Here the searching patterns are stored and retrieved using references (\1, \2, \3).
This will work
sed 's/$/\.html/g' file.txt