RegEx - How to change two double quotes to one double quote?

RegEx - How to change two double quotes to one double quote? - regex

I have a bunch of strings:
pipe 1/4"" square
3" bar
3/16"" spanner
nozzle 2""
1/2"" tube pipe with 6"" cut out
I want to replace the 2 double quotation marks from a string with Regex. I've been trying on some code with the aid of some references but cannot seem to do it right.
Ideally once RegEx'ed I would like to pass it into a $var that I can call further on in my script.
Q: What is the Regex that will do this with Bash?

You can use sed:
sed 's/""/"/g' input_file > output_file
Or, process the input line by line and use parameter expansion:
while read -r line ; do
line=${line//\"\"/\"}
echo "$line"
done < input_file
/g in sed and // in the expansion serve the same purpose: they'll apply the substitution on all occurrences on a line.

Using Bash parameter expansion:
echo "${var//\"\"/\"}"
sample output:
pipe 1/4" square

You can use the gawk:
echo $varName | gawk '{ gsub(/""/,"\"") } 1'
or the sed command:
echo $varName | sed 's/""/"/g'
I assumed your variable is named varName.
Instead if you need to to this for a file:
gawk '{ gsub(/""/,""") } 1' fileName
or
sed 's/""/"/g' fileName

Related

unix sed not backtracking to finish the job

I'm trying to make a script to convert postgres CSV dumps into Oracle csv dumps. Aka, I'm trying to replace "true" with "Y" and "false" with "N".
So I want a script called to_oracle like this:
echo "false,false,false,true" | to_oracle
N,N,N,Y
So here is my attempt:
sed -E -e 's:(,|^)true(,|$):\1Y\2:g' -e 's:(,|^)false(,|$):\1N\2:g' "$#"
The logic is that a field in a CSV file either starts with beginning of line or a comma "," and it ends with either the end of line or a comma ","
The problem with this script is that it greedily absorbs the comma and thus every second field doesn't work:
echo "false,false,false,true" | to_oracle
N,false,N,Y
Now I suppose I could pipe it to the script twice, and that would do the job, but I'm wondering is there a more elegant solution?

An awk version:
echo "false,false,false,true" | awk -F, -v OFS=, '{for(i=1;i<=NF;i++) $i=$i=="true"?"Y":"N"}1'
N,N,N,Y
It test one by one field, if its true use Y, else use N
If you like to test for false as well
echo "false,false,false,true" | awk -F, -v OFS=, '{for(i=1;i<=NF;i++) $i=($i=="true"?"Y":($i=="false"?"N":"other"))}1'
N,N,N,Y

With GNU sed, you may use
sed -E ':a;s/(,|^)false(,|$)/\1N\2/;ta; :b;s/(,|^)true(,|$)/\1Y\2/;tb'
See the online demo
Details
-E will enable POSIX ERE syntax
':a;s/(,|^)false(,|$)/\1N\2/;ta; will recursively replace false in between commas or start/end of string with N
:b;s/(,|^)true(,|$)/\1Y\2/;tb' will recursively replace true in between commas or start/end of string with Y.

How to parse every match of sed command

I have a string [u'SOMEVALUE1', u'SOMEVALUE2', u'SOMEVALUE3'], I would like to parse every element matched by my sed command. The element matched are in the single quote. Here is my script
#!/bin/bash
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for id in $(sed -n "s/^.*'\(.*\)'.*$/\1/ p" <<< ${ARR});
do
echo "$id"
done
I have only the first value returned.

The wildcard .* will match the longest leftmost possible string. If your intention is to match the individual substrings which are in single quotes, try
grep -o "'[^']*'" <<<"$ARR"
To remove the single quotes around the values, simply pipe to sed "s/'//g" and to loop over the lines printed by a pipe, do
... commands ... |
while read -r id; do
: things with "$id"
done

BASH can match regular expressions with the help of =~ (see man bash). Matching more than once is a bit painful but in your case we can split the input on white space and match once per item:
ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
for A in $ARR
do
[[ $A =~ u\'(.+)\' ]] && echo ${BASH_REMATCH[1]}
done
results in
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1

is this what you're trying to do?
$ ARR="[u'SOMEVALUE1', u'SOMEVALUE1', u'SOMEVALUE1']"
$ awk -v RS="'" '!(NR%2)' <<< "$ARR"
SOMEVALUE1
SOMEVALUE1
SOMEVALUE1
$ awk -v RS="'" '!(NR%2)' <<< "$ARR" |
while IFS= read -r id; do echo "id=$id"; done
id=SOMEVALUE1
id=SOMEVALUE1
id=SOMEVALUE1

How to use sed or awk to extract substring

I have a file that contains the following:
[class:ABC_DEF_GHI]
[class:ABC_DEF_GHI:app:ABC_DEF_GHI]
My goal is to extract ABC_DEF_GHI
Here is the script I'm trying to write so far.
eval sed -n 's/.*app://p' file.txt >> $file

You can get this value by using multiple delimiters in awk:
awk -F':|]' '{print $2}' $file

with sed
$ sed -E 's/.*:(.+)]/\1/' file
ABC_DEF_GHI
ABC_DEF_GHI
extract content between a colon and right square bracket, due to greedy match it will be the last colon.

SED replace expression "within" a regular expression

I have to change a CSV file column (the date) which is written in the following format:
YYYY-MM-DD
and I would like it to be
YYYY.MM.DD
I can write a succession of 2 sed rules piped one to the other like :
sed 's/-/./' file.csv | sed 's/-/./'
but this is not clean. my question is: is there a way of assigning variables in sed and tell it that YYYY-MM-DD should be parsed as year=YYYY ; month=MM ; day=DD and then tell it
write $year.$month.$day
or something similar? Maybe with awk?

You could use groups and access the year, month, and day directly via backreferences:
sed 's#\([0-9][0-9][0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\)#\1.\2.\3#g'

Here's an alternative solution with awk:
awk 'BEGIN { FS=OFS="," } { gsub("-", ".", $1); print }' file.csv
BEGIN { FS=OFS="," } tells awk to break the input lines into fields by , (variable FS, the [input] Field Separator), as well as to also use , when outputting modified input lines (variable OFS, the Output Field Separator).
gsub("-", ".", $1) replaces all - instances with . in field 1
The assumption is that the data is in the 1st field, $1; if the field index is a different one, replace the 1 in $1 accordingly.
print simply outputs the modified input line, terminated with a newline.

What you are doing is equivalent to supplying the "global" replacement flag:
sed 's/-/./g' file.csv
sed has no variables, but it does have numbered groups:
sed -r 's/([0-9]{4})-([0-9]{2})-([0-9]{2})/\1.\2.\3/g' file.csv
or, if your sed has no -r:
sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g' file.csv

You may try this sed command also,
sed 's/\([0-9]\{4\}\)\-\([0-9]\{2\}\)\-\([0-9]\{2\}\)/\1.\2.\3/g' file
Example:
$ (echo '2056-05-15'; echo '2086-12-15'; echo 'foo-bar-go') | sed 's/\([0-9]\{4\}\)-\([0-9]\{2\}\)-\([0-9]\{2\}\)/\1.\2.\3/g'
2056.05.15
2086.12.15
foo-bar-go

Change CSV Delimiter with sed

I've got a CSV file that looks like:
1,3,"3,5",4,"5,5"
Now I want to change all the "," not within quotes to ";" with sed, so it looks like this:
1;3;"3,5";5;"5,5"
But I can't find a pattern that works.

If you are expecting only numbers then the following expression will work
sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
e.g.
$ echo '1,3,"3,5",4,"5,5"' | sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5"
You can't just replace the [0-9][0-9]* with .* to retain any , in that is delimted by quotes, .* is too greedy and matches too much. So you have to use [a-z0-9]*
$ echo '1,3,"3,5",4,"5,5",",6","4,",7,"a,b",c' | sed -e 's/,/;/g' -e 's/\("[a-z0-9]*\);\([a-z0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5";",6";"4,";7;"a,b";c
It also has the advantage over the first solution of being simple to understand. We just replace every , by ; and then correct every ; in quotes back to a ,

You could try something like this:
echo '1,3,"3,5",4,"5,5"' | sed -r 's|("[^"]*),([^"]*")|\1\x1\2|g;s|,|;|g;s|\x1|,|g'
which replaces all commas within quotes with \x1 char, then replaces all commas left with semicolons, and then replaces \x1 chars back to commas. This might work, given the file is correctly formed, there're initially no \x1 chars in it and there're no situations where there is a double quote inside double quotes, like "a\"b".

Using gawk
gawk '{$1=$1}1' FPAT="([^,]+)|(\"[^\"]+\")" OFS=';' filename
Test:
[jaypal:~/Temp] cat filename
1,3,"3,5",4,"5,5"
[jaypal:~/Temp] gawk '{$1=$1}1' FPAT='([^,]+)|(\"[^\"]+\")' OFS=';' filename
1;3;"3,5";4;"5,5"

This might work for you:
echo '1,3,"3,5",4,"5,5"' |
sed 's/\("[^",]*\),\([^"]*"\)/\1\n\2/g;y/,/;/;s/\n/,/g'
1;3;"3,5";4;"5,5"
Here's alternative solution which is longer but more flexible:
echo '1,3,"3,5",4,"5,5"' |
sed 's/^/\n/;:a;s/\n\([^,"]\|"[^"]*"\)/\1\n/;ta;s/\n,/;\n/;ta;s/\n//'
1;3;"3,5";4;"5,5"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegEx - How to change two double quotes to one double quote? - regex

Using Bash parameter expansion: echo "${var//\"\"/\"}" sample output: pipe 1/4" square

You can use the gawk: echo $varName | gawk '{ gsub(/""/,"\"") } 1' or the sed command: echo $varName | sed 's/""/"/g' I assumed your variable is named varName. Instead if you need to to this for a file: gawk '{ gsub(/""/,""") } 1' fileName or sed 's/""/"/g' fileName

Related

unix sed not backtracking to finish the job

How to parse every match of sed command

How to use sed or awk to extract substring

SED replace expression "within" a regular expression

Change CSV Delimiter with sed

Categories

Resources