I am trying to translate a huge mySQL database dump file from mySQL syntax into SQLite syntax.
At https://regex101.com/ I have successfully created a ECMAScript flavor regex to turn something like:
,'foo\'s bar!',
into:
,"foo\'s bar!"
with this regular expression:
/,'([^']+)\\'([^']+)',/"$1\\'$2"/g
testing against this short file:
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
Resulting in:
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n'"Henry\'s_1.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h'"Henry\'s_2.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
but for the life of me I cannot translate this into a GNU sed flavor regex.
For example, this command does not make any substitutions in the output:
sed -r s/,'([^']+)\\'([^']+)',/"$1\\'$2"/g <test.sql
...
sed -r s/,'([^']+)\\'([^']+)',/"\1\\'\2"/g <test.sql: doesn't work either.
I have looked for a regex tool online that translates between different flavors of regex but cannot find one that works on GNU sed (shipped with GIT: sed (GNU sed) 4.8). PCRE seems to be close to what sed has but that doesn't work. I tried perl as well, no luck.
Anyone know a regex expression that works or a translator tool that works?
I am just about ready to write a nodejs program to do this for me.
Also, for extra credit, how can I write a sed script to handle any number of escaped quotes within a quoted string? I have that issue to deal with as well in my DB dump file.
Examples:
'foo\'-bar' // on instance
'foo\'and\'bar' // two instances
'foo\'and\'bar\'s on the deck' // three instances
and so on...
Thanks!
You can use
sed -E "s/,'([^']+)\\\\'([^']+)',/"'"'"\\1\\\\'\\2"'"'/g test.sql
The "s/,'([^']+)\\\\'([^']+)',/"'"'"\\1\\\\'\\2"'"'/g consists of
"s/,'([^']+)\\\\'([^']+)',/" - a s/,'([^']+)\\'([^']+)',/ part (inside double quotes, so backslashes need doubling)
'"' - a " char (inside single quotes)
"\\1\\\\'\\2" - \1\\'\2 pattern (inside double quotes, so backslashes are doubled)
'"' - a " char (inside single quotes)
/g - the global flag (no need quoting here).
First look at your command
sed -r s/,'([^']+)\\'([^']+)',/"\1\\'\2"/g test.sql
I prefer writing the whole sed command in single quotes. When you need a single quote, you must close the string ('), use an escaped single quote (\') and open the next string with a ', all joined: '\''.
I also added two , characters.
sed -r 's/,'\''([^'\'']+)\\'\''([^'\'']+)'\'',/,"\1\\'\''\2",/g' test.sql
# Shorter
sed -r 's/,'\''([^'\'']+\\'\''[^'\'']+)'\'',/,"\1",/g' test.sql
# Using another way to write the single quotes, with the hex notation
sed -r 's/,\x27([^\x27]+\\\x27[^\x27]+)\x27,/,"\1",/g' test.sql
This works for simple cases, not for 'foo\'and\'bar\'s on the deck'.
I think you want to replace the quotes in the simple fields too.
Suppose you want to transform
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
(2000,'extra credit from question','foo\'and\'bar\'s on the deck','text/csv','{\"identified\":true,\"analyzed\":true}',33999,'KgSBFstbdthdsssssstvbA==','2022-01-02 13:07:23'),
into
(1058,"gpl5q0x51349lmdq3e0ijm4k9b6n","Henry\'s_1.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33854,"mUVk0/XGX+afIpkrqBm7LQ==","2021-01-06 03:07:23"),
(1059,"xzj8mivsenkakkrurfjytxjsaj1h","Henry\'s_2.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33555,"KfRYqfAWtSIYXZ6oQZyYbA==","2021-01-06 03:07:23"),
(2000,"extra credit from question","foo\'and\'bar\'s on the deck","text/csv","{\"identified\":true,\"analyzed\":true}",33999,"KgSBFstbdthdsssssstvbA==","2022-01-02 13:07:23"),
In this answer I don't use the '\'' but the hexadecimal notation \x27.
First "backup" the \' combinations (replace them by an unused character like \r), replace all normal quotes by double quotes and "restore the backup" (change back the \r).
sed 's/\\\x27/\r/g; s/\x27/"/g; s/\r/\\\x27/g' test.sql
# or hex value for double quote "
sed 's/\\\x27/\r/g; s/\x27/\x22/g; s/\r/\\\x27/g' test.sql
I have a file which contains multiple latex equations like this :
...
\begin{equation}
\beq{x}=x^{1}\beq{e_{1}}+x^{2}\beq{e_{2}}+x^{3}\beq{e_{3}}
\end{equation}
...
\begin{equation}
\beq{y}=y^{1}\beq{e_{1}}+y^{2}\beq{e_{2}}+y^{3}\beq{e_{3}}
\end{equation}
...
I want to insert just before the "\end{equation}" the string "\tag{number}" where I can successfully get number variable.
To insert this string at the line identified by "$(($line)-1)", I do :
gsed -i "$(($line)-1)i \tag{$number}" file
But I get only :
...
\begin{equation}
\beq{x}=x^{1}\beq{e_{1}}+x^{2}\beq{e_{2}}+x^{3}\beq{e_{3}}
tag{1}
\end{equation}
...
\begin{equation}
\beq{y}=y^{1}\beq{e_{1}}+y^{2}\beq{e_{2}}+y^{3}\beq{e_{3}}
tag{2}
\end{equation}
...
As you can see, I can't print the backslash character at the beginning of "\tag" string
I tried with :
gsed -i "$(($line)-1)i '\'tag{$number}" file
or
gsed -i "$(($line)-1)i \\tag{$number}" file
but no good results,
if someone could see what's wrong ...
Thanks
PS: I am on MacOS X, that's why I used gsed
You need five slashes:
gsed -i "$(($line)-1)i \\\\\tag{$number}" file
Let me explain starting with a single quoted command:
gsed -i '1i \\\test'
You would need three slashes in that case:
The first one delimits the i command with the text to be inserted, the second one escapes the slash itself because otherwise it would get expanded as \t. The third, now escaped, slash, will get inserted as literal \ at the start of the new line.
If we additionally using double quotes to enclose the command,
gsed -i "1i\\\\\test"
the string will get additionally subject of parsing by the shell. Both escaping slashes from the single quoted command, would therefore need to get escaped as well. This makes 5 slashes.
So far so good. But since you are interpolating shell variables into the command, you need to make sure that slashes in them would get escaped as well.
I'm trying to write a bash function that would escape all double quotes within single quotes, eg:
'I need to escape "these" quotes with backslashes'
would become
'I need to escape \"these\" quotes with backslashes'
My take on it was:
Find pairs of single quotes in the input and extract them with grep
Pipe into sed, escape double quotes
Sed again the whole input and replace grep match with sedded match
I managed to get it working to the part of having correctly escaped quotes section, but replacing it in the whole input fails.
The script code copypaste:
# $1 - Full name, $2 - minified name
adjust_quotes ()
{
SINGLE_QUOTES=`grep -Eo "'.*'" $2`
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
sed -r "s|'.*'|$ESCAPED_QUOTES|g" "$2" > "$2.escaped"
mv "$2.escaped" $2
echo "Quotes escaped within single quotes on $2"
}
Random additional questions:
In the console, escaping the quote with only two backslashes works, but when code is put in the script - I need four. I'd love to know
Could I modify this code into a loop to escape all pairs of single quotes, one after another until EOF?
Thanks!
P.S. I know this would probably be easier to do in eg. python, but I really need to keep it in bash.
Using BASH string replacement:
s='I need to escape "these" quotes with backslashes'
r="${s//\"/\\\"}"
echo "$r"
I need to escape \"these\" quotes with backslashes
Here's a pure bash solution, which does the transformation on stdin, printing to stdout. It reads the entire input into memory, so it won't work with really enormous files.
escape_enclosed_quotes() (
IFS=\'
read -d '' -r -a fields
for ((i=1; i<${#fields[#]}; i+=2)); do
fields[i]=${fields[i]//\"/\\\"}
done
printf %s "${fields[*]}"
)
I deliberately enclosed the body of the function in parentheses rather than braces, in order to force the body to run in a subshell. That limits the modification of IFS to the body, as well as implicitly making the variables used local.
The function uses the read builtin to read the entire input (since the line delimiter is set to NUL with -d '') into an array (-a) using a single quote as the field separator (IFS=\'). The result is that the parts of the input surrounded with single quotes are in the odd positions of the array, so the function loops over the odd indices to do the substitution only for those fields. I use bash's find-and-replace syntax instead of deferring to an external utility like sed.
This being bash, there are a couple of gotchas:
If the file contains a NUL, the rest of the file will be ignored.
If the last line of the file does not end with a newline, and the last character of that line is a single quote, it will not be output.
Both of the above conditions are impossible in a portable text file, so it's probably OK. All the same, worth taking note.
The supplementary question: why are the extra backslashes needed in
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
Answer: It has nothing to do with that line being in a script. It has to do with your use of backticks (...) for command substitution, and the idiosyncratic and often unpredictable handling of backslashes inside backticks. This syntax is deprecated. Do not use it. (Not even if you see someone else using it in some random example on the internet.) If you had used the recommended $(...) syntax for command substitution, it would have worked as expected:
ESCAPED_QUOTES=$(echo $SINGLE_QUOTES | sed 's|"|\\"|g')
(More information is in the Bash FAQ linked above.)
I want to change single line of config started with "option timezone" to a line with single quotes: "option timezone 'EST-10'". However when I do this
sed -i '/option timezone/c\option timezone 'EST-10'' /etc/config/system
single quotes missed and result is like this:
head /etc/config/system
config system
option timezone EST-10
Of course backslash before quotes doesn't help. Can I achieve it somehow with \c command.
P.S. sed is from openwrt busybox, limited, supports only e,f,i,n,r.
Try this:
sed -i '/option timezone/c\option timezone '\'EST-10\' /etc/config/system
Adjacent strings are automatically concatenated by bash, so this closes the first string, adds a single quote (which needs to be escaped), EST-10, then another escaped single quote.
If the "EST-10" part contained spaces, then you would need to put it into single quotes too:
sed -i '/option timezone/c\option timezone '\''EST - 10'\' /etc/config/system
Double quotes are also an option but personally I prefer not to use them as there are a whole load of other characters that Bash will interpret, such as $ and !, that then need escaping.
You can use single quoted string inside double quoted sed command without bothering to escape them:
sed -i "/option timezone/c\option timezone 'EST-10'" /etc/config/system
What I want to achieve:
Suppose I have a file file with the following content:
ENV_VAR='/foo/`whoami`/bar/'
sh my_script.sh 'LOL'
I want to replace - using sed - the single quotes that surrounds the directory names, but not the ones that surrounds stuff that does not seem like a directory, for example, the arguments of a script.
That is, after running the sed command, I would expect the following output:
ENV_VAR="/foo/`whoami`/bar/"
sh my_script.sh 'LOL'
The idea is to make this happen without using tr to replace ' with ", nor sed like s/'/"/g, as I don't want to replace the lines that does not seem to be directories.
Please note that sed is running on AIX, so no GNU sed is available.
What I have tried:
If I use sed like this:
sed "s;'=.*/.*';&;g" file
... the & variable hold the regex previously matched, that is: ='/foo/`whoami`/bar/'. However, I can't figure out how to make the replacement so the single quotes gets transformed into double quotes.
I wonder if there's a way to make this work using sed only, via a one-liner.
This will do the job:
/usr/bin/sed -e "/='.*\/.*'/ s/'/\"/g" file
Basically, you just want the plain ' => " replacement, but not for all lines, just for those that match the pattern ='.*\/.*'/. And, in the s command you just need to escape the ".
This should work:
sed "s/'\(.*\/.*\)'/\"\1\"/g"
Captures the part between ' and uses a backreference.