GNU sed regex to fix mySQL db inserts for SQLite - regex

I am trying to translate a huge mySQL database dump file from mySQL syntax into SQLite syntax.
At https://regex101.com/ I have successfully created a ECMAScript flavor regex to turn something like:
,'foo\'s bar!',
into:
,"foo\'s bar!"
with this regular expression:
/,'([^']+)\\'([^']+)',/"$1\\'$2"/g
testing against this short file:
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
Resulting in:
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n'"Henry\'s_1.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h'"Henry\'s_2.csv"'text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
but for the life of me I cannot translate this into a GNU sed flavor regex.
For example, this command does not make any substitutions in the output:
sed -r s/,'([^']+)\\'([^']+)',/"$1\\'$2"/g <test.sql
...
sed -r s/,'([^']+)\\'([^']+)',/"\1\\'\2"/g <test.sql: doesn't work either.
I have looked for a regex tool online that translates between different flavors of regex but cannot find one that works on GNU sed (shipped with GIT: sed (GNU sed) 4.8). PCRE seems to be close to what sed has but that doesn't work. I tried perl as well, no luck.
Anyone know a regex expression that works or a translator tool that works?
I am just about ready to write a nodejs program to do this for me.
Also, for extra credit, how can I write a sed script to handle any number of escaped quotes within a quoted string? I have that issue to deal with as well in my DB dump file.
Examples:
'foo\'-bar' // on instance
'foo\'and\'bar' // two instances
'foo\'and\'bar\'s on the deck' // three instances
and so on...
Thanks!

You can use
sed -E "s/,'([^']+)\\\\'([^']+)',/"'"'"\\1\\\\'\\2"'"'/g test.sql
The "s/,'([^']+)\\\\'([^']+)',/"'"'"\\1\\\\'\\2"'"'/g consists of
"s/,'([^']+)\\\\'([^']+)',/" - a s/,'([^']+)\\'([^']+)',/ part (inside double quotes, so backslashes need doubling)
'"' - a " char (inside single quotes)
"\\1\\\\'\\2" - \1\\'\2 pattern (inside double quotes, so backslashes are doubled)
'"' - a " char (inside single quotes)
/g - the global flag (no need quoting here).

First look at your command
sed -r s/,'([^']+)\\'([^']+)',/"\1\\'\2"/g test.sql
I prefer writing the whole sed command in single quotes. When you need a single quote, you must close the string ('), use an escaped single quote (\') and open the next string with a ', all joined: '\''.
I also added two , characters.
sed -r 's/,'\''([^'\'']+)\\'\''([^'\'']+)'\'',/,"\1\\'\''\2",/g' test.sql
# Shorter
sed -r 's/,'\''([^'\'']+\\'\''[^'\'']+)'\'',/,"\1",/g' test.sql
# Using another way to write the single quotes, with the hex notation
sed -r 's/,\x27([^\x27]+\\\x27[^\x27]+)\x27,/,"\1",/g' test.sql
This works for simple cases, not for 'foo\'and\'bar\'s on the deck'.
I think you want to replace the quotes in the simple fields too.
Suppose you want to transform
(1058,'gpl5q0x51349lmdq3e0ijm4k9b6n','Henry\'s_1.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33854,'mUVk0/XGX+afIpkrqBm7LQ==','2021-01-06 03:07:23'),
(1059,'xzj8mivsenkakkrurfjytxjsaj1h','Henry\'s_2.csv','text/csv','{\"identified\":true,\"analyzed\":true}',33555,'KfRYqfAWtSIYXZ6oQZyYbA==','2021-01-06 03:07:23'),
(2000,'extra credit from question','foo\'and\'bar\'s on the deck','text/csv','{\"identified\":true,\"analyzed\":true}',33999,'KgSBFstbdthdsssssstvbA==','2022-01-02 13:07:23'),
into
(1058,"gpl5q0x51349lmdq3e0ijm4k9b6n","Henry\'s_1.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33854,"mUVk0/XGX+afIpkrqBm7LQ==","2021-01-06 03:07:23"),
(1059,"xzj8mivsenkakkrurfjytxjsaj1h","Henry\'s_2.csv","text/csv","{\"identified\":true,\"analyzed\":true}",33555,"KfRYqfAWtSIYXZ6oQZyYbA==","2021-01-06 03:07:23"),
(2000,"extra credit from question","foo\'and\'bar\'s on the deck","text/csv","{\"identified\":true,\"analyzed\":true}",33999,"KgSBFstbdthdsssssstvbA==","2022-01-02 13:07:23"),
In this answer I don't use the '\'' but the hexadecimal notation \x27.
First "backup" the \' combinations (replace them by an unused character like \r), replace all normal quotes by double quotes and "restore the backup" (change back the \r).
sed 's/\\\x27/\r/g; s/\x27/"/g; s/\r/\\\x27/g' test.sql
# or hex value for double quote "
sed 's/\\\x27/\r/g; s/\x27/\x22/g; s/\r/\\\x27/g' test.sql

Related

replace line with single quotes by sed

I want to change single line of config started with "option timezone" to a line with single quotes: "option timezone 'EST-10'". However when I do this
sed -i '/option timezone/c\option timezone 'EST-10'' /etc/config/system
single quotes missed and result is like this:
head /etc/config/system
config system
option timezone EST-10
Of course backslash before quotes doesn't help. Can I achieve it somehow with \c command.
P.S. sed is from openwrt busybox, limited, supports only e,f,i,n,r.
Try this:
sed -i '/option timezone/c\option timezone '\'EST-10\' /etc/config/system
Adjacent strings are automatically concatenated by bash, so this closes the first string, adds a single quote (which needs to be escaped), EST-10, then another escaped single quote.
If the "EST-10" part contained spaces, then you would need to put it into single quotes too:
sed -i '/option timezone/c\option timezone '\''EST - 10'\' /etc/config/system
Double quotes are also an option but personally I prefer not to use them as there are a whole load of other characters that Bash will interpret, such as $ and !, that then need escaping.
You can use single quoted string inside double quoted sed command without bothering to escape them:
sed -i "/option timezone/c\option timezone 'EST-10'" /etc/config/system

sed replace exact match

I want to change some names in a file using sed. This is how the file looks like:
#! /bin/bash
SAMPLE="sample_name"
FULLSAMPLE="full_sample_name"
...
Now I only want to change sample_name & not full_sample_name using sed
I tried this
sed s/\<sample_name\>/sample_01/g ...
I thought \<> could be used to find an exact match, but when I use this, nothing is changed.
Adding '' helped to only change the sample_name. However there is another problem now: my situation was a bit more complicated than explained above since my sed command is embedded in a loop:
while read SAMPLE
do
name=$SAMPLE
sed -e 's/\<sample_name\>/$SAMPLE/g' /path/coverage.sh > path/new_coverage.sh
done < $1
So sample_name should be changed with the value attached to $SAMPLE. However when running the command sample_name is changed to $SAMPLE and not to the value attached to $SAMPLE.
I believe \< and \> work with gnu sed, you just need to quote the sed command:
sed -i.bak 's/\<sample_name\>/sample_01/g' file
In GNU sed, the following command works:
sed 's/\<sample_name\>/sample_01/' file
The only difference here is that I've enclosed the command in single quotes. Even when it is not necessary to quote a sed command, I see very little disadvantage to doing so (and it helps avoid these kinds of problems).
Another way of achieving what you want more portably is by adding the quotes to the pattern and replacement:
sed 's/"sample_name"/"sample_01"/' script.sh
Alternatively, the syntax you have proposed also works in GNU awk:
awk '{sub(/\<sample_name\>/, "sample_01")}1' file
If you want to use a variable in the replacement string, you will have to use double quotes instead of single, for example:
sed "s/\<sample_name\>/$var/" file
Variables are not expanded within single quotes, which is why you are getting the the name of your variable rather than its contents.
#user1987607
You can do this the following way:
sed s/"sample_name">/sample_01/g
where having "sample_name" in quotes " " matches the exact string value.
/g is for global replacement.
If "sample_name" occurs like this ifsample_name and you want to replace that as well
then you should use the following:
sed s/"sample_name ">/"sample_01 "/g
So that it replaces only the desired word. For example the above syntax will replace word "the" from a text file and not from words like thereby.
If you are interested in replacing only first occurence, then this would work fine
sed s/"sample_name"/sample_01/
Hope it helps

How can I use `sed` to replace the single quotes enclosing a directory with double quotes

What I want to achieve:
Suppose I have a file file with the following content:
ENV_VAR='/foo/`whoami`/bar/'
sh my_script.sh 'LOL'
I want to replace - using sed - the single quotes that surrounds the directory names, but not the ones that surrounds stuff that does not seem like a directory, for example, the arguments of a script.
That is, after running the sed command, I would expect the following output:
ENV_VAR="/foo/`whoami`/bar/"
sh my_script.sh 'LOL'
The idea is to make this happen without using tr to replace ' with ", nor sed like s/'/"/g, as I don't want to replace the lines that does not seem to be directories.
Please note that sed is running on AIX, so no GNU sed is available.
What I have tried:
If I use sed like this:
sed "s;'=.*/.*';&;g" file
... the & variable hold the regex previously matched, that is: ='/foo/`whoami`/bar/'. However, I can't figure out how to make the replacement so the single quotes gets transformed into double quotes.
I wonder if there's a way to make this work using sed only, via a one-liner.
This will do the job:
/usr/bin/sed -e "/='.*\/.*'/ s/'/\"/g" file
Basically, you just want the plain ' => " replacement, but not for all lines, just for those that match the pattern ='.*\/.*'/. And, in the s command you just need to escape the ".
This should work:
sed "s/'\(.*\/.*\)'/\"\1\"/g"
Captures the part between ' and uses a backreference.

search and replace substring in string in bash

I have the following task:
I have to replace several links, but only the links which ends with .do
Important: the files have also other links within, but they should stay untouched.
<li>Einstellungen verwalten</li>
to
<li>Einstellungen verwalten</li>
So I have to search for links with .do, take the part before and remember it for example as $a , replace the whole link with
<s:url action=' '/>
and past $a between the quotes.
I thought about sed, but sed as I know does only search a whole string and replace it complete.
I also tried bash Parameter Expansions in combination with sed but got severel problems with the quotes and the variables.
cat ./src/main/webapp/include/stoBox2.jsp | grep -e '<a href=".*\.do">' | while read a;
do
b=${a#*href=\"};
c=${b%.do*};
sed -i 's/href=\"$a.do\"/href=\"<s:url action=\'$a\'/>\"/g' ./src/main/webapp/include/stoBox2.jsp;
done;
any ideas ?
Thanks a lot.
sed -i sed 's#href="\(.*\)\.do"#href="<s:url action='"'\1'"'/>"#g' ./src/main/webapp/include/stoBox2.jsp
Use patterns with parentheses to get the link without .do, and here single and double quotes separate the sed command with 3 parts (but in fact join with one command) to escape the quotes in your text.
's#href="\(.*\)\.do"#href="<s:url action='
"'\1'"
'/>"#g'
parameters -i is used for modify your file derectly. If you don't want to do this just remove it. and save results to a tmp file with > tmp.
Try this one:
sed -i "s%\(href=\"\)\([^\"]\+\)\.do%\1<s:url action='\2'/>%g" \
./src/main/webapp/include/stoBox2.jsp;
You can capture patterns with parenthesis (\(,\)) and use it in the replacement pattern.
Here I catch a string without any " but preceding .do (\([^\"]\+\)\.do), and insert it without the .do suffix (\2).
There is a / in the second pattern, so I used %s to delimit expressions instead of traditional /.

egrep regular expression works within PHP, but doesn't work at unix shell - escaping issues?

I think my problem has something to do with escaping differences between using a regex within PHP versus using it at Bash commandline.
Here is my regex that is working in PHP:
$emailregex = '^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$';
So I try giving the following at commandline and it doesn't seem to match anything.
(where emails.txt is a long plain text file with thousands of (possibly badly-formed) email addresses, one per line).
[root#host dir]# egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$' emails.txt
I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference.
Do I need to add some backslashes into the regex?
SOLVED! Thank you!
My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dollar sign in the regex.
Single quotes should work with bash...
It works for me with this simple case:
echo test#test.com | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$'
In your text file, the line has to only contain the email address. Any additional spaces on the line will throw it off. For example this doesn't print anything:
echo " test#test.com" | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$'
Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:
egrep '[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})'
WWorks for me:
JPP-MacBookPro-4:tmp jpp$ cat emails.txt
aa#bb.com
bb#cc.com
not an email
cc#dd.ee.ff
JPP-MacBookPro-4:tmp jpp$ egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$' emails.txt
aa#bb.com
bb#cc.com
cc#dd.ee.ff
JPP-MacBookPro-4:tmp jpp$
Beware trailing whitespace/tabs/and returns - they have a way of biting regexs
There is a great ref on shell quoting here http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html