Add backslash before single and double quote - regex

I am trying to add backslash before single and double quote. The problem that I have is that I want to exclude triple quote.
What I did is as for now:
for single quote:
sed -e s/\'/\\\\\'/g test.txt > test1.txt
for double quote:
sed -e s/\"/\\\\\"/g test.txt > test1.txt
I have text like:
1,"""Some text XM'SD12X""","""Some text XM'SD12X""","""Auto " Moto " Some text"Some text"""
What I want is:
120,"""Some text\'SD12X""","""Some text XM\'SD12X""","""Auto \" Moto \" Some text\"Some text"""

If perl is okay:
perl -pe 's/"{3}(*SKIP)(*F)|[\x27"]/\\$&/g'
"{3}(*SKIP)(*F) don't change triple double quotes
use (\x27{3}|"{3})(*SKIP)(*F) if you shouldn't change triple single/double quotes
|[\x27"] match single or double quotes
\\$& prefix \ to the matched portion
With sed, you can replace the triple quotes with newline character (since newline character cannot be present in pattern space for default line-by-line usage), then replace the single/double quote characters and then change newline characters back to triple quotes.
# assuming only triple double quotes are present
sed 's/"""/\n/g; s/[\x27"]/\\&/g; s/\n/"""/g'

Related

How to use sed to add double quotes around every word, excluding colons and commas

I want to alter a string so that I have double quotes around every "word," excluding colons and commas ':,'.
For example, my input may look like:
[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World,
JOB_NAME:Hello_Jenkins]
but I want it to appear as
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World",
"JOB_NAME":"Hello_Jenkins"]
I've been using something like (using '_' as the delimiter)
'echo ${params} | sed -i "s_\'/\\([^:]*\\):/i\'_\'"$1" :\'_g" '
based off of what I've found online, yet it makes no changes to my string.
> sed -r 's/[^], :[]+/"&"/g' file
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]
In the above sed we exclude colons, commas, the brackets and the spaces, as your example says so. If your case is not fully represented by your example, you could modify the excluded characters, but the order of the brackets in the expression is important.
$ echo '[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World, JOB_NAME:Hello_Jenkins]' |
sed 's/[[:alnum:]_]\+/"&"/g'
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]
or if you have to exclude instead of include chars in the regexp:
$ echo '[ANALYSIS:true, RESTRICTED:false, STRING_PARAMETER:World, JOB_NAME:Hello_Jenkins]' |
sed 's/[^][,: ]\+/"&"/g'
["ANALYSIS":"true", "RESTRICTED":"false", "STRING_PARAMETER":"World", "JOB_NAME":"Hello_Jenkins"]

How can I escape a single quote in ag (without using double quotes)

How can I escape a single quote in ag when searching for an expression like this one?
ag ''react-redux''
I'm aware that "'react-redux'" is one solution in this scenario, but I'd like a solution that lets me use single quotes. That way I don't have to worry about the complex escape sequences required by $, %, etc. when using double quotes.
you can use \x27 to represent single quotes. (\x27 is just the ascii code for single quotes)
thus, you can use:
ag '\x27react-redux\x27'
ref: How to escape single quote in sed? --stackoverflow
While \x27 works, it's obscure. The standard approach is triple single quotes. So, if you want to escape single quotes around 'react-redux' you'd surround each single quote with a pair of single quotes: '''react-redux'''.
For example:
$ag '''DD-MON-YYYY''' users.sql
19: TO_CHAR(lock_date, 'DD-MON-YYYY') AS lock_date,
20: TO_CHAR(expiry_date, 'DD-MON-YYYY') AS expiry_date,
23: TO_CHAR(created, 'DD-MON-YYYY') AS created,
So, a trivial example to only match quoted items:
$touch test.txt
$echo value >> test.txt
$echo '''value''' >> test.txt
$echo value >> test.txt
$ < test.txt
value
'value'
value
$ ag '''value''' test.txt
2:'value'

Regex for replacing space with comma-space, except at end of line

I am trying to covert input file content of this:
NP_418770.2: 257-296 344-415 503-543 556-592 642-707
YP_026226.4: 741-779 811-890 896-979 1043-1077
to this:
NP_418770.2: 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4: 741-779, 811-890, 896-979, 1043-1077
i.e., replace a space with comma and space (excluding newline)
For that, I have tried:
perl -pi.bak -e "s/[^\S\n]+/, /g" input.txt
but it gives:
NP_418770.2:, 257-296, 344-415, 503-543, 556-592, 642-707
YP_026226.4:, 741-779, 811-890, 896-979, 1043-1077
how can I stop the additional comma which appear after ":" (I want ":" and a single space) without writing another regex?
Thanks
Try using regex negative lookbehind. It is basically look if the character before the space is colon (:) then it don't match that space.
s/(?<!:)[^\S\n]+/, /g
You can play with the word-boundary to discard the space that follows the colon: s/\b\h+/, /g
It can be done with perl:
perl -pe's/\b\h+/, /g' file
but also with sed:
sed -E 's/\b[ \t]+/, /g' file
Other approach that uses the field separator:
perl -F'\b\h+' -ape'BEGIN{$,=", "}' file
or do the same with awk:
awk -F'\b[ \t]+' -vOFS=', ' '1' file
You were close. That should do the trick:
s/(\d+-\d+)[^\S\n]+/$1, /g
The thing is, I try to look at the parts that will get a comma after them which apply to the pattern of "digits, then a dash, more digits, then a whitespace that's not a newline". The funny thing about it is that I said that "whitespace that's not a newline" part as [^\S\n]+ which means "not a non-whitespace or a newline" (because \S is all that's not \s and we want to exclude the newline too). If in any case you have some trailing whitespace, you can trim it with s/\s+$// prior to the regex above, just don't forget to add the newline character back after that.

String replace on a very large file

I have a giant text file that is JSON. You can see it here: http://api.mtgdb.info/cards/. I have saved this JSON to a file called cards.json.
In cards.json, I need to escape every single quote ' with a backslash \.
So I need to replace ' with \'.
Usually this is trivial in any editor, however the file is too large. How can I escape all single quotes in this string?
What I've tried:
I tried using sed. My command was sed s/\'/\\\'/ cards.json > cards_cleaned.json. However the cards_cleaned.json file did not have any escaped ', it was just an exact copy of cards.json. Sed works when i do sed s/\'/foobar/ cards.json > cards_cleaned.json, so I'm assuming something is wrong with my escaping backslashes.
I tried using vim. I opened cards.json in vim $ vi cards.json. Then I tried a global string replace using :%s/'/\'/g. This did not change anything in the file.
While #anubhava's or #gboffi's answers works, they produces and INVALID JSON.
JSON allows only few characters after the backslash:
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
e.g. the part of the following original (correct) JSON
[
{
"description" : "Whenever a land enters the battlefield, Ankh of Mishra deals 2 damage to that land's controller.",
"rarity" : "Rare",
"name" : "Ankh of Mishra"
}
]
you want to get
[
{
"description" : "Whenever a land enters the battlefield, Ankh of Mishra deals 2 damage to that land\'s controller.",
"rarity" : "Rare",
"name" : "Ankh of Mishra"
}
]
#e.g. instead of the land's want land\'s
But this is an INVALID JSON.
So, if you (for some strange reason) want have the backslash, you need to use double \\, such:
[
{
"description" : "Whenever a land enters the battlefield, Ankh of Mishra deals 2 damage to that land\\'s controller.",
"rarity" : "Rare",
"name" : "Ankh of Mishra"
}
]
Solution (for both)
with perl
perl -pE "s/'/\\\'/g" < mtg_cards.json > cards.malformed.json
#changes "land's" to wrong "land\'s"
and
perl -pE "s/'/\\\\'/g" < mtg_cards.json > card_with_double_BS.json
#changes "land's" to "land\\s"
Ps: Because your file is only one long (30MB) line, the vim has some problems. You can pretty print (fold and indent) the JSON, before editing. Many tools here, i'm using the json_xs command from the JSON_XS perl package. After "prettyfying" you can use the vim safely.
You need to use double quotes in the shell to avoid quoting the single quote character, but the you have to be careful because the shell, for a double quoted string, use the backslash as a quoting character
$ echo "eoieriou'iouou'oiuiouiuo"|sed "s/'/\\'/g"
eoieriou'iouou'oiuiouiuo
and the command that sed is trying to execute is s/'/\'/g but sed quoting character is the backslash, so that you substitute each single quote with a single quote...
We have to quote the backslash also when it arrives to sed, so let's try
$ echo "eoieriou'iouou'oiuiouiuo"|sed "s/'/\\\\'/g" # Four (4) backslashes in a row
eoieriou\'iouou\'oiuiouiuo
$
That's OK, isn't it? because sed is instructed to do s/'/\\'/g so that the quoted character, from the POV of sed, is the backslash itself...
Please note that the quotes, single or double, are not special characters from the POV of sed, they're special only in the context of the shell.
In Vi you will need to escape the \ character.
Try using
:%s/'/\\'/g
For me it worked.
Test.txt
\'\'\' \'\'\'
You need to double escape the backelas, so use:
sed -i.bak "s/'/\\\\'/g" cards.json
You can use like this, in vim.
:%s/'/\\\'/g
In sed,
sed "s/'/\\\'/g" filename
Here is an awk version:
cat file
hi'more data here'
awk '{gsub(g,"\\"g)}1' g="'" file
hi\'more data here\'
Or if you need double backslash:
awk '{gsub(g,"\\\\"g)}1' g="'" file
hi\\'more data here\\'
sed "s/'/\\\\&/g" cards.json > cards_cleaned.json
no need of your first escaped in search pattern \'
you should surround by double quote (single if single quote was not the char to change) and escape the escape due to double quote used at shell level in this case

sed - remove quotes within quotes in large csv files

I am using stream editor sed to convert a large set of text files data (400MB) into a csv format.
I have come very close to finish, but the outstanding problem are quotes within quotes, on a data like this:
1,word1,"description for word1","another text",""text contains "double quotes" some more text"
2,word2,"description for word2","another text","text may not contain double quotes, but may contain commas ,"
3,word3,"description for "word3"","another text","more text and more"
The desired output is:
1,word1,"description for word1","another text","text contains double quotes some more text"
2,word2,"description for word2","another text","text may not contain double quotes, but may contain commas ,"
3,word3,"description for word3","another text","more text and more"
I have searched around for help, but I am not getting too close to solution, I have tried the following seds with regex patterns:
sed -i 's/(?<!^\s*|,)""(?!,""|\s*$)//g' *.txt
sed -i 's/(?<=[^,])"(?=[^,])//g' *.txt
These are from the below questions, but do not seem to be working for sed:
Related question for perl
Related question for SISS
The original files are *.txt and I am trying to edit them in place with sed.
Here's one way using GNU awk and the FPAT variable:
gawk 'BEGIN { FPAT="([^,]+)|(\"[^\"]+\")"; OFS=","; N="\"" } { for (i=1;i<=NF;i++) if ($i ~ /^\".*\"$/) { gsub(/\"/,"", $i); $i=N $i N } }1' file
Results:
1,word1,"description for word1","another text","text contains double
quotes some more text" 2,word2,"description for word2","another
text","text may not contain double quotes, but may contain commas ,"
3,word3,"description for word3","another text","more text and more"
Explanation:
Using FPAT, a field is defined as either "anything that is not a
comma," or "a double quote, anything that is not a double quote, and a
closing double quote". Then on every line of input, loop through each
field and if the field starts and ends with a double quote, remove all
quotes from the field. Finally, add double quotes surrounding the
field.
sed -e ':r s:["]\([^",]*\)["]\([^",]*\)["]\([^",]*\)["]:"\1\2\3":; tr' FILE
This looks over the strings of the type "STR1 "STR2" STR3 " and converts them to "STR1 STR2 STR3". If it found something, it repeats, to be sure that it eliminates all nested strings at a depth > 2.
It also assures that none of STRx contains comma.