sed with multi line selecting with regex

sed with multi line selecting with regex - regex

I need grab a file that was converted (firewall migration) and it adds a huge comments section. I want to replace everything between the ("").
So, in this example, I want to have the output say
set comments " "
Here is what I have tried:
sed 's/.*set comments* ".*"/set comment "" /' %filename% >> %outputfilename%
The problem is that some of the "comments" have multi-line and with my command, it does not take that into account. So the ones without the /r or /n in them work fine.
Actual File
set action accept
set comments "access-list inside_access_in extended permit udp host 10.2.55.131 host 192.168.0.65 eq snmp
This policy is disabled as not allowed by NAT-Control."
next

With GNU sed for -z and using -E to enable EREs:
$ sed -Ez 's/(set comment)s? "[^"]*"/\1 " "/g' file
set action accept
set comment " "
next
The above will fail if your comments can include double quotes, escaped or not. If that can happen then you should include it in your sample input.

This might work for you (GNU sed):
sed '/set comments "/!b;:a;/"[^\\"]*\(\\.[^\\"]*\)*"/bb;N;ba;:b;s//" "/' file
This ignores lines other than those that contain the string set comments ". It then checks to see if the line contains a closing unquoted double quote and if not accumulates lines until the condition is met. Finally it removes all characters between the starting/ending double quotes and replaces them with a single space.
P.S. I suspect that the OP did not mean to replace comments with comment however it a trivial change to the second regexp and the RHS of the substitution command if it is intended.

Are you required to use sed?
perl -0777pe 's/set comments "[^"]*"/set comments " "/gm' input.txt
produces
set action accept
set comments " "
next
from your sample input.
(If your comment string can include escaped quotes itself, it gets a lot harder.)

Related

Remove double quotes from the substring preceding the first dot

I previously asked the question in which I required help with removing double quotes from a string after a . (dot). I kindly received an answer however I am unsure as to how it works exactly.
I am now attempting to remove double quotes from around a string before the . (dot). I have attempted through trial and error to edit the original command, however I haven't had much luck, the closest I have come so far I have left below.
Could someone please explain how and why the first command works and if possible aid me editing my attempt to allow it to remove the double quotes from around the string on the left of the . (dot).
Original Command - removes " " from the right of the dot:
sed 's/\."\([^"]*\)"/.\1/g' file
Sample Before:
"A".HELLO
A."HELLO"
"A"."HELLO"
Required Result:
A.HELLO
A."HELLO"
A."HELLO"
Attempt:
sed -i 's/"*"\.\([^"]*\)"/.\1/g' $(2)
After:
"A".HELLO
A."HELLO"
"A.HELLO"
Link to original post: UNIX Bash - Removing double quotes from specific strings within a file
Credit to user potong for original answer.

Could you please try following(in case you are ok with awk), written and tested with shown samples in GNU awk.
awk 'BEGIN{FS=OFS="."} {gsub(/"/,"",$1)} 1' Input_file
Explanation: Making field separator and output field separator as . in BEGIN section. Then in main program globally substituting " with NULL in first field specifically, since we have made . as field separator and OP wants to remove " double quotes before . only hence taking 1st field will do the trick here. 1 will print current line of Input_file.

Using a primitive loop:
$ sed -e ':L' -e 's/\([^.]*\)"\([^.]*\.\)/\1\2/' -e 'tL' file
A.HELLO
A."HELLO"
A."HELLO"
This works with all POSIX-compliant seds, not only GNU sed.

Insert newline before/after match for TSV

I'm going grey trying to figure out how to accomplish some regex matching to insert new lines. Example input/output below...
Example TSV Data:
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001 some:other:tag:with-colons-and-hypens=MACHINE NAME Name=NAMETAG backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01 backup=true Name=SOMENAME"
Desired Output:
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001
some:other:tag:with-colons-and-hyphens=MACHINE NAME
Name=NAMETAG
backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01
backup=true
Name=SOMENAME"
I can guarantee each key=value within those quotes are separated by hard/literal tabs, although it may not appear that way with how the StackOverflow code block is displayed in HTML they did carry over into the code block editor, the data under the column Tags is in quotes so that even though they are tab separated they stay within the Tags column. For whatever reason I'm not able to successfully get the desired results.
In my measly attempts, I've been basically capturing everything between the "" as if tabs aren't separated in my regex searches because of my use of wildcards [TAB].*=.*[TAB] is obviously not working because then I'm losing everything in between the first/last occurrence for each line. I've attempted storing them in capture groups without any success.
I'm looking for a unix toolset solution (sed, awk, perl and the like). Any/All help is appreciated!

This will work using any awk in any shell on any UNIX box:
$ awk 'match($0,/".*"/){str=substr($0,RSTART,RLENGTH); gsub(/\t/,"\n",str); $0=substr($0,1,RSTART-1) str substr($0,RSTART+RLENGTH)} 1' file
Name Monitoring Tags
i-RBwPyvq8wPbUhn495 enabled "some:tags:with:colons=some:value:with:colons-and-dashes/and/slashes/yay606-values-001
some:other:tag:with-colons-and-hypens=MACHINE NAME
Name=NAMETAG
backup=true"
i-sMEwh2MXj3q47yWWP enabled "description=RANDOM BUSINESS INT01
backup=true
Name=SOMENAME"
It just extracts a string between "s from the current record, replaces all tabs with newlines within that string, then puts the record back together before it's printed.

You can try this sed (GNU sed) 4.4
sed -E ':A;s/(".*)\t(.*")/\1\n\2/;tA' TSV_Data_File
With OSX sed, you can try this one.
I think the \t is ok.
sed -E '
:A
s/(".*)\t(.*")/\1\
\2/
tA
' TSV_Data_File
brief explain :
Catch the text inside "
Substitute the last \t by \n
If a substitution occur jump to A else continue
With awk :
awk -v RS='"' 'NR%2==0{gsub("\t","\n")}1' ORS='"' TSV_Data_File

This is basically ctac_'s awk answer converted to perl:
perl -pe'1 while s/(".*)\t(.*")/$1\n$2/s' file.tsv
Where the \t might be replaced by \t\s* if you want just one newline out of each tab-and-then-some.

This might work for you (GNU sed):
sed 's/\S\+=\S\+/\n&/2g' file
Insert a newline in before the second or more non-empty strings containing an =.

BASH escaping double quotes within single quotes

I'm trying to write a bash function that would escape all double quotes within single quotes, eg:
'I need to escape "these" quotes with backslashes'
would become
'I need to escape \"these\" quotes with backslashes'
My take on it was:
Find pairs of single quotes in the input and extract them with grep
Pipe into sed, escape double quotes
Sed again the whole input and replace grep match with sedded match
I managed to get it working to the part of having correctly escaped quotes section, but replacing it in the whole input fails.
The script code copypaste:
# $1 - Full name, $2 - minified name
adjust_quotes ()
{
SINGLE_QUOTES=`grep -Eo "'.*'" $2`
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
sed -r "s|'.*'|$ESCAPED_QUOTES|g" "$2" > "$2.escaped"
mv "$2.escaped" $2
echo "Quotes escaped within single quotes on $2"
}
Random additional questions:
In the console, escaping the quote with only two backslashes works, but when code is put in the script - I need four. I'd love to know
Could I modify this code into a loop to escape all pairs of single quotes, one after another until EOF?
Thanks!
P.S. I know this would probably be easier to do in eg. python, but I really need to keep it in bash.

Using BASH string replacement:
s='I need to escape "these" quotes with backslashes'
r="${s//\"/\\\"}"
echo "$r"
I need to escape \"these\" quotes with backslashes

Here's a pure bash solution, which does the transformation on stdin, printing to stdout. It reads the entire input into memory, so it won't work with really enormous files.
escape_enclosed_quotes() (
IFS=\'
read -d '' -r -a fields
for ((i=1; i<${#fields[#]}; i+=2)); do
fields[i]=${fields[i]//\"/\\\"}
done
printf %s "${fields[*]}"
)
I deliberately enclosed the body of the function in parentheses rather than braces, in order to force the body to run in a subshell. That limits the modification of IFS to the body, as well as implicitly making the variables used local.
The function uses the read builtin to read the entire input (since the line delimiter is set to NUL with -d '') into an array (-a) using a single quote as the field separator (IFS=\'). The result is that the parts of the input surrounded with single quotes are in the odd positions of the array, so the function loops over the odd indices to do the substitution only for those fields. I use bash's find-and-replace syntax instead of deferring to an external utility like sed.
This being bash, there are a couple of gotchas:
If the file contains a NUL, the rest of the file will be ignored.
If the last line of the file does not end with a newline, and the last character of that line is a single quote, it will not be output.
Both of the above conditions are impossible in a portable text file, so it's probably OK. All the same, worth taking note.
The supplementary question: why are the extra backslashes needed in
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
Answer: It has nothing to do with that line being in a script. It has to do with your use of backticks (...) for command substitution, and the idiosyncratic and often unpredictable handling of backslashes inside backticks. This syntax is deprecated. Do not use it. (Not even if you see someone else using it in some random example on the internet.) If you had used the recommended $(...) syntax for command substitution, it would have worked as expected:
ESCAPED_QUOTES=$(echo $SINGLE_QUOTES | sed 's|"|\\"|g')
(More information is in the Bash FAQ linked above.)

Sed with both " and ' in insert string

I am using sed command in Ubuntu for making shell script.
I have a problem because the string I am inserting has both single and double quotes. Dashes also. This is the expample:
sed -i "16i$('#myTable td:contains("Trunk do SW-BG-26,
GigabitEthernet0/22")').parents("tr").remove();" proba.txt
It should insert
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
in line 16 of the file proba.txt
but instead it inserts
$('#myTable td:contains(
because it exits prematurely . How can resolve this, I cannot find solution here on site bcause I have both quotation signs and there are explanations only for one kind.
2nd try
I set \ in front every double quote except the outermost ones but I still didn't get what I want. Result is:
.parents("tr").remove();
Then I put \ in front of every ' too but the result was an error in script. This is the 4th row:
sed -i "16i$(\'#myTable td:contains(\"QinQ tunnel - SCnet wireless\")\').parents(\"tr\").remove();" proba.txt
This is the error:
4: skripta.sh: Syntax error: "(" unexpected (expecting ")")
Maybe there is easier way to insert line into the file at the exact line if that line has ", ', /?
3rd time is a charm
Inserting many lines last day I came across another problem using sed. I want to insert this text:
$(document).ready( function() {
with command:
sed -i "16i$(document).ready( function() {" proba.txt
and I get as result this text inserted as document is something special or because of the $:
.ready( function() {
Any thoughts about that?

There are two ways around this. The easy way out is to put the script into a file and use that on the command line. For example, sed.script contains:
16i\
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
and you run:
sed -f sed.script ...
If you want to do it without the file, then you have to decide whether to use single quotes or double quotes around your sed -e expression. Using single quotes is usually easier; there are no other special characters to worry about. Each embedded single quote is replaced by '\'':
sed -e '16i\
$('\''#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")'\'').parents("tr").remove();' ...
If you want to use double quotes, then each embedded double quote needs to be replaced by \", but you also have to escape embedded back quotes `, dollar signs $ and backslashes \:
sed -e "16i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" ...
(To the point: I forgot to escape the $ before I checked the script with double quotes; I got the script with single quotes right first time.)
Because of all the extra checking, I almost invariably use single quotes, unless I need to get shell variables substituted into the script.

sed -i "6 i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" proba.txt
escape the double quote, the slash and new line needed after the i instruction and the $ due to double quote shell interpretation

Substitute words not in double quotes

$cat file0
"basic/strong/bold"
" /""?basic""/strong/bold"
"^/))basic"
basic
I want unix sed command such that only basic that is not in quotes should be changed.[change basic to ring]
Expected output:
$cat file0
"basic/strong/bold"
" /""?basic""/strong/bold"
"^/))basic"
ring

If we disallow escaping quotes, then any basic that is not within " is preceded by an even number of ". So this should do the trick:
sed -r 's/^([^"]*("[^"]*){2}*)basic/\1ring/' file
And as ДМИТРИЙ МАЛИКОВ mentioned, adding the --in-place option will immediately edit the file, instead of returning the new contents.
How does this work?
We anchor the regular expression to the beginning of each line with ". Then we allow an arbitrary number of non-" characters (with [^"]*). Then we start a new subpattern "[^"]* that consists of one " and arbitrarily many non-" characters. We repeat that an even number of times (with {2}*). And then we match basic. Because we matched all of that stuff in the line before basic we would replace that as well. That's why this part is wrapped in another pair of parentheses, thus capturing the line and writing it back in the replacement with \1 followed by ring.
One caveat: if you have multiple basic occurrences in one line, this will only replace the last one that is not enclosed in double quotes, because regex matches cannot overlap. A solution would be a lookbehind, but since this would be a variable-length lookbehind, which is only supported by the .NET regex engine. So if that is the case in your actual input, run the command multiple times until all occurrences are replaced.

$> sed -r 's/^([^\"]*)(basic)([^\"]*)$/\1ring\3/' file0
"basic/strong/bold"
" /""?basic""/strong/bold"
"^/))basic"
ring
If you wanna edit file in place use --in-place option.

This might work for you (GNU sed):
sed -r 's/^/\n/;ta;:a;s/\n$//;t;s/\n("[^"]*")/\1\n/;ta;s/\nbasic/ring\n/;ta;s/\n([^"]*)/\1\n/;ta' file

Not a sed solution, but it substitutes words not in quotes
Assuming that there is no escaped quotes in strings, i.e. "This is a trap \" hehe", awk might be able to solve this problem
awk -F\" 'BEGIN {OFS=FS}
{
for(i=1; i<=NF; i++){
if(i%2)
gsub(/basic/,"ring",$i)
}
print
}' inputFile
Basically the words that are not in quotes are in odd-numbered fields, and the word "basic" is replaced by "ring" in these fields.
This can be written as a one-liner, but for clarity's sake I've written it in multiple lines.

If basic is at the beginning of line:
sed -e 's/^basic/ring/' file0

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed with multi line selecting with regex - regex

With GNU sed for -z and using -E to enable EREs: $ sed -Ez 's/(set comment)s? "[^"]*"/\1 " "/g' file set action accept set comment " " next The above will fail if your comments can include double quotes, escaped or not. If that can happen then you should include it in your sample input.

Are you required to use sed? perl -0777pe 's/set comments "[^"]*"/set comments " "/gm' input.txt produces set action accept set comments " " next from your sample input. (If your comment string can include escaped quotes itself, it gets a lot harder.)

Related

Remove double quotes from the substring preceding the first dot

Insert newline before/after match for TSV

BASH escaping double quotes within single quotes

Sed with both " and ' in insert string

Substitute words not in double quotes

Categories

Resources