Remove double quotes from the substring preceding the first dot

Remove double quotes from the substring preceding the first dot - regex

I previously asked the question in which I required help with removing double quotes from a string after a . (dot). I kindly received an answer however I am unsure as to how it works exactly.
I am now attempting to remove double quotes from around a string before the . (dot). I have attempted through trial and error to edit the original command, however I haven't had much luck, the closest I have come so far I have left below.
Could someone please explain how and why the first command works and if possible aid me editing my attempt to allow it to remove the double quotes from around the string on the left of the . (dot).
Original Command - removes " " from the right of the dot:
sed 's/\."\([^"]*\)"/.\1/g' file
Sample Before:
"A".HELLO
A."HELLO"
"A"."HELLO"
Required Result:
A.HELLO
A."HELLO"
A."HELLO"
Attempt:
sed -i 's/"*"\.\([^"]*\)"/.\1/g' $(2)
After:
"A".HELLO
A."HELLO"
"A.HELLO"
Link to original post: UNIX Bash - Removing double quotes from specific strings within a file
Credit to user potong for original answer.

Could you please try following(in case you are ok with awk), written and tested with shown samples in GNU awk.
awk 'BEGIN{FS=OFS="."} {gsub(/"/,"",$1)} 1' Input_file
Explanation: Making field separator and output field separator as . in BEGIN section. Then in main program globally substituting " with NULL in first field specifically, since we have made . as field separator and OP wants to remove " double quotes before . only hence taking 1st field will do the trick here. 1 will print current line of Input_file.

Using a primitive loop:
$ sed -e ':L' -e 's/\([^.]*\)"\([^.]*\.\)/\1\2/' -e 'tL' file
A.HELLO
A."HELLO"
A."HELLO"
This works with all POSIX-compliant seds, not only GNU sed.

Related

sed with multi line selecting with regex

I need grab a file that was converted (firewall migration) and it adds a huge comments section. I want to replace everything between the ("").
So, in this example, I want to have the output say
set comments " "
Here is what I have tried:
sed 's/.*set comments* ".*"/set comment "" /' %filename% >> %outputfilename%
The problem is that some of the "comments" have multi-line and with my command, it does not take that into account. So the ones without the /r or /n in them work fine.
Actual File
set action accept
set comments "access-list inside_access_in extended permit udp host 10.2.55.131 host 192.168.0.65 eq snmp
This policy is disabled as not allowed by NAT-Control."
next

With GNU sed for -z and using -E to enable EREs:
$ sed -Ez 's/(set comment)s? "[^"]*"/\1 " "/g' file
set action accept
set comment " "
next
The above will fail if your comments can include double quotes, escaped or not. If that can happen then you should include it in your sample input.

This might work for you (GNU sed):
sed '/set comments "/!b;:a;/"[^\\"]*\(\\.[^\\"]*\)*"/bb;N;ba;:b;s//" "/' file
This ignores lines other than those that contain the string set comments ". It then checks to see if the line contains a closing unquoted double quote and if not accumulates lines until the condition is met. Finally it removes all characters between the starting/ending double quotes and replaces them with a single space.
P.S. I suspect that the OP did not mean to replace comments with comment however it a trivial change to the second regexp and the RHS of the substitution command if it is intended.

Are you required to use sed?
perl -0777pe 's/set comments "[^"]*"/set comments " "/gm' input.txt
produces
set action accept
set comments " "
next
from your sample input.
(If your comment string can include escaped quotes itself, it gets a lot harder.)

sed backreferences returning their numerical index rather than their value

Weird problem here that I don't seem to see repeated anywhere else, so posting here. Thanks in advance.
I have the following multiline sed code that is printing further sed and copy commands into a script (yep, using a script to insert code into a script). The code looks like this:
sed -i -r '/(rpub )([$][a-zA-Z0-9])/i\
sed -i '\''/#PBS -N/d'\'' \1\
cp \1 '"$filevariable"'' $masterscript
which is supposed to do the following:
1.) Open the master script
2.) Navigate to each instance of rpub $[a-zA-Z0-9] in the script
3.) Insert the second line (sed) and third line (cp) as lines before the rpub instance, using \1 as a backreference of the matched $[a-zA-Z0-9] from step 1.
This works great; all lines print well enough in relation to each other. However, all of my \1 references are appearing explicitly, minus their backslashes. So all of my \1's are appearing as 1.
I know my pattern match specifications are working correctly, as they nail all instances of rpub $[a-zA-Z0-9] well enough, but I guess I'm just not understanding the use of backreferences. Anyone see what is going on here?
Thanks.
EDIT 1
Special thanks to Ed Morton below, implemented the following, which gets me 99% closer, but I still can't close the gap with unexpected behavior:
awk -v fv="$filevariable" '
match($0, /rpub( [$][[:alnum:]])/, a)
{
print "sed -i '\''/#PBS -N/d'\''", a[1]
}
1' "$masterscript" > tmpfile && mv tmpfile "$masterscript"
Note: I removed one of the multiline print statements, as it isn't important here. But, as I said, though this gets me much closer I am still having an issue where the printed lines appear between every line in the masterscript; it is as if the matching function is considering every line to be a match. This is my fault, as I should probably have specified that I'd like the following to occur:
stuff here
stuff here
rpub $name
stuff here
rpub $othername
stuff here
would become:
stuff here
stuff here
inserted line $name
rpub $name
stuff here
insertedline $othername
rpub $othername
Any help would be greatly appreciated. Thanks!

It LOOKS like what you're trying to do could be written simply in awk as:
awk -i inplace -v fv="$filevariable" '
match($0,/rpub ([$][[:alnum:]])/,a) {
print "sed -i \"/#PBS -N/d\", a[1]
print "cp", a[1], fv
}
1' "$masterscript"
but without sample input and expected output it's just a guess.
The above uses GNU awk for inplace editing and the 3rd arg for match().

If you want a backreference to work the regular expression for it should be enclosed in parentheses, your second line is a second invocation of sed, nothing is saved from the first line.

Sed with both " and ' in insert string

I am using sed command in Ubuntu for making shell script.
I have a problem because the string I am inserting has both single and double quotes. Dashes also. This is the expample:
sed -i "16i$('#myTable td:contains("Trunk do SW-BG-26,
GigabitEthernet0/22")').parents("tr").remove();" proba.txt
It should insert
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
in line 16 of the file proba.txt
but instead it inserts
$('#myTable td:contains(
because it exits prematurely . How can resolve this, I cannot find solution here on site bcause I have both quotation signs and there are explanations only for one kind.
2nd try
I set \ in front every double quote except the outermost ones but I still didn't get what I want. Result is:
.parents("tr").remove();
Then I put \ in front of every ' too but the result was an error in script. This is the 4th row:
sed -i "16i$(\'#myTable td:contains(\"QinQ tunnel - SCnet wireless\")\').parents(\"tr\").remove();" proba.txt
This is the error:
4: skripta.sh: Syntax error: "(" unexpected (expecting ")")
Maybe there is easier way to insert line into the file at the exact line if that line has ", ', /?
3rd time is a charm
Inserting many lines last day I came across another problem using sed. I want to insert this text:
$(document).ready( function() {
with command:
sed -i "16i$(document).ready( function() {" proba.txt
and I get as result this text inserted as document is something special or because of the $:
.ready( function() {
Any thoughts about that?

There are two ways around this. The easy way out is to put the script into a file and use that on the command line. For example, sed.script contains:
16i\
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
and you run:
sed -f sed.script ...
If you want to do it without the file, then you have to decide whether to use single quotes or double quotes around your sed -e expression. Using single quotes is usually easier; there are no other special characters to worry about. Each embedded single quote is replaced by '\'':
sed -e '16i\
$('\''#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")'\'').parents("tr").remove();' ...
If you want to use double quotes, then each embedded double quote needs to be replaced by \", but you also have to escape embedded back quotes `, dollar signs $ and backslashes \:
sed -e "16i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" ...
(To the point: I forgot to escape the $ before I checked the script with double quotes; I got the script with single quotes right first time.)
Because of all the extra checking, I almost invariably use single quotes, unless I need to get shell variables substituted into the script.

sed -i "6 i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" proba.txt
escape the double quote, the slash and new line needed after the i instruction and the $ due to double quote shell interpretation

Regular Expression to parse Common Name from Distinguished Name

I am attempting to parse (with sed) just First Last from the following DN(s) returned by the DSCL command in OSX terminal bash environment...
CN=First Last,OU=PCS,OU=guests,DC=domain,DC=edu
I have tried multiple regexs from this site and others with questions very close to what I wanted... mainly this question... I have tried following the advice to the best of my ability (I don't necessarily consider myself a newbie...but definitely a newbie to regex..)
DSCL returns a list of DNs, and I would like to only have First Last printed to a text file. I have attempted using sed, but I can't seem to get the correct function. I am open to other commands to parse the output. Every line begins with CN= and then there is a comma between Last and OU=.
Thank you very much for your help!

I think all of the regular expression answers provided so far are buggy, insofar as they do not properly handle quoted ',' characters in the common name. For example, consider a distinguishedName like:
CN=Doe\, John,CN=Users,DC=example,DC=local
Better to use a real library able to parse the components of a distinguishedName. If you're looking for something quick on the command line, try piping your DN to a command like this:
echo "CN=Doe\, John,CN=Users,DC=activedir,DC=local" | python -c 'import ldap; import sys; print ldap.dn.explode_dn(sys.stdin.read().strip(), notypes=1)[0]'
(depends on having the python-ldap library installed). You could cook up something similar with PHP's built-in ldap_explode_dn() function.

Two cut commands is probably the simplest (although not necessarily the best):
DSCL | cut -d, -f1 | cut -d= -f2
First, split the output from DSCL on commas and print the first field ("CN=First Last"); then split that on equal signs and print the second field.

Using sed:
sed 's/^CN=\([^,]*\).*/\1/' input_file
^ matches start of line
CN= literal string match
\([^,]*\) everything until a comma
.* rest

http://www.gnu.org/software/gawk/manual/gawk.html#Field-Separators
awk -v RS=',' -v FS='=' '$1=="CN"{print $2}' foo.txt

I like awk too, so I print the substring from the fourth char:
DSCL | awk '{FS=","}; {print substr($1,4)}' > filterednames.txt

This regex will parse a distinguished name, giving name and val a capture groups for each match.
When DN strings contain commas, they are meant to be quoted - this regex correctly handles both quoted and unquotes strings, and also handles escaped quotes in quoted strings:
(?:^|,\s?)(?:(?<name>[A-Z]+)=(?<val>"(?:[^"]|"")+"|[^,]+))+
Here is is nicely formatted:
(?:^|,\s?)
(?:
(?<name>[A-Z]+)=
(?<val>"(?:[^"]|"")+"|[^,]+)
)+
Here's a link so you can see it in action:
https://regex101.com/r/zfZX3f/2
If you want a regex to get only the CN, then this adapted version will do it:
(?:^|,\s?)(?:CN=(?<val>"(?:[^"]|"")+"|[^,]+))

replacing doublequotes in csv

I've got nearly the following problem and didn't find the solution. This could be my CSV file structure:
1223;"B630521 ("L" fixed bracket)";"2" width";"length: 5"";2;alternate A
1224;"B630522 ("L" fixed bracket)";"3" width";"length: 6"";2;alternate B
As you can see there are some " written for inch and "L" in the enclosing ".
Now I'm looking for a UNIX shell script to replace the " (inch) and "L" double quotes with 2 single quotes, like the following example:
sed "s/$OLD/$NEW/g" $QFILE > $TFILE && mv $TFILE $QFILE
Can anyone help me?

Update (Using perl it easy since you get full lookahead features)
perl -pe 's/(?<!^)(?<!;)"(?!(;|$))/'"'"'/g' file
Output
1223;"B630521 ('L' fixed bracket)";"2' width";"length: 5'";2;alternate A
1224;"B630522 ('L' fixed bracket)";"3' width";"length: 6'";2;alternate B
Using sed, grep only
Just by using grep, sed (and not perl, php, python etc) a not so elegant solution can be:
grep -o '[^;]*' file | sed 's/"/`/; s/"$/`/; s/"/'"'"'/g; s/`/"/g'
Output - for your input file it gives:
1223
"B630521 ('L' fixed bracket)"
"2' width"
"length: 5'"
2
alternate A
1224
"B630522 ('L' fixed bracket)"
"3' width"
"length: 6'"
2
alternate B
grep -o is basically splitting the input by ;
sed first replaces " at start of line by `
then it replaces " at end of line by another `
it then replaces all remaining double quotes " by single quite '
finally it puts back all " at the start and end

Maybe this is what you want:
sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g"
I.e.: Find double quotes (") following a number ([0-9]) but not followed by a semicolon ([^;]) and replace it with two single quotes.
Edit:
I can extend my command (it's becoming quite long now):
sed "s/\([0-9]\)\"\([^;]\)/\1''\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g;s/\([^;]\)\"\([^;]\)/\1\'\2/g"
As you are using SunOS I guess you cannot use extended regular expressions (sed -r)? Therefore I did it that way: The first s command replaces all inch " with '', the second and the third s are the same. They substitute all " that are not a direct neighbor of a ; with a single '. I have to do it twice to be able to substitute the second " of e.g. "L" because there's only one character between both " and this character is already matched by \([^;]\). This way you would also substitute "" with ''. If you have """ or """" etc. you have to put one more (but only one more) s.

For the "L" try this:
sed "s/\"L\"/'L'/g"
For inches you can try:
sed "s/\([0-9]\)\"\"/\1''\"/g"
I am not sure it is the best option, but I have tried and it works. I hope this is helpful.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove double quotes from the substring preceding the first dot - regex

Using a primitive loop: $ sed -e ':L' -e 's/\([^.]\)"\([^.]\.\)/\1\2/' -e 'tL' file A.HELLO A."HELLO" A."HELLO" This works with all POSIX-compliant seds, not only GNU sed.

Related

sed with multi line selecting with regex

sed backreferences returning their numerical index rather than their value

Sed with both " and ' in insert string

Regular Expression to parse Common Name from Distinguished Name

replacing doublequotes in csv

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove double quotes from the substring preceding the first dot - regex

Using a primitive loop: $ sed -e ':L' -e 's/\([^.]*\)"\([^.]*\.\)/\1\2/' -e 'tL' file A.HELLO A."HELLO" A."HELLO" This works with all POSIX-compliant seds, not only GNU sed.

Related

sed with multi line selecting with regex

sed backreferences returning their numerical index rather than their value

Sed with both " and ' in insert string

Regular Expression to parse Common Name from Distinguished Name

replacing doublequotes in csv

Categories

Resources

Using a primitive loop: $ sed -e ':L' -e 's/\([^.]\)"\([^.]\.\)/\1\2/' -e 'tL' file A.HELLO A."HELLO" A."HELLO" This works with all POSIX-compliant seds, not only GNU sed.