sed match dollar and single quote characters

sed match dollar and single quote characters - regex

I have the following string in my file:
"sequence A$_{0}$B$_{}$C$_{'0}$"
I want to move any single quotes that appear after a $_{ to go before it, i.e.
"sequence A$_{0}$B$_{}$C'$_{0}$"
This is my sed command (using # as a delimiter) for just the part with the quote:
$ echo "$_{'0}$" | sed "s#$_{'#'\$_{#g"
'$_{0}$
So this works. However my text contains strings that shouldn't be matched, e.g.
$ echo "$_{0}$" | sed "s#$_{'#'\$_{#g"
/home/ppatest/texlive/2010/texmf{0}$`
I understand that $_ gives the last argument of previous command. I checked:
$ echo $_
/home/ppatest/texlive/2010/texmf
But I don't understand why $_{' matches "$_{0}$"
Furthermore, I found that to prevent the Unix shell from interpreting the dollar sign as a shell variable, the script should be put in single quotes. But I can't do that as I am also matching on single quotes.

Your current approach uses double quotes in sed to be able to handle the single quotes. However, as you can see, this produces the expansion of $, so that you can end up having broader problems.
What I recommend is to use a sed expression with single quotes. To match and replace single quotes, you need to close the leading ', the enclose the ' within " and then open the expression again:
$ echo "he'llo" | sed 's#'"'"'#X#'
heXllo
In your case:
sed 's#$_{'"'"'#'"'"'$_{#g' file
This way, you keep using single quotes and prevent the expansion of $.
Test
$ cat a
hello $_{'0}$ bye
$_{'0}$
yeah
$ sed 's#$_{'"'"'#'"'"'$_{#g' a
hello '$_{0}$ bye
'$_{0}$
yeah

echo "\$_{'0}\$" | sed "s#\(\$_{\)'#'\1#g"
escape the $ when using double quote
use group avoiding several confusing \$ when possible
use double quote when simple quote are part of the pattern

Related

add a command in the beginning of quotation mark by using bash

I would like to use sed to do the following steps:
before: test="testabc"
after: test="quiz testabc"
How can I add quiz followed by a space in the beginning of the quotation mark?
Thank you for your help.

You can just use the following sed command
$ echo 'test="testabc"' | sed 's/="\([^"]*\)"/="quiz \1"/'
test="quiz testabc"
Explanations:
s/pattern/replacement/ use sed in search/replace mode.
="\([^"]*\)" this regex will fetch ="some string".
then you can use a capturing group ( ) and back reference \1 to it in order to keep the string content and add your quiz
s/pattern/replacement/g use the global replacement mode if you need to search and replace more than one occurrence of this pattern
or the following perl solution works as well:
$ echo 'test="testabc"' | perl -pe 's/(?<==")([^"]*)(?=")/quiz \1/'
test="quiz testabc"
For regex details: http://www.rexegg.com/regex-quickstart.html
Improvements:
sed 's/test="\([^"]*\)"/test="quiz \1"/' add the variable name to be sure to change only that variable.
sed 's/="/&quiz /g' or if you don't care about the variable names and want to change every assignation.

Regex for string with double quotes and environment variable

Using sed, I need to replace a string that contains double quotes with an environment variable:
BUCKET_FOLDER=\"dev\"
(or any derivative of 'dev') needs to convert to:
BUCKET_FOLDER=bucket1/$ID
where $ID = abcde, ie
BUCKET_FOLDER=bucket1/abcde
To expand the $ID environment variable, I need to put double quotes around the sed substitution expression:
sed -e "s/BUCKET_FOLDER=\\"(.*?)\\"/BUCKET_FOLDER=bucket1\/$ID/g" $string
but this is then preventing a match on the double quotes in the source string.
Would appreciate any advice. I can make it work with 2 steps, but would prefer 1.

ID=abcde
echo 'A=\"x\" BUCKET_FOLDER=\"dev\" B=\"y\"' |sed -r "s|(.*)(BUCKET_FOLDER=)([^ ]+)(.*)|\1\2bucket1/$ID\4|g"
A=\"x\" BUCKET_FOLDER=bucket1/abcde B=\"y\"
Using | as seprator in sed. As you mentioned, used double quotes to expand $ID and captured BUCKET_FOLDER= as first group.

I believe you escaped the quote correctly in the sed command. On the other hand, the way you specified the rest of the regex isn't how it's supposed to be. Here is my take on the story:
echo BUCKET_FOLDER='"'dev'"' | \
sed -e "s/BUCKET_FOLDER=\"\(.*\?\)\"/BUCKET_FOLDER=bucket1\/$ID/g"
Explanation:
I made the assumption that the \" parts in your $string variable are just escape sequences. Thus I used the echo BUCKET_FOLDER='"'dev'"' command. It's output is BUCKET_FOLDER="dev".
The non-greedy qualifier looks like this: .*\?. I.e. you need to escape the question mark.
You need to escape the parentheses too: \(...\). This should work without the group too, because you don't use backreferences like \1.
Alternatives:
If you want to eliminate the capturing group, then the sed expression becomes this:
sed -e "s/BUCKET_FOLDER=\".*\?\"/BUCKET_FOLDER=bucket1\/$ID/g"
If the backslash is really part of your string, you can match them with the character class [\] :
echo BUCKET_FOLDER=\\'"'dev\\'"' | \
sed -e "s/BUCKET_FOLDER=[\]\".*\?[\]\"/BUCKET_FOLDER=bucket1\/$ID/g

Extracting Substring from String with Multiple Special Characters Using Sed

I have a text file with a line that reads:
<div id="page_footer"><div><? print('Any phrase's characters can go here!'); ?></div></div>
And I'm wanting to use sed or awk to extract the substring above between the single quotes so it just prints ...
Any phrase's characters can go here!
I want the phrase to be delimited as I have above, starting after the single quote and ending at the single-quote immediately followed by a parenthesis and then semicolon. The following sed command with a capture group doesn't seem to be working for me. Suggestions?
sed '/^<div id="page_footer"><div><? print(\'\(.\+\)\');/ s//\1/p' /home/foobar/testfile.txt

Incorrect would be using cut like
grep "page_footer" /home/foobar/testfile.txt | cut -d "'" -f2
It will go wrong with single quotes inside the string. Counting the number of single quotes first will change this from a simple to an over-complicated solution.
A solution with sed is better: remove everything until the first single quote and everything after the last one. A single quote in the string becomes messy when you first close the sed parameter with a single quote, escape the single quote and open a sed string again:
grep page_footer /home/foobar/testfile.txt | sed -e 's/[^'\'']*//' -e 's/[^'\'']*$//'
And this is not the full solution, you want to remove the first/last quotes as well:
grep page_footer /home/foobar/testfile.txt | sed -e 's/[^'\'']*'\''//' -e 's/'\''[^'\'']*$//'
Writing the sed parameters in double-quoted strings and using the . wildcard for matching the single quote will make the line shorter:
grep page_footer /home/foobar/testfile.txt | sed -e "s/^[^\']*.//" -e "s/.[^\']*$//"

Using advanced grep (such as in Linux), this might be what you are looking for
grep -Po "(?<=').*?(?='\);)"

Perl regex single quote

Can someome help me to run the command below. I also tried to escape the single quotes but no luck.
perl -pi.bak -e 's/Object\.prototype\.myString='q'//' myfile.html

The problem is not with Perl, but with your shell. To see what's happening, try this:
$ echo 's/Object\.prototype\.myString='q'//'
s/Object\.prototype\.myString=q//
To make it work, you can replace each single quote with '\'', like this:
$ echo 's/Object\.prototype\.myString='\''q'\''//'
s/Object\.prototype\.myString='q'//
or you can save a few characters by writing just:
$ echo 's/Object\.prototype\.myString='\'q\''//'
s/Object\.prototype\.myString='q'//
or even just:
$ echo 's/Object\.prototype\.myString='\'q\'//
s/Object\.prototype\.myString='q'//
or even:
$ echo s/Object\\.prototype\\.myString=\'q\'//
s/Object\.prototype\.myString='q'//
Double quotes, as suggested by mu is too short, will work here too, but can cause unwanted surprises in other situations, since many characters commonly found in Perl code, like $, ! and \, have special meaning to the shell even inside double quotes.
Of course, an alternative solution is to replace the single quotes in your regexp with the octal or hex codes \047 or \x27 instead:
$ perl -pi.bak -e 's/Object\.prototype\.myString=\x27q\x27//' myfile.html

Double quotes should work:
perl -pi.bak -e "s/Object\.prototype\.myString='q'//" myfile.html
You may or may not want a g modifier on that regex. And you'll probably want to do a diff after to make sure you didn't mangle the HTML.

Replace all whitespace with a line break/paragraph mark to make a word list

I am trying to vocab list for a Greek text we are translating in class. I want to replace every space or tab character with a paragraph mark so that every word appears on its own line. Can anyone give me the sed command, and explain what it is that I'm doing? I’m still trying to figure sed out.

For reasonably modern versions of sed, edit the standard input to yield the standard output with
$ echo 'τέχνη βιβλίο γη κήπος' | sed -E -e 's/[[:blank:]]+/\n/g'
τέχνη
βιβλίο
γη
κήπος
If your vocabulary words are in files named lesson1 and lesson2, redirect sed’s standard output to the file all-vocab with
sed -E -e 's/[[:blank:]]+/\n/g' lesson1 lesson2 > all-vocab
What it means:
The character class [[:blank:]] matches either a single space character or
a single tab character.
Use [[:space:]] instead to match any single whitespace character (commonly space, tab, newline, carriage return, form-feed, and vertical tab).
The + quantifier means match one or more of the previous pattern.
So [[:blank:]]+ is a sequence of one or more characters that are all space or tab.
The \n in the replacement is the newline that you want.
The /g modifier on the end means perform the substitution as many times as possible rather than just once.
The -E option tells sed to use POSIX extended regex syntax and in particular for this case the + quantifier. Without -E, your sed command becomes sed -e 's/[[:blank:]]\+/\n/g'. (Note the use of \+ rather than simple +.)
Perl Compatible Regexes
For those familiar with Perl-compatible regexes and a PCRE-capable sed, use \s+ to match runs of at least one whitespace character, as in
sed -E -e 's/\s+/\n/g' old > new
or
sed -e 's/\s\+/\n/g' old > new
These commands read input from the file old and write the result to a file named new in the current directory.
Maximum portability, maximum cruftiness
Going back to almost any version of sed since Version 7 Unix, the command invocation is a bit more baroque.
$ echo 'τέχνη βιβλίο γη κήπος' | sed -e 's/[ \t][ \t]*/\
/g'
τέχνη
βιβλίο
γη
κήπος
Notes:
Here we do not even assume the existence of the humble + quantifier and simulate it with a single space-or-tab ([ \t]) followed by zero or more of them ([ \t]*).
Similarly, assuming sed does not understand \n for newline, we have to include it on the command line verbatim.
The \ and the end of the first line of the command is a continuation marker that escapes the immediately following newline, and the remainder of the command is on the next line.
Note: There must be no whitespace preceding the escaped newline. That is, the end of the first line must be exactly backslash followed by end-of-line.
This error prone process helps one appreciate why the world moved to visible characters, and you will want to exercise some care in trying out the command with copy-and-paste.
Note on backslashes and quoting
The commands above all used single quotes ('') rather than double quotes (""). Consider:
$ echo '\\\\' "\\\\"
\\\\ \\
That is, the shell applies different escaping rules to single-quoted strings as compared with double-quoted strings. You typically want to protect all the backslashes common in regexes with single quotes.

The portable way to do this is:
sed -e 's/[ \t][ \t]*/\
/g'
That's an actual newline between the backslash and the slash-g. Many sed implementations don't know about \n, so you need a literal newline. The backslash before the newline prevents sed from getting upset about the newline. (in sed scripts the commands are normally terminated by newlines)
With GNU sed you can use \n in the substitution, and \s in the regex:
sed -e 's/\s\s*/\n/g'
GNU sed also supports "extended" regular expressions (that's egrep style, not perl-style) if you give it the -r flag, so then you can use +:
sed -r -e 's/\s+/\n/g'
If this is for Linux only, you can probably go with the GNU command, but if you want this to work on systems with a non-GNU sed (eg: BSD, Mac OS-X), you might want to go with the more portable option.

All of the examples listed above for sed break on one platform or another. None of them work with the version of sed shipped on Macs.
However, Perl's regex works the same on any machine with Perl installed:
perl -pe 's/\s+/\n/g' file.txt
If you want to save the output:
perl -pe 's/\s+/\n/g' file.txt > newfile.txt
If you want only unique occurrences of words:
perl -pe 's/\s+/\n/g' file.txt | sort -u > newfile.txt

option 1
echo $(cat testfile)
Option 2
tr ' ' '\n' < testfile

This should do the work:
sed -e 's/[ \t]+/\n/g'
[ \t] means a space OR an tab. If you want any kind of space, you could also use \s.
[ \t]+ means as many spaces OR tabs as you want (but at least one)
s/x/y/ means replace the pattern x by y (here \n is a new line)
The g at the end means that you have to repeat as many times it occurs in every line.

You could use POSIX [[:blank:]] to match a horizontal white-space character.
sed 's/[[:blank:]]\+/\n/g' file
or you may use [[:space:]] instead of [[:blank:]] also.
Example:
$ echo 'this is a sentence' | sed 's/[[:blank:]]\+/\n/g'
this
is
a
sentence

You can also do it with xargs:
cat old | xargs -n1 > new
or
xargs -n1 < old > new

Using gawk:
gawk '{$1=$1}1' OFS="\n" file

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

sed match dollar and single quote characters - regex

echo "\$_{'0}\$" | sed "s#\(\$_{\)'#'\1#g" escape the $ when using double quote use group avoiding several confusing \$ when possible use double quote when simple quote are part of the pattern

Related

add a command in the beginning of quotation mark by using bash

Regex for string with double quotes and environment variable

Extracting Substring from String with Multiple Special Characters Using Sed

Perl regex single quote

Replace all whitespace with a line break/paragraph mark to make a word list

Categories

Resources