How to replace all dollar signs before all variables inside a double-quoted string with sed? - regex

I have problems replacing the variables that are inside strings in bash. For example, I want to replace
"test$FOO1=$FOO2" $BAR
with:
"test" .. FOO1 .. "=" .. FOO2 .. "" $BAR
I tried:
sed 's/\$\([A-Z0-9_]\+\)\b/" .. \1 .. "/g'
But I don't want to replace variables the same way outside of double-quoted strings, e.g. like:
if [ $VARIABLE = 1 ]; then
Has to be replaced by just
if VARIABLE then
Is there a way to replace only inside of double-quotes?
Background:
I want to convert a bash script into Lua script.
I am aware, that it will not be easily possible to convert all possible shell scripts this way, but what I want to achieve is to replace all basic language constructs with Lua commands and replace all variables and conditionals. An automation here will save much work when translating bash into Lua by hand

This with GNU awk for multi-char RS, RT, and gensub() shows one way to separate and then manipulate quoted (in RT) and unquoted (in $0) strings as a starting point:
$ cat tst.awk
BEGIN { RS="\"[^\"]*\""; ORS="" }
{
$0 = gensub(/\[\s+[$]([[:alnum:]_]+)\s+=\s+\S+\s+];/,"\\1","g",$0)
RT = gensub(/[$]([[:alnum:]_]+)"/,"\" .. \\1","g",RT)
RT = gensub(/[$]([[:alnum:]_]+)/,"\" .. \\1 .. \"","g",RT)
print $0 RT
}
$ awk -f tst.awk file
"count: " .. FOO .. " times " .. BAR
if VARIABLE then
The above was run on this input file:
$ cat file
"count: $FOO times $BAR"
if [ $VARIABLE = 1 ]; then
NOTE: this approach of matching strings with regexps will always just be a best effort based on the samples provided, you'd need a shell language parser to do the job robustly.

bash lexer for shell!?
I'm so sorry: I just post this answer to warn you about a wrong way!
Reading language is a job for a consistant lexer not for sed nor any regex based tool!!!
See GNU Bison, Berkeley Yacc (byacc).
You could have a look at bash's sources in order to see how scripts are read!
Persisting in this way will bring you quickly to big script, then further to unsolvable problems.

using group and recursive
sed -e ':a' -e 's/^\(\([^"]*\("[^"]*"\)*\)*\)\("[^$"]*\)[$]\([A-Z0-9_]\{1,\}\)/\1\4 .. \5 .. /;t a'
isolate in string from previous part with
^\(\([^"]*\("[^"]*"\)*\)*\) in group 1
select the var content in the string isolated with s\("[^$"]*\)[$]\([A-Z0-9_]\{1,\}\)' in group 4 (prefix) and 5 (var name)
change like you want with \1\4 .. \5 ..
repeat this operation while a change is occuring :a and t a
with a gnu sed you can reduce the command to (no -e needed to target the label a):
sed ':a;s/^\(\([^"]*\("[^"]*"\)*\)*\)\("[^$"]*\)[$]\([A-Z0-9_]\{1,\}\)/\1\4 .. \5 .. /;t a'
Assuming there is no quote (escaped one) in string. If so a first pass is needed to change them and put them back after main modification.

This might work for you (GNU sed):
sed -E ':a;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([^" ]*) /\1" .. \3 .. " /;ta;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([^"]*)"/\1" .. \3/;ta' file
When changing things within double quotes, first we must sail passed any double quoted strings that do not need changing. This means anchoring the regexp to the start of the line using the ^ metacharacter and iterating the regexp until all cases cease to exist.
First, eliminate zero or more characters which are not double quotes from the start of the line.
Second, eliminate double quoted strings which do not contain the character of interest (TCOI) i.e. $, followed by zero or more characters which are not double quotes, zero or more times.
Third, eliminate double quotes followed by zero or more characters which are not double quotes or TCOI i.e. $.
The following character (if it exists) must be TCOI. Group the entire collection of strings before in a back reference \1.
Following TCOI, one or more conditions may be grouped. In the above example the first condition is when a variable (beginning with TCOI) is followed by a space. The second condition is when the variable is followed directly by ". Hence this entails two substitution commands, the ta command, branches to the loop identified a when the substitution was successful.
N.B. The if [ $VARIABLE = 1 ]; then situation can be treated in the same vien, here the [ is the opening double quote and the ] is the closing double quote.
P.S. TCOI was $ and this is also a metacharacter in regexp that represents the end of a line, it therefore must be quoted e.g.\$
P.P.S. Don't forget to quote the ['s and ]'s too. If quotings not your thing, then enclose the character in [x] where x is the character to be quoted.
EDIT:
sed -E ':a;s/^([^"]*("[^"$]*"[^"]*)*"[^"$]*)\$([[:alnum:]]*)/\1" .. \3 .. "/;ta' file
Since the original example has been replace by the OP here is a solution based on the new example.

Related

Bash - use variable in perl regex together with matching groups

this is my first post on stackoverflow, please forgive me if I missed something important.
I am currently stuck with the follwing issue. The goal is, to replace port numbers dynamically based on a filelist I prepared with find. All of the ports in those files, start with the number "4" and have 5 digits.
Now the tricky part, I am replacing only digit #2 and #3, and keep positions 1, 4 and 5. Examples:
old port in file: 40380, 40381
new port in file: 41580, 40381
I am working on Sun Solaris 5.10 therefore I prefer perl for inline replacements
Finally the key question: how can I combine $1 (group 1) + $PIN_PINNO + $3 (group 3) so that the result would be: 41580
NEW_PINNO=15
LOGI=$HOME/filelist.txt
# port replacement
for file in `cat $LOGI`
do
perl -pe 's/[\:\>\=]\s*(4)(\d{2})(\d{2})\b/$1${NEW_PINNO}$3/g' $file
done
many thanks in advance
perl -pse 's/ [:>=]\s* \K (\d)\d\d(\d\d) \b/$1$pin$2/gx' -- -pin="$new_pinno" file
Your regex will match the [colon, greater than, equal sign] and the spaces, but you don't include them in the substitution. I'm using the \K directive to match those characters but then forget about them (ref: http://perldoc.perl.org/perlre.html#Lookaround-Assertions)
I'm using the -s option to enable "rudimentary switch parsing" to pass the shell variable into perl without playing quoting games. (ref: http://perldoc.perl.org/perlrun.html)
Testing
new_pinno=15
perl -pse 's/ [:>=]\s* \K (\d)\d\d(\d\d) \b/$1$pin$2/gx' -- -pin="$new_pinno" <<END
var1=40380
var2=40381
END
var1=41580
var2=41581
Notes
you should not use ALL_CAPS_VARNAMES in the shell, leave those to be reserved by the shell. One day, you'll use PATH=something and then wonder why your script is broken.
and #123's comment is valid. This is the safe way to read lines from a file:
while read -r file; do
perl ... "$file"
done < "$LOGI"
ref: http://mywiki.wooledge.org/DontReadLinesWithFor
perl -pe 's/[\:\>\=]\s*(4)(\d{2})(\d{2})\b/$1'${NEW_PINNO}'$3/g' $file
is ok.
The difference between single and double quotes is how bash treats its variables. In single quotes it won't expand them, but when enclosed in double quotes it will. You can open & close quotes as much as you want in a command line argument. It's only if there is a space that is outside of a pair of quotes (single or double) that determines the start of a new argument.
So you close the single quote, have the bash variable which will be expanded and then re-open the single quote. Enclosing the variable in double quotes ensures that if there are any spaces in the variable they'll not split the argument.
perl -pe 's/[\:\>\=]\s*(4)(\d{2})(\d{2})\b/$1'"${NEW_PINNO}"'$3/g' $file

How to swap text based on patterns at once with sed?

Suppose I have 'abbc' string and I want to replace:
ab -> bc
bc -> ab
If I try two replaces the result is not what I want:
echo 'abbc' | sed 's/ab/bc/g;s/bc/ab/g'
abab
So what sed command can I use to replace like below?
echo abbc | sed SED_COMMAND
bcab
EDIT:
Actually the text could have more than 2 patterns and I don't know how many replaces I will need. Since there was a answer saying that sed is a stream editor and its replaces are greedily I think that I will need to use some script language for that.
Maybe something like this:
sed 's/ab/~~/g; s/bc/ab/g; s/~~/bc/g'
Replace ~ with a character that you know won't be in the string.
I always use multiple statements with "-e"
$ sed -e 's:AND:\n&:g' -e 's:GROUP BY:\n&:g' -e 's:UNION:\n&:g' -e 's:FROM:\n&:g' file > readable.sql
This will append a '\n' before all AND's, GROUP BY's, UNION's and FROM's, whereas '&' means the matched string and '\n&' means you want to replace the matched string with an '\n' before the 'matched'
sed is a stream editor. It searches and replaces greedily. The only way to do what you asked for is using an intermediate substitution pattern and changing it back in the end.
echo 'abcd' | sed -e 's/ab/xy/;s/cd/ab/;s/xy/cd/'
Here is a variation on ooga's answer that works for multiple search and replace pairs without having to check how values might be reused:
sed -i '
s/\bAB\b/________BC________/g
s/\bBC\b/________CD________/g
s/________//g
' path_to_your_files/*.txt
Here is an example:
before:
some text AB some more text "BC" and more text.
after:
some text BC some more text "CD" and more text.
Note that \b denotes word boundaries, which is what prevents the ________ from interfering with the search (I'm using GNU sed 4.2.2 on Ubuntu). If you are not using a word boundary search, then this technique may not work.
Also note that this gives the same results as removing the s/________//g and appending && sed -i 's/________//g' path_to_your_files/*.txt to the end of the command, but doesn't require specifying the path twice.
A general variation on this would be to use \x0 or _\x0_ in place of ________ if you know that no nulls appear in your files, as jthill suggested.
Here is an excerpt from the SED manual:
-e script
--expression=script
Add the commands in script to the set of commands to be run while processing the input.
Prepend each substitution with -e option and collect them together. The example that works for me follows:
sed < ../.env-turret.dist \
-e "s/{{ name }}/turret$TURRETS_COUNT_INIT/g" \
-e "s/{{ account }}/$CFW_ACCOUNT_ID/g" > ./.env.dist
This example also shows how to use environment variables in your substitutions.
This might work for you (GNU sed):
sed -r '1{x;s/^/:abbc:bcab/;x};G;s/^/\n/;:a;/\n\n/{P;d};s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/;ta;s/\n(.)/\1\n/;ta' file
This uses a lookup table which is prepared and held in the hold space (HS) and then appended to each line. An unique marker (in this case \n) is prepended to the start of the line and used as a method to bump-along the search throughout the length of the line. Once the marker reaches the end of the line the process is finished and is printed out the lookup table and markers being discarded.
N.B. The lookup table is prepped at the very start and a second unique marker (in this case :) chosen so as not to clash with the substitution strings.
With some comments:
sed -r '
# initialize hold with :abbc:bcab
1 {
x
s/^/:abbc:bcab/
x
}
G # append hold to patt (after a \n)
s/^/\n/ # prepend a \n
:a
/\n\n/ {
P # print patt up to first \n
d # delete patt & start next cycle
}
s/\n(ab|bc)(.*\n.*:(\1)([^:]*))/\4\n\2/
ta # goto a if sub occurred
s/\n(.)/\1\n/ # move one char past the first \n
ta # goto a if sub occurred
'
The table works like this:
** ** replacement
:abbc:bcab
** ** pattern
Tcl has a builtin for this
$ tclsh
% string map {ab bc bc ab} abbc
bcab
This works by walking the string a character at a time doing string comparisons starting at the current position.
In perl:
perl -E '
sub string_map {
my ($str, %map) = #_;
my $i = 0;
while ($i < length $str) {
KEYS:
for my $key (keys %map) {
if (substr($str, $i, length $key) eq $key) {
substr($str, $i, length $key) = $map{$key};
$i += length($map{$key}) - 1;
last KEYS;
}
}
$i++;
}
return $str;
}
say string_map("abbc", "ab"=>"bc", "bc"=>"ab");
'
bcab
May be a simpler approach for single pattern occurrence you can try as below:
echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
My output:
~# echo 'abbc' | sed 's/ab/bc/;s/bc/ab/2'
bcab
For multiple occurrences of pattern:
sed 's/\(ab\)\(bc\)/\2\1/g'
Example
~# cat try.txt
abbc abbc abbc
bcab abbc bcab
abbc abbc bcab
~# sed 's/\(ab\)\(bc\)/\2\1/g' try.txt
bcab bcab bcab
bcab bcab bcab
bcab bcab bcab
Hope this helps !!
echo "C:\Users\San.Tan\My Folder\project1" | sed -e 's/C:\\/mnt\/c\//;s/\\/\//g'
replaces
C:\Users\San.Tan\My Folder\project1
to
mnt/c/Users/San.Tan/My Folder/project1
in case someone needs to replace windows paths to Windows Subsystem for Linux(WSL) paths
If replacing the string by Variable, the solution doesn't work.
The sed command need to be in double quotes instead on single quote.
#sed -e "s/#replacevarServiceName#/$varServiceName/g" -e "s/#replacevarImageTag#/$varImageTag/g" deployment.yaml
Here is an awk based on oogas sed
echo 'abbc' | awk '{gsub(/ab/,"xy");gsub(/bc/,"ab");gsub(/xy/,"bc")}1'
bcab
I believe this should solve your problem. I may be missing a few edge cases, please comment if you notice one.
You need a way to exclude previous substitutions from future patterns, which really means making outputs distinguishable, as well as excluding these outputs from your searches, and finally making outputs indistinguishable again. This is very similar to the quoting/escaping process, so I'll draw from it.
s/\\/\\\\/g escapes all existing backslashes
s/ab/\\b\\c/g substitutes raw ab for escaped bc
s/bc/\\a\\b/g substitutes raw bc for escaped ab
s/\\\(.\)/\1/g substitutes all escaped X for raw X
I have not accounted for backslashes in ab or bc, but intuitively, I would escape the search and replace terms the same way - \ now matches \\, and substituted \\ will appear as \.
Until now I have been using backslashes as the escape character, but it's not necessarily the best choice. Almost any character should work, but be careful with the characters that need escaping in your environment, sed, etc. depending on how you intend to use the results.
Every answer posted thus far seems to agree with the statement by kuriouscoder made in his above post:
The only way to do what you asked for is using an intermediate
substitution pattern and changing it back in the end
If you are going to do this, however, and your usage might involve more than some trivial string (maybe you are filtering data, etc.), the best character to use with sed is a newline. This is because since sed is 100% line-based, a newline is the one-and-only character you are guaranteed to never receive when a new line is fetched (forget about GNU multi-line extensions for this discussion).
To start with, here is a very simple approach to solving your problem using newlines as an intermediate delimiter:
echo "abbc" | sed -E $'s/ab|bc/\\\n&/g; s/\\nab/bc/g; s/\\nbc/ab/g'
With simplicity comes some trade-offs... if you had more than a couple variables, like in your original post, you have to type them all twice. Performance might be able to be improved a little bit, too.
It gets pretty nasty to do much beyond this using sed. Even with some of the more advanced features like branching control and the hold buffer (which is really weak IMO), your options are pretty limited.
Just for fun, I came up with this one alternative, but I don't think I would have any particular reason to recommend it over the one from earlier in this post... You have to essentially make your own "convention" for delimiters if you really want to do anything fancy in sed. This is way-overkill for your original post, but it might spark some ideas for people who come across this post and have more complicated situations.
My convention below was: use multiple newlines to "protect" or "unprotect" the part of the line you're working on. One newline denotes a word boundary. Two newlines denote alternatives for a candidate replacement. I don't replace right away, but rather list the candidate replacement on the next line. Three newlines means that a value is "locked-in", like your original post way trying to do with ab and bc. After that point, further replacements will be undone, because they are protected by the newlines. A little complicated if I don't say so myself... ! sed isn't really meant for much more than the basics.
# Newlines
NL=$'\\\n'
NOT_NL=$'[\x01-\x09\x0B-\x7F]'
# Delimiters
PRE="${NL}${NL}&${NL}"
POST="${NL}${NL}"
# Un-doer (if a request was made to modify a locked-in value)
tidy="s/(\\n\\n\\n${NOT_NL}*)\\n\\n(${NOT_NL}*)\\n(${NOT_NL}*)\\n\\n/\\1\\2/g; "
# Locker-inner (three newlines means "do not touch")
tidy+="s/(\\n\\n)${NOT_NL}*\\n(${NOT_NL}*\\n\\n)/\\1${NL}\\2/g;"
# Finalizer (remove newlines)
final="s/\\n//g"
# Input/Commands
input="abbc"
cmd1="s/(ab)/${PRE}bc${POST}/g"
cmd2="s/(bc)/${PRE}ab${POST}/g"
# Execute
echo ${input} | sed -E "${cmd1}; ${tidy}; ${cmd2}; ${tidy}; ${final}"

Egrep expression: how to unescape single quotes when reading from file?

I need to use egrep to obtain an entry in an index file.
In order to find the entry, I use the following command:
egrep "^$var_name" index
$var_name is the variable read from a var list file:
while read var_name; do
egrep "^$var_name" index
done < list
One of the possible keys comes usually in this format:
$ERROR['SOME_VAR']
My index file is in the form:
$ERROR['SOME_VAR'] --> n
Where n is the line where the variable is found.
The problem is that $var_name is automatically escaped when read. When I enable the debug mode, I get the following command being executed:
+ egrep '^$ERRORS['\''SELECT_COUNTRY'\'']' index
The command above doesn't work, because egrep will try to interpret the pattern.
If I don't use the extended version, using grep or fgrep, the command will work only if I remove the ^ anchor:
grep -F "$var_name" index # this actually works
The problem is that I need to ensure that the match is made at the beginning of the line.
Ideas?
set -x shows the command being executed in shell notation.
The backslashes you see do not become part of the argument, they're just printed by set -x to show the executed command in a copypastable format.
Your problem is not too much escaping, but too little: $ in regex means "end of line", so ^$ERROR will never match anything. Similarly, [ ] is a character range, and will not match literal square brackets.
The correct regex to match your pattern would be ^\$ERROR\['SOME VAR'], equivalent to the shell argument in egrep "^\\\$ERROR\['SOME_VAR']".
Your options to fix this are:
If you expect to be able to use regex in your input file, you need to include regex escapes like above, so that your patterns are valid.
If you expect to be able to use arbitrary, literal strings, use a tool that can match flexibly and literally. This requires jumping through some hoops, since UNIX tools for legacy reasons are very sloppy.
Here's one with awk:
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
It passes the string in through the environment (because -v is sloppy) and then matches literally against the string from the start of the input.
Here's an example invocation:
$ cat script
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
$ cat list
$ERRORS['SOME_VAR']
\E and \Q
'"'%##%*'
$ cat index
hello world
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too
etc
$ bash script
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too
You can use printf "%q":
while read -r var_name; do
egrep "^$(printf "%q\n" "$var_name")" index
done < list
Update: You can also do:
while read -r var_name; do
egrep "^\Q$var_name\E" index
done < list
Here \Q and \E are used to make string in between a literal string removing all special meaning of regex symbols.

"sed" special characters handling

we have an sed command in our script to replace the file content with values from variables
for example..
export value="dba01upc\Fusion_test"
sed -i "s%{"sara_ftp_username"}%$value%g" /home_ldap/user1/placeholder/Sara.xml
the sed command ignores the special characters like '\' and replacing with string "dba01upcFusion_test" without '\'
It works If I do the export like export value='dba01upc\Fusion_test' (with '\' surrounded with ‘’).. but unfortunately our client want to export the original text dba01upc\Fusion_test with single/double quotes and he don’t want to add any extra characters to the text.
Can any one let me know how to make sed to place the text with special characters..
Before Replacement : Sara.xml
<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account >
<ser:description/>
<ser:static-account>
<con:username>{sara_ftp_username}</con:username>
</ser:static-account>
</ser:service-account>
After Replacement : Sara.xml
<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account>
<ser:description/>
<ser:static-account>
<con:username>dba01upcFusion_test</con:username>
</ser:static-account>
</ser:service-account>
Thanks in advance
You cannot robustly solve this problem with sed. Just use awk instead:
awk -v old="string1" -v new="string2" '
idx = index($0,old) {
$0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file
Ah, #mklement0 has a good point - to stop escapes from being interpreted you need to pass in the values in the arg list along with the file names and then assign the variables from that, rather than assigning values to the variables with -v (see the summary I wrote a LONG time ago for the comp.unix.shell FAQ at http://cfajohnson.com/shell/cus-faq-2.html#Q24 but apparently had forgotten!).
The following will robustly make the desired substitution (a\ta -> e\tf) on every search string found on every line:
$ cat tst.awk
BEGIN {
old=ARGV[1]; delete ARGV[1]
new=ARGV[2]; delete ARGV[2]
lgthOld = length(old)
}
{
head = ""; tail = $0
while ( idx = index(tail,old) ) {
head = head substr(tail,1,idx-1) new
tail = substr(tail,idx+lgthOld)
}
print head tail
}
$ cat file
a\ta a a a\ta
$ awk -f tst.awk 'a\ta' 'e\tf' file
e\tf a a e\tf
The white space in file is tabs. You can shift ARGV[3] down and adjust ARGC if you like but it's not necessary in most cases.
Update with the benefit of hindsight, to present options:
Update 2: If you're intent on using sed, see the - somewhat cumbersome, but now robust and generic - solution below.
If you want a robust, self-contained awk solution that also properly handles both arbitrary search and replacement strings (but cannot incorporate regex features such as word-boundary assertions), see Ed Morton's answer.
If you want a pure bash solution and your input files are small and preserving multiple trailing newlines is not important, see Charles Duffy's answer.
If you want a full-fledged third-party templating solution, consider, for instance, j2cli, a templating CLI for Jinja2 - if you have Python and pip, install with sudo pip install j2cli.
Simple example (note that since the replacement string is provided via a file, this may not be appropriate for sensitive data; note the double braces ({{...}})):
value='dba01upc\Fusion_test'
echo "sara_ftp_username=$value" >data.env
echo '<con:username>{{sara_ftp_username}}</con:username>' >tmpl.xml
j2 tmpl.xml data.env # -> <con:username>dba01upc\Fusion_test</con:username>
If you use sed, careful escaping of both the search and the replacement string is required, because:
As Ed Morton points out in a comment elsewhere, sed doesn't support use of literal strings as replacement strings - it invariably interprets special characters/sequences in the replacement string.
Similarly, the search string literal must be escaped in a way that its characters aren't mistaken for special regular-expression characters.
The following uses two generic helper functions that perform this escaping (quoting) that apply techniques explained at "Is it possible to escape regex characters reliably with sed?":
#!/usr/bin/env bash
# SYNOPSIS
# quoteRe <text>
# DESCRIPTION
# Quotes (escapes) the specified literal text for use in a regular expression,
# whether basic or extended - should work with all common flavors.
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# '
# SYNOPSIS
# quoteSubst <text>
# DESCRIPTION
# Quotes (escapes) the specified literal string for safe use as the substitution string (the 'new' in `s/old/new/`).
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}
# The search string.
search='{sara_ftp_username}'
# The replacement string; a demo value with characters that need escaping.
value='&\1%"'\'';<>/|dba01upc\Fusion_test'
# Use the appropriately escaped versions of both strings.
sed "s/$(quoteRe "$search")/$(quoteSubst "$value")/g" <<<'<el>{sara_ftp_username}</el>'
# -> <el>&\1%"';<>/|dba01upc\Fusion_test</el>
Both quoteRe() and quoteSubst() correctly handle multi-line strings.
Note, however, given that sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
quoteRe() is always safe to use with a command substitution ($(...)), because it always returns a single-line string (newlines in the input are encoded as '\n').
By contrast, if you use quoteSubst() with a string that has trailing newlines, you mustn't use $(...), because the latter will remove the last trailing newline and therefore break the encoding (since quoteSubst() \-escapes actual newlines, the string returned would end in a dangling \).
Thus, for strings with trailing newlines, use IFS= read -d '' -r escapedValue < <(quoteSubst "$value") to read the escaped value into a separate variable first, then use that variable in the sed command.
This can be done with bash builtins alone -- no sed, no awk, etc.
orig='{sara_ftp_username}' # put the original value into a variable
new='dba01upc\Fusion_test' # ...no need to 'export'!
contents=$(<Sara.xml) # read the file's content into
new_contents=${contents//"$orig"/$new} # use parameter expansion to replace
printf '%s' "$new_contents" >Sara.xml # write new content to disk
See the relevant part of BashFAQ #100 for information on using parameter expansion for string substitution.

In GNU Grep or another standard bash command, is it possible to get a resultset from regex?

Consider the following:
var="text more text and yet more text"
echo $var | egrep "yet more (text)"
It should be possible to get the result of the regex as the string: text
However, I don't see any way to do this in bash with grep or its siblings at the moment.
In perl, php or similar regex engines:
$output = preg_match('/yet more (text)/', 'text more text yet more text');
$output[1] == "text";
Edit: To elaborate why I can't just multiple-regex, in the end I will have a regex with multiple of these (Pictured below) so I need to be able to get all of them. This also eliminates the option of using lookahead/lookbehind (As they are all variable length)
egrep -i "([0-9]+) +$USER +([0-9]+).+?(/tmp/Flash[0-9a-z]+) "
Example input as requested, straight from lsof (Replace $USER with "j" for this input data):
npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXu8pvMg (deleted)
npviewer. 17875 j 17u REG 8,8 16037387 524273 /tmp/FlashXXIBH29F (deleted)
The end goal is to cp /proc/$var1/fd/$var2 ~/$var3 for every line, which ends up "Downloading" flash files (Flash used to store in /tmp but they drm'd it up)
So far I've got:
#!/bin/bash
regex="([0-9]+) +j +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+)"
echo "npviewer. 17875 j 11u REG 8,8 59737848 524264 /tmp/FlashXXYOvS8S (deleted)" |
sed -r -n -e " s%^.*?$regex.*?\$%\1 \2 \3%p " |
while read -a array
do
echo /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
It cuts off the first digits of the first value to return, and I'm not familiar enough with sed to see what's wrong.
End result for downloading flash 10.2+ videos (Including, perhaps, encrypted ones):
#!/bin/bash
lsof | grep "/tmp/Flash" | sed -r -n -e " s%^.+? ([0-9]+) +$USER +([0-9]+).+?/tmp/(Flash[0-9a-zA-Z]+).*?\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
Edit: look at my other answer for a simpler bash-only solution.
So, here the solution using sed to fetch the right groups and split them up. You later still have to use bash to read them. (And in this way it only works if the groups themselves do not contain any spaces - otherwise we had to use another divider character and patch read by setting $IFS to this value.)
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
sed -r -n -e " s%^.*$regex.*\$%\1 \2 \3%p " |
while read -a array
do
cp /proc/${array[0]}/fd/${array[1]} ~/${array[2]}
done
Note that I had to adapt your last regex group to allow uppercase letters, and added a space at the beginning to be sure to capture the whole block of numbers. Alternatively here a \b (word limit) would have worked, too.
Ah, I forget mentioning that you should pipe the text to this script, like this:
./grep-result.sh < grep-result-test.txt
(provided your files are named like this). Instead you can add a < grep-result-test after the sed call (before the |), or prepend the line with cat grep-result-test.txt |.
How does it work?
sed -r -n calls sed in extended-regexp-mode, and without printing anything automatically.
-e " s%^.*$regex.*\$%\1 \2 \3%p " gives the sed program, which consists of a single s command.
I'm using % instead of the normal / as parameter separator, since / appears inside the regex and I don't want to escape it.
The regex to search is prefixed by ^.* and suffixed by .*$ to grab the whole line (and avoid printing parts of the rest of the line).
Note that this .* grabs greedy, so we have to insert a space into our regexp to avoid it grabbing the start of the first digit group too.
The replacement text contains of the three parenthesed groups, separated by spaces.
the p flag at the end of the command says to print out the pattern space after replacement. Since we grabbed the whole line, the pattern space consists of only the replacement text.
So, the output of sed for your example input is this:
5 11 /tmp/FlashXXu8pvMg
5 17 /tmp/FlashXXIBH29F
This is much more friendly for reuse, obviously.
Now we pipe this output as input to the while loop.
read -a array reads a line from standard input (which is the output from sed, due to our pipe), splits it into words (at spaces, tabs and newlines), and puts the words into an array variable.
We could also have written read var1 var2 var3 instead (preferably using better variable names), then the first two words would be put to $var1 and $var2, with $var3 getting the rest.
If read succeeded reading a line (i.e. not end-of-file), the body of the loop is executed:
${array[0]} is expanded to the first element of the array and similarly.
When the input ends, the loop ends, too.
This isn't possible using grep or another tool called from a shell prompt/script because a child process can't modify the environment of its parent process. If you're using bash 3.0 or better, then you can use in-process regular expressions. The syntax is perl-ish (=~) and the match groups are available via $BASH_REMATCH[x], where x is the match group.
After creating my sed-solution, I also wanted to try the pure-bash approach suggested by Mark. It works quite fine, for me.
#!/bin/bash
USER=j
regex=" ([0-9]+) +$USER +([0-9]+).+(/tmp/Flash[0-9a-zA-Z]+) "
while read
do
if [[ $REPLY =~ $regex ]]
then
echo cp /proc/${BASH_REMATCH[1]}/fd/${BASH_REMATCH[2]} ~/${BASH_REMATCH[3]}
fi
done
(If you upvote this, you should think about also upvoting Marks answer, since it is essentially his idea.)
The same as before: pipe the text to be filtered to this script.
How does it work?
As said by Mark, the [[ ... ]] special conditional construct supports the binary operator =~, which interprets his right operand (after parameter expansion) as a extended regular expression (just as we want), and matches the left operand against this. (We have again added a space at front to avoid matching only the last digit.)
When the regex matches, the [[ ... ]] returns 0 (= true), and also puts the parts matched by the individual groups (and the whole expression) into the array variable BASH_REMATCH.
Thus, when the regex matches, we enter the then block, and execute the commands there.
Here again ${BASH_REMATCH[1]} is an array-access to an element of the array, which corresponds to the first matched group. ([0] would be the whole string.)
Another note: Both my scripts accept multi-line input and work on every line which matches. Non-matching lines are simply ignored. If you are inputting only one line, you don't need the loop, a simple if read ; then ... or even read && [[ $REPLY =~ $regex ]] && ... would be enough.
echo "$var" | pcregrep -o "(?<=yet more )text"
Well, for your simple example, you can do this:
var="text more text and yet more text"
echo $var | grep -e "yet more text" | grep -o "text"