sed: capture group not expanding - regex

Can someone explain to me why my sed command isn't working? I'm sure I'm doing something stupid. Here's small text file that demonstrates my issue:
#!/usr/bin/env python
class A:
def candy(self):
print "cane"
Put that in a file and call it test.py
My goal is to add #profile before the def line with the same indentation as the function declaration. I try with this:
$ sed -i '/\( *\)def /i \
\1#profile' test.py
Note that the capture group should be the set of spaces before the def and I'm referencing the group with \1.
Here's my result:
#!/usr/bin/env python
class A:
1#profile
def candy(self):
print "cane"
Why is that 1 being placed in there literally instead of being replaced by my capture group (four spaces)?
Thanks!

I don't know this to be true but I'm going to assume that sed doesn't maintain captures from address selectors and into manually inserted text and in fact may not be evaluating references inside "literal" text at all.
Try sed -e 's/\( *\)def /\1#profile\n&/' test.py instead.

What about that :
sed -i -e 's/^\(.*\)\(def.*\)/\1#profile\n\2/' test.py

Just use awk:
$ awk '{orig=$0} sub(/def.*/,"#profile"); {print orig}' file
#!/usr/bin/env python
class A:
#profile
def candy(self):
print "cane"
simple, portable, easily extendable, debuggable, etc., etc....

Related

Extracting string from html file or curl output

I have a html file where some of them are "minified", this means that a whole website can be in just one line.
I want to filter the value of ?idsite= which contains numbers. So a html contains something like this: img src="//stats.domains.com/piwik.php?idsite=44.
So the plain output should be "44".
I tried grep but it echos the whole line and just highlights the value.
With perl it could be something like:
echo "Whole bunch of stuff \
img src=\"stats.domains.com/piwik.php?idsite=44\" " \
| perl -nE 'say /.*idsite=(..)\"/ '
(assumes that idsite is always two characters ! :-). Your regex will need to be more sophisticated than this most likely).
Putting the snippet from the page you reference above in an HTML file (non-minified) and subsituting 44 for the parameter variable, this bit of perl will extract the "44":
perl -nE 'say /.*idsite=(..)/ if /idsite/ ' idsite.html
Translating the one liner to a sed command line would be similar:
echo "Whole bunch of stuff \
img src=\"stats.domains.com/piwik.php?idsite=44\" " \
| sed -En "s/^.*idsite=(..)\"/\1/p"
This is POSIXsed from FreeBSD (should work on OSX) the -E switch is to add "modern" regexes.
Doing it in awk is left as an exercise for another community member :-)
Here is a perl way to extract only the trailing digits of strings like src="//stats.domains.com/piwik.php?idsite=44" and run on a bash command line:
echo $src|perl -ne '$_ =~m /(\d+$)/; print $1'
Here is a python way to do the same thing:
import re
print ', '.join( re.findall(r'\d+$', src))
If there will be a lot of src strings to process, it would be best to compile the regex when using Python as follows:
import re
p = re.compile('\d+$')
print ', '.join(p.findall(src))
The import and the compilation only have to be done once.
Here is a Ruby way to do it:
puts src.scan( /\d+$/ ).first
In all cases the regexes end with "$" which matches the end of the string. That is why they match and extract only digits (\d+) at the end of the string.
If you don't need to check whether the idsite is in the value of a src attribute, then all you need is
perl -nE'say $1 if /\bidsite=(\d+)' myfile.html
$ cat site.html
lorem ipsum idsite='4934' fasdf a
other line
$ sed -n '/idsite/ { s/.*idsite=\([0-9]\+\).*$/\1/; p }' < site.html
4934
Let me know in case you need an explanation of what is going on.

Uncommenting a config line based on regex with string replace

I have a config file with a bunch of URLs for repos that are commented out. I need to uncomment a specific one and thought sed would make it easy to match a regex then doing a string replace on that line.
I was wondering if my regex in correct for sed syntax or if the sed command is not correct?
mirrorRegex="^# http.*vendor.*distroARCH-1.1\/"
sed '/$mirrorRegex/s/# //' /etc/repos
Before:
# ftp://mirrors.example.com/distro/distroARCH-1.1/
# http://mirrors.example.com/distro/distroARCH-1.1/
# ftp://packages.vendor.org/distro/distro/distroARCH-1.1/
# http://packages.vendor.org/distro/distro/distroARCH-1.1/
# http://mirror.school.edu/pub/distro/distroARCH-1.1/
# http://system.site3.com/distroARCH-1.1/
After: What is expected.
# ftp://mirrors.example.com/distro/distroARCH-1.1/
# http://mirrors.example.com/distro/distroARCH-1.1/
# ftp://packages.vendor.org/distro/distro/distroARCH-1.1/
http://packages.vendor.org/distro/distro/distroARCH-1.1/
# http://mirror.school.edu/pub/distro/distroARCH-1.1/
# http://system.site3.com/distroARCH-1.1/
You need to use double quotes in order to expand shell variables:
sed "/$mirrorRegex/s/# //"
You can use awk like this to do the same:
awk '$0~var {sub(/^# /,x)}1' var="$mirrorRegex" file
sed 's|^#[[:blank:]]*\(http.*vendor.*distroARCH-1.1/.*\)|\1|' YourFile
use of other separator than default / (| in this case) will help
and with variable version
Content='http.*vendor.*distroARCH-1.1/'
sed "s|^#[[:blank:]]*\(${Content}.*\)|\1|" YourFile

Perl from command line: substitute regex only once in a file

I'm trying to replace a line in a configuration file. The problem is I only want to replace only one occurence. Part of the file looks like this. It is the gitolite default config file:
# -----------------------------------------------------------------
# suggested locations for site-local gitolite code (see cust.html)
# this one is managed directly on the server
# LOCAL_CODE => "$ENV{HOME}/local",
# or you can use this, which lets you put everything in a subdirectory
# called "local" in your gitolite-admin repo. For a SECURITY WARNING
# on this, see http://gitolite.com/gitolite/cust.html#pushcode
# LOCAL_CODE => "$rc{GL_ADMIN_BASE}/local",
# ------------------------------------------------------------------
I would like to set LOCAL_CODE to something else from the command line. I thought I might do it in perl to get pcre convenience. I'm new to perl though and can't get it working.
I found this:
perl -i.bak -p -e’s/old/new/’ filename
The problem is -p seems to have it loop over the file line by line, and so a 'o' modifier won't any have effect. However without the -p option it doesn't seem to work...
A compact way to do this is
perl -i -pe '$done ||= s/old/new/' filename
Yet another one-liner:
perl -i.bak -p -e '$i = s/old/new/ if !$i' filename
There are probably a large number of perl one liners that will do this, but here is one.
perl -i.bak -p -e '$x++ if $x==0 && s/old/new/;' filename

how to print part of regex match in sed when parsing source code?

I would like to print part between regex match like this:
echo "this is foo and another foo quux" | sed 's/this\(.*\)another.*/\1/'
which prints
is foo and
what is perfectly ok as I want to get part between this and another printed.
But, If I want to parse my source code and use:
cat source_code | sed 's/.*AdulterateFood\(.*\)DangerousFood.*/\1/'
and I do know that AdulterateFood and DangerousFood is only once in the source code, it still prints everything, whole file:( I am wondering why.. AdulterateFood and DangerousFood are on different lines.
Thank you for your suggestions.
sed prints each input line by default. If you don't want that behavior you need to add the -n option. If you then want it to print the lines that match your RE you have to add a "p" to the end of the substitution command to tell sed TO print that line. So this:
sed -n 's/.*AdulterateFood\(.*\)DangerousFood.*/\1/p' source_code
seems to be what you're asking for but since you didn't provide any sample input and expected output it's just a guess.
To print all lines between AdulterateFood and Dangerous food:
sed -n '/AdulterateFood/,/DangerousFood/p' file

Regex to batch rename files in OS X Terminal

I'm after a way to batch rename files with a regex i.e.
s/123/onetwothree/g
I recall i can use awk and sed with a regex but couldnt figure out how to pipe them together for the desired output.
You can install perl based rename utility:
brew install rename
and than just use it like:
rename 's/123/onetwothree/g' *
if you'd like to test your regex without renaming any files just add -n switch
An efficient way to perform the rename operation is to construct the rename commands in a sed pipeline and feed them into the shell.
ls |
sed -n 's/\(.*\)\(123\)\(.*\)/mv "\1\2\3" "\1onetwothree\2"/p' |
sh
Namechanger is super nice. It supports regular expressions for search and replace: consider that I am doing a super complex rename with the following regex:
\.sync-conflict-.*\.
thats a life saver.
Regex captures groups (Diomidis answer) be the CLI way, into variables I think called $1 and $2 so rename -nv 's/^(\d{2})\.(\d{2}).*/s$1e$2.mp4/' *.mp4 becomes possible. Notice the $1 and $2? Those are coming from capture group one (\d{2}) and two (\d{2}) in my example.
My take on a friendly recursive regex file name renamer which by default only emulates the replacement and shows what the resulting file names would be.
Use -w to actually write changes when you are satisfied with the dry run result, -s to suppress displaying non-matching files; -h or --help will show usage notes.
Simplest usage:
# replace all occurences of 'foo' with 'bar'
# "foo-foo.txt" >> "bar-bar.txt"
ren.py . 'foo' 'bar' -s
# only replace 'foo' at the beginning of the filename
# "foo-foo.txt" >> "bar-foo.txt"
ren.py . '^foo' 'bar' -s
Matching groups (e.g. \1, \2 etc) are supported too:
# rename "spam.txt" to "spam-spam-spam.py"
ren.py . '(.+)\.txt' '\1-\1-\1.py' -s
# rename "12-lovely-spam.txt" to "lovely-spam-12.txt"
# (assuming two digits at the beginning and a 3 character extension
ren.py . '^(\d{2})-(.+)\.(.{3})' '\2-\1.\3' -s
NOTE: don't forget to add -w when you tested the results and want to actually write the changes.
Works both with Python 2.x and Python 3.x.
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import print_function
import argparse
import os
import fnmatch
import sys
import shutil
import re
def rename_files(args):
pattern_old = re.compile(args.search_for)
for path, dirs, files in os.walk(os.path.abspath(args.root_folder)):
for filename in fnmatch.filter(files, "*.*"):
if pattern_old.findall(filename):
new_name = pattern_old.sub(args.replace_with, filename)
filepath_old = os.path.join(path, filename)
filepath_new = os.path.join(path, new_name)
if not new_name:
print('Replacement regex {} returns empty value! Skipping'.format(args.replace_with))
continue
print(new_name)
if args.write_changes:
shutil.move(filepath_old, filepath_new)
else:
if not args.suppress_non_matching:
print('Name [{}] does not match search regex [{}]'.format(filename, args.search_for))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Recursive file name renaming with regex support')
parser.add_argument('root_folder',
help='Top folder for the replacement operation',
nargs='?',
action='store',
default='.')
parser.add_argument('search_for',
help='string to search for',
action='store')
parser.add_argument('replace_with',
help='string to replace with',
action='store')
parser.add_argument('-w', '--write-changes',
action='store_true',
help='Write changes to files (otherwise just simulate the operation)',
default=False)
parser.add_argument('-s', '--suppress-non-matching',
action='store_true',
help='Hide files that do not match',
default=False)
args = parser.parse_args(sys.argv[1:])
print(args)
rename_files(args)
files = "*"
for f in $files; do
newname=`echo "$f" | sed 's/123/onetwothree/g'`
mv "$f" "$newname"
done