Where is this Regex expression not closed in sed (apostrophe parenthesis)? - regex

I'm trying to update some setting for wordpress and I need to use sed. When I run the below command, it seems to think the line is not finished. What am I doing wrong?
$ sed -i 's/define\( \'DB_NAME\', \'database_name_here\' \);/define\( \'DB_NAME\', \'wordpress\' \);/g' /usr/share/nginx/wordpress/wp-settings.php
> ^C
Thanks.

Single quotes in most shells don't support any escaping. If you want to include a single quote, you need to close the single quotes and add the single quote - either in double quotes, or backslashed:
sed 's/define\( '\''DB_NAME'\'', '\''database_name_here'\'' \);/define\( '\''DB_NAME'\'', '\''wordpress'\'' \);/g'
I fear it still wouldn't work for you, as \( is special in sed. You probably want just a simple ( instead.
sed 's/define( '\''DB_NAME'\'', '\''database_name_here'\'' );/define( '\''DB_NAME'\'', '\''wordpress'\'' );/g'
or
sed 's/define( '"'"'DB_NAME'"'"', '"'"'database_name_here'"'"' );/define( '"'"'DB_NAME'"'"', '"'"'wordpress'"'"' );/g'

Normally, using single quotes around the script of a sed script is sensible. This is a case where double quotes would be a better choice — there are no shell metacharacters other than single quotes in the sed script:
sed -e "s/define( 'DB_NAME', 'database_name_here' );/define( 'DB_NAME', 'wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or:
sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or even:
sed -e "/define( 'DB_NAME', 'database_name_here' );/s/database_name_here/wordpress/g" /usr/share/nginx/wordpress/wp-settings.php
One other option to consider is using sed's -f option to provide the script as a file. That saves you from having to escape the script contents from the shell. The downside may be that you have to create the file, run sed using it, and then remove the file. It is likely that's too painful for the current task, but it can be sensible — it can certainly make life easier when you don't have to worry about shell escapes.
I'm not convinced the g (global replace) option is relevant; how many single lines are you going to find in the settings file containing two independent define DB_NAME operations with the default value?
You can add the -i option when you've got the basic code working. Do note that if you might ever work on macOS or a BSD-based system, you'll need to provide a suffix as an extra argument to the -i option (e.g. -i '' for a null suffix or no backup; or -i.bak to be able to work reliably on both Linux (or, more accurately, with GNU sed) and macOS and BSD (or, more accurately, with BSD sed). Appealing to POSIX is no help; it doesn't support an overwrite option.
Test case (first example):
$ echo "define( 'DB_NAME', 'database_name_here' );" |
> sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g"
define( 'DB_NAME', 'wordpress' );
$
If the spacing around 'DB_NAME' is not consistent, then you'd end up with more verbose regular expressions, using [[:space:]]* in lieu of blanks, and you'd find the third alternative better than the others, but the second could capture both the leading and trailing contexts and use both captures in the replacement.
Parting words: this technique works this time because the patterns don't involve shell metacharacters like $ or  ` . Very often, the script does need to match those, and then using mainly single quotes around the script argument is sensible. Tackling a different task — replace $DB_NAME in the input with the value of the shell variable $DB_NAME (leaving $DB_NAMEORHOST unchanged):
sed -e 's/$DB_NAME\([^[:alnum:]]\)/'"$DB_NAME"'\1/'
There are three separate shell strings, all concatenated with no spaces. The first is single-quoted and contains the s/…/ part of a s/…/…/ command; the second is "$DB_NAME", the value of the shell variable, double-quoted so that if the value of $DB_NAME is 'autonomous vehicle recording', you still have a single argument to sed; the third is the '\1/' part, which puts back whatever character followed $DB_NAME in the input text (with the observation that if $DB_NAME could appear at the end of an input line, this would not match it).
Most regexes do fuzzy matching; you have to consider variations on what might be in the input to determine how hard your regular expressions have to work to identify the material accurately.

Related

Trying to do a simple sed substitution but I'm confused about what needs to be escaped

I have this string: '$'nwnwnwnnn
And want to change it to: { bitset<9>(0bnwnwnwnnn), '$'},
I've looked at many similar questions for different shells using their methods but nothing has worked. I'm generally in zsh but I can use bash or another shell.
The general form I've been trying is this:
sed -E -i new s/(\'.\')([nw]+)/{ bitset<9>(0b\2), \1},/g thing.txt
It should work for any character other than $ and any sequence of n or w.
I'm generally confused as to what I need to escape here. Some answers on this site said to escape the parenthesis in the first part of the substitution.
Am I using -i incorrectly?
You need to escape the parentheses to create a capture group if you're using basic regexp, you don't escape them if you're using extended regexp. The -E option to GNU sed, and the -r option to standard sed, enable extended regexp, so you don't need to escape them.
If you only want to match $ rather than allow any character in the quotes, you need an escaped $.
You need to put the entire s/// command inside quotes, as it must be a single argument to the sed command.
When using -i, it's conventional to put a . before the suffix. Also, the suffix is put on the saved copy of the original file, not the new file that you're creating with the changes, so new is a poor suffix.
sed -E -i .bak "s/('\$')([nw]+)/{ bitset<9>(0b\2), \1},/g" thing.txt

sed regexp, number reformatting: how to escape for bash

I have a working (in macOS app Patterns) RegExp that reformats GeoJSON MultiPolygon coordinates, but don't know how to escape it for sed.
The file I'm working on is over 90 Mb in size, so bash terminal looks like the ideal place and sed the perfect tool for the job.
Search Text Example:
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
Desired outcome:
[[[37.9017735,69.400367955],[37.90098431,69.400425761],[37.90004869,69.400489545],[37.89915455,69.400578128],[37.89840665,69.400660744],[37.89747072,69.400762152],[37.89628639,69.400905283],[37.89545822,69.401014028],[37.89479369,69.401113128],[37.89414564,69.401195094],[37.89362565,69.401281229],[37.89276089,69.401414764],[37.89196611,69.401540312],[37.891721,69.401587053],[37.89137614,69.401634443],[37.89136515,69.401635893],[37.89114453,69.401663531],
My current RegExp:
((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))
and reformatting:
$1\.$2$4,$6.$7$9
The command should be something along these lines:
sed -i -e 's/ The RegExp escaped /$1\.$2$4,$6.$7$9/g' large_file.geojson
But what should be escaped in the RegExp to make it work?
My attempts always complain of being unbalanced.
I'm sorry if this has already been answered elsewhere, but I couldn't find even after extensive searching.
Edit: 2017-01-07: I didn't make it clear that the file contains properties other than just the GPS-points. One of the other example values picked from GeoJSON Feature properties is "35.642.1.001_001", which should be left unchanged. The braces check in my original regex is there for this reason.
That regex is not legal in sed; since it uses Perl syntax, my recommendation would be to use perl instead. The regular expression works exactly as-is, and even the command line is almost the same; you just need to add the -p option to get perl to operate in filter mode (which sed does by default). I would also recommend adding an argument suffix to the -i option (whether using sed or perl), so that you have a backup of the original file in case something goes horribly wrong. As for quoting, all you need to do is put the substitution command in single quotation marks:
perl -p -i.bak -e \
's/((?:\[)[0-9]{2})([0-9]+)(\.)([0-9]+)(,)([0-9]{2})([0-9]+)(\.)([0-9]+(?:\]))/$1\.$2$4,$6.$7$9/g' \
large_file.geojson
If your data is just like you showed, you needn't worry about the brackets. You may use a POSIX ERE enabled with -E (or -r in some other distributions) like this:
sed -i -E 's/([0-9]{2})([0-9]*)\.([0-9]+)/\1.\2\3/g' large_file.geojson
Or a POSIX BRE:
sed -i 's/\([0-9]\{2\}\)\([0-9]*\)\.\([0-9]\+\)/\1.\2\3/g' large_file.geojson
See an online demo.
You may see how this regex works here (just a demo, not proof).
Note that in POSIX BRE you need to escape { and } in limiting / range quantifiers and ( and ) in grouping constructs, and the + quantifier, else they denote literal symbols. In POSIX ERE, you do not need to escape the special chars to make them special, this POSIX flavor is closer to the modern regexes.
Also, you need to use \n notation inside the replacement pattern, not $n.
A simple sed will do it:
$ echo "$var"
[[[379017.735,6940036.7955],[379009.8431,6940042.5761],[379000.4869,6940048.9545],[378991.5455,6940057.8128],[378984.0665,6940066.0744],[378974.7072,6940076.2152],[378962.8639,6940090.5283],[378954.5822,6940101.4028],[378947.9369,6940111.3128],[378941.4564,6940119.5094],[378936.2565,6940128.1229],[378927.6089,6940141.4764],[378919.6611,6940154.0312],[378917.21,6940158.7053],[378913.7614,6940163.4443],[378913.6515,6940163.5893],[378911.4453,6940166.3531],
$ echo "$var" | sed 's/\([0-9]\{3\}\)\./.\1/g'
[[[379.017735,6940.0367955],[379.0098431,6940.0425761],[379.0004869,6940.0489545],[378.9915455,6940.0578128],[378.9840665,6940.0660744],[378.9747072,6940.0762152],[378.9628639,6940.0905283],[378.9545822,6940.1014028],[378.9479369,6940.1113128],[378.9414564,6940.1195094],[378.9362565,6940.1281229],[378.9276089,6940.1414764],[378.9196611,6940.1540312],[378.91721,6940.1587053],[378.9137614,6940.1634443],[378.9136515,6940.1635893],[378.9114453,6940.1663531],

egrep: how to search for text that includes double quotes (win7 cmd window)

I'm trying to make a TOC in my HTML file by searching for all HTML tags that contain one of three classes: article, section, and subsection.
I'm using GNU grep 2.4.2 in a Windows 7 cmd window. Now I've read at least 12 pages from my Google search and tried 20+ permutations of my grep command. I'm trying to find classes in my HTML file. Luckily in my HTML file there is only one HTML tag per line in the HTML file, which simplifies things.
I made a cmd batch file and tried running this and got various errors. I've tried escaping the double quotes, and not escaping them. I tried escaping the parens and not escaping them. I've tried different switches, with and without -E, etc. This is the regex I need to search for on every line and print the lines that match.
/class="\(article\|section\|subsection\)"/
This is one of my later grep attempts.
grep -i -E 'class="\(article\|section\|subsection\)"' ch18IP.htm
In this example I'm not getting any lines returned nor any error message. What am I doing wrong here?
Thank you!
You have three problems:
1) double quote " literals must be escaped as \" when using grep on windows.
2) meta-characters (, ), and | should only be escaped as \(, \), and \| when using basic mode. The -E exended regex option uses the more traditional unescaped form. This is documented at http://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html
3) If a parameter requires quoting on Windows, then double quotes are used, not single quotes. But in this case, enclosing quotes are not required, and would actually get in the way. I'll explain this later in the answer.
I also suggest that you add a word boundry assertion \b before class so that you don't mistakenly match something like subclass.
So either of the following should work:
grep -i -E \bclass=\"(article|section|subsection)\" ch18IP.htm
grep -i \bclass=\"\(article\|section\|subsection\)\" ch18IP.htm
It gets tricky if you want to enclose your search argument in quotes because the search term also includes quote literals, as well as poison characters like | that have special meaning to the cmd "shell". So you may end up having to escape some characters for both grep and cmd.exe. See https://stackoverflow.com/a/19816688/1012053 for more info.
In your case, here are two options for how you could quote your search term for Windows.
grep -i -E ^"\bclass=\"(article|section|subsection)\"^" ch18IP.htm
grep -i -E "\bclass=\"(article^|section^|subsection)\"" ch18IP.htm
That last form looks mighty weird if you decide to use the basic regex:
grep -i "\bclass=\"\(article\^|section\^|subsection\)\"" ch18IP.htm
Getting double-quotes as input on Windows cmd.exe command line is notoriously problematic. See if this works for you: https://www.gnu.org/software/gawk/manual/html_node/DOS-Quoting.html

Replace more than 150000 character with sed

I want to replace this LONG string with sed
And I got the string from grep which I store it into variable var
Here is my grep command and my var variable :
var=$(grep -P -o "[^:]//.{0,}" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map | grep -P -o "//.{0,})
Here is the output from grep : string
Then I try to replace it with sed command
sed -i "s|$var||g" /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
But it give me output bash: /bin/sed: Argument list too long
How can I replace it?
NB : That string has 183544 character in one line.
What are you actually trying to accomplish here? sed is line-oriented, so you cannot replace a multi-line string (not even if you replace literal newlines with \n .... Well, there are ways to write a sed script which effectively replaces a sequence of lines, but it gets tortured quickly).
bash$ var=$(head -n 2 /etc/mtab)
bash$ sed "s|$var||" /etc/mtab
sed: -e expression #1, char 25: unterminated `s' command
bash$ sed "s|${var//$'\n'/\\n}||" /etc/mtab | diff -u /etc/mtab -
bash$ # (didn't replace anything, so no output)
As a workaround, what you probably want could be approached by replacing the newlines in $var with \| (or possibly just |, depending on your sed dialect) similarly to what was demonstrated above, but you'd still be bumping into the ARG_MAX limit and have a bunch of other pesky wrinkles to iron out, so let's not go there.
However, what you are attempting can be magnificently completed by sed itself, all on its own. You don't need a list of the strings; after all, sed too can handle regular expressions (and nothing in the regex you are using actually requires Perl extensions, so the -P option is by and large superfluous).
sed -i 's%\([^:]\)//.*%\1%' file
There is a minor caveat -- if there are strings which occur both with and without : in front, your original command would have replaced them all (if it had worked), whereas this one will only replace the occurrences which do not have a colon in front. That means comments at beginning of line will not be touched -- if you want them removed too, just add a line anchor as an alternative; sed -i 's%\(^\|[^:]\)//.*%\1%' file
If you want the comments in var for other reasons, the grep can be cleaned up significantly, too. (Obviously, you'd run this before performing the replacement.)
var=$(grep -P -o '[^:]\K//.*' file)
(The \K extension is one which genuinely requires -P. And of course, the common, clear, standard, readable, portable, obvious, simple way to write {0,} is *.)
On most systems these days, the value of ARG_MAX is big enough to handle 150k without problems, but it is important to note that while the limit is called ARG_MAX and the error message indicates that the command line is too long, the real limit is the sum of the sizes of the arguments and all (exported) environment variables. Also, Linux imposes a limit of 128k (131,072 bytes) for a single argument string. Exceeding any of these limits triggers an error return of E2BIG, which is printed as "Argument list too long".
In any case, bash built-ins are exempt from the limit, so you should be able to feed the command into sed as a command file:
echo "s|$var||g" | sed -f - -i /home/lazuardi/project/assets/static/admin/bootstrap3/css/bootstrap.css.map
That may not help you much, though. Your variable is full of regex metacharacters, so it will not match the string itself. You'll need to clean it up in order to be able to use it as a regular expression.
There's probably a cleaner way to do that edit, though.

Simplest, Safe Method for Trimming File Paths

I have a script that does a lot of file-processing, and it's good enough to receive its paths using null-characters as a separator for safety.
However, it process all paths as absolute (saves some headaches), but these are a bit unwieldy for output purposes, so I'd like to remove a chunk of the path from my output. Now, plenty of options spring to mind, but the difficulty is in using these in a way that's safe for any arbitrary path that I might encounter, which is where things get a bit trickier.
Here's a quick example:
#!/bin/sh
TARGET="$1"
find "$TARGET" -print0 | while IFS= read -rd '' path; do
# Process path for output here
path_str="$path"
echo "$path_str"
done
So in the above script I want to take path and remove TARGET from it, in the most compatible way possible (e.g - nothing bash specific), it needs to be able to remove only from the start of the string, i.e - /foo/bar becomes bar, /foo/bar/foo becomes bar/foo and /bar/foo remains /bar/foo. It should also cope with any possible characters in a file-name, including characters that some file-systems support such as tildes, colons etc., as well as pesky inverted quotation characters.
I've hacked together some messy solutions using sed by first escaping any characters that might break my regular expression, but this is a very messy way of doing things, so I'm hoping there are some simpler methods out there. In case there isn't, here's by solution so far:
SAFE_CHARS='s:\([[/.*]\):\\\1:g'
target_safe=$(printf '%s' "$TARGET" | sed "$SAFE_CHARS")
path_str=$(printf '%s' "$path" | sed "s/^$target_safe//g')
There's probably a few characters missing that I should be escaping in addition to those ones, and apologies for any typos.
To remove a prefix from a string,
$ TARGET=/foo/
$ path=/foo/bar
$ echo "${path#$TARGET}"
bar
The # operator for parameter expansion is part of the POSIX standard and will work in any POSIX-compliant shell.
You can try this simple find:
export TARGET="$1"
find "$TARGET" -exec bash -c 'sed "s|^$TARGET\/||" <<< "$1"' - '{}' \;