sed error "unterminated 's' command" troubleshooting - regex

I am building a script that will, among other things, replace a pattern in an XML file with a folder path.
The sed command I am trying to use is:
SEDCMD="s|PATHTOEXPORT|$2|"
where $2 is the command-line parameter that has the folder path in it.
This is later called:
sed -e $SEDCMD $FILTER > $TEMPFILTER
However, on running the command, I am getting an "unterminated 's' command" error.
How can I get around this? I've tried changing the characters used to separate the regex (from / to |). And I've tried quoting (in different ways) the command-line parameter.

The shell is seeing the parsing the contents of $SEDCMD. If you’re using this from a shell script, including a Makefile, you should always protect all your expanded variables with double quotes. The double quotes will force variable interpolation but protect any shell metacharacter from further interpretation.
sed -e "$SEDCMD" "$FILTER" > "$TEMPFILTER"
I assume that $FILTER and $TEMPFILTER are filenames? I’ve quoted them, too, just in case they contain evil things like whitespace or other sorts of shell metacharacters; bizarre, yes, but it’s been known to happen. A regularly run rename 's/\s+/_/g' on filenames to clean them of whitespace, but for the others, you'll have to take a more careful approach; e.g., what to do with stars vs question marks vs brackets and parens, etc.
If you add -x and/or -v to your shell command line, you’ll get some trace debugging, which I think would likely have shown where you went amiss here.

Related

Linux: rename files containing ASCII-Code for capital letters

I have a collection of files where the capital letters are replaced by their ASCII-code (example ;065 for A). How can I most effectively recursively rename them from the command line?
Since I don't want to make the mess worse, I unfortunately don't know how test any commands...
For me it would be no problem to modify the command for each letter.
Many Linux distributions ship some variant or another of the Perl rename script, sometimes as prename, sometimes as rename. Any variant will do, but not the Linux rename utility that isn't written in Perl (run it with no argument and see if the help text mentions perl anywhere). This script runs Perl code on file names, typically a regex replacement.
prename -n 's/;(03[2-9]|0[4-9][0-9]|1[01][0-9]|12[0-6])/chr($1)/eg' *
I made a regular expression that matches three-digit numbers that are the character code of a printable ASCII character. You may need to adjust it depending on exactly what can follow a semicolon. The * at the end says to rename all files in the current directory, it's just a normal shell wildcard. It's ok to include files that don't contain anything to rename: prename will just skip them.
The -n option says to show what would be done, but don't actually rename any file. Review the output. If you're happy with it, run the command again without -n to actually rename the files.

Where is this Regex expression not closed in sed (apostrophe parenthesis)?

I'm trying to update some setting for wordpress and I need to use sed. When I run the below command, it seems to think the line is not finished. What am I doing wrong?
$ sed -i 's/define\( \'DB_NAME\', \'database_name_here\' \);/define\( \'DB_NAME\', \'wordpress\' \);/g' /usr/share/nginx/wordpress/wp-settings.php
> ^C
Thanks.
Single quotes in most shells don't support any escaping. If you want to include a single quote, you need to close the single quotes and add the single quote - either in double quotes, or backslashed:
sed 's/define\( '\''DB_NAME'\'', '\''database_name_here'\'' \);/define\( '\''DB_NAME'\'', '\''wordpress'\'' \);/g'
I fear it still wouldn't work for you, as \( is special in sed. You probably want just a simple ( instead.
sed 's/define( '\''DB_NAME'\'', '\''database_name_here'\'' );/define( '\''DB_NAME'\'', '\''wordpress'\'' );/g'
or
sed 's/define( '"'"'DB_NAME'"'"', '"'"'database_name_here'"'"' );/define( '"'"'DB_NAME'"'"', '"'"'wordpress'"'"' );/g'
Normally, using single quotes around the script of a sed script is sensible. This is a case where double quotes would be a better choice — there are no shell metacharacters other than single quotes in the sed script:
sed -e "s/define( 'DB_NAME', 'database_name_here' );/define( 'DB_NAME', 'wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or:
sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g" /usr/share/nginx/wordpress/wp-settings.php
or even:
sed -e "/define( 'DB_NAME', 'database_name_here' );/s/database_name_here/wordpress/g" /usr/share/nginx/wordpress/wp-settings.php
One other option to consider is using sed's -f option to provide the script as a file. That saves you from having to escape the script contents from the shell. The downside may be that you have to create the file, run sed using it, and then remove the file. It is likely that's too painful for the current task, but it can be sensible — it can certainly make life easier when you don't have to worry about shell escapes.
I'm not convinced the g (global replace) option is relevant; how many single lines are you going to find in the settings file containing two independent define DB_NAME operations with the default value?
You can add the -i option when you've got the basic code working. Do note that if you might ever work on macOS or a BSD-based system, you'll need to provide a suffix as an extra argument to the -i option (e.g. -i '' for a null suffix or no backup; or -i.bak to be able to work reliably on both Linux (or, more accurately, with GNU sed) and macOS and BSD (or, more accurately, with BSD sed). Appealing to POSIX is no help; it doesn't support an overwrite option.
Test case (first example):
$ echo "define( 'DB_NAME', 'database_name_here' );" |
> sed -e "s/\(define( 'DB_NAME', '\)database_name_here' );/\1wordpress' );/g"
define( 'DB_NAME', 'wordpress' );
$
If the spacing around 'DB_NAME' is not consistent, then you'd end up with more verbose regular expressions, using [[:space:]]* in lieu of blanks, and you'd find the third alternative better than the others, but the second could capture both the leading and trailing contexts and use both captures in the replacement.
Parting words: this technique works this time because the patterns don't involve shell metacharacters like $ or  ` . Very often, the script does need to match those, and then using mainly single quotes around the script argument is sensible. Tackling a different task — replace $DB_NAME in the input with the value of the shell variable $DB_NAME (leaving $DB_NAMEORHOST unchanged):
sed -e 's/$DB_NAME\([^[:alnum:]]\)/'"$DB_NAME"'\1/'
There are three separate shell strings, all concatenated with no spaces. The first is single-quoted and contains the s/…/ part of a s/…/…/ command; the second is "$DB_NAME", the value of the shell variable, double-quoted so that if the value of $DB_NAME is 'autonomous vehicle recording', you still have a single argument to sed; the third is the '\1/' part, which puts back whatever character followed $DB_NAME in the input text (with the observation that if $DB_NAME could appear at the end of an input line, this would not match it).
Most regexes do fuzzy matching; you have to consider variations on what might be in the input to determine how hard your regular expressions have to work to identify the material accurately.

egrep: how to search for text that includes double quotes (win7 cmd window)

I'm trying to make a TOC in my HTML file by searching for all HTML tags that contain one of three classes: article, section, and subsection.
I'm using GNU grep 2.4.2 in a Windows 7 cmd window. Now I've read at least 12 pages from my Google search and tried 20+ permutations of my grep command. I'm trying to find classes in my HTML file. Luckily in my HTML file there is only one HTML tag per line in the HTML file, which simplifies things.
I made a cmd batch file and tried running this and got various errors. I've tried escaping the double quotes, and not escaping them. I tried escaping the parens and not escaping them. I've tried different switches, with and without -E, etc. This is the regex I need to search for on every line and print the lines that match.
/class="\(article\|section\|subsection\)"/
This is one of my later grep attempts.
grep -i -E 'class="\(article\|section\|subsection\)"' ch18IP.htm
In this example I'm not getting any lines returned nor any error message. What am I doing wrong here?
Thank you!
You have three problems:
1) double quote " literals must be escaped as \" when using grep on windows.
2) meta-characters (, ), and | should only be escaped as \(, \), and \| when using basic mode. The -E exended regex option uses the more traditional unescaped form. This is documented at http://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html
3) If a parameter requires quoting on Windows, then double quotes are used, not single quotes. But in this case, enclosing quotes are not required, and would actually get in the way. I'll explain this later in the answer.
I also suggest that you add a word boundry assertion \b before class so that you don't mistakenly match something like subclass.
So either of the following should work:
grep -i -E \bclass=\"(article|section|subsection)\" ch18IP.htm
grep -i \bclass=\"\(article\|section\|subsection\)\" ch18IP.htm
It gets tricky if you want to enclose your search argument in quotes because the search term also includes quote literals, as well as poison characters like | that have special meaning to the cmd "shell". So you may end up having to escape some characters for both grep and cmd.exe. See https://stackoverflow.com/a/19816688/1012053 for more info.
In your case, here are two options for how you could quote your search term for Windows.
grep -i -E ^"\bclass=\"(article|section|subsection)\"^" ch18IP.htm
grep -i -E "\bclass=\"(article^|section^|subsection)\"" ch18IP.htm
That last form looks mighty weird if you decide to use the basic regex:
grep -i "\bclass=\"\(article\^|section\^|subsection\)\"" ch18IP.htm
Getting double-quotes as input on Windows cmd.exe command line is notoriously problematic. See if this works for you: https://www.gnu.org/software/gawk/manual/html_node/DOS-Quoting.html

Create directory based on part of filename

First of all, I'm not a programmer — just trying to learn the basics of shell scripting and trying out some stuff.
I'm trying to create a function for my bash script that creates a directory based on a version number in the filename of a file the user has chosen in a list.
Here's the function:
lav_mappe () {
shopt -s failglob
echo "[--- Choose zip file, or x to exit ---]"
echo ""
echo ""
select zip in $SRC/*.zip
do
[[ $REPLY == x ]] && . $HJEM/build
[[ -z $zip ]] && echo "Invalid choice" && continue
echo
grep ^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$ $zip; mkdir -p $MODS/out/${ver}
done
}
I've tried messing around with some other commands too:
for ver in $zip; do
grep "^[0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}$" $zip; mkdir -p $MODS/out/${ver}
done
And also find | grep — but I'm doing it wrong :(
But it ends up saying "no match" for my regex pattern.
I'm trying to take the filename the user has selected, then grep it for the version number (ALWAYS x.xx.x somewhere in the filename), and fianlly create a directory with just that.
Could someone give me some pointers what the command chain should look like? I'm very unsure about the structure of the function, so any help is appreciated.
EDIT:
Ok, this is how the complete function looks like now: (Please note, the sed(1) commands besides the directory creation is not created by me, just implemented in my code.)
Pastebin (Long code.)
I've got news for you. You are writing a Bash script, you are a programmer!
Your Regular Expression (RE) is of the "wrong" type. Vanilla grep uses a form known as "Basic Regular Expressions" (BRE), but your RE is in the form of an Extended Regular Expression (ERE). BRE's are used by vanilla grep, vi, more, etc. EREs are used by just about everything else, awk, Perl, Python, Java, .Net, etc. Problem is, you are trying to look for that pattern in the file's contents, not in the filename!
There is an egrep command, or you can use grep -E, so:
echo $zip|grep -E '^[0-9]\.[0-9]{1,2}\.[0-9]{1,2}$'
(note that single quotes are safer than double). By the way, you use ^ at the front and $ at the end, which means the filename ONLY consists of a version number, yet you say the version number is "somewhere in the filename". You don't need the {1} quantifier, that is implied.
BUT, you don't appear to be capturing the version number either.
You could use sed (we also need the -E):
ver=$(echo $zip| sed -E 's/.*([0-9]\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
The \1 on the right means "replace everything (that's why we have the .* at front and back) with what was matched in the parentheses group".
That's a bit clunky, I know.
Now we can do the mkdir (there is no merit in putting everything on one line, and it makes the code harder to maintain):
mkdir -p "$MODS/out/$ver"
${ver} is unnecessary in this case, but it is a good idea to enclose path names in double quotes in case any of the components have embedded white-space.
So, good effort for a "non-programmer", particularly in generating that RE.
Now for Lesson 2
Be careful about using this solution in a general loop. Your question specifically uses select, so we cannot predict which files will be used. But what if we wanted to do this for every file?
Using the solution above in a for or while loop would be inefficient. Calling external processes inside a loop is always bad. There is nothing we can do about the mkdir without using a different language like Perl or Python. But sed, by it's nature is iterative, and we should use that feature.
One alternative would be to use shell pattern matching instead of sed. This particular pattern would not be impossible in the shell, but it would be difficult and raise other questions. So let's stick with sed.
A problem we have is that echo output places a space between each field. That gives us a couple of issues. sed delimits each record with a newline "\n", so echo on its own won't do here. We could replace each space with a new-line, but that would be an issue if there were spaces inside a filename. We could do some trickery with IFS and globbing, but that leads to unnecessary complications. So instead we will fall back to good old ls. Normally we would not want to use ls, shell globbing is more efficient, but here we are using the feature that it will place a new-line after each filename (when used redirected through a pipe).
while read ver
do
mkdir "$ver"
done < <(ls $SRC/*.zip|sed -E 's/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/\1/')
Here I am using process substitution, and this loop will only call ls and sed once. BUT, it calls the mkdir program n times.
Lession 3
Sorry, but that's still inefficient. We are creating a child process for each iteration, to create a directory needs only one kernel API call, yet we are creating a process just for that? Let's use a more sophisticated language like Perl:
#!/usr/bin/perl
use warnings;
use strict;
my $SRC = '.';
for my $file (glob("$SRC/*.zip"))
{
$file =~ s/.*([0-9]{1}\.[0-9]{1,2}\.[0-9]{1,2}).*/$1/;
mkdir $file or die "Unable to create $file; $!";
}
You might like to note that your RE has made it through to here! But now we have more control, and no child processes (mkdir in Perl is a built-in, as is glob).
In conclusion, for small numbers of files, the sed loop above will be fine. It is simple, and shell based. Calling Perl just for this from a script will probably be slower since perl is quite large. But shell scripts which create child processes inside loops are not scalable. Perl is.

Perl regex: replace all backslashes with double-backslashes

Within a set of large files, I need to replace all occurrences of "\" with "\\". I'd like to use Perl for this purpose. Right now, I have the following:
perl -spi.bak -e '/s/\\/\\\\/gm' inputFile
This command was suggested to me, but it results in no change to inputFile (except an updated timestamp). Thinking that the problem might be that the "\"s were not surrounded by blanks, I tried
perl -spi.bak -e '/s/.\\./\\\\/gm' inputFile
Again, this had no effect on the file. Finally, I thought I might be missing a semicolon, so I tried:
perl -spi.bak -e '/s/.\\./\\\\/gm;' inputFile
This also has no effect. I know that my file contains "\"s, for example in the following line:
("C:\WINDOWS\system32\iac25_32.ax","Indeo audio)
I'm not sure whether there is a problem with the regex, or if something is wrong with the way I'm invoking Perl. I have a basic understanding of regexes, but I'm an absolute beginner when it comes to Perl.
Is there anything obviously wrong here? One thing I notice is that the command returns quite quickly, despite the fact that inputFile is ~10MB in size.
The hard part with handling backslashes in command lines is knowing how many processes are going to manipulate the command line - and what their quoting rules are.
On Unix, under any shell, the first command line you show would work.
You appear to be on Windows, and there, you have the DOS command 'shell' to deal with.
I would put the replacement into a file and pass that to Perl:
#!/bin/perl -spi.bak
s/\\/\\\\/g;
That should do the trick - save as 'subber.pl' and then run:
perl subber.pl file1 ...
How about this it should replace all \ with two \s.
s/\\/\\\\/g
perl -pi -e 's/\\/\\\\/g' inputfile
will replace all of them in one file
this
s/\\/\\\\/g
works for me
You've got a renegade / in the front of the substitution flag at the beginning of the regex
don't use
.\\.
otherwise it will trash whatever's before and after the \ in the file
perl -spi.bak -e 's/.\\./\\\\/gm;' inputFile
maybe?
Why did you type that leading /?
You appear to be on Windows, and
there, you have the DOS command
'shell' to deal with.
Hopefully I am not splitting hairs but Windows hasn't come with DOS for a long time now. I think ME (circa 1999/2000) was last version that still came with DOS or was built on DOS. The "command" shell is controlled by cmd.exe since XP (a sort of DOS simulation), but the affects of running a perl one-liner in a command shell might still be the same as running them in a DOS shell. Maybe someone can verify that.