Enclosing strings with forward slashes using AWK - regex

I have a php file in which split() function was used extensively. I replaced it to preg_split using sed and find commands. The problem now is preg_split requires the regex pattern to be enclosed in delimiters while split does not require it.
I have tried using SED to enclose the strings with delimiters but SED is unable to it according to my knowledge. I have come to know that AWK kan solve this problem.
I want
preg_split('\r\n', $some_string);
to be modified as
preg_split('/\r\n/', $some_string);
where the forward slashes work as delimiters. How can this be done using AWK?

sed is perfectly capable of this.
sed "s:\(preg_split('\)\(([^']*\)':\1/\2/':g" file.php
Your sed dialect might want a different mix of backslashes; or use Perl (or, ugh, PHP);
perl -pi~ -e "s:(preg_split\(')([^']*)':$1/$2/':g" file.php
(Notice the -i flag for in-place editing; perhaps your sed supports that, too?)
I'm imagining your problem was with quoting rather than with the actual sed regex. Getting single quotes properly quoted in the shell can be a challenge. (In the worst case, put your shell script in a file so the shell won't see it.) And of course, using a different delimiter instead of slash makes the expression simpler.

That should work as you expect:
sed "s#preg_split('\(.*\)'#preg_split('/\1/'#g"
As #Stephen P mentioned in comment. You can use different delimeters with sed. If your delimiter is used in regex or replacement string you have to escape it using \. It's always simplier to use the delimiter which does not exist in your regex and replacement string. Here, I used #.

Related

Expand environment variable inside Perl regex

I am having trouble with a short bash script. It seems like all forward slashes needs to be escaped. How can required characters in expanded (environment) variables be escaped before perl reads them? Or some other method that perl understands.
This is what I am trying to do, but this will not work properly.
eval "perl -pi -e 's/$HOME\/_TV_rips\///g'" '*$videoID.info.json'
That is part of a longer script where videoID=$1. (And for some reason perl expands variables both within single and double quotes.)
This simple workaround with no forward slash in the expanded environment variable $USER works. But I would like to not have /Users/ hard coded:
eval "perl -pi -e 's/\/Users\/$USER\/_TV_rips\///g'" '*$videoID.info.json'
This is probably solvable in some better way fetching home dir for files or something else. The goal is to remove the folder name in youtube-dl's json data.
I am using perl just because it can handle extended regex. But perl is not required. Any better substitute for extended regex on macOS is welcome.
You are building the following Perl program:
s//home/username\/_TV_rips\///g
That's quite wrong.
You shouldn't be attempting to build Perl code from the shell in the first place. There are a few ways you could pass values to the Perl code instead of generating Perl code. Since the value is conveniently in the environment, we can use
perl -i -pe's/\Q$ENV{HOME}\E\/_TV_rips\///' *"$videoID.info.json"
or better yet
perl -i -pe's{\Q$ENV{HOME}\E/_TV_rips/}{}' *"$videoID.info.json"
(Also note the lack of eval and the fixed quoting on the glob.)
Just assembling the ideas in comments, this should achieve what you expected :
perl -pi -e 's{$ENV{HOME}/_TV_rips/}{}g' *$videoID.info.json
#ikegami thanks for your comment! It is indeed safer with \Q...\E, in case $HOME contains characters like $.
All RegEx delimiters must of cource be escaped in input String.
But as Stefen stated, you can use other delimiters in perl, like %, ยง.
Special characters
# Perl comment - don't use this
?,[], {}, $, ^, . Regex control chars - must be escaped in Regex. That makes it easier if you have many slashes in your string.
You should always write a comment to make clear you are using different delimiters, because this makes your regex hard to read for inexperienced users.
Try out your RegEx here: https://regex101.com/r/cIWk1o/1

How can I translate a regex within vim to work with sed?

I have a string that exists within a text file that I am trying to modify with regex.
"configuration_file_for_wks_33-40"
and I want to modify it so that it looks like this
"configuration_file_for_wks_33-40_6ks"
Within vim I can accomplish this with the following regex command
%s/33-\(\d\d\)/33-\1_6ks/
But if I try to pass that regex command to sed such as
sed 's/33-\(\d\d\)/33-\1_6ks/' input_file.json
The string is not changed, even if I include the -e parameter.
I have also tried to do this using ex as
echo '%s/33-\(\d\d\)/33-\1_6ks/' | ex input_file.json
If I use
sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
then I get
configuration_file_for_wks_33-_6ks40
For that, I've tried various different escaping patterns without any luck.
Can someone help me understand why this changes are not working?
vim has a different syntax for regular expressions (which is even configurable). Unfortunately, sed doesn't understand \d (see https://unix.stackexchange.com/a/414230/304256). With -E, you can match digits with [0-9] or [[:digit:]]:
$ sed -E 's/33-[0-9][0-9]/&_6ks/'
configuration_file_for_wks_33-40_6ks
Note that you can use & in the replacement for adding the entire matched string.
So why is this:
$ sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
configuration_file_for_wks_33-_6ks40
Here, (\d\d)* is simply matched 0 times, so you replace wks_33- by wks_33-_6ks (\1 is a zero-length string) and 40 remains where it was before.
Translation from one language to another is best done with some reference material on hand:
sed BRE syntax
sed ERE syntax
sed classes
sed RE extensions
The superficial reading of which shows that sed doesn't support \d.
Possible alternatives to \d\d:
[[:digit:]]\{2\}
[0-9]\{2\}
How can I translate a regex within vim to work with sed?
Since you write "a regex", I think you refer to any regex.
Translating a Vim regex to a Sed regex is not always possible, because a Vim regex can have lookarounds, whereas a Sed regex has no such things.

Extract string between single quotes with sed

I have a thousand Delphi files (.pas), and I need to extract text from them.
The text I need is between single quotes (Pascal strings), and I only need the strings called from a particular function. E.g.: my_function('This is the string I need')
I have extracted all the lines that appear the function and added to a text file, using find and grep, but I'm unable to extract the strings.
I've been looking around the Internet for a regex to extract this strings, but I don't know how to do this. I'm trying with this:
sed "s/.*my_function\('(.*)'\).*/\1/" all_the_strings.txt > my_out_file.txt
But it doesn't work (I'm not an expert with regex...).
Can you help me with this?
This might work for you (GNU sed):
sed -nr "s/.*my_function\('([^']*)'\).*/\1/p" all_the_strings.txt > my_out_file.txt
You can try this:
sed 's/.*my_function(.\(.*\).).*/\1/;'
Your solution doesn't escape parentheses at right place. In sed they are not special metacharacters, so they match literal.
You must escape them to do grouping, so change the regexp to escape the internal ones, like:
sed "s/.*my_function('\(.*\)').*/\1/" all_the_strings.txt > my_out_file.txt

Making regular expressions look nice in shell scripts

I often use grep and sed in my bash scripts.
For example, I use a script to remove comments from a template
In this example the comments look like:
/*# my comments contain text and ascii art:
*#
*# [box1] ------> [box2]o
*#
#*/
My sed chain to remove these lines looks like:
sed '/^\/\*#/d' | sed '/^\s*\*#/d' | sed '/^\s*#\*\//d'
I my scripts, I have to escape chars such as \ and /, which makes the code less readable. Therefore, my question is: How can I write nice-to-read regular expressions for sed in bash scripts?
One way, I can think of, is by using another separator instead of /, as in vim where you can natively use %s#search/text#replace/text#gc (using # the as separator) and therefore allow / as unescaped character. Defining an alternative escape char would also help. I would be interested in how you solve this problem. I am also open for alternative tools in case you think it is only a sed problem.
You can specify different separators, as detailed here.
Note that Perl allows you to do this too, along with splitting your regexp across several lines for better readability.
I think trying to make regex (which a lot of times is a sequence of symbols) nice to read is pretty hard.
However there are a few things you can do:
Use -r (or -E in some systems) so that you don't have to escape regex operators (), {}, +, ?
Use alternative separators, e.g. for s command
sed 's#regex#replacement#' file
For address ranges (you'll need '\')
sed '\#pattern# d' file
Leave spaces between address range and command (like d above).
Leave comments explaining what the regex matches (you can even include an example).
3 and 4 are more of an indirect approach but they should help.
Anyway what you are doing can be done in a single sed expression:
sed '\:^/\*#:,\:^#\*/: d' file
In addition to using alternative separators you may use extended regular expressions where appropriate, they invert the escaping rules so you have to write square brackets as "\[\]" to give them the special meaning.

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:
/usr/userName/config.cfg
or
../config.cfg
I want to extract the file name (part after the last /, so in this case: "config.cfg")
I figure the best way to do this is with some simple regex?
Is this correct? Should or should I use sed or awk instead?
Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.
Any example solutions are also appreciated.
If you're okay with using bash, you can use bash string expansions:
FILE="/path/to/file.example"
FILE_BASENAME="${FILE##*/}"
It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.
Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.
Also, a simple replace construct is available too:
FILE=${FILE// /_}
would replace all spaces with underscores for instance.
A single slash again, is non-greedy.
Instead of string manipulation I'd just use
file=`basename "$filename"`
Edit:
Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):
file=$(basename $filename)
Most environments have access to perl and I'm more comfortable with that for most string manipulation.
But as mentioned, from something this simple, you can use basename.
I typically use sed with a simple regex, like this:
echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
result:
>echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
config.cfg