Extract string between single quotes with sed - regex

I have a thousand Delphi files (.pas), and I need to extract text from them.
The text I need is between single quotes (Pascal strings), and I only need the strings called from a particular function. E.g.: my_function('This is the string I need')
I have extracted all the lines that appear the function and added to a text file, using find and grep, but I'm unable to extract the strings.
I've been looking around the Internet for a regex to extract this strings, but I don't know how to do this. I'm trying with this:
sed "s/.*my_function\('(.*)'\).*/\1/" all_the_strings.txt > my_out_file.txt
But it doesn't work (I'm not an expert with regex...).
Can you help me with this?

This might work for you (GNU sed):
sed -nr "s/.*my_function\('([^']*)'\).*/\1/p" all_the_strings.txt > my_out_file.txt

You can try this:
sed 's/.*my_function(.\(.*\).).*/\1/;'

Your solution doesn't escape parentheses at right place. In sed they are not special metacharacters, so they match literal.
You must escape them to do grouping, so change the regexp to escape the internal ones, like:
sed "s/.*my_function('\(.*\)').*/\1/" all_the_strings.txt > my_out_file.txt

Related

How can I translate a regex within vim to work with sed?

I have a string that exists within a text file that I am trying to modify with regex.
"configuration_file_for_wks_33-40"
and I want to modify it so that it looks like this
"configuration_file_for_wks_33-40_6ks"
Within vim I can accomplish this with the following regex command
%s/33-\(\d\d\)/33-\1_6ks/
But if I try to pass that regex command to sed such as
sed 's/33-\(\d\d\)/33-\1_6ks/' input_file.json
The string is not changed, even if I include the -e parameter.
I have also tried to do this using ex as
echo '%s/33-\(\d\d\)/33-\1_6ks/' | ex input_file.json
If I use
sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
then I get
configuration_file_for_wks_33-_6ks40
For that, I've tried various different escaping patterns without any luck.
Can someone help me understand why this changes are not working?
vim has a different syntax for regular expressions (which is even configurable). Unfortunately, sed doesn't understand \d (see https://unix.stackexchange.com/a/414230/304256). With -E, you can match digits with [0-9] or [[:digit:]]:
$ sed -E 's/33-[0-9][0-9]/&_6ks/'
configuration_file_for_wks_33-40_6ks
Note that you can use & in the replacement for adding the entire matched string.
So why is this:
$ sed 's/wks_33-\(\d\d\)*/wks_33-\1_6ks/' input_file.json
configuration_file_for_wks_33-_6ks40
Here, (\d\d)* is simply matched 0 times, so you replace wks_33- by wks_33-_6ks (\1 is a zero-length string) and 40 remains where it was before.
Translation from one language to another is best done with some reference material on hand:
sed BRE syntax
sed ERE syntax
sed classes
sed RE extensions
The superficial reading of which shows that sed doesn't support \d.
Possible alternatives to \d\d:
[[:digit:]]\{2\}
[0-9]\{2\}
How can I translate a regex within vim to work with sed?
Since you write "a regex", I think you refer to any regex.
Translating a Vim regex to a Sed regex is not always possible, because a Vim regex can have lookarounds, whereas a Sed regex has no such things.

Enclosing strings with forward slashes using AWK

I have a php file in which split() function was used extensively. I replaced it to preg_split using sed and find commands. The problem now is preg_split requires the regex pattern to be enclosed in delimiters while split does not require it.
I have tried using SED to enclose the strings with delimiters but SED is unable to it according to my knowledge. I have come to know that AWK kan solve this problem.
I want
preg_split('\r\n', $some_string);
to be modified as
preg_split('/\r\n/', $some_string);
where the forward slashes work as delimiters. How can this be done using AWK?
sed is perfectly capable of this.
sed "s:\(preg_split('\)\(([^']*\)':\1/\2/':g" file.php
Your sed dialect might want a different mix of backslashes; or use Perl (or, ugh, PHP);
perl -pi~ -e "s:(preg_split\(')([^']*)':$1/$2/':g" file.php
(Notice the -i flag for in-place editing; perhaps your sed supports that, too?)
I'm imagining your problem was with quoting rather than with the actual sed regex. Getting single quotes properly quoted in the shell can be a challenge. (In the worst case, put your shell script in a file so the shell won't see it.) And of course, using a different delimiter instead of slash makes the expression simpler.
That should work as you expect:
sed "s#preg_split('\(.*\)'#preg_split('/\1/'#g"
As #Stephen P mentioned in comment. You can use different delimeters with sed. If your delimiter is used in regex or replacement string you have to escape it using \. It's always simplier to use the delimiter which does not exist in your regex and replacement string. Here, I used #.

sed to remove character pattern from end of line?

I've seen several examples on here of something close to what I'm asking, but not quite.
I have some pipe-delimited flat files which have some extraneous column data that I want to strip out using sed. the basic structure looks like this:
Column1|Column2|Column3|ignore
data1|data2|data3|ignore
data4|data5|data6|ignore
I want an expression using sed that will produce:
Column1|Column2|Column3
data1|data2|data3
data4|data5|data6
This should be stupid easy, but as always regular expressions and sed manage to hurt my brain. I thought this would work:
sed "s/\|ignore//" table1.txt >filtered.txt
but this seems to do nothing. What am I doing wrong?
NOTE: This is GNU sed for Windows.
Don't escape the pipe.
$ sed 's/|ignore//' table1.txt > filtered.txt
works on my machine. (GNU sed on Cygwin.)
The idea here is that \| is the regex pipe, not the literal pipe. I don't quite know how to figure these things out, but to use (, {, or | in sed regex, you must escape them. But [ is not escaped, unless you want the literal character.
Change \| to |. You don’t want an alternative, you want a literal pipe.
Or, if you use \|, pass -r to sed to indicate you want the extended syntax.
Several possible solutions here. Also, why not use cut?

Match single character between Start string and End string

I can't seem to understand regular expression at all. How can I match a character which resides between a START and END string. For Example
#START-EDIT
#ValueA=0
#ValueB=1
#END-EDIT
I want to match any # which is between the #START-EDIT and #END-EDIT.
Specifically I want to use sed to replace the matches # values with nothing (delete them) on various files which may or may not have multiple START-EDIT and END-EDIT sections.
^#START-EDIT.*(#) *. *#END-EDIT$
sed is line based. you can easily search, replace based on regex in one line. But there is no really easy way to search/replace on multilines. AWK might do the trick.
If you have the regex on one line, the following command could be what you are looking for
sed -e "/^#START-EDIT.*#END-EDIT$//" myInput.txt

What is the best way to do string manipulation in a shell script?

I have a path as a string in a shell-script, could be absolute or relative:
/usr/userName/config.cfg
or
../config.cfg
I want to extract the file name (part after the last /, so in this case: "config.cfg")
I figure the best way to do this is with some simple regex?
Is this correct? Should or should I use sed or awk instead?
Shell-scripting's string manipulation features seem pretty primative by themselves, and appear very esoteric.
Any example solutions are also appreciated.
If you're okay with using bash, you can use bash string expansions:
FILE="/path/to/file.example"
FILE_BASENAME="${FILE##*/}"
It's a little cryptic, but the braces start the variable expansion, and the double hash does a greedy removal of the specified glob pattern from the beginning of the string.
Double %% does the same thing from the end of a string, and a single percent or hash does a non-greedy removal.
Also, a simple replace construct is available too:
FILE=${FILE// /_}
would replace all spaces with underscores for instance.
A single slash again, is non-greedy.
Instead of string manipulation I'd just use
file=`basename "$filename"`
Edit:
Thanks to unwind for some newer syntax for this (which assumes your filename is held in $filename):
file=$(basename $filename)
Most environments have access to perl and I'm more comfortable with that for most string manipulation.
But as mentioned, from something this simple, you can use basename.
I typically use sed with a simple regex, like this:
echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
result:
>echo "/usr/userName/config.cfg" | sed -e 's+^.*/++'
config.cfg