Find all text within square brackets using regex - regex

I have a problem that because of PHP version, I need to change my code from $array[stringindex] to $array['stringindex'];
So I want to find all the text using regex, and replace them all. How to find all strings that look like this? $array[stringindex].

Here's a solution in PHP:
$re = "/(\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\])/";
$str = "here is \$array[stringindex] but not \$array['stringindex'] nor \$3array[stringindex] nor \$array[4stringindex]";
$subst = "$1'$2'$3";
$result = preg_replace($re, $subst, $str);
You can try it out interactively here. I search for variables beginning with a letter, otherwise things like $foo[42] would be converted to $foo['42'], which might not be desirable.
Note that all the solutions here will not handle every case correctly.
Looking at the Sublime Text regex help, it would seem you could just paste (\\$[[:alpha:]][[:alnum:]]+\\[)([[:alpha:]][[:alnum:]]+)(\\]) into the Search box and $1'$2'$3 into the Replace field.

It depends of the tool you want to use to do the replacement.
with sed for exemple, it would be something like that:
sed "s/\(\$array\)\[\([^]]*\)\]/\1['\2']/g"

If sed is allowed you could simply do:
sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" file
Explanation:
sed "s/pattern/replace/g" is a sed command which searches for pattern and replaces it with replace. The g options means replace multiple times per line.
(\$[^[]*[)([^]]*)] this pattern consists of two groups (in between brackets). The first is a dollar followed by a series of non [ chars. Then an opening square bracket follows, followed by a series of non closing brackets which is then followed by a closing square bracket.
\1'\2'] the replacement string: \1 means insert the first captured group (analogous for \2. Basically we wrap \2 in quotes (which is what you wanted).
the -i options means that the changes should be applied to the original file, which is supplied at the end.
For more information, see man sed.
This can be combined with the find command, as follows:
find . -name '*.php' -exec sed -i "s/(\$[^[]*[)([^]]*)]/\1'\2']/g" '{}' \;
This will apply the sed command to all php files found.

Related

I need to use sed to comment out two lines in a text file

I am running a custom kernel build and have created a custom config file in a bash script, now I need to comment out two lines in Kbuild in order to prevent the bc compiler from running. The lines are...
$(obj)/$(timeconst-file): kernel/time/timeconst.bc FORCE
$(call filechk,gentimeconst)
Using Expresso, I have a regex that matches the first line...
^\$\(obj\)\/\$\(timeconst-file\): kernel\/time\/timeconst\.bc FORCE
Regex Match
But can't get sed to actually insert a # in front of the line.
Any help would be much appreciated.
sed -i "/<Something that matches the lines to be replaced>/s/^#*/#/g"
This uses a regex to select lines you want to comment/<something>/, then substitutes /s/ the start of the string ^(plus any #*s already there, with #. So you can comment lines that are already commented no problem. the /g means continue after you found your first match, so you can do mass commenting.
I have a bash script that I can mass comment using the above as:
sed -i.bkp "/$1/s/^#\+\s*//g" $2
i.bkp makes a backup of the file named .bkp
Script is called ./comment.sh <match> <filename>
The match does not have to match the entire line, just enough to make it only hit lines you want.
You can use following sed for replacement:
sed 's,^\($(obj)/$(timeconst-file): kernel/time/timeconst.bc FORCE\),#\1,'
You don't need to escape ( ) or $, as in sed without -r it is treated as literal, for grouping \( \) is used.

search and replace substring in string in bash

I have the following task:
I have to replace several links, but only the links which ends with .do
Important: the files have also other links within, but they should stay untouched.
<li>Einstellungen verwalten</li>
to
<li>Einstellungen verwalten</li>
So I have to search for links with .do, take the part before and remember it for example as $a , replace the whole link with
<s:url action=' '/>
and past $a between the quotes.
I thought about sed, but sed as I know does only search a whole string and replace it complete.
I also tried bash Parameter Expansions in combination with sed but got severel problems with the quotes and the variables.
cat ./src/main/webapp/include/stoBox2.jsp | grep -e '<a href=".*\.do">' | while read a;
do
b=${a#*href=\"};
c=${b%.do*};
sed -i 's/href=\"$a.do\"/href=\"<s:url action=\'$a\'/>\"/g' ./src/main/webapp/include/stoBox2.jsp;
done;
any ideas ?
Thanks a lot.
sed -i sed 's#href="\(.*\)\.do"#href="<s:url action='"'\1'"'/>"#g' ./src/main/webapp/include/stoBox2.jsp
Use patterns with parentheses to get the link without .do, and here single and double quotes separate the sed command with 3 parts (but in fact join with one command) to escape the quotes in your text.
's#href="\(.*\)\.do"#href="<s:url action='
"'\1'"
'/>"#g'
parameters -i is used for modify your file derectly. If you don't want to do this just remove it. and save results to a tmp file with > tmp.
Try this one:
sed -i "s%\(href=\"\)\([^\"]\+\)\.do%\1<s:url action='\2'/>%g" \
./src/main/webapp/include/stoBox2.jsp;
You can capture patterns with parenthesis (\(,\)) and use it in the replacement pattern.
Here I catch a string without any " but preceding .do (\([^\"]\+\)\.do), and insert it without the .do suffix (\2).
There is a / in the second pattern, so I used %s to delimit expressions instead of traditional /.

regexp greedness: shrinking long path

Please have a look at my mind-breaker.
I'd stuck in shrinking with regex some long path, like this:
/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890
I'd like to transform this path to the following form:
/123/123/123/123/12/1/123/123/123/123
each "directory" in a path abbreviates to only 3 first characters
LONG_PATH="/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890"
perl -pe "s#/(.{1,3})[^/]*?(/|$)#/\1\2#g" <<<$LONG_PATH
/123/123456/123/123/12//1234567/132/123456789/123
sed -E "s#/(.{1,3})[^/]*?(/|$)#/\1\2#g" <<<$LONG_PATH
/123/123456/123/123/12//1234567/132/123456789/123
I have tried also:
perl -pe "s,/(.)(.)?(.)?[^/]*+,/\1\2\3,g" <<<$LONG_PATH
/123/123/123/123/12//123/132/123/123
and many another, no "luck" - I still have no idea about.
Please point me a right way to success.
Match up to three non-slash characters and capture them. Then match the rest until the next slash. Replace by the capture:
"s#(/[^/]{3})[^/]*#\1#g"
There is no need for ungreediness or anything here, because the negated character class is mutually exclusive with the / or $.
EDIT: Although you seem to know this I should probably clarify for future visitors that this will work with either perl -pe... or sed -E... as you have used it in your question. The regex could also be used as is with sed -r.... If you leave out the -E or -r option, then (as usual) you will need to escape both the parentheses and curly brackets:
sed "s#\(/[^/]\{3\}\)[^/]*#\1#g" filename
Note also as ikegami points out that in Perl you should rather use $1 in the replacement than \1.
You could do it like this:
perl -pe's#[^/]{3}\K[^/]*##g'
/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890
/123/123/123/123/12/1/123/132/123/123
Find 3 non-slashes, and keep (\K) them, remove the following characters up until the next slash.
As ikegami pointed out, it is not required to match less than three characters, in which case a lookbehind assertion can be used instead of \K. The benefit is that \K requires perl v5.10, and I believe look-around assertions predate that.
perl -pe 's#(?<=[^/]{3})[^/]*##g'
The best way seems to use the File::Spec module to split and recombine a path. An intermediate call to map will reduce each path segment to its first three characters. This program demonstrates
use strict;
use warnings;
use File::Spec;
my $path = '/12345/123456/1234/123/12/1/1234567/13245678/123456789/1234567890';
my $newpath = File::Spec->catdir(map substr($_, 0, 3), File::Spec->splitdir($path));
print $newpath;
output
/123/123/123/123/12/1/123/132/123/123

using sed to copy lines and delete characters from the duplicates

I have a file that looks like this:
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
I want it to look like this
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
I thought I could use sed to do this but I can't figure out how to store something in a buffer and then modify it.
Am I even using the right tool?
Thanks
You don't have to get tricky with regular expressions and replacement strings: use sed's p command to print the line intact, then modify the line and let it print implicitly
sed 'p; s/\.png//'
Glenn jackman's response is OK, but it also doubles the rows which do not match the expression.
This one, instead, doubles only the rows which matched the expression:
sed -n 'p; s/\.png//p'
Here, -n stands for "print nothing unless explicitely printed", and the p in s/\.png//p forces the print if substitution was done, but does not force it otherwise
That is pretty easy to do with sed and you not even need to use the hold space (the sed auxiliary buffer). Given the input file below:
$ cat input
#"Afghanistan.png",
#"Albania.png",
#"Algeria.png",
#"American_Samoa.png",
you should use this command:
sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
The result:
$ sed 's/#"\([^.]*\)\.png",/&\
#"\1",/' input
#"Afghanistan.png",
#"Afghanistan",
#"Albania.png",
#"Albania",
#"Algeria.png",
#"Algeria",
#"American_Samoa.png",
#"American_Samoa",
This commands is just a replacement command (s///). It matches anything starting with #" followed by non-period chars ([^.]*) and then by .png",. Also, it matches all non-period chars before .png", using the group brackets \( and \), so we can get what was matched by this group. So, this is the to-be-replaced regular expression:
#"\([^.]*\)\.png",
So follows the replacement part of the command. The & command just inserts everything that was matched by #"\([^.]*\)\.png", in the changed content. If it was the only element of the replacement part, nothing would be changed in the output. However, following the & there is a newline character - represented by the backslash \ followed by an actual newline - and in the new line we add the #" string followed by the content of the first group (\1) and then the string ",.
This is just a brief explanation of the command. Hope this helps. Also, note that you can use the \n string to represent newlines in some versions of sed (such as GNU sed). It would render a more concise and readable command:
sed 's/#"\([^.]*\)\.png",/&\n#"\1",/' input
I prefer this over Carles Sala and Glenn Jackman's:
sed '/.png/p;s/.png//'
Could just say it's personal preference.
or one can combine both versions and apply the duplication only on lines matching the required pattern
sed -e '/^#".*\.png",/{p;s/\.png//;}' input

Is there a truly universal wildcard in Grep? [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 3 years ago.
Really basic question here. So I'm told that a dot . matches any character EXCEPT a line break. I'm looking for something that matches any character, including line breaks.
All I want to do is to capture all the text in a website page between two specific strings, stripping the header and the footer. Something like HEADER TEXT(.+)FOOTER TEXT and then extract what's in the parentheses, but I can't find a way to include all text AND line breaks between header and footer, does this make sense? Thanks in advance!
When I need to match several characters, including line breaks, I do:
[\s\S]*?
Note I'm using a non-greedy pattern
You could do it with Perl:
$ perl -ne 'print if /HEADER TEXT/ .. /FOOTER TEXT/' file.html
To print only the text between the delimiters, use
$ perl -000 -lne 'print $1 while /HEADER TEXT(.+?)FOOTER TEXT/sg' file.html
The /s switch makes the regular expression matcher treat the entire string as a single line, which means dot matches newlines, and /g means match as many times as possible.
The examples above assume you're cranking on HTML files on the local disk. If you need to fetch them first, use get from LWP::Simple:
$ perl -MLWP::Simple -le '$_ = get "http://stackoverflow.com";
print $1 while m!<head>(.+?)</head>!sg'
Please note that parsing HTML with regular expressions as above does not work in the general case! If you're working on a quick-and-dirty scanner, fine, but for an application that needs to be more robust, use a real parser.
By definition, grep looks for lines which match; it reads a line, sees whether it matches, and prints the line.
One possible way to do what you want is with sed:
sed -n '/HEADER TEXT/,/FOOTER TEXT/p' "$#"
This prints from the first line that matches 'HEADER TEXT' to the first line that matches 'FOOTER TEXT', and then iterates; the '-n' stops the default 'print each line' operation. This won't work well if the header and footer text appear on the same line.
To do what you want, I'd probably use perl (but you could use Python if you prefer). I'd consider slurping the whole file, and then use a suitably qualified regex to find the matching portions of the file. However, the Perl one-liner given by '#gbacon' is an almost exact transliteration into Perl of the 'sed' script above and is neater than slurping.
The man page of grep says:
grep, egrep, fgrep, rgrep - print lines matching a pattern
grep is not made for matching more than a single line. You should try to solve this task with perl or awk.
As this is tagged with 'bbedit' and BBedit supports Perl-Style Pattern Modifiers you can allow the dot to match linebreaks with the switch (?s)
(?s).
will match ANY character. And yes,
(?s).+
will match the whole text.
As pointed elsewhere, grep will work for single line stuff.
For multiple-lines (in ruby with Regexp::MULTILINE, or in python, awk, sed, whatever), "\s" should also capture line breaks, so
HEADER TEXT(.*\s*)FOOTER TEXT
might work ...
here's one way to do it with gawk, if you have it
awk -vRS="FOOTER" '/HEADER/{gsub(/.*HEADER/,"");print}' file