Regex - replace wrappers around strings - regex

What would be a regex expression that could turn this:
{{ Form::label('events', 'Events') }}
Into this:
<label for="events">Events</label>
I need the strings "events" and "Events" to remain in tact.

try this:
s/.*::(.*?)\('(.*?)',\s'(.*?)'.*/<$1 for="$2">$3</$1>/
DEMO

This would also work for your sample, this is formatted with sed:
sed -E "s#[^']+'([^']+)', '([^']+)'.*#<label for=\"\1\">\2</label>#"
If you want it in two pieces:
s#[^']+'([^']+)', '([^']+)'.*
<label for=\"\1\">\2</label>

Related

Regex to avoid all words between {{ and }}

I am using https://github.com/tbroadley/spellchecker-cli.
I have a JSON file that I'd like to run spellChecker on and it looks like this:
{
"abc.editGroupsMaxLengthError": "Maximum {{charLen}} characters"
}
I would like to know how can all words between {{ and }} be ignored by the spellchecker.
I tried with
[A-Za-z]+}}
as documented here https://github.com/tbroadley/spellchecker-cli#ignore-regexes to ignore regex.
but it doesn't seem to use }} or {{ for some reason.
How can this be fixed?
You can wrap your {{...}} substrings with <!-- spellchecker-disable --> / <!-- spellchecker-enable --> tags, see this Github issue.
So, make sure your JSON looks like
{
"abc.editGroupsMaxLengthError": "Maximum <!-- spellchecker-disable -->{{charLen}}<!-- spellchecker-enable --> characters"
}
And the result will be
C:\Users\admin\Documents\1>spellchecker spellchecker -f spellchecker_test.json
Spellchecking 1 file...
spellchecker_test.json: no issues found
To wrap the {{...}} strings in a certain file in Windows you could use PowerShell, e.g., for a spellchecker_test.json file:
powershell -Command "& {(Get-Content spellchecker_test.json -Raw) -replace '(?s){{.*?}}','<!-- spellchecker-disable -->$&<!-- spellchecker-enable -->' | Set-Content spellchecker_test.json}"
In *nix, Perl is preferable:
perl -0777 -i -pe 's/\{\{.*?}}/<!-- spellchecker-disable -->$&<!-- spellchecker-enable -->/s' spellchecker_test.json

Regex -> detect a pattern -> move it to the start of the line

It is the first time that I use this platform because it is impossible for me to find the solution.
I have this html code:
<img ...></img><a ...><span ...
I need this:
<a ...><img ...></img><span ...
Where ... would be the content of the pattern (like <img.*.</img>) because it will be done in a bulk way and the information changes. The file has this format:
<img ...></img><a ...><span ...
.....
<img ...></img><a ...><span ...
.....
<img ...></img><a ...><span ...
.....
<img ...></img><a ...><span ...
As you can guess, I need to put the <img> tag inside the <a> tag. I tried to take the pattern <a.*.> and move it to the beginning of the line but I have not succeeded.
You generally should not be using regex to manipulate HTML content, which might be nested and have other complexities. However, assuming your <img> and <a> tags always be just one level, you could try the following find and replace, in Sed:
echo "<img ...></img><a ...><span ..." | sed 's/\(<img[^>]*><\/img>\)\(<a[^>]*>\)/\2\1/'
This prints:
<a ...><img ...></img><span ...
Here is a more general solution, also easier to read:
Find: (<img[^>]*><\/img>)(<a[^>]*>)
Replace: $2$1
Demo
This solution simply captures, in two separate groups $1 and $2, the <img> and <a> tags. Then, in the replacement, it swaps the two tags to give you the order you want.
In the end I solved it like this:
sed -i -E "s/(<img.*)(<a .*.>)/\2\1/" file.txt

How to insert an arbitrary string after pattern with sed

It must be really easy, but somehow I don't get it… I want to process an HTML-file via a bash script and insert an HTML-String into a certain node:
org.html: <div id="wrapper"></div>
MYTEXT=$(phantomjs capture.js www.somesite.com)
# MYTEXT will look something like this:
# <div id="test" style="top: -1.9%;">Something</div>
sed -i "s/\<div id=\"wrapper\"\>/\<div id=\"wrapper\"\>$MYTEXT/" org.html
I always get this error: bad flag in substitute command: 'd' which is probably because sed interprets the content of $MYTEXT as a pattern as well – which is not what I want…
By the way: Duplicating \<div id=\"wrapper\"\> is probably also not necessary?
It seems the / in $MYTEXT's </div> part is interpreted indeed as the final / in the sed command. You can choose another delimiter, which does not appear in $MYTEXT, for instance:
sed -i "s|\<div id=\"wrapper\"\>|\<div id=\"wrapper\"\>$MYTEXT|" org.html

Regex in perl/sed replacement not matching whitespace/characters

Given this file, I'm trying to do a super primitive sed or perl replacement of a footer.
Typically I use DOM to parse HTML files but so far I've had no issues due to the primitive HTML files I'm dealing with ( time matters ) using sed/perl.
All I need is to replace the <div id="footer"> which contains whitespace, an element that has another element, and the closing </div> with <?php include 'footer.php';?>.
For some reason I can't even get this pattern to match up until the <div id="stupid">. I know there are whitespace characters so i used \s*:
perl -pe 's|<div id="footer">.*\s*.*\s*|<?php include INC_PATH . 'includes/footer.php'; ?>|' file.html | less
But that only matches the first line. The replacement looks like this:
<?php include INC_PATH . includes/footer.php; ?>
<div id="stupid"><img src="file.gif" width="206" height="252"></div>
</div>
Am I forgetting something simple, or should I specify some sort of flag to deal with a multiline match?
perl -v is 5.14.2 and I'm only using the pe flags.
You probably want -0777, which will force perl to read the entire file at once.
perl -0777 -n -e 's|something|else|g' file
Also, your strategy of doing .*\s*.*\s* is pretty fragile. It'll match e.g. <div id="foo", which is just a fragment...
Are you forgetting that almost all regex parsing works on a line-by-line basis?
I've always had to use tr to convert the newlines into some other character, and then back again after the regex.
Just found this: http://www.perlmonks.org/?node_id=17947
You need to tell the regex engine to treat your scalar as a multiline string with the /m option; otherwise it won't attempt to match across newlines.
perl -p
is working on the file on a line by line basis see perl.com
that means your regex will never see all lines to match, it will only match when it gets the line that starts with "<div id="footer">" and on the following lines it will not match anymore.

grab value between two strings with sed?

I have the following data on one line:
Go to start of metadata
<div id="page-metadata-end" class="assistive"></div>
<fieldset class="hidden parameters">
<input type="hidden" title="browsePageTreeMode" value="view">
</fieldset>
<div class="wiki-content">
<p>(openissues)81(/openissues)</p><p>(assignstoday)0(/assignstoday)</p><p>(assignsweek)2(/assignsweek)</p><p>(replyissues)6(/replyissues)</p><p>(wrapissues)26(/wrapissues)</p>
</div>
I'd like to grab the value for "openissues" for example, but I can't figure out to properly retrieve this. One of the things I tried is the following command:
sed -n '/(assignstoday)/,/(\/assignstoday)/p' ~/test.txt
Any help?
sed 's/.*(openissues)\(.*\)(\/openissues).*/\1/' test.txt
a quick hack to possibly meet your edited requirement:
sed -n '/openissues/p' test.txt | sed 's/.*(openissues)\(.*\)(\/openissues).*/\1/'
but regexes are really not the way to go when parsing HTML.
I'd try
VALUE=openissues
sed 's#.*('"$VALUE"')\([^(]\+\).*#\1#'
that is, replace everything except the contents of what you are searching, with that content.
edit: Now I see Neil's answer, that's practically the same, accept his. I leave my answer for the customization of which value you want to extract.