complex sed multiline match and replace

complex sed multiline match and replace - regex

<Placemark id="051314">
<name>HI Hostel</name>
<description><![CDATA[<div style="color: #404040;font-size: 12px"><a "#book"style="color:#295181;font-size: 12px" target="_top" href="http://www.hihostels.com/dba/hostel051314.de.htm?himap=Y#book" >Girona - Equity Point Girona</a><img style="margin: 5px 0px 5px 0px; border-color:#909090; padding:2px; display:block; clear:both;" src="http://www.hihostels.com/pics/ES/051314_pic_main.jpeg" width="96" height="72" border="1">Plaça Catalunya, 23<br>Girona<br>17002<br><b>Spanien</b><br><div style="margin-top:3px;"><img style="vertical-align:middle;margin-right:5px;" src="http://www.hihostels.com/imgfront/pegsmall.png" /><a style="color:#295181;font-size: 12px;" href="http://www.hihostels.com/openSVwindow(41.981658,2.823057)">Street View</a></div></div> ]]></description>
My source files look like the one above (basically coming from http://www.hihostels.com/mapcoord/ES.en.kml). I want to replace the (useless) name tag "HI Hostel" (always the same for every placemark) with the hostels real name. The real name appears in the description tag one line below, in the case above it would be "Girona - Equity Point Girona".
Any clever idea on how to do this? Thanks for reading.

Some like this? Using awk
awk -F, '/^<name>/ {next} /^<description/ {s=$0;gsub(/<[^>]*>/, ",");$0="<name>" $4 "</name>\n" s} 1' file
<Placemark id="051314">
<name>Girona - Equity Point Girona</name>
<description><![CDATA[<div style="color: #404040;font-size: 12px"><a "#book"style="color:#295181;font-size: 12px" target="_top" href="http://www.hihostels.com/dba/hostel051314.de.htm?himap=Y#book" >Girona - Equity Point Girona</a><img style="margin: 5px 0px 5px 0px; border-color:#909090; padding:2px; display:block; clear:both;" src="http://www.hihostels.com/pics/ES/051314_pic_main.jpeg" width="96" height="72" border="1">Plaça Catalunya, 23<br>Girona<br>17002<br><b>Spanien</b><br><div style="margin-top:3px;"><img style="vertical-align:middle;margin-right:5px;" src="http://www.hihostels.com/imgfront/pegsmall.png" /><a style="color:#295181;font-size: 12px;" href="http://www.hihostels.com/openSVwindow(41.981658,2.823057)">Street View</a></div></div> ]]></description>
This may also work:
awk -F"<|>" '/^<name>/ {next} /^<description/ {$0="<name>" $8 "</name>\n" $0} 1' file

Related

ack-grep Regex not returning consistent results

I am preforming the following ack-grep inside of a bash script and, while it mostly works .. I am getting inconsistent results. The line I am using is:
ack-grep '(?<=imageserver).*(?=png)'
This is Perl type Regex (supported by both ack_grep and plain grep) -- I am searching for everything between imageserver and png. While it mostly works -- I get inconsistent results IE:
Why is it that you'll see it matched the first umpteen lines, then it matches something that it (in theory) should have two or three matches WITHIN. It's obvious the last "block" should have been matched after the first iteration of png however it skipped it multiple times and finally settled --
So, the first couple returning are my desired result -- And the last highlighted block is the "bad" result. How do I get consistent results here? I'll paste some text that returns this result for copy/paste posterity (verifiable example). If you copy and paste the following into a text file, you should get the same results I am getting.
Is this a syntax error, a misunderstanding, or a bug? Hate when things should work but don't ... The banes of development.
.mobile_menu_icon { display:block;cursor:pointer;width:100%;height:40px;margin:0 auto;background-image:url('/imageserver/default_images/four_lines_40x19.png');
.button-error { display:inline-block;width:14px;height:13px;background:url('/imageserver/GlobalMedia/Icons/deleteIcon.png') no-repeat;background-size:16px 16px;background-position:center;opacity:1;transition:all ease-in-out 150ms; }
.button-finished { display:inline-block;width:14px;height:13px;background:url('/imageserver/GlobalMedia/Icons/checkmark.png') no-repeat;background-size:16px 16px;background-position:center; }
background:url('/imageserver/confirm/ie.png');
background:url('/imageserver/confirm/buttons.png') no-repeat;
background:url('/imageserver/confirm/buttons.png') no-repeat;
.capItem { width:30px;height:30px;background:url('/imageserver/styles/captchaShapesWhite.png');background-repeat:no-repeat;background-size:auto 35px;display:inline-block;margin:0 3px; }
.form_button_error { display:inline-block;width:14px;height:13px;background:url('/imageserver/GlobalMedia/Icons/deleteIcon.png') no-repeat;background-size:13px 13px;background-position:center;opacity:1;transition:all ease-in-out 150ms; }
.form_button_finished { display:inline-block;width:14px;height:13px;background:url('/imageserver/GlobalMedia/Icons/checkmark.png') no-repeat;background-size:16px 16px;background-position:center; }
#mega_slider_wrapper,.shadow{width:100%;position:relative}.nav-arrows,.nav-dots,.shadow{display:none}.nav-arrows a,.nav-dots span,.nav-options span{cursor:pointer;border-radius:50%}#mega_slider_wrapper{background:0 0;overflow:hidden}#mega_slider_wrapper img,.mega_slide_image{width:100%}.shadow{height:168px;margin-top:-110px;background:url(/imageserver/AdminMedia/moduleImages/megaslider/shadow.png) bottom center no-repeat;background-size:100% 100%;z-index:-1}.sb-description h3{text-shadow:1px 1px 1px rgba(0,0,0,.3)}.sb-description h3 a{color:#4a3c27;text-shadow:0 1px 1px rgba(255,255,255,.5)}.nav-arrows a{width:42px;height:42px;background:url(/imageserver/AdminMedia/moduleImages/megaslider/nav.png) top left no-repeat #cbbfae;position:absolute;top:50%;left:2px;text-indent:-9000px;opacity:.9;box-shadow:0 1px 1px rgba(255,255,255,.8)}.nav-arrows a:first-child{left:auto;right:2px;background-position:top right}.nav-arrows a:hover{opacity:1}.nav-dots{text-align:center;position:absolute;height:30px;width:100%;left:0}.nav-dots span{display:inline-block;width:16px;height:16px;margin:3px;box-shadow:0 1px 1px rgba(255,255,255,.6),inset 0 1px 1px rgba(0,0,0,.1)}.nav-dots span.nav-dot-current{box-shadow:0 1px 1px rgba(255,255,255,.6),inset 0 1px 1px rgba(0,0,0,.1),inset 0 0 0 3px #cbbfae,inset 0 0 0 8px #fff}.nav-options{width:70px;height:30px;position:absolute;right:70px;bottom:0;display:none}.nav-options span{width:30px;height:30px;background:url(/imageserver/AdminMedia/moduleImages/megaslider/options.png) top left no-repeat #cbbfae;text-indent:-9000px;opacity:.7;display:inline-block}.sb-slider,.sb-slider li>img{width:100%}.nav-options span:first-child{background-position:-30px 0;margin-right:3px}.nav-options span:hover{opacity:1}.sb-slider{margin:0 auto;position:relative;overflow:hidden;list-style-type:none;padding:0;max-width:2000px!important}.sb-slider li{margin:0;padding:0;display:none}.sb-slider li>a{outline:0}.sb-slider img{max-width:100%;display:block}.sb-description{width:100%;max-width:1124px;margin:0 auto;padding:30px 10px 10px;height:900px;top:0;left:10px;right:10px;z-index:10;position:absolute;color:#fff;-webkit-transition:all .2s;-moz-transition:all .2s;-o-transition:all .2s;-ms-transition:all .2s;transition:all .2s;background:rgba(40,40,40,.2);text-shadow:#000 0 0 7px}.sb-description h2,.sb-description h3{line-height:1.1;margin:4px 0;padding:4px 0}.nav-dots span,.slider_button{transition:all ease-in-out 180ms}.sb-description h2{font-size:42px}.sb-description h3{font-size:22px}.sb-perspective{position:relative}.sb-perspective>div{position:absolute;-webkit-transform-style:preserve-3d;-moz-transform-style:preserve-3d;-o-transform-style:preserve-3d;-ms-transform-style:preserve-3d;transform-style:preserve-3d;-webkit-backface-visibility:hidden;-moz-backface-visibility:hidden;-o-backface-visibility:hidden;-ms-backface-visibility:hidden;backface-visibility:hidden}.sb-side{margin:0;display:block;position:absolute;-moz-backface-visibility:hidden;-webkit-transform-style:preserve-3d;-moz-transform-style:preserve-3d;-o-transform-style:preserve-3d;-ms-transform-style:preserve-3d;transform-style:preserve-3d}.nav-arrows,.nav-arrows a,.nav-dots{z-index:11!important}.nav-arrows a{margin-top:-60px!important;background-color:rgba(0,0,0,.8);margin-left:10px;margin-right:10px}.nav-dots{bottom:0!important;background:rgba(0,0,0,.8);padding:8px}.nav-dots span{background:#777}.nav-dots span:hover{background:#aaa}.slider_button{position:relative;display:inline-block;line-height:1;width:auto;padding:10px 16px;background:#1F1E1E;border-radius:5px;color:#fff;text-decoration:none;margin:12px 0 0;font-size:16px}.slider_button:hover{background:#333}#media (max-width:1170px){.sb-description{width:85%!important;min-width:auto!important;margin:0 60px;box-sizing:border-box}}#media (max-width:850px){.sb-description{width:80%!important;min-width:auto!important;margin:0 60px;box-sizing:border-box}.sb-description h2,.sb-description h3{line-height:1.1;margin:4px 0;padding:4px 0}.sb-description h2{font-size:28px}.sb-description h3{font-size:14px}}#media (max-width:650px){.sb-description{width:75%!important;min-width:auto!important;margin:0 60px;box-sizing:border-box}}#media (max-width:600px){.hide_in_mobile{display:none}}
<div class="logo"><img src="/imageserver/UserMedia/zakattack/Logo.png" /></div>
<div class="mobile_logo"><img src="/imageserver/UserMedia/zakattack/mobile.png" alt="Logo" /></div>
<div class="powered_by">Powered by <img src="/imageserver/UserMedia/ywpgallery/ywpLogo.png" style="max-height:25px;vertical-align:middle;" alt="Your Web Pro | Roofing and Contractor Websites" title="On-Line Showrooms for Roofers & Contractors"></div>

The problem with your version of the regex is that it is greedy, which means .* consumes all characters until the end of the line and performs a backtracking then. That's why in your broken part (the long yellow line) the expression matches everything between 'imageserver' and the last 'png'.
A slight modification can make your regex non-greedy; just add a ? after the quantifier. Then the new regex will also search for a preceding 'imageserver' but it directly checks for each following character if a 'png' sequence is following. So, it only consumes and matches the text until the first 'png' sequence.
The example with the new regex (?<=imageserver).*?(?=png) and your text can be found here: https://regex101.com/r/FvSwg4/1
It is also a good idea to have a look at the regex-debugger view for the example. Then one can better understand the single steps that have to be performed for the matching.

Need regex to remove unnecessary strings in Notepad++

I have big css file and need regex(Notepad++) to get only elements and css selectors found by specific css value. In following example I need to get element and selector by value 123456
header #objectnav nav a {
border-right: solid 1px #c0c0c0;
border-left: solid 1px #f4f9ff;
color: #123456;
}
a:hover {
color: #654321;
}
#hints .hint {
background-color: #f4f9ff;
border: 1px solid #e0f0ff;
color: #123456;
margin: 0 0 30px 0;
position: relative;
}
on exit I expect following
header #objectnav nav a
color
#hints .hint
color
or, if possible
header #objectnav nav a^color
#hints .hint^color

I did this just for the challenge:
The following regex will find all the rules containing the text 123456 as a value:
[^{}\s][^{}]*\{[^}]*?[-\w]+\s*:[^;}]*?123456[^}]*\}
But that's just a basic regex. The more challenging part is that I wondered if it's possible to generate a report such as the one you asked for using nothing but Notepad++. It turns out it's possible.
Replace the following pattern:
\s*([^{}]+?)\s*\{[^}]*?(?(?=([-\w]+)\s*:[^;}]*?123456)[^}]*|[^}])*\}\s*
With the following replacement string:
(?2$1^$2:)
Or this one depending on the output you prefer:
(?2$1\r\n$2\r\n\r\n:)
I didn't test it extensively but it works for the test cases you provided.

mediawiki template editor sensitive on new lines

Trying to create quoting template in MediaWiki (version is 1.19) - cquote. Parser seems to be very picky about text flow: same code is displayed either as garbage or normal, depending on where new lines start.
For example:
{| style="margin:auto; border-collapse:collapse; border-style:none;class="cquote" {{#if: {{{bgcolor|}}} | border: 1px solid #AAAAAA;}}
| width="20" valign="top" style="color:#B2B7F2;font-size:35px; font-family:'Times New Roman',serif;font-weight:bold;text-align:left;padding:10px 10px;" | “
| valign="top" style="padding:4px 10px; font-style: italic;" | {{{1|Insert the text of the quote here, without quotation marks.}}}
| width="20" valign="bottom" style="color:#B2B7F2;font-size:35px; font-family:'Times New Roman',serif;font-weight:bold;text-align:right;padding:10px 10px;" | ”
|-
|}<!-- {{subst:FULLPAGENAME}} -->
This validates ok, but when I change new lines a bit the output becomes junk and I could not follow logic, how new lines should be cut. I guess there should not be any such sensitivity, on new line positioning but not sure where to look for.
{| style="margin:auto; border-collapse:collapse; border-style:none;class="cquote" {{#if: {{{bgcolor|}}} | border: 1px solid #AAAAAA;}}
| width="20" valign="top" style="color:#B2B7F2;
font-size:35px; font-family:'Times New Roman',serif;font-weight:bold;text-align:left;padding:10px 10px;" | “
| valign="top" style="padding:4px 10px; font-style: italic;" | {{{1|Insert the text of the quote here, without quotation marks.}}}
| width="20" valign="bottom" style="color:#B2B7F2;font-size:35px; font-family:'Times New Roman',serif;font-weight:bold;text-align:right;padding:10px 10px;" | ”
|-
|}<!-- {{subst:FULLPAGENAME}} -->

Turns out I did not have ParserFunctions extension activated, link. Did that and template was parsed ok.

Remove all <br/> tags from CKEditor Output html

I am using CKeditor in my application.When i save content of the CKEditor output is added with tags like this.
<B>Summary:</B>
<P><BR><SPAN style="TEXT-ALIGN: left; WIDOWS: 2; TEXT-TRANSFORM: none; BACKGROUND-COLOR: rgb(255,255,255); TEXT-INDENT: 0px; LETTER-SPACING: normal; DISPLAY: inline !important; FONT: 15px/20px Helvetica, Arial, sans-serif; WHITE-SPACE: normal; ORPHANS: 2; FLOAT: none; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px">The company's latest tweet simply states that "our team continues to investigate, but at this time, we're still unable to confirm that any security breach has occurred. Stay tuned here."</SPAN></P><BR>
<P><BR>Facebook : http://www.facebook.com</P><BR>
How Can I remove all the "break" tags out of above sample using regular exression in javascript.
Upon Save Text should be appended to "Summary : " like here
Summary: Call back the department if you have not heard from them.The initial story was triggered after a user in a Russian forum
claimed that he hacked and uploaded almost 6.5 millionThe initial
story was triggered after a user in a Russian forum claimed that he
hacked and uploaded almost 6.5 million
But now it's coming like this
Summary:
The initial story was triggered after a user in a Russian forum
claimed that he hacked and uploaded almost 6.5 millionThe initial
story was triggered after a user in a Russian forum claimed that he
hacked and uploaded almost 6.5 million
I am using replace(/[\n\r\f]/g, ' ') ;
replace(/\<!>[\s\S]*?\<!>/ig, '')
but no use.help me.

Finally
CKEDITOR.instances.editor1.getData().replace(/(\r\n|\n|\r)/gm,"");
worked perfectly for my issue.
Thanks.

replace(/[\n\r\f]/g, ' ') ;
will remove actual newlines not the coded ones
try
replace(/<BR>/g, '') ;
or if you want to remove all tags
replace(/<[^>]*?>/g, ' ') ;

Add style attribute to images in divs with PHP

I have a newsletter that contains a few image inside a div called nieuwsbrief-tekst. I want to find those images and add inline css code to it. I can find the div with preg_match, and I can also find the image tag itself, but adding the style="" to the image tag hasn't worked so far.
There is also more then one nieuwsbief-tekst div, these divs are the different content blocks, so there are 3 or 4 of them. I tried the preg_replace, but that has no effect.
Any tips or suggestions how to handle this?
So the html would look like this, and I only want add the style attribute to the images inside the div.
HTM Code:
<div class="nieuwsbrief-tekst">lorum ipmsum</div>
<div class="nieuwsbrief-tekst"><img src="#"></div>
<div class="nieuwsbrief-tekst">lorum ipmsum</div>
<div class="nieuwsbrief-tekst"><img src="#"></div>
PHP Code:
if(preg_match_all('/<div class="nieuwsbrief-tekst">(.*?)<\/div>/is', $var, $matches)) {
foreach($matches[0] as $match) {
if(preg_match('/<img[^>]+>/is', $match, $match_img)) {
echo 'image found';
$pattern = '/<img[^>]+>/is';
$replacement = '<img style="float:left; margin:0 10px 10px 0;';
$test = preg_replace($pattern, $replacement, $match_img);
}
}
echo '<pre>';
print($test);
echo '</pre>';
}
Thanks :)

this should accomplish what you want
$pattern = "/<div class=\"tmp\">(\\w+)?<img ([^>]+)>(\\w+)?<\/div>/is";
$replacement = "<div class=\"tmp\">\\1 <img style=\"float:left; margin:0 10px 10px 0; \\2 /> \\3</div>";
$str = preg_replace($pattern, $replacement, $str);
echo $str;
Just change the tmp to fit your needs :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

complex sed multiline match and replace - regex

Related

ack-grep Regex not returning consistent results

Need regex to remove unnecessary strings in Notepad++

mediawiki template editor sensitive on new lines

Remove all <br/> tags from CKEditor Output html

Add style attribute to images in divs with PHP

Categories

Resources