Please suggest correct regular expression for below situation

Please suggest correct regular expression for below situation - regex

I have following response section.
06S
</td><td>
<img src="/Media/Images/Opr/2.png" title="Pa" />
</td><td style="0">
4
</td>
I want to extract 06S and 2 only if value in between tag is NOT 0.
I wrote following regular expression but it does not work. could anyone please help.
(?s)(.+?)
</td><td>
<img src="/Media/Images/Opr/(.+?).png" title="(.+?)" />
</td><td style="0">
([1-9]{1})
</td>

I didn't put a lot of time into making this pretty or anything but this will do what you asked.
<a href="\/Public\/Details\/(.+?)\?OID=(.+?)" .*?>.*?<\/a>\s*<\/td><td>\s*<img.*?\/>\s*<\/td><td.*?>\s*(?:[^\s0]|[^\s]{2,})?\s*<\/td>
This assumes that your data is coming in an extremely similar way that you posted in the question. I'm a little preoccupied so I couldn't make it much better.
obligatory regex101 link

Related

Regex to add style attribute to a specific <table> tag

** >> Please see Update near the bottom**
I am having to deal with a large amount of imported HTML code that is poorly formatted.
I have around 200 similar (but not identical) instances of the code, and each instance includes a specific set of <img> tags. In some instances, the <img> tags run from one to the next, with no line breaks in between. In other instances there are line breaks in the code, and these result in <br> tags being inserted into the final code sent to the browser.
This will make more sense once I illustrate what I mean:
Example #1: There are no breaks between the <img> tags...
<table align="center" border="0px"> <tbody><tr> <td> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/CustomerSatisfaction.png" alt="100% Customer Satisfaction" height="60" align="middle" width="140"> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/PaypalVerified.png" alt="Paypal Verified" height="60" align="middle" width="140"> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/FastDelivery.png" alt="Fast Delivery" height="60" align="middle" width="140"> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/Recycled.png" alt="100% Recyled Pre-owned Products" height="60" align="middle" width="140"> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/TopSellerRated.png" alt="Top Seller Rated" height="60" align="middle" width="140"> <img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/PhoneSupport.png" alt="Phone Support" height="60" align="middle" width="140"> </td> </tr> </tbody></table>
Example #2: There are breaks between the <img> tags...
<table align="center" border="0px">
<tbody><tr>
<td>
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/CustomerSatisfaction.png" alt="100% Customer Satisfaction" align="middle" height="60" width="140">
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/PaypalVerified.png" alt="Paypal Verified" align="middle" height="60" width="140">
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/FastDelivery.png" alt="Fast Delivery" align="middle" height="60" width="140">
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/Recycled.png" alt="100% Recyled Pre-owned Products" align="middle" height="60" width="140">
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/TopSellerRated.png" alt="Top Seller Rated" align="middle" height="60" width="140">
<img src="http://simplicitywebsitedesign.com/iOutlet/images/buttons/PhoneSupport.png" alt="Phone Support" align="middle" height="60" width="140">
</td>
</tr>
</tbody></table>
As mentioned, for reasons unknown to me, the Wordpress site on which this code is utilised throws in <br> tags when code example #2 is parsed through to the browser.
That results in the images displaying as follows (on Firefox):
Code sample #1 displays link this:
I am thinking the best way to resolve this is do to a search/replace via MySQL on the DB, using a regular expression that will identify instances of code example #2 and make it like code example #1. In other words, the line breaks will be removed from between the relevant <img> tags.
Two questions:
1) Is that in fact the best way to go about this, or is there a potentially better way?
2) If that is a valid and suitable way to do it, would you suggest a suitable regular expression.
(With question 2, I am not sure what to suggest as the correct regex engine. This regex will be parsed within MySQL, using the Mac app Sequel Pro.app (http://www.sequelpro.com/).
My take on the possible Regex logic
My guess is that we need to:
1) Find instance of <table...> ... </table>
2) Find instances of </img> (soft line break) <img ...> within code identified by #1 above
3) Remove (soft line break)
There is one other <table> ... </table> set within the code that will be searched. There is only one <img> within that instance. There are exactly 6 <img> instances within the <table> ... </table>
Update, taking comments into account
It has been suggested that I use the flex CSS display attribute, and apply it to the table row. I've done that, and it works well. I am a little concerned about compatibility on older browsers, as I gather it's a relatively recent CSS addition.
I do, however, still need to do a search/replace to locate the correct <table> in the HTML.
In most of the HTML instances, there are two instances of <table> ... </table>. So I suspect the regex would need to do a negative forward check for something like /stars/ which exists in a URL that's in the <table> instance I don't want modified. Then it would be a matter of replacing <table> with <table id="green-icons">
Thanks.
Jonathan
P.S. I am aware there is a LOT of contention around whether or not regex is a valid way to make changes to HTML. As this is a relatively fixed and known set of HTML, I suspect it'll be okay. But I am also open to other suggestions.

Regex, wordpress, and Search Regex plugin - removing affiliate links

I have a wordpress blog(s), in which I am trying to use plugin named 'Search Regex' to remove a part of post text, that I've placed as advertisement on the bottom (don't ask)
Now, I'm spinning around for a few months trying to find a proper answer on this, I know mysql doesn't support regex fully, and it would probably be painful to even try doing it that way, so I decided to use this plugin.
My wordpress blog have couple of thousands of posts, with almost the same code on the bottom, and code looks something like this:
<!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
<td bgcolor="#000000" style="text-align: center; font-size: 16px; font-weight: bold;">
<a href="http://myaffiliate.com/?q2=affiliateid" target="_blank" rel="nofollow" ><img title="blabla" src="http://someimage.com/somewhere></a><br />
<b>Some random Blah</b>
</td>
</tr>
</tbody>
</table>
</center>
Everything outside is fixed and doesn't change, and everything inside tags is changing with each post.
Now, this is multiline text, and I SERIOUSLY have problem finding a matching string, and even tool, that could help me solving this puzzle. I have this spreaded on several blogs, and I figured that instead of spamming my blog with same repetitive code that makes 1/3rd of all content I have in my posts, I can simply include it through single.php
So, I want this permanently deleted from my database.
Thanks in advance for help.

I'm not an expert on blogs or regex, but couldn't you use a python script to substitute all that text to replace it?
Something like this
re.sub(r' <!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
[\.]+
</td>
</tr>
</tbody>
</table>
</center>
', "")
If I'm not wrong, that should find all the cases where it finds an ad and replace it with a blank string :)

You could try something like this (always take a backup before testing)
global $wpdb;
$posts=$wpdb->get_results("SELECT ID,post_content FROM {$wpdb->posts} LIMIT 0,1");
foreach($posts as $p){
$pos = strpos($p->post_content, '<!--more-->');
$more_content = substr($p->post_content,$pos);
if($pos !== false && strstr($more_content,"myaffiliate.com" && $p->ID>0)){
$content = substr($p->post_content,0,$pos-1);
$wpdb->query($wpdb->prepare("UPDATE {$wpdb->posts} SET post_content='%s' WHERE ID=%d",$content,$p->ID));
}
}
This is untested, but you get the idea.
This will remove the post content part after <!--more--> if it contains the string "myaffiliate.com" (this is somewhat lazy matching, but we could refine it with preg_match or preg_replace if you need it).
You could use LIMIT 0,1 while testing and then increase it to your needs.

XSL Links to pages

I'm a novice in XSL, so excuse me if my question is too easy.
Look at code
<table class="foot_table">
<tr>
<td>
<div id="open_all">
show all
</div>
</td>
<td>
<div id="producers_footer">
close
</div>
</td>
</tr>
</table>
So i have page showall.xsl. How can i connect xslt template with that page?
Because now my page (showall) is empty.
Can you show me some examples?

Not exactly sure, but I think you are looking for a XSLT processor. An XSLT processor takes the source code (html in your case) and executes the showall.xslt on it and produces a new output.
Some references:
http://www.xml.com/pub/a/2000/08/30/xsltandhtml/index.html
http://en.wikipedia.org/wiki/Category:XSLT_processors

You can download a free copy of Visual Studio Express 2012 and process the xslt and xml in there.

XSS Cross Site Scripting - Jsp <Input> tag

The following piece of code in my JSP caused a cross site scripting vulnerability on the input tag.
<form name="acctFrm" method="post" action="<%=contextPath%>/form/acctSummary?rpt_nm=FIMM_ACCT_SUMM_RPT">
<table>
<tr>
<td>Account Id:</td>
<td>
<input class="tbl1" type="text" id="acctId" name="acctId" size="20" maxlength="10" value="<%=rptBean.getAcctId()%>"/>
<img class="tbl1" src="<%=contextPath%>/img/Submit.gif" border="0" />
</td>
</tr>
</table>
</form>
During Penetration testing they were able to alert some random message to the user by injecting a alert script in the value attribute of the tag as follows
<input class="tbl1" type="text" id="acctId" name="acctId" size="20" maxlength="10" value="1"><script>alert(12345)</script>" />
What is the problem here, and what would be the fix.
I was reading through some online references on XSS still I wasnt 100% sure on what could be the issue.
Any help would be greatly appreciated.
Thanks,
Deena

I have used the following solution,
The scriplet in the value attribute is the problem, I replaced it with jstl tag, I read somewhere that jstl tags have inbuild escaping mechanism to avoid xss issues.
<input class="tbl1" type="text" id="acctId" name="acctId" size="20" maxlength="10" value="<c:out value=${rptBean.acctId}"/>"/>
This works good for my issue.
Thanks

It seems the penetration testers were able to manipulate their session such that rptBean.getAcctId() would return an arbitrary string. If they could inject quotes and a right bracket, they could "force close" the input tag and insert their own script tag.
It looks like penetration testers got the method to return the string 1"><script>alert(12345)</script>.
This indicates that you need to escape the data when writing to the page. I would suggest taking a look at the answer on escaping HTML in jsp.
Also, remember that code does not have to be "perfectly" formatted for a browser to render it "correctly". Here are some links on how attackers may try evade XSS filters:
http://blog.whitehatsec.com/tag/filter-evasion/
http://ha.ckers.org/xss.html
Always treat user data as "dangerous" and take care when rendering it on a page.

It seems using jstl tag <c:out value=""> in value attribute will cause errors in jstl <form options> tags,
more info
XSS prevention in JSP/Servlet web application

if getAcctId() returned data come from DB you can filter before sending to client. for example check is data should be a number.

How to use a regular expression to extract a substring?

so im trying to figure out regular expressions in Flex, but cant for the life of me figure out how to do the following.
From the sample below, i need to extract out only "Mike_Mercury".
So i have to somehow strip out everything around it with RegExp, or whatever's best. Also, I would need it to work with other samples as well. Im getting this from the reddit api, so id have to extract that same section from a whole bunch of these. Thanks!
<table>
<tr>
<td>
<a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
<img src="http://d.thumbs.redditmedia.com/9StfiHi7hEbf8v73.jpg" alt="Barack Obama's insightful words on abortion"
title="Barack Obama's insightful words on abortion" /></a>
</td>
<td>
submitted by Mike_Mercury
to atheism
<br />
[link] <a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
[1722 comments]</a>
</td>
</tr>
</table>

Try this regex:
submitted by (.*?)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Please suggest correct regular expression for below situation - regex

Related

Regex to add style attribute to a specific <table> tag

Regex, wordpress, and Search Regex plugin - removing affiliate links

XSL Links to pages

XSS Cross Site Scripting - Jsp <Input> tag

How to use a regular expression to extract a substring?

Categories

Resources