replacing special placeholders in HTML file with Qt - c++

Good day to everybody!
I have this sort of HTML file:
<tr>
<td>
<p>First name: </p>
</td>
<td>
<p> %first_name% </p>
</td>
</tr>
<tr>
<td>
<p>Last name: </p>
</td>
<td>
<p"> %last_name% </p>
</td>
</tr>
I'm looking for a way of replacing special markers of type(%smth%) by concrete data. Project's being developed under Qt, so I wonder if some Qt's methods can do it.
Thanks!

The simplest solution might be using QString & QString::replace ( const QString & before, const QString & after, Qt::CaseSensitivity cs = Qt::CaseSensitive ) which replaces every occurrence of the string before with the string after and returns a reference to this string.
Place the contents of your html file into a QString then call QString::replace() to replace the special markers by concrete data. For example:
QString firstName("John");
html.replace("%first_name%", firstName);

As far as you can not use regexps, I recommend using
XSLT which supported by xmlpatterns library.
EDIT
As someone thinks he still can parse html with regexp in this case, I will give some examples, that will show regexps fail:
You have marker in attribute (and you don't want it to be replaced)
<p class="%first name">
Someone would deside to inject:
map: %firstname -> <srcipt language ="javascript">....</script>
After XSLT substitution will be escaped automatically.

Related

Find a table's last cell by regular expression

I want to use Regular Expression (compatible with pcre) to select a table
cell in an XML or HTML file.This cell was expanded in several lines containing
other elements and relative attributes and values. Thiscell supposed to be at the last column.
for some reasons I can't and don't want to use ". matches newline" option.
for example in this code:
EDITED:
<table colcount="4">
<tr>
<td colspan="2">
<para><text> Mike</text></para>
</td>
<td>
<tab />
</td>
<td1>
<para><text>Jack</text></para>
<para><text>Sarah</text></para>
</td>
</tr1>
<tr>
<td>
<para><text>Bob</text></para>
<para><text>Rita</text></para>
</td>
<td2 colspan="3" with>
<para><text>Helen</text></para>
</td>
</tr2>
<tr>
<td style="with:445px;">
<para><text>Sam</text></para>
</td>
<td>
<para><text>Emma</text></para>
<para><text>George</text></para>
</td>
<td>
</td>
<td3 colspan="">
<tab />
</td>
</tr3>
</table>
/EDITED
I want to find and select the whole last cell together with its start and end tags (<td and </td>)
and the end tag of the corresponding row(</tr>), that is:
EDITED:
Here is what I want to select in the table like above using RegEx:
Either from <td1 to </tr1> - or from <td2 to </tr2> - or from <td3 to </tr3>
/EDITED
The format (indentation and new lines have to be preserved), I mean I can't put, for example
</tr> in front of of closing tag of the cell(</td>).
Indentation is only space character.
Thanks for any help...
Best you can do with regex is:
<td(([^<]|<(?!\/td>))*)<\/td>\s*<\/tr>(?!(.|\r|\n)*<tr)
But this is kinda ugly, resource intensive and breaks when you have nested tables. A better route is indeed to use an XML or HTML parser for whichever programming language you're using.
If you want to select the last cell from EVERY row, as your updated question suggests, leave out the negative lookahead like so:
<td(([^<]|<(?!\/td>))*)<\/td>\s*<\/tr>
Working example here: http://refiddle.com/gt2

How to handle dynamically changing id's with similar starting name using Webdriver

I am automating the test for web application. I have a scenario for creating an admin, for which i have to enter the name, email address and phone number text boxes. But ids of this text boxes are dynamic.
userName, id='oe-field-input-41'
Email, id='oe-field-input-42'
phone number, id='oe-field-input-43'
First Query:
The numbers in the ids are dynamic, it keep changes
I tired to use the xpath for handling the dynamic value.
xpath = //*[starts-with(#id,'oe-field-input-')]
In this it enter the text into first text box successfully
Second Query:
I am not able use the same xpath for next two text boxes, as it enters the email and phone number into name field only
Please help me to resolve this dynamic value handling.
Edited: added the html code,
<table class="oe_form_group " cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_many2one oe_form_field_with_button">
<a class="oe_m2o_cm_button oe_e" tabindex="-1" href="#" draggable="false" style="display: inline;">/</a>
<div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_email">
<div>
<input id="oe-field-input-35" type="text" maxlength="240">
</div>
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-36" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
<td class="oe_form_group_cell oe_form_group_cell_label" width="1%" colspan="1">
<td class="oe_form_group_cell" width="99%" colspan="1">
<span class="oe_form_field oe_form_field_char">
<input id="oe-field-input-37" type="text" maxlength="32">
</span>
</td>
</tr>
<tr class="oe_form_group_row">
</tbody>
you can try alternate way for locating unique element by label or so. For example:
css=.oe_form_group_row:contains(case_sensitive_text) input
xpath=//tr[#class = 'oe_form_group_row'][contains(.,'case_sensitive_text')]//input
If you are using ISFW you should create custom component for such form fields.
You do have some classes which are good for identification, e.g. oe_form_field_email, oe_form_field_char. It's a little complicated to use them because they're not on the input fields themselves, and the second one is not unique; but it's quite possible:
.//span[contains(#class, 'oe_form_field_email')]//input
That is an xpath which identifies the Email field as being the input which is a descendant of a span with the oe_form_field_email class. You could also use the same logic in a css selector like this, more efficiently:
span.oe_form_field_email input
For the two other fields, there is no unique class which can tell them apart so you're going to have to rely on the order (I'm assuming username comes before phone number), and that means you have to use xpaths:
(//tr//span[contains(#class, 'oe_form_field_char')])[1]//input
(//tr//span[contains(#class, 'oe_form_field_char')])[2]//input
Those xpaths pick out the first and second fields respectively, which are inputs which are descendants of a span of class oe_form_field_char.
P.S. I used Firepath in firefox to verify the xpath and css locators.
The problem here is, that your XPath does the correct selection, but Selenium will always pick the first one if multiple results are returned for your query.
You can select each of the input fields directly by using:
//input[1]
//input[2]
//input[3]
If there are other input fields, you can tighten your selection by selecting only input nodes with oe-field-input in their id attribute like this:
//input[starts-with(#id,'oe-field-input-')][1]
//input[starts-with(#id,'oe-field-input-')][2]
//input[starts-with(#id,'oe-field-input-')][3]
Use the following xpath works like a charm. Although I don't recommend this kind of an xpath. Since we don't have text against the text box no other choice.
//div/input[contains(#id, 'oe-field-input')] - First text box
//tr[#class = 'oe_form_group_row'][2]//input - Second text box
//tr[#class = 'oe_form_group_row'][3]//input - Third text box
You can use below XPATH.
//tr[#class = 'oe_form_group_row'][2]//input for First Text box
//tr[#class = 'oe_form_group_row'][3]//input for Second Text box
//tr[#class = 'oe_form_group_row'][4]//input for Third text box.
I have tested avove xpath.
But the better way if you have development access then ask developers to make is standaralized and recommand tags like "name" , "value", or attach text e.g. Email:, Password. So you can use these in your xpath.

Regex, wordpress, and Search Regex plugin - removing affiliate links

I have a wordpress blog(s), in which I am trying to use plugin named 'Search Regex' to remove a part of post text, that I've placed as advertisement on the bottom (don't ask)
Now, I'm spinning around for a few months trying to find a proper answer on this, I know mysql doesn't support regex fully, and it would probably be painful to even try doing it that way, so I decided to use this plugin.
My wordpress blog have couple of thousands of posts, with almost the same code on the bottom, and code looks something like this:
<!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
<td bgcolor="#000000" style="text-align: center; font-size: 16px; font-weight: bold;">
<a href="http://myaffiliate.com/?q2=affiliateid" target="_blank" rel="nofollow" ><img title="blabla" src="http://someimage.com/somewhere></a><br />
<b>Some random Blah</b>
</td>
</tr>
</tbody>
</table>
</center>
Everything outside is fixed and doesn't change, and everything inside tags is changing with each post.
Now, this is multiline text, and I SERIOUSLY have problem finding a matching string, and even tool, that could help me solving this puzzle. I have this spreaded on several blogs, and I figured that instead of spamming my blog with same repetitive code that makes 1/3rd of all content I have in my posts, I can simply include it through single.php
So, I want this permanently deleted from my database.
Thanks in advance for help.
I'm not an expert on blogs or regex, but couldn't you use a python script to substitute all that text to replace it?
Something like this
re.sub(r' <!--more-->
<br />
<center>
<table width="100%">
<tbody>
<tr>
[\.]+
</td>
</tr>
</tbody>
</table>
</center>
', "")
If I'm not wrong, that should find all the cases where it finds an ad and replace it with a blank string :)
You could try something like this (always take a backup before testing)
global $wpdb;
$posts=$wpdb->get_results("SELECT ID,post_content FROM {$wpdb->posts} LIMIT 0,1");
foreach($posts as $p){
$pos = strpos($p->post_content, '<!--more-->');
$more_content = substr($p->post_content,$pos);
if($pos !== false && strstr($more_content,"myaffiliate.com" && $p->ID>0)){
$content = substr($p->post_content,0,$pos-1);
$wpdb->query($wpdb->prepare("UPDATE {$wpdb->posts} SET post_content='%s' WHERE ID=%d",$content,$p->ID));
}
}
This is untested, but you get the idea.
This will remove the post content part after <!--more--> if it contains the string "myaffiliate.com" (this is somewhat lazy matching, but we could refine it with preg_match or preg_replace if you need it).
You could use LIMIT 0,1 while testing and then increase it to your needs.

How to use a regular expression to extract a substring?

so im trying to figure out regular expressions in Flex, but cant for the life of me figure out how to do the following.
From the sample below, i need to extract out only "Mike_Mercury".
So i have to somehow strip out everything around it with RegExp, or whatever's best. Also, I would need it to work with other samples as well. Im getting this from the reddit api, so id have to extract that same section from a whole bunch of these. Thanks!
<table>
<tr>
<td>
<a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
<img src="http://d.thumbs.redditmedia.com/9StfiHi7hEbf8v73.jpg" alt="Barack Obama's insightful words on abortion"
title="Barack Obama's insightful words on abortion" /></a>
</td>
<td>
submitted by Mike_Mercury
to atheism
<br />
[link] <a href="http://www.reddit.com/r/atheism/comments/q2sfe/barack_obamas_insightful_words_on_abortion/">
[1722 comments]</a>
</td>
</tr>
</table>
Try this regex:
submitted by (.*?)

Notepad++ search & replace

I'm trying to convert a html file with 100 of entries like this one:
<table>
<tr>
<td valign="top" width="30">
1.</td>
<td>
TEXT DESCRIPTION
</td>
</tr>
</table>
<table><tr><td></td></tr></table>
where the number "1." goes from 1 to 100, into this:
<li>
TEXT DESCRIPTION
</li>
I haven't find a way to do this, neither with regexp nor with extended search mode. Any ideas?
You could start with this:
Replace
.*<td>(.*[A-Za-z]+.*)<\/td>.*
with
<li>\1</li>
This will match one chunk of code of the form you reported. You must modify it to match multiple chunks of the same form in the same file.
Moreover to work correctly we should make it match lazily. Someone who knows how?