I have set up a basic webpage
testim.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Members</title>
</head>
<body>Done</body>
</html>
----------
I have then set up the following imacros script - very basic however everytime i run it though the if condition is not true. If i do a iimDisplay of the iimGetExtract it returns Done, and still the if condition is not true.
var macro_var;
macro_var="CODE:VERSION BUILD=8530828 RECORDER=FX\n";
macro_var+="TAB T=1\n";
macro_var+="URL GOTO=http://www.example.com/testim.html\n";
macro_var+="TAG POS=1 TYPE=BODY ATTR=TXT:* EXTRACT=TXT\n";
iimPlay(macro_var);
if(iimGetExtract(0)=="Done"){
iimDisplay("success");
}
Thanks
Try this.
var macro_var;
macro_var="CODE:VERSION BUILD=8530828 RECORDER=FX\n";
macro_var+="TAB T=1\n";
macro_var+="URL GOTO=http://www.example.com/testim.html\n";
macro_var+="TAG POS=1 TYPE=BODY ATTR=TXT:* EXTRACT=TXT\n";
iimPlay(macro_var);
var result=iimGetLastExtract();
if(result=="Done"){
iimDisplay("success");
}
Related
I am using BeautifulSoup v4 to parse out a string of HTML that looks like this:
<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">
<head></head>
<body><p>Hello, world</p></body>
</html>
Here is how I am parsing it:
soup = BeautifulSoup(html)
Where html is the pasted HTML above. For whatever reason, BS keeps replaces the <html> tag with a standard tag without the extra meta info. Any way I can tell BS to not do this?
I was able to figure it out by passing in html5lib as the HTML parser to BS. But, now, it keeps dropping in a random HTML comment tag for the DOCTYPE
<!--<!DOCTYPE HTML-->
I am running an iMacros macro that navigates a web site and saves a page to a file. I'm using a simple script as follow:
URL GOTO=http://myurl.com/
SAVEAS TYPE=HTM FOLDER=* FILE=*
The issue is that the saved HTML page is different than the one I get when saving from Firefox using File -> Save page as... and selecting "Web Page, HTML only". It seems like some processing is done on the page by iMacros I suppose. For instance this line
<meta charset="utf-8" />
becomes
<meta charset="utf-8">
This looks minor but on some occasions I had element that were reversed, hence hiding an issue with a tag wrongly closed. For example, where my page had
</form></div>
it was saved as
</div></form>
by iMacros.
Unfortunately I cannot find any reference of this issue on the iMacros forum. Any ideas?
Work for me.
Before start script create directory d:\reports or other
URL GOTO=http://your_url
'Uncomment for not show popup
'SET !EXTRACT_TEST_POPUP NO
TAG POS=1 TYPE=HTML ATTR=CLASS:* EXTRACT=HTM
SAVEAS TYPE=HTM FOLDER=d:\reports FILE=1.html
try this code.
URL GOTO=http://myurl.com/
TAG POS=1 TYPE=HTML ATTR=CLASS:* EXTRACT=HTM
SAVEAS TYPE=HTM FOLDER=* FILE=*
Also a good and easy way to test extracted data is www.jsbin.com
to make valid:
xhtml
twitter cards
facebook-graph-api
for http://www.theyact.com/acting-classes/los-angeles/
I've managed to get my code to come up valid everywhere...
save 1 error on
http://validator.w3.org/
there is no attribute "property"
but only 1 instance among the many in the code, only the below seems to ruffle the validator's feathers:
<meta name="og:description" property="og:description" content="...
I'd like the code to be completely valid in validator.w3.org's eyes. What am I missing?
If you remove this element, the validator will complain about the next one containing the property attribute.
The property attribute is part of RDFa, but your DOCTYPE doesn’t allow the use of RDFa.
If you want to keep using XHTML 1.1, you could change it to:
for RDFa 1.0: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
for RDFa 1.1: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd">
Or simply switch to (X)HTML5, which comes with RDFa 1.1 support.
I'm interested in using pugixml to parse HTML documents, but HTML has some optional closing tags. Here is an example: <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
Pugixml stops reading the HTML as soon as it encounters a tag that's not closed, but in HTML missing a closing tag does not necessarily mean that there is a start-end tag mismatch.
A simple test of parsing the HTML documentation of pugixml fails because the meta tag is the second line of the HTML document: http://pugixml.googlecode.com/svn/tags/latest/docs/quickstart.html
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>pugixml 1.0</title>
<link rel="stylesheet" href="pugixml.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
<link rel="home" href="quickstart.html" title="pugixml 1.0">
</head>
<!--- etc... -->
A lot of HTML documents in the wild would fail if I try to parse them with pugixml. Is there a way to avoid that? If there is no way to "fix" that, then is there another HTML parsing tool that's as about as fast as pugixml?
Update
It would also be great if the HTML parser also supports XPATH.
I ended up taking pugixml, converting it into an HTML parser and I created a github project for it: https://github.com/rofldev/pugihtml
For now it's not fully compliant with the HTML specifications, but it does a decent enough job at parsing HTML that I can use it. I'm working on making it compliant with the HTML specifications.
One way to address this is to do some pre-processing that converts the HTML to XHTML, then it would "officially" be considered XML and usable with XML tools. If you want to go that route, see this question: How to convert HTML to XHTML?
Look at this situation:
www.websitea.com displays an img tag with a src attribute of www.websiteb.com/image.aspx?id=5 and style="display:none"
www.websiteb.com returns an clear image, in addition to a cookie with a name of referrer and value of 5 (created server-side from validated querystring.)
Would the cookie be created on domain www.websitea.com or www.websiteb.com?
Currently I'm sure a series of redirects with querystrings and to achieve cross-domain cookies, but I came up with this image idea a little ago. I guess I could also use an iframe.
Thanks!
Check out:
cross-domain-user-tracking
Someone mentions using a 1x1 image for tracking across domains.
The cookie would be created for websiteb.com.
The cookie is created from the request to websiteb.com so yea... the cookie goes to websiteb scope
You're on the right track. As others have mentioned, the cookie would be created for websiteb.com.
To overcome issues with IE you'll probably need to ad a Compact Privacy policy.
Start here: http://msdn.microsoft.com/en-us/library/ms537342.aspx and Google for the rest.
Ok looks good. Tested in all browsers. Added a P3P tag for IE6, not sure if it was necessary though.
<%# Page Language="VB" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<script runat="server">
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs)
Response.AddHeader("P3P", "CP=""CAO PSA OUR""")
Dim passedlocalizeID As String = Request.QueryString("id")
Dim localizeID As Integer
If passedlocalizeID IsNot Nothing AndAlso Int32.TryParse(passedlocalizeID, localizeID) Then
Dim localizer As New Localizer
localizer.LocalizeTo(localizeID)
End If
End Sub
</script>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title>Redirecting . . .</title>
<meta http-equiv="refresh" content="0;URL=/" />
</head>
<body>
<form id="form1" runat="server">
<div>
</div>
</form>
</body>
</html>