Getting WebPage to use a specific URL to download HTML resources - c++

I have a Qt program that downloads webpages (HTML), parses them and then generates its own HTML which is then displayed with QWebPage. Some times the HTML that I download contains IMG tags, which work fine when the src attribute contains a full URL. However, some times the IMG tag might use a relative path like:
<IMG SRC="images/foo.png" />
Since I know the URL that should be prepended to the SRC my first thought was to just tack it onto my resulting HTML when I'm parsing. However, this is proving more difficult than I anticipated and now I'm wondering if there's a better way.
If there any mechanism/property with QWebPage that I can say "use this URL for relative paths"? Or maybe someone can suggest a better way to accomplish what I want?
Thanks!

In the comments, you mentioned that you're using QWebView::setHtml(). The second, optional parameter of this method sets the URL to use for resolving relative paths. According to the documentation:
External objects such as stylesheets or images referenced in the HTML
document are located relative to baseUrl.
Setting that parameter should be all that's needed here.

Related

Broken image paths on Github Pages (without Jekyll)

I recently pushed a static HTML site to Github Pages. Since it's not a blog, I opted not to use Jekyll. Now, of course, all of my relative image links are broken, and I've yet to find a fix that isn't specific to Jekyll.
Any ideas for a fix?
Will that fix continue to work once I switch from the username.github.io URL to a custom URL?
Not sure if I'm understanding the question correctly here, but could you not just move the images to the relevant place? For example, if in index.html you had
<img src="images/photo.png">
could you not just move photo.png to a directory /images in the same folder as index.html?
Alternatively, you could change the img tags' src attribute to instead point to the relevant location.
Both of these would continue to work, so long as the images are in the same directory as the html file, or a subdirectory of that directory.

What does this URL mean?

http://localhost/students/index.cfm/register?action=studentreg
I did not understand the use of 'register' after index.cfm. Can anyone please help me understand what it could mean? There is a index.cfm file in students folder. Could register be a folder name?
They might be using special commands within their .htaccess files to modify the URL to point to something else.
Things like pointing home.html -> index.php?p=home
ColdFusion will execute index.cfm. It is up to the script to decide what to do with the /register that comes after.
This trick is used to build SEO friendly URL's. For example http://www.ohnuts.com/buy.cfm/bulk-nuts-seeds/almonds/roasted-salted - buy.com uses the /bulk-nuts-seeds/almonds/roasted-salted to determine which page to show.
Whats nice about this is it avoids custom 404 error handlers and URL rewrites. This makes it easier for your application to directly manage the URL's used.
I don't know if it works on all platforms, as I've only used it on IIS.
You want to look into the cgi.PATH_INFO variable, it is populated automatically by CF server when such URL format used.
Better real-life example would look something like this.
I have an URL which I want to make prettier:
http://mybikesite/index.cfm?category=bicycles&manufacturer=cannondale&model=trail-sl-4
I can rewrite it this way:
http://mybikesite/index.cfm/category/bicycles/manufacturer/cannondale/model/trail-sl-4
Our cgi.PATH_INFO value is: /category/bicycles/manufacturer/cannondale/model/trail-sl-4
We can parse it using list functions to get the same data as original URL gives us automatically.
Second part of your URL is plain GET variable, it is pushed into URL scope as usually.
Both formats can be mixed, GET vars may be used for paging or any other secondary stuff.
index.cfm is using either a CFIF IsDefind("register") or a CFIF #cgi.Path_Info# CONTAINS statements to execute a function or perform a logic step.

Plot a graph in the html file using Django

I am doing a monitoring system using Django. In my views file, I have defined one class called showImage which collects the information necessary to plot a graph using matplotlib.
At the beginning, I just stored the image in a string buffer to represent it with HttpResponse:
buffer = StringIO.StringIO()
canvas = pylab.get_current_fig_manager().canvas
canvas.draw()
pilImage = PIL.Image.fromstring("RGB", canvas.get_width_height(), canvas.tostring_rgb())
pilImage.save(buffer, "PNG")
# Send buffer in a http response the the browser with the mime type image/png set
return HttpResponse(buffer.getvalue(), mimetype="image/png")
However, I need to implement some javaScript in the html file to add more applications. For that reason, I have decided to save the image in a variable and plot it in the html file:
# serialize to HTTP response
response = HttpResponse(buffer.getvalue(), mimetype="image/png")
return render_to_response('eQL/dev/showImage.html', {'response':response})
My question is that I don't really know how to represent it in the html file because I didn't find any example doing it. Any one knows the answer?
Thanks in advance!
Do you mean that in your first implementation, your response was a PNG file, but now you wish to make the response an HTML file instead, containing the image?
Well firstly, you need to change the response MIME type from image/png to text/html or similar.
Secondly, I'm not sure why you are passing a HttpResponse object (containing the PNG data) into the template. Can the template even read that? Surely you just want to be passing the raw PNG data, not a HttpResponse object.
Finally, how to do it. Well as you may know, HTML isn't so great at embedding images. As with normal websites, you can include text in the page, but if you want an image, you need a separate file and link to it using the <img src="..." /> element. This is tricky to do dynamically: it means you need to setup two separate URLs (one for the PNG and one for the HTML), which run independently of one another (you can't just have one piece of code; you need one handler for generating the PNG and the other for generating the HTML), and have the HTML link to the PNG URL.
If that is too hard, there is another way out, but it is a bit hacky: data URLs. They let you include image data in the HTML page itself, so you only need to produce one response. Unfortunately it is not well supported in Internet Explorer pre-9. IE8 supports images less than 32K, IE7 and below don't work. See the example on Wikipedia -- you are aiming to generate something like this:
<img src="
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />
Basically, take the PNG data, and Base64-encode it (use Python's base64 library). Then just put "data:image/png;base64," in front of it, and set that as the URL for the img src. In other words, pass the Base64-encoded string to Django's template engine, and construct the URL as part of the img tag in the template.

How to simply parse html references

how it is possible ,to simply parse html links. For example I receive http response containing http. In which you have links to other files, which need to be downloaded for example jpgs, css files,js files. What is the simplest way to parse all this references.
Use an HTML parser for your platform/language.
There are some recommendations for c++ ones here.
Once you have a parsed document, you will need to look at each src and href in it - you will also need to remember the base tag, if one exists and add logic for external, relative and absolute paths.

Multiple pages html output from a .rst document in Django

I'm writing a Django app to serve some documentation written in RestructuredText.
I have many documents written in *.rst, each of them is quite long with many section, subsection and so on.
Display the whole document in a single page is not a problem using Django filters, but I'd rather have just the topic index on a first page, whit links to an URL where I can display a single section / subsection (which will need some 'previous | up | home | next' link I guess...). In a way similar to a 'multiple HTML page output' as in a docbook / XML to HTML conversion.
Can anyone point me to some direction to build a document tree of a *.rst document an parse a single section of it, or suggest a clever way to obtain a similar result?
Choice 1. Include URL links to the other parts of the document.
You write an index.rst, part1.rst, part2.rst, etc. And your index.rst has links to the other parts. This requires almost no work, except careful planning to make sure that your RST HTML links are correct.
There's no "parse". You just break your document into sections. Manually.
[This seems so obvious, I'm afraid to mention it.]
Choice 2. Use Sphinx. It manages table-of-contents and inter-document connections very nicely.
However, the Sphinx extensions to RST aren't handled directly by Django, so you'd need to save the Sphinx output and then display that in Django. We use the JSON HTML Builder (http://sphinx.pocoo.org/builders.html?highlight=json#sphinx.builders.html.JSONHTMLBuilder) output from Sphinx. Then we render these documents through a template.