Does libcurl load complete page in single shot? - c++

I'm using libcurl to fire HTTP request.
Does lincurl load complete page in single shot,
or for sub-links on page i.e. .css or. png file it request separately.

libcurl does not automatically send any sub-requests for any links in the requested resource. This would be a completely unreasonable behaviour for any linked media.
To retrieve linked media, you have to extract the links from the resource you initially retrieve, and then do separate requests for them as needed (just like a web browser does behind the scenes).

Related

Static content on CloudFront is cached incorrectly over time

I have set up a CloudFront on top of multiple S3 buckets (in different regions) to provide a fast stable version of my webapp. This webapp is implemented with React which means it's all one single HTML file and one single Javascript file.
Using the routing mechanism of React, all the paths in the URL are handled within the code. This means if I click on a link like www.example.com/users, there won't be a request sent to the server. Instead, the client code will render the appropriate page without any consultation with the server (I'm just talking about the HTML and not considering the data). This means that if some user types in the given URL, the server should return the index.html (the only HTML file I have) which then will take care of the URL on the client-side. In other words, all the requests sent to the server should either return the HTML file or the Javascript file I mentioned earlier. Even the requests that are pointing to none-existing files.
In order to implement this requirement, I asked this question and I got an answer like this:
I need to set up an error page for my distribution on CloudFront and
redirect all the 403 (Forbidden) requests to /index.html file. This
is because when the request is pointing to a nonexisting file on S3,
S3 will return 403 to CloudFront due to the lack of listing
permission. Or I can grant the listing permission and instead handle
the 404 error (I didn't test this latter option).
Anyways, I set this up and it works perfectly - for a few hours. But then, for some unknown reason, the request to the Javascript file also returns the HTML file. And of course, all I'm getting back is actually coming from CloudFront's cache which means, no matter how many times I send the request, it will keep returning the same value. That is until I invalidate the cache on CloudFront which will solve the problem for few more hours. And we go around and around.
Even though I'm not sure why this happens but my guess is that at some point the S3 buck is inaccessible to CloudFront which will result in CloudFront caching the index.html. What can I do about this?
I think I found the problem:
MAKE SURE YOUR STATIC CONTENT ON ALL THE S3 BUCKETS ARE IDENTICAL!!!
In my case, the Javascript filename is automatically generated by Webpack which means it's random. And since different regions were "compiled" separated, their filenames differed.

What's the http request for the page source?

I've managed to make a file downloader in C++ (using winsock). It downloads every simple link with a file like: www.page.com/image.png
I want to make it download all of the images from an entire page, such as all the images from a 4chan thread, but I don't know what I should send in the http request to get the page's source. How can I request the source of a webpage?
You don't send anything in the http request, in the manner you're thinking.
An http request sends a single request, for a single document, and returns a single document from the server.
To download an entire page, you will have to parse the downloaded HTML document, extract all the relative links from the HTML source, then issue a separate http request for every image, css, js, etc... referenced from the main document.
This is how tools like wget's --recursive option download entire pages.
If the page is located at the root of the http://www.page.com server, you would send a GET request to the www.page.com server asking for the / resource:
GET / HTTP/1.1
Host: www.page.com
Let's say the page was actually located at http://www.page.com/thepage.html. You would send a GET request asking for /thepage.html instead:
GET /thepage.html HTTP/1.1
Host: www.page.com
Either way, you would then have to parse the resulting HTML to get the individual URLs of all the <img> tags that are on the page.

Does qtwebkit load complete page in single shot?

I'm using qtwebkit for HTTP request,
Does qt-webkit load http page in single shot? Does Qt-webkit creates separate requested for java script(.js), style sheet (.css), image links?
If it create separate request for this links, do we have access/control over that request?
First of all, single HTTP request fetches single file. So there are multiple requests anyway.
Second, a single TCP connection could be re-used - a persistent one.
See QNetworkAccessManager and HTTP persistent connection for more info.

How to open the default browser in background and get the source code of a web page?

I'm using Dev-C++ and i'm looking for a mode to open(...or better...i need to load a browser intance in the background) the default browser (Example I.E.) and send a request to get the source code of the page I requested.
Can I do something like this in C++?
Thank you!
P.S. I need this for Windows
You seem to have imagined the wrong solution for your problem. If you want to get the HTML source for a web page, you don't need to somehow do it through the browser. You need to do whatever the browser does to get it.
When you enter an address into a browser, the browser sends a HTTP GET request to the server that hosts the resource you're requesting (often a web page) and the server sends a HTTP response back containing the resource content (often HTML) back.
You want to do the same in your application. You need to send a HTTP request to the server and read the response. A popular library for doing this is libcurl.
If you don't have to send a POST request (i.e. just a simple web request or with parameters passed on the URL (GET), then you could just use URLDownloadToFile().
If you don't want to use callbacks etc. and just download the file, you can call it rather simple:
URLDownloadToFile(0, "http://myserver/myfile", "C:\\mytempfile", 0, 0);
There are also a few other functions provided that will automatically push the downloaded data to a stream rather than a file.
It can't be done in pure C++. You should use native Windows library or other (like Qt Framework) and use it's capabilities of getting and parsing website. In Qt, you'd use QtWebkit.
edit: also if you want only the source code of a page, you can do this without using browser or their engines, you can use Winsock.

api request returns json files and not html/xml browser content

I am sending get httpwebrequests to the facebook graph api and all was working fine till I deployed to production server and now module that expects html/xml response is not working and when tested url in internet explorer, the save file dialog pops up and the file needs to be saved.
Other modules also send requests to the facebook graph but just differ in the form of requests so not sure what is going on here.
Any ideas appreciated
Edit:
Let me try and rephrase this. On my production server the httpwebrequest was not returning the correct result. So to Test it I copied the url http://graph.facebook.com/pepsi which is an example, should return the profile info viewable in the browser. The server has internet explorer v8 and I am not sure why it tries to download the file instead of displaying it in the browser. this is what is happening in my code and when I make a request to a different part of the api, then it works in my app but not in the browser
Your question is not very clear. From what I gather, you want the display the JSON response in a browser. Instead, you are being asked to download a file by the browser.
Well, this is normal behaviour. The response you get from Facebook would most likely have a MIME type of application/json. Most newer web browsers display the text in the browser itself. Some browsers, however don't know how to handle this content type and just ask you to download the file.
You mentioned that your module expects an html/xml response. Try changing this to application/json.
You also said that it works in your app but not in your browser. I don't know what you're making, but generally you wouldn't show raw json to the user in a browser, right?