How to download compressed files using the curl C API? - c++

I want to download a compressed file from a URL using libcurl C API. I have the following code:
// CurlGet.h
#include <cstddef>
#include <cstdlib>
#include <cstring>
#include <curl/curl.h>
struct memory {
char *response;
size_t size;
};
size_t callBackWrite(void *data, size_t size, size_t nmemb, void *userp) {
size_t written = fwrite(data, size, nmemb, (FILE *) userp);
return written;
}
int curlGetC(const char *url, const char* output_filename) {
CURL *curl_handle;
curl_global_init(CURL_GLOBAL_ALL);
/* init the curl session */
curl_handle = curl_easy_init();
if (!curl_handle) {
throw std::logic_error("You no curl");
}
/* set URL to get here */
curl_easy_setopt(curl_handle, CURLOPT_URL, url);
/* Switch on full protocol/debug output while testing */
curl_easy_setopt(curl_handle, CURLOPT_VERBOSE, 1L);
/* disable progress meter, set to 0L to enable it */
curl_easy_setopt(curl_handle, CURLOPT_NOPROGRESS, 0L);
/* send all data to this function */
curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, callBackWrite);
/* open the file */
FILE *f = fopen(output_filename, "wb");
if (!f) {
throw std::invalid_argument("You no got file");
}
/* write the page body to this file handle */
curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, f);
/* get it! */
curl_easy_perform(curl_handle);
/* close the header file */
fclose(f);
/* cleanup curl stuff */
curl_easy_cleanup(curl_handle);
curl_global_cleanup();
return 0;
}
Then using this code to download a web page works as expected but downloading an omex file (which is actually just a zip file with the omex extension name) does not:
#include "CurlGet.h"
#include <iostream>
// works as expected
std::string url1 = "https://isocpp.org/wiki/faq/mixing-c-and-cpp";
std::string output_filename1 = "/mnt/d/libsemsim/semsim/example.html";
curlGetC(url1_.c_str(), output_filename1_.c_str());
// downloaded file is 0 bytes.
std::string url2 = "https://auckland.figshare.com/ndownloader/files/17432333";
std::string output_filename2 = "/mnt/d/libsemsim/semsim/example.omex";
curlGetC(url2_.c_str(), output_filename2_.c_str());
Could anybody suggest how to modify my code to get it to download the compressed file?
edit : Showing the verbose traces:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 52.48.88.255...
* TCP_NODELAY set
* Connected to auckland.figshare.com (52.48.88.255) port 443 (#0)
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=GB; L=London; O=figshare LLP; CN=*.figshare.com
* start date: Mar 20 00:00:00 2019 GMT
* expire date: Jul 9 12:00:00 2020 GMT
* subjectAltName: host "auckland.figshare.com" matched cert's "*.figshare.com"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
> GET /ndownloader/files/17432333 HTTP/1.1
Host: auckland.figshare.com
Accept: */*
< HTTP/1.1 302 Found
< Date: Sun, 12 Apr 2020 10:43:10 GMT
< Content-Type: application/octet-stream
< Content-Length: 0
< Connection: keep-alive
< Server: nginx
< X-Storage-Protocol: https
< X-Filename: BIOMD0000000204_new.omex
< Location: https://objectext.auckland.ac.nz/figshare/17432333/BIOMD0000000204_new.omex
< X-Storage-Host: objectext.auckland.ac.nz
< X-Storage-File: 17432333/BIOMD0000000204_new.omex
< X-Storage-Bucket: figshare
< Content-Disposition: attachment;filename=BIOMD0000000204_new.omex
< Cache-Control: no-cache, no-store
< Set-Cookie: fig_tracker_client=0975a192-4ec5-4a63-a800-c598eb7ca6b5; Max-Age=31536000; Path=/; expires=Mon, 12-Apr-2021 10:43:10 GMT; secure; HttpOnly
< X-Robots-Tag: noindex
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Strict-Transport-Security: max-age=31536000; includeSubDomains;
< Cache-Control: public, must-revalidate, proxy-revalidate
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Methods: GET, OPTIONS
< Access-Control-Allow-Headers: Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,Range
< Access-Control-Expose-Headers: Location,Accept-Ranges,Content-Encoding,Content-Length,Content-Range
<
0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0
* Connection #0 to host auckland.figshare.com left intact

This really has nothing to do with the fact that the traget file is compressed. Zip files are archives whose components are compressed individually; it is not possible to decompress a zip file into a single meaningful object. That's different from gzipped tar archives, for example. (However it is not generally desirable for a user agent to automatically decompress a .tgz file into a .tar file, even though it could.)
Your problem stems from the fact that you didn't provide the full URI for the file. The web server responded by sending a redirect (302) return code. That tells the user agent to make a new request for the resource, using the URI provided in the Location response header.
You need to tell libcurl to follow redirects.
curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1L);
302 redirects differ from 301 redirects in that the redirection is marked as temporary. The 301 return code suggests to the user agent that it should remember the redirection and not attempt to use the original URL in the future. A 302 response should not be cached; it might, for example, be used to provide the location of what is currently the most recent version of a resource.

here is (probably) what happened:
You sent a request without the Accept-Encoding header, the server (foolishly, imo) assumed that since you didn't specify any specific transfer encodings, you probably support gzip.. (sounds stupid, i know, but the proper way to say "i dont support any transfer encodings" is to send the header Accept-Encoding: identity, but you didn't do that), and the server decided to answer with Content-Encoding: gzip, which your code ignored. what happens next is that gzip-compressed data was saved in your "output_filename".
to tell curl to automatically deal with encodings (which is the easiest solution, the vast majority of the time), just set CURLOPT_ACCEPT_ENCODING to emptystring, this tells curl to attempt to do the tansfer compressed, and automatically decompress the response before writing it:
curl_easy_setopt(curl_handle, CURLOPT_ACCEPT_ENCODING, "");
that should fix your problem. now curl will send a header looking like Accept-Encoding: gzip, deflate, br (the exact compression algorithms sent will depend on what your libcurl was compiled to support), and the server will choose 1 of those encodings, or if the server doesn't support any of the encodings your libcurl supports, the server should send the data uncompressed,
and curl in turn will auto-decompress the data before sending it to CURLOPT_WRITEFUNCTION
you can find relevant documentation here: https://curl.haxx.se/libcurl/c/CURLOPT_ACCEPT_ENCODING.html

Related

libcurl HTTP request set content-disposition and content-type in MIME data

I'm trying to upload an image on twitter using libcurl, I used the twurl command line tool to generate an HTTP request and see how it should look like, what I get is this:
POST /1.1/media/upload.json HTTP/1.1
Accept: */
Content-Type: multipart/form-data, boundary="00Twurl342528555775455418lruwT99"
Authorization: OAuth oauth_body_hash="XXX", oauth_consumer_key="XXX", oauth_nonce="XXX", oauth_signature="XXX", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1603308767", oauth_token="XXX", oauth_version="1.0"
Connection: close
Host: upload.twitter.com
Content-Length: 612739
--00Twurl342528555775455418lruwT99
Content-Disposition: form-data; name="media"; filename="image.png"
Content-Type: application/octet-stream
binary data of image.png
--00Twurl342528555775455418lruwT99--
The request that I can generate via libcurl (got it using curl verbose) for the moment is this one:
POST /1.1/media/upload.json HTTP/2
Host: upload.twitter.com
accept: */*
authorization: OAuth oauth_consumer_key="XXX",oauth_nonce="XXX",oauth_signature="XXX",oauth_signature_method="HMAC-SHA1",oauth_timestamp="1603372043",oauth_token="XXX",oauth_version="1.0"
content-length: 268
content-type: multipart/form-data; boundary=------------------------d1b0fc28e693c24a
Using the following code:
curl_mime *mime = nullptr;
curl_mimepart *part = nullptr;
mime = curl_mime_init(request_handle);
part = curl_mime_addpart(mime);
curl_mime_name(part, "media");
curl_mime_filename(part, "image.png");
curl_easy_setopt(request_handle, CURLOPT_MIMEPOST, mime);
The problem is that I don't know how to make my request similar to the first one with libcurl, how do I specify Content-Type and Content-Disposition ?
Edit: solution
Full code
curl_mime* mime = nullptr;
curl_mimepart* part = nullptr;
/* initialize mime part */
mime = curl_mime_init(request_handle);
part = curl_mime_addpart(mime);
/* content-disposition: form-data; name="media"; filename="image.png" */
curl_mime_name(part, "media");
curl_mime_filename(part, "image.png");
/* add file content */
curl_mime_filedata(part, "image.png");
/* content-type: application/octet-stream */
curl_mime_type(part, "application/octet-stream");
/* link the MIME data to your curl handle */
curl_easy_setopt(request_handle, CURLOPT_MIMEPOST, mime);
I didn't do it to highlight the functions to use, but check function return.
how do I specify Content-Type and Content-Disposition ?
Just read the fine manual (which you can navigate to from the fine example postit2.c)
CURLcode curl_mime_type(curl_mimepart * part, const char * mimetype);
curl_mime_type sets a mime part's content type.
CURLcode curl_mime_filename(curl_mimepart * part, const char * filename);
curl_mime_filename sets a mime part's remote file name. When remote file name is set, content data is processed as a file, whatever is the part's content source. A part's remote file name is transmitted to the server in the associated Content-Disposition generated header.
The official libcurl tutorial is also a nice read.

How to send JSON data to a REST API?

I'm sending data to a Wordpress site with the WooCommerce plugin installed using libcurl in C++ and the WooCommerce REST API. The data seems to get sent but the expected result is not shown on the website. The purpose of it is to update (modify) the product. My code is based on the WooCommerce documentation, found here.
I have managed to get the CURLOPT_VERBOSE text from the program in a separate txt file.
Here is my C++ code using cURL :
std::string URL = main_domain + "wp-json/wc/v3/products/" + product_id + "?consumer_key=" + consumer_key + "&consumer_secret=" + consumer_secret;
curl_slist* headers = NULL;
headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
headers = curl_slist_append(headers, "Accept:application/json");
headers = curl_slist_append(headers, "Content-Type:application/json");
headers = curl_slist_append(headers, "charsets: utf-8");
// log file
FILE* filep = fopen("logfichier.txt", "w");
std::string toUpdate = "{\"id\":\"" + product_id + ",\"name\":\"" + product_name + "\",\"description\":\"" + product_description + "\",\"price\":\"" + product_price + "\"}";
curl_global_init(CURL_GLOBAL_ALL);
curl = curl_easy_init();
if (curl) {
readBuffer = "";
curl_easy_setopt(curl, CURLOPT_URL, URL.c_str());
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_CUSTOMREQUEST, "PUT");
curl_easy_setopt(curl, CURLOPT_POST, 1);
curl_easy_setopt(curl, CURLOPT_POSTFIELDS, toUpdate.c_str());
curl_easy_setopt(curl, CURLOPT_POSTFIELDSIZE, toUpdate.length());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_setopt(curl, CURLOPT_VERBOSE, true);
curl_easy_setopt(curl, CURLOPT_STDERR, filep);
res = curl_easy_perform(curl);
// Check for errors
if (res != CURLE_OK) {
// error handling and cleanup
}
else {
// code and cleanup
}
}
else {
// error handling and cleanup
}
I've literally put every header found on the internet that seemed relevant to what i'm trying to acomplish in my code.
Here is the returned debug text :
* STATE: INIT => CONNECT handle 0x10870278; line 1428 (connection #-5000)
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0x10870278; line 1464 (connection #0)
* Trying 192.XX.XX.XX...
* TCP_NODELAY set
* STATE: WAITRESOLVE => WAITCONNECT handle 0x10870278; line 1545 (connection #0)
* Connected to mywebsite.com (192.XX.XX.XX) port 443 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x10870278; line 1599 (connection #0)
* Marked for [keep alive]: HTTP default
* schannel: SSL/TLS connection with mywebsite.com port 443 (step 1/3)
* schannel: checking server certificate revocation
* schannel: sending initial handshake data: sending 176 bytes...
// (here was just a bunch of connexion attemps log text...)
* schannel: SSL/TLS handshake complete
* schannel: SSL/TLS connection with mywebsite.com port 443 (step 3/3)
* schannel: stored credential handle in session cache
* STATE: PROTOCONNECT => DO handle 0x10870278; line 1634 (connection #0)
> PUT /wp-json/wc/v3/products/111867?consumer_key=(the actual key)&consumer_secret=(the actual secret) HTTP/1.1
Host: mywebsite.com
Transfer-Encoding: chunked
Accept:application/json
Content-Type:application/json
charsets: utf-8
4b
* upload completely sent off: 82 out of 75 bytes
* STATE: DO => DO_DONE handle 0x10870278; line 1696 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x10870278; line 1823 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x10870278; line 1838 (connection #0)
* schannel: client wants to read 16384 bytes
* schannel: encdata_buffer resized 17408
* schannel: encrypted data buffer: offset 0 length 17408
// (a few decrypting data attempts...)
* schannel: decrypted data returned 536
* schannel: decrypted data buffer: offset 0 length 16384
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
< Date: Tue, 18 Jun 2019 15:27:42 GMT
* Server Apache is not blacklisted
< Server: Apache
< X-Robots-Tag: noindex
< Link: <https://mywebsite.com/wp-json/>; rel="https://api.w.org/"
< X-Content-Type-Options: nosniff
< Access-Control-Expose-Headers: X-WP-Total, X-WP-TotalPages
< Access-Control-Allow-Headers: Authorization, Content-Type
< Expires: Wed, 11 Jan 1984 05:00:00 GMT
< Cache-Control: no-transform, no-cache, must-revalidate, max-age=0
< Allow: GET, POST, PUT, PATCH, DELETE
< Transfer-Encoding: chunked
< Content-Type: application/json; charset=UTF-8
<
* schannel: client wants to read 16384 bytes
* schannel: encrypted data buffer: offset 835 length 17408
// (a few decrypting data attempts...)
* schannel: decrypted data returned 1986
* schannel: decrypted data buffer: offset 0 length 16384
* STATE: PERFORM => DONE handle 0x10870278; line 2011 (connection #0)
* multi_done
* Connection #0 to host axanti.info left intact
I took out a few redundant part from the original text and kept what I think is the main piece of information. It seems that my JSON data is actually sent to the server but the intended result doesn't show up on my website (a product should be modified but it's actually not).
Is there any way this code could be wrong ? Or is the problem on the server side ? Because I litteraly apply the same stuff that is mentioned in the official documentation.
Looks like your payload if off. The id portion is redundant, as you're already specifying the product to update via the URL, so you can drop that. Additionally, you're attempting to set the price incorrectly. Per the REST docs, you need to use the regular_price attribute instead of price (price is read only). The proper payload should look like this:
{
name: 'My product name',
description: 'my product description',
regular_price' : '3.50',
}

webserver based on node.js couldn't parse the right data which sent by method 'POST' via CURL

I implemented a simple webserver based on node.js, which handle the data
sent by method 'POST' via libCURL.
Client: c++ code based on libCURL
Server: Node.js
Result:
Client
I implemented a simple webserver based on node.js, which handle the data
sent by method 'POST' via libCURL.
Client: c++ code based on libCURL
Server: Node.js
###Client####
>>>
* Found bundle for host 109.123.121.146: 0x1b13f00 [can pipeline]
* Re-using existing connection! (#0) with host 109.123.121.146
* Connected to 109.123.121.146 (109.123.121.146) port 8124 (#0)
> POST / HTTP/1.1
Host: 109.123.121.146:8124
Accept: */*
Content-Type: text/xml
Content-Length: 11
* upload completely sent off: 11 out of 11 bytes
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Type: text/plain;charset=utf-8
Content-Type: text/plain;charset=utf-8
< Date: Thu, 05 Jan 2017 21:14:05 GMT
Date: Thu, 05 Jan 2017 21:14:05 GMT
< Connection: keep-alive
Connection: keep-alive
< Transfer-Encoding: chunked
Transfer-Encoding: chunked
<
* Connection #0 to host 109.123.121.146 left intact
###Server####
####### 'POST' ######
Partial body: B=~� size:11
Body: B=~� size:11 ---->> why the data is invalid char, size is right.
Source Code
Client.cpp
string strUrl = "http://109.123.121.146:8124/";
string strLog = "Hello,CURL!";
curl_easy_setopt(g_curl_handle, CURLOPT_URL, strUrl.c_str());
curl_easy_setopt(g_curl_handle, CURLOPT_POST, 1);
curl_easy_setopt(g_curl_handle, CURLOPT_POSTFIELDS, strLog.c_str());
Server.js
req.addListener('data', function (data) {
body += data;
console.log("Partial body: " + data + " size:"+data.toString().length);
});
req.addListener('end', function () {
console.log("Body: " + body + " size:"+body.toString().length);
});
I wonder why the request data is invalid char:"B=~�" , or not the right string "Hello,CURL!",
Anyone konws, help to explain to me, thanks ~~~

sf::Http::sendRequest never returns

I've written a simple web service using pistache. I'm seding requests to it using sf::Http and sf::Http::Request classes. However, call of sf::Http::sendRequest never returns, even though I specified a 250 ms timeout. The thing happens only with requests to my website. If I send GET request to www.google.com the method returns correct response quite quickly.
Here's the client-side code sample:
sf::Http http("http://192.168.1.10", 8080);
sf::Http::Request request("/highscores", sf::Http::Request::Method::Get);
request.setHttpVersion(1, 1);
//the call below never returns
auto response = http.sendRequest(request, sf::seconds(0.25f));
std::cout << response.getBody();
The service response seems correct in browser and in curl:
$ curl -v 192.168.1.10:8080/highscores
* Trying 192.168.1.10...
* Connected to 192.168.1.10 (192.168.1.10) port 8080 (#0)
> GET /highscores HTTP/1.1
> Host: 192.168.1.10:8080
> User-Agent: curl/7.47.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Connection: Keep-Alive
< Content-Length: 2
<
* Connection #0 to host 192.168.1.10 left intact
[]%
Using strace on my application shows that it sends correct request and even at some point it receives the correct response:
$ strace -s 192 ./sfmlApplication
...
sendto(20, "GET /highscores HTTP/1.1\r\nconnection: close\r\ncontent-length: 0\r\ncontent-type: application/json\r\nfrom: user#sfml-dev.org\r\nhost: 192.168.1.10\r\nuser-agent: libsfml-network/2.x\r\n\r\n", 176, MSG_NOSIGNAL, NULL, 0) = 176
recvfrom(20, "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\nConnection: Keep-Alive\r\nContent-Length: 2\r\n\r\n[]", 1024, MSG_NOSIGNAL, NULL, NULL) = 96
recvfrom(20,
These are the last lines from strace output, after recvfrom(20, the program stops responding and has to be killed.
And the top of stack trace of blocked operation is:
recv() at 0x7ffff7bcd10f
sf::TcpSocket::receive() at 0x7ffff77b12c0
sf::Http::sendRequest() at 0x7ffff77ad5ed
SFML Version: 2.3.2
System: Fedora 4.8.4-200.fc24.x86_64
Any ideas why the sf::Http::sendRequest method call never returns?

cURL requests changed

I've start work with cURL library, before work i compile library. i Send request and have some problem. Code in c++ that i used for work with cURL:
CURL *curl=NULL;
CURLcode res;
struct curl_slist *headers=NULL; // init to NULL is important
curl_slist_append(headers, "POST /oauth/authorize HTTP/1.1");
curl_slist_append(headers, "Host: sp-money.yandex.ru");
curl_slist_append(headers, "Content-Type: application/x-www-form-urlencoded");
curl_slist_append(headers, "charset: UTF-8");
curl_slist_append(headers, "Content-Length: 12345");
curl = curl_easy_init();
if(!curl)
return 0;
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);
curl_easy_setopt(curl, CURLOPT_URL, "sp-money.yandex.ru");
curl_easy_setopt(curl, CURLOPT_PROXY, "127.0.0.1:8888");
if( curl_easy_perform(curl)!=CURLE_OK)
return 1;
I've used proxy, fiddler2, for check what data sent to server. When i check sent data
i get result:
POST HTTP://sp-money.yandex.ru/ HTTP/1.1
Host: sp-money.yandex.ru
Accept: */*
Connection: Keep-Alive
Content-Length: 151
Content-Type: application/x-www-form-urlencoded
also i check this data using Wiresharck, result the same.
Do you know why in first line cURL wrote:
POST HTTP://sp-money.yandex.ru/ HTTP/1.1
I send
POST /oauth/authorize HTTP/1.1
I've used VS 2010 for work, and also i don't used framework
POST /oauth/authorize HTTP/1.1 is not a header, it the HTTP Verb (POST) with URL and version.
I suppose you need to put that elsewhere (will look at the docs for second noww)
The POST line doesn't belong to headers, it should be set with CURLOPT_URL, CURLOPT_POST, and something like that for the protocol. Actually, the same goes for the Host: header, it is inferred from URL.