Request JSON Data from HTTPS with C++? - c++

I'm writing a program in C++ that needs to download JSON data from an HTTPS URL. The program is based on wxWidgets. That URL is for the translation service at Glosbe
So I've tried multiple different libraries including:
libcurl
Boost.Asio
the http functionality included in wxWidgets
wxCurl
Urdl
However, it always throws an error saying it can't connect, or I get a reply that says "Moved Permanently".
When i copy and paste the URL I am testing it with into a browser, it returns the JSON data perfectly.
Does anyone know the correct way to do this?
Any help would be great!

301 Moved Permanently is what the server responds when you try to access the page with HTTP instead of HTTPS. Here's a complete response I just received from the server:
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 16 Jul 2015 20:25:01 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://en.glosbe.com/a-api
It means exactly that: "The content you are looking for is really at https://en.glosbe.com/a-api." Your browser simply adheres to the HTTP protocol by following the server's hint and automatically proceeding to request https://en.glosbe.com/a-api when you try to access http://en.glosbe.com/a-api. It works seamlessly for you as a user.
You will have to read more documentation to create HTTPS requests yourself. Each of the libraries you mentioned will have a different way of supporting HTTPS (or not support it at all). For example, have a look at http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/overview/ssl.html, especially the "Notes" section where it says that "OpenSSL is required to make use of Boost.Asio's SSL support."

Related

Google Cloud Run: Webhook POST causes 400 Response

We are catching a BigCommerce webhook event in our Google Cloud Run application. The request looks like:
Headers
host: abc-123-ue.a.run.app
AccountId: ABC
Content-Type: application/json
Password: Goodbye
Platform: BC
User-Agent: akka-http/10.1.10
Username: Hello
Content-Length: 197
Connection: keep-alive
Body
{"created_at":1594914374,"store_id":"1001005173","producer":"stores/gy68868uk5","scope":"store/product/created","hash":"139fab64ded23b3e1b8473ba24ab21bedd3f535b","data":{"type":"product","id":132}}
For some reason, this causes a 400 response from Google Cloud Run. Our application doesn't even seem to be passed the request. All other endpoints work (including other post requests).
Any ideas?
Edit
In the original post, I had the path in the host header. This was a mistake made in creating this post and not the actual value passed to us. We can only inspect the request via Requestbin (I can't find the request values anywhere in Google logs) so I'm speculating on the host value and made a mistake writing it out here.
Research so far...
So upon further testing, it seems that BigCommerce Webhooks also fail to send to any Google Cloud Function we set up. As a workaround, I'm having Pipedream catch the webhook and send the payload to our application. No problems there. This endpoint also works with mirror payloads from local and Zapier which seems to eliminate authentication errors.
We are running FastAPI on Google Run and the simplest function on Google Cloud Functions. This seems to be an error with how Google Serverless and BigCommerce Webhook Events communicate with each other. I'm just not sure how...
Here are the headers we managed to capture on one of the only times a BigCommerce Webhook Event came through to our Google Cloud Function:
Content-Length: 197
Content-Type: application/json
Host: us-central1-abc-123.cloudfunctions.net
User-Agent: akka-http/10.1.10
Forwarded: for="0.0.0.0";proto=https
Function-Execution-Id: unes7v34vzyo
X-Appengine-Country: ZZ
X-Appengine-Default-Version-Hostname: f696ddc1d56c3fd66p-tp.appspot.com
X-Appengine-Https: on
X-Appengine-Request-Log-Id: 5f10e15c00ff082ecbb02ee3a70001737e6636393664646331643536633366643636702d7470000165653637393633633164376565323033383131366437343031613365613263303a36000100
X-Appengine-Timeout-Ms: 599999
X-Appengine-User-Ip: 0.0.0.0
X-Cloud-Trace-Context: a62207698d141465d0f38488492d088b/9870406606828581415
X-Forwarded-For: 0.0.0.0
X-Forwarded-Proto: https
Accept-Encoding: gzip
Connection: close
> host: abc-123-ue.a.run.app/bigcommerce/webhooks/
This is most likely the issue. Host headers must contain only the hostname, not the request /paths.
You can clearly see this will fail:
$ curl -IvH 'Host: pdf-2wvlk7vg3a-uc.a.run.app/foo' https://pdf-2wvlk7vg3a-uc.a.run.app
...
HTTP/2 400
However if you don't craft the Host header yourself, it will work.

JMETER This site does not specify a policy in the P3P header ERROR

I am trying to hit this URL https://subdomain.example.com in JMeter and recorded using the Blazemeter Chrome extension has all the necessary config elements but get an error:
HTTP/1.1 429 Too Many Requests
Content-Type: text/html; charset=utf-8
Content-Length: 1031
Connection: keep-alive
Cache-Control: private, no-cache, no-store, must-revalidate
Date: Tue, 20 Aug 2019 01:21:35 GMT
Expires: 0
p3p: CP="This site does not specify a policy in the P3P header"
I have tried coping the Header Cookies from Browser Header Response which works for sometime but then start throwing an error
As per HTTP Status Code 429 Too Many Requests description:
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").
A Retry-After header might be included to this response indicating how long to wait before making a new request.
So there are following options:
Your server is overloaded, in this case there is nothing you can do here apart from reporting the error as the bottleneck
Your script doesn't have proper correlation implemented, i.e. you're sending recorded hard-coded values instead of getting dynamic parameters
Your server doesn't allow such amount of requests from a single IP address within the given timeframe, you could try implementing IP Spoofing so your server would "think" that the requests are coming from the different machines.
Thanks for your reply. In the end I figured out that no limitation for number of calls implemented.
Now come to answer this is how I managed to work this:
Opened the page in chrome and from the header section copied all the header elements into the header manager hard coded.
First time it fails and returns p3p: CP="This site does not specify a policy in the P3P header" but also return the update variable value needed for next request which I extract and used in the next and subsequent Requests. The way I was able to find out which variable is changing by using the string comparison of 2 Response Headers
This was a difficult one but somehow worked with very minor change I also added the Header Manager to each request for safer side.

Validation of Akamai caching

We have cached html and png pages in Akamai by changing the waa config. But unable to validate it through fiddler, live http headers or through curl commands. Below are the screenshots. Please help if I missed any headers
Fiddler:
live http header
Curl command :
$ curl -H "Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no" -IXGET "url"
Response :
HTTP/1.1 200 OK
Date: Thu, 20 Oct 2016 19:15:53 GMT
Content-Length: 19836
Content-Type: image/png
X-FRAME-OPTIONS: DENY
Well in Akamai WAF(Web Application Firewall) there are few WAF rules that prevent display of pragma headers to users. You need to create an exception for those WAF rules and add only trusted IP's to it. Then you will be able to see the information that you are looking for. Thanks, Vinod
I would double check that you are definitely sending that Pragma header with the request and that it is also properly formed. I've seen a lot of problems with people trying to set this up and not getting it right.
Also it's worth reviewing your Akamai configuration because it is also possible to switch this off - some clients prefer to do this for security reasons.

URL forbidden 403 when using a tool but fine from browser

I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.
When I go to the image url on a browser it loads without a problem.
When I attempt to get the Header info via my code or via online tools it fails
An example URL is http://www.adorama.com/images/large/CHHB74P.JPG
As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code.
I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.
It also fails to do a normal GET request via Hurl.it. Same 403 error.
If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.
And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue
Any idea what is required for me to download the image headers and not get the 403?
same problem here.
Locally it works smoothly. Doing it from an AWS instance I get the very same problem.
I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.
My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.
Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.
I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 403 Forbidden
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 299
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 08:42:33 GMT
Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.
But from another Linux host it did work. Here is output
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Content-Type: image/jpeg
Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
ServerID: C01
Content-Length: 15131
Cache-Control: private, max-age=604800
Date: Wed, 23 Mar 2016 08:43:11 GMT
Connection: keep-alive
Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d
100%[=====================================>] 15,131 --.-K/s in 0s
2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]
I would guess that the image provider is deliberately blocking requests from EC2 address ranges.
The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing
Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check
Geoip, IP
Browser headers
User agents
plugin info
Browser fonts return
You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org
You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.
I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:
Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
https ://bar: okay
https://bar: 403
All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.
I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.

C++ HTTP always 301 using sockets

I'm sick of this. ALWAYS when I make a HTTP GET query from a C/C++ program using just plain sockets I get 301 Moved Permanently's. Normally I'd use libcURL, but in this case I don't want to add another library, I just need to download one flat identification file from one fixed server.
This is my current query:
GET /game/getversion.jsp?user=nightcracker&password=yeahright&version=12 HTTP/1.1\r\n
Connection: close\r\n
Host: www.minecraft.net\r\n
Accept-Encoding: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\r\n
\r\n
I have tried EVERYTHING, and everything just gets answered with this funny message:
HTTP/1.1 301 Moved Permanently
Server: nginx/0.6.32
Date: Tue, 15 Mar 2011 02:18:11 GMT
Content-Type: text/html
Content-Length: 185
Connection: close
Location: http://www.minecraft.net/game/getversion.jsp?user=nightcracker&password=yeahright&version=12
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/0.6.32</center>
</body>
</html>
I remember this issue from before and I ragequitted before. Now I want to fix this damn bugger. So tell me SO, why do all my HTTP queries always give back a 301?
Alright, besides the issue with the Accept-Encoding, the query was fine. The problem was that I resolved in my socket code to "minecraft.net" instead of "www.minecraft.net". RAAAAH. Fixed.
I can't see anything obviously wrong since the redirected URI appears to be the same as the original GET request URI, so I would suggest downloading the command-line curl and running that in verbose mode against the same target. Perhaps it will show something in its output that can point you in the right direction. There's a chance that this is a badly-configured server or badly written JSP, so keep that in mind.
I don't know if this is the problem you have on the Minecraft server (I don't have an account) but
Accept-Encoding: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\r\n
what the heck is that? Header fields that might go in requests include
Accept: MIME types (e.g. what you have there)
Accept-Charset: charsets (e.g. utf-8)
Accept-Encoding: encodings (e.g. gzip)
Accept-Language: languages (e.g. en)
and you seem to be mixing them up.
Well, the server is redirecting the client to another location. You just have to issue another
request to the URL coming back in the "Location" header of the 3xx respone
OOPs realized that the redirect location is the same as the original URI. DOes this URL work from the browser? If so you might try adding a User-Agent header in the request that contains the same User-Agent that the browser is sending.
You can either specify the correct URL (www.minecraft.net) or tell libcurl to follow redirects automatically:
curl_easy_setopt(curl_handle,CURLOPT_FOLLOWLOCATION,1);