URL forbidden 403 when using a tool but fine from browser - amazon-web-services

I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.
When I go to the image url on a browser it loads without a problem.
When I attempt to get the Header info via my code or via online tools it fails
An example URL is http://www.adorama.com/images/large/CHHB74P.JPG
As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code.
I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.
It also fails to do a normal GET request via Hurl.it. Same 403 error.
If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.
And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue
Any idea what is required for me to download the image headers and not get the 403?

same problem here.
Locally it works smoothly. Doing it from an AWS instance I get the very same problem.
I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.
My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.
Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.

I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 403 Forbidden
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 299
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 08:42:33 GMT
Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.
But from another Linux host it did work. Here is output
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Content-Type: image/jpeg
Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
ServerID: C01
Content-Length: 15131
Cache-Control: private, max-age=604800
Date: Wed, 23 Mar 2016 08:43:11 GMT
Connection: keep-alive
Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d
100%[=====================================>] 15,131 --.-K/s in 0s
2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]
I would guess that the image provider is deliberately blocking requests from EC2 address ranges.
The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing

Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check
Geoip, IP
Browser headers
User agents
plugin info
Browser fonts return
You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org
You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.

I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:
Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
https ://bar: okay
https://bar: 403
All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.
I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.

Related

How to send a valid HTTP request to a Google Apps Script Web App?

We sent an HTTP request from a C++ app (Arduino Sketck) to a Google apps script web app, but we got the HTTP Response: HTTP/1.1 302 Moved Temporarily. The url with the http request works fine from a browser.
The same code works also fine with other web site, like www.google.com. Do not work with script.google.com.
The Google apps script published web app is public, anyone even anonymous can access:
Here the code we used.
client.println("GET /macros/s/AKfycbyQnmHekk4_NNy3Bl5ILzuSRkykMWaXQ7Rtojk7fFieDUbVqNM/exec?valore=7 HTTP/1.1");
client.println("Host: script.google.com");
client.println("Connection: close");
client.println();
The answer was:
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Date: Wed, 03 Feb 2021 09:29:02 GMT
Location: https://script.google.com/macros/s/AKfycbyQnmHekk4_NNy3Bl5ILzuSRkykMWaXQ7Rtojk7fFieDUbVqNM/exec?valore=7
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Content-Security-Policy: frame-ancestors 'self'
X-XSS-Protection: 1; mode=block
Server: GSE
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close
Transfer-Encoding: chunked
11e
<HTML>
<HEAD>
<TITLE>Moved Permanently</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000">
<H1>Moved Permanently</H1>
The document has moved here.
</BODY>
</HTML>
0
disconnecting from server.
The url is correct (
http://script.google.com/macros/s/AKfycbyQnmHekk4_NNy3Bl5ILzuSRkykMWaXQ7Rtojk7fFieDUbVqNM/exec?valore=7) but seems that the google apps script web app redirect the request (to the same url, using the https protocol).
Using the same code, we did others HTTP request from Arduino, and it worked fine.
For example we did:
client.println("GET /search?q=arduino HTTP/1.1");
client.println("Host: www.google.com");
client.println("Connection: close");
client.println();
And we got the response `` HTTP/1.1 200 OK ```, and the html response contains the search result according with the query q=arduino
Any suggestion on how we can send a valid http/https request to a Google apps script web app?
Thanks.
As you have noticed, the Google script app is redirecting you from HTTP to HTTPS. Some Google sites are accessible via HTTP, they don't have to redirect to HTTPS if they don't want to. In your example, http://www.google.com/search?q=arduino does redirect, to https://www.google.com/search?q=arduino&gws_rd=ssl. But, your client is not sending a User-Agent header in the request, so Google knows your client is not a browser, and might not be issuing the redirect in your case. But in a real browser, it does.
Putting the URL http://script.google.com/macros/s/AKfycbyQnmHekk4_NNy3Bl5ILzuSRkykMWaXQ7Rtojk7fFieDUbVqNM/exec?valore=7 into a browser does redirect to https://script.google.com/macros/s/AKfycbyQnmHekk4_NNy3Bl5ILzuSRkykMWaXQ7Rtojk7fFieDUbVqNM/exec?valore=7. A real browser will follow that redirect automatically, a user might not even notice the difference.
But your client will have to follow the redirect manually. That means extracting the Location header from the response, closing the existing connection (to script.google.com on port 80), connecting to the specified server (script.google.com on port 443), and initiating an SSL/TLS encrypted session with the server before you can finally send the HTTP request.
SSL/TLS is complex, and HTTP has a lot of rules to it. Don't try to implement them manually. You are best off using an existing HTTP library that has HTTPS support built in. Let it handle all of these details for you.

Django on Gunicorn/Nginx - Stripe Webhooks Always Getting 400

Production Setup: Django v3.0.5 on Nginx / Gunicorn / Supervisor (i followed directions from here)
(I don't think this is any issue but i am using dj-stripe for django/stripe integration)
While on development (django's built-in HTTP server).. everything seems to work (i.e. stripe can send webhook events just fine)... however, on production, i get emails saying that Stripe can't reach my server.
When I run
curl -D - -d "user=user1&pass=abcd" -X POST https://my.server/stripe/webhook/
I get this response
HTTP/1.1 400 Bad Request
Server: nginx/1.15.9 (Ubuntu)
Date: Thu, 18 Jun 2020 19:44:07 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 0
Connection: keep-alive
X-Frame-Options: SAMEORIGIN
Vary: Cookie
However, non-webhook (i.e. visiting the website via browser) seems to work normally.. just webhooks.
Any idea where this is going wrong?
Your request doesn't have the Stripe secret which is needed for authentication.

Validation of Akamai caching

We have cached html and png pages in Akamai by changing the waa config. But unable to validate it through fiddler, live http headers or through curl commands. Below are the screenshots. Please help if I missed any headers
Fiddler:
live http header
Curl command :
$ curl -H "Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no" -IXGET "url"
Response :
HTTP/1.1 200 OK
Date: Thu, 20 Oct 2016 19:15:53 GMT
Content-Length: 19836
Content-Type: image/png
X-FRAME-OPTIONS: DENY
Well in Akamai WAF(Web Application Firewall) there are few WAF rules that prevent display of pragma headers to users. You need to create an exception for those WAF rules and add only trusted IP's to it. Then you will be able to see the information that you are looking for. Thanks, Vinod
I would double check that you are definitely sending that Pragma header with the request and that it is also properly formed. I've seen a lot of problems with people trying to set this up and not getting it right.
Also it's worth reviewing your Akamai configuration because it is also possible to switch this off - some clients prefer to do this for security reasons.

Request JSON Data from HTTPS with C++?

I'm writing a program in C++ that needs to download JSON data from an HTTPS URL. The program is based on wxWidgets. That URL is for the translation service at Glosbe
So I've tried multiple different libraries including:
libcurl
Boost.Asio
the http functionality included in wxWidgets
wxCurl
Urdl
However, it always throws an error saying it can't connect, or I get a reply that says "Moved Permanently".
When i copy and paste the URL I am testing it with into a browser, it returns the JSON data perfectly.
Does anyone know the correct way to do this?
Any help would be great!
301 Moved Permanently is what the server responds when you try to access the page with HTTP instead of HTTPS. Here's a complete response I just received from the server:
HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 16 Jul 2015 20:25:01 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive
Location: https://en.glosbe.com/a-api
It means exactly that: "The content you are looking for is really at https://en.glosbe.com/a-api." Your browser simply adheres to the HTTP protocol by following the server's hint and automatically proceeding to request https://en.glosbe.com/a-api when you try to access http://en.glosbe.com/a-api. It works seamlessly for you as a user.
You will have to read more documentation to create HTTPS requests yourself. Each of the libraries you mentioned will have a different way of supporting HTTPS (or not support it at all). For example, have a look at http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/overview/ssl.html, especially the "Notes" section where it says that "OpenSSL is required to make use of Boost.Asio's SSL support."

Use libCurl with Bluecoat cookie proxy

I am trying to connect through a Bluecoat proxy which uses a cookie during the proxy authentication.
I have been completely unable to find a combination of CURLOPT_ settings that will get CURL to present the cookie during proxy authentication.
So: the proxy responds with:
HTTP/1.1 407 Proxy Authentication Required
Proxy-Authenticate: NTLM
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Proxy-Connection: close
Set-Cookie: BCSI-CS-EDD688431754D715=2; Path=/
Connection: close
Content-Length: 825
But curl does not present the cookie in subsequent authentication attempts, no matter what I set for CURLOPT_COOKIEFILE or CURLOPT_COOKIEJAR.
NOTE: I am also using (because I must)
CURLOPT_PROXYTYPE = CURLPROXY_HTTP
CURLOPT_PROXYAUTH = CURLAUTH_ANY
CURLOPT_HTTPPROXYTUNNEL = 1
CURLOPT_CONNECT_ONLY = 1
Is it reasonable to expect CURL to present a cookie with a Proxy-Authorization request?
I am using curl_easy_*, would moving to the multi interface help?
Finally, I am building with 7.19.7
The CONNECT request is done a bit separately in the code than the "regular" requests and it seems there's no cookie handling done there! I consider it a libcurl bug.
(This is my comment from above, turned into a proper answer.)
It is possible to create a tunnel through a Blue Coat Proxy. But my advice is not to use a network with the Blue Coat Proxy. In a free country it should not be a problem to buy a SIM card and use a mobile network instead.
Read more at https://bluecoatproxy.wordpress.com