A have a Django app using the built-in settings called ALLOWED_HOSTS that whitelists request Host headers. This is needed as Django uses the Host header provided by the client to construct URLs in certain cases.
ALLOWED_HOSTS=djangoapp.com,subdomain.djangoapp.com
I made ten requests with a fake host header (let's call it fakehost.com) to the Django endpoint: /example.
curl -i -s -k -X $'GET' \
-H $'Host: fakehost.com' -H $'Accept-Encoding: gzip, deflate' -H $'Accept: */*' -H $'Accept-Language: en' -H $'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36' -H $'Connection: close' \
$'https://subdomain.djangoapp.com/example'
In the application logs I see the django.security.DisallowedHost error was raised ten times. However, according to the logs of fakehost.com, it did receive one request for /example.
As I understand, this is a server-side request forgery (SSRF) vulnerability as the Django server can be made to make requests to an arbitrary URL.
It makes debugging hard and is strange that the issue doesn't occur consistently. Also strange that the fake host seems to be recognised by Django, but one request still somehow reached fakehost.com.
Does anyone have any ideas what I could investigate further in order to fix the apparent vulnerability in this Django app? Is the problem potentially on the server level not the application level?
Related
I have recently created an API for internal use in my company. Only my colleagues and I have the URL.
From a few days ago, I detected that random requests where occuring to a given method of the API (less than once per day), so I logged accesses to that method and this is what I am getting:
2017-06-18 17:10:00,359 INFO (default task-427) 85.52.215.80 - Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon
2017-06-20 07:25:42,273 INFO (default task-614) 85.52.215.80 - Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon
The request to the API is performed with the full set of parameters (I mean, it's not just to the root of the webservice)
Any idea of what could be going on?
I have several thesis:
A team member that has a browser tab with the method request URL open, that reloads every time he opens the browser. --> This is my favourite, but everybody claims not their fault
A team member having the service URL (with all parameters) in their browser History, with the browser randomly querying it to retrieve the favicon
A team member having the service service URL (with all parameters) in their browser Favourites/Bookmarks, with the browser randomly querying it to retrieve the favicon
Since the UserAgent (Google Favicon) seems to suggest one of the two latter options, the IP (located near our own city, with Orange Spain ISP) seem to suggest the first option: After a quick search on the Internet, I've found that everybody that is having such requests seem to have a California's Google IP.
I know I could just block that User Agent or IP, but I'd really would like to get to the bottom of this issue.
Thanks!
Edit:
Now I am getting User Agents as:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/41.0.2272.118 Safari/537.36
as well :/
Both of these User Agents are associated with Google's Fetch and Render tool in Google Search Console. These user agents make request upon asking Google to Fetch and Render a given page for SEO validation. This really does not make sense considering you are asking about an API and not a page. But perhaps it is because a page that was submitted to the Fetch and Render service called the API?
Specifically, I'm trying to scrape this entire page, but am only getting a portion of it. If I use:
r = requests.get('http://store.nike.com/us/en_us/pw/mens-shoes/7puZoi3?ipp=120')
it only gets the "visible" part of the page, since more items load as you scroll downwards.
I know there are some solutions in PyQT such as this, but is there a way to have python requests continuously scroll to the bottom of a webpage until all items load?
You could monitor page network activity with browser development console (F12 - Network in Chrome) to see what request does the page do when you scroll down, use that data and reproduce the request with requests. As an alternative, you can use selenium to control a browser programmatically to scroll down until page is ended, then save its HTML.
I guess I found the right request
Request URL:http://store.nike.com/html-services/gridwallData?country=US&lang_locale=en_US&gridwallPath=mens-shoes/7puZoi3&pn=3
Request Method:GET
Status Code:200 OK
Remote Address:87.245.221.98:80
Request Headers
Provisional headers are shown
Accept:application/json, text/javascript, */*; q=0.01
Referer:http://store.nike.com/us/en_us/pw/mens-shoes/7puZoi3?ipp=120
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
X-NewRelic-ID:VQYGVF5SCBAJVlFaAQIH
X-Requested-With:XMLHttpRequest
Seems that query parameter pn means the current "subpage". But you still need to understand the response correctly.
I am trying to serve all assets through cloudfront on my Rails 4.2 app hosting on Heroku. I have been successful using cloudfront before using heroku apps that have somename.herokuapp.com, but this one has a custom domain and a wildcard SSL. I cannot get any of the assets to serve, they all have 403
I have tried uploading my SSL on AWS & also tried using the Default CloudFront Certificate (*.cloudfront.net) (which works for my non-custom domain apps).
I have made sure my SSL is in the region on AWS that AWS wants it to be (N.Virginia).
I have made sure I'm only using HTTP/1.1, HTTP/1.0.
I have made sure my distribution is 'enabled'.
My SSL is a wildcard so it looks like this '*.mydomain.com'
When I uploaded it to AWS and added it to my distribution, I see that it is in use.
I have made sure that my aws_id/aws_key are valid. However, there is some sort of Cloudfront key/pair, but I don't know where I would put that on my site, I only have ENV variables for aws_id/aws_secret_key.
Request URL:https://mycloudfrontdistn.cloudfront.net/assets/subfolder/secondfolder/gift-52db27eb2ced10800db38fbd74ec2ef40704d8c55d49b2654f7fe014e4bd1eff.png
Request Method:GET
Status Code:403 Forbidden
Remote Address:REDACTED (I don't know if this is sensitive)
Response Headers
view source
Connection:keep-alive
Content-Length:146
Content-Type:text/xml
Date:Fri, 04 Nov 2016 23:06:47 GMT
Server:CloudFront
Via:1.1 somebignumber.cloudfront.net (CloudFront)
X-Amz-Cf-Id:myAmazonKey==
X-Cache:Error from cloudfront
Request Headers
view source
Accept:image/webp,image/*,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Host:mycloudfrontdist.cloudfront.net
Referer:https: https//www.mysite.com
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36
This error occurs if you enable Restrict Viewer Access on the cache behavior on the distribution, but you are not actually using CloudFront signed URLs because the content is public.
If you want requests for objects that match the PathPattern for this cache behavior to use public URLs, choose No.
I have a web service that contains a method DoWork(). This method will be used to retrieve data from a database and pass the data back to the caller in JSON format.
[OperationContract]
[WebInvoke(Method = "GET", UriTemplate = "doWork")]
public Stream DoWork()
{
return new MemoryStream(Encoding.UTF8.GetBytes("<html><body>WORK DONE</body></html>"));
}
I have composed a fiddler request just to verify my method is available.
if I execute this from fiddler, the method in my web service gets called. But I can't seem to figure out how to construct a cURL command that will do the same thing.
Perhaps the easiest approach to this is having Chrome create that curl command line for you, especially when the request involves many headers and complicated POST data.
Open the developer tools by pressing F12 and going to Network. Then run whatever call you want to monitor.
(In my example you can see what happens when you open questions here on stack overflow)
Then right click on the relevant line and select copy as cURL (cmd) if you are on Windows (for Linux use the other)
This will give you a command line similar to this:
curl "http://stackoverflow.com/questions" -H "Accept-Encoding: gzip, deflate, sdch" -H "Accept-Language: de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Referer: ..." -H "Cookie: ..." -H "Connection: keep-alive" --compressed
If you experience problems you should add -v to see more details, for a detailled explanation of the commands you can see the manual.
Perhaps all you need to add to your already existing curl command line are those browser specific headers (User-Agent, Accept, ...)
I realize this is more of a server question (since all media requests bypass Django via NGINX), but I want to know how others Django Developers have been doing this, more so than I care to understand only the specifics of how to do it in NGINX. I don't care about the bandwidth of HTML page requests via Django; only the bandwidth of static media files. Are those of you out there using Django and its DB to do this, or are you using web server-specific methods to do this? If the latter is the case, I'll head over to ServerFault.
I want to do this so I can measure the bandwidth usage on a per-subdomain (or similar method) basis.
Sorry about non-django approach but as we speak about static files that in good practice get passed through without ever hitting the wsgi or whathaveyou.
Apache access logs have request size in them, so what you could do is grep out your media files and directories (cat access_log|grep "/images/\|/media/thumbs/\|jpg) and parse/sum that number with regexp and/or awk. Here's example access log entry (45101 being the file size):
10.0.0.123 - - [09/Sep/2010:13:30:05 -0400] "GET /media/images/mypic.jpg HTTP/1.1" 200 45101 "http://10.0.0.123/myapp" "Mozilla/5.0 (Windows; U; Windows
NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 Firefox/3.5.11"
That should get you going..