I'm having a problem allowing users to download items from the Sitecore media library, specifically.
I have a link to a media item (xls, pdf etc) on a page, when a user clicks on the page the file should be downloaded.
This works fine on our test sitecore instance, but when we try it on our live instance, the file starts to download OK but then seems to be truncated. (both instances are located on the same IIS server)
Using Fiddler, I can see that the downloads response body is truncated at 784kb.
HTTP/1.1 200 OK
Cache-Control: public, max-age=604800
Content-Type: application/vnd.ms-excel
Expires: Fri, 25 Mar 2011 14:12:48 GMT
Last-Modified: Fri, 18 Mar 2011 11:17:45 GMT
ETag: 050b2f8a408b47c49fefbe28b5ec9661
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET; Sitecore CMS
X-Powered-By: ASP.NET
Content-Disposition: attachment; filename="Filename.xls"
Date: Fri, 18 Mar 2011 14:12:48 GMT
Content-Length: 804795
(the file is actually 5019136kb!)
IF any body can shed any light, I'd be eternally grateful!
Yours in desperation!
Pete
--UPDATE--
Think I might be getting closer to the cause of this.
By closer examination of the response I'm getting back from the server
Response denied by WatchGuard HTTP proxy.
Reason: chunk-size line too large line='‰u¯^%|\x0c\x04‡V–\x15ÿ\x00¾c*?Jã5]cW×o[R×5K›Ë‡ûóÝÎÒ;}Y‰&«Q]4èQ¥ðE/D‘ÅW\x13ˆ®ïRn^¿Ì(¢ŠÔÄ(¢Š\x00(¢Š\x00(¢Š\x00(¢¾…ý•?àœ¿\x1bi\x19'
Mystery 1 Solved - The reason it was working on the test site was because I wasn't going via the proxy server!
Mystery 2 - Why is the chunk size too large!!!!?
Pete
The first thing I would check is the <httpRuntime maxRequestLength="" /> element/attribute in your web.config file. You'll want to ensure that the 'maxRequestLength' attribute value is set to a value large enough to accommodate the size of the files that you're serving.
Beyond that, are you generating the response headers yourself (i.e. in your code)? For instance, are you explicitly setting the Content-Disposition header and the Content-Length headers? If so, I would suggest verifying that the method you're using to compute the Content-Length is accurate.
Lastly, verify that the IIS configuration is the same between both Sitecore instances. Are you using IIS6, IIS7 or IIS7.5?
cheers,
adam
Have you done a web.config comparison between the two instances?
Have you asked about this on the SDN forums?
This problem sounds so familiar. I'm sure that I've seen it before... Are you sure it's truncating the return? Or is it that there is 16kb of unwanted junk in the header/data? I want to say that's the issue I've seen before... but can't remember for sure. Brain cells are tingling, give me some time.
Related
I am trying to hit this URL https://subdomain.example.com in JMeter and recorded using the Blazemeter Chrome extension has all the necessary config elements but get an error:
HTTP/1.1 429 Too Many Requests
Content-Type: text/html; charset=utf-8
Content-Length: 1031
Connection: keep-alive
Cache-Control: private, no-cache, no-store, must-revalidate
Date: Tue, 20 Aug 2019 01:21:35 GMT
Expires: 0
p3p: CP="This site does not specify a policy in the P3P header"
I have tried coping the Header Cookies from Browser Header Response which works for sometime but then start throwing an error
As per HTTP Status Code 429 Too Many Requests description:
The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").
A Retry-After header might be included to this response indicating how long to wait before making a new request.
So there are following options:
Your server is overloaded, in this case there is nothing you can do here apart from reporting the error as the bottleneck
Your script doesn't have proper correlation implemented, i.e. you're sending recorded hard-coded values instead of getting dynamic parameters
Your server doesn't allow such amount of requests from a single IP address within the given timeframe, you could try implementing IP Spoofing so your server would "think" that the requests are coming from the different machines.
Thanks for your reply. In the end I figured out that no limitation for number of calls implemented.
Now come to answer this is how I managed to work this:
Opened the page in chrome and from the header section copied all the header elements into the header manager hard coded.
First time it fails and returns p3p: CP="This site does not specify a policy in the P3P header" but also return the update variable value needed for next request which I extract and used in the next and subsequent Requests. The way I was able to find out which variable is changing by using the string comparison of 2 Response Headers
This was a difficult one but somehow worked with very minor change I also added the Header Manager to each request for safer side.
I'm not seeing my isomorphic-fetch based XHRs show up in the mini-profiler.
My page response headers:
Content-Type:text/html; charset=utf-8
Date:Fri, 14 Jul 2017 11:23:07 GMT
Server:Kestrel
Transfer-Encoding:chunked
Vary:Accept-Encoding
X-MiniProfiler-Ids:["16d0cc1e-9881-403e-a73c-85103e74a52f","803894bc-219e-4011-92c4-9838d8005827","58ee3691-2e1d-4592-b4b1-a1a2f0eb4b61"]
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcY29kZVxvdGhlclxwbGF5LXNzclxmZXRjaGRhdGFcNQ==?=
My fetch response headers:
Content-Type:application/json; charset=utf-8
Date:Fri, 14 Jul 2017 11:23:19 GMT
Server:Kestrel
Transfer-Encoding:chunked
X-MiniProfiler-Ids:["6bcaaaa2-9ad8-42b1-8123-5c12d22a243e","fdfddce8-fc0f-4106-bbab-8de03b22c2e5","dc24b210-8079-41ee-a231-d84d6d1401e3"]
X-Powered-By:ASP.NET
X-SourceFiles:=?UTF-8?B?QzpcY29kZVxvdGhlclxwbGF5LXNzclxhcGlcU2FtcGxlRGF0YVxXZWF0aGVyRm9yZWNhc3Rz?=
Should I be expecting some type of overlap between the two X-MiniProfiler-Ids?
If so, any suggestions for tracking this down further?
The issue here is we're not listening to the fetch API in general (but are for popular JS frameworks) in the MiniProfiler client-side JS. In effect, we just never observe that header coming back to trigger a fetch on.
I think the best route here would be starting a discussion in a MiniProfiler issue so we can decide the best generic way to support this case. I'm 100% for it, we just need to make sure we don't break anyone in the process.
From 4am this morning. Two of my webjobs that have been running quite happily for months every 2 minutes are now broken. The error is:
Http Action - Response from host
'*******************.scm.azurewebsites.net': 'NotFound' Response
Headers: Pragma: no-cache x-ms-request-id:
d719e8d0-429d-4ba3-86de-a732e54dbd4f Cache-Control: no-cache Date:
Wed, 21 Sep 2016 21:20:01 GMT Set-Cookie:
ARRAffinity=8f119d7b3e71f6a6a4d78b9eebbac59d8f13ae47ad9ddc5efdc9151826e5ad57;Path=/;Domain=********************.scm.azurewebsites.net
Server: Microsoft-IIS/8.0 X-AspNet-Version: 4.0.30319 X-Powered-By:
ASP.NET Body: "No route registered for
'/api/triggeredwebjobs/batch/run%3Farguments=job-steve'"
https://github.com/projectkudu/kudu/wiki/WebJobs-API#invoke-a-triggered-job
http://blog.davidebbo.com/2015/05/scheduled-webjob.html
I am using David Ebbo's solution in the above link and also adding parameters as outlined on the project website.
As we discovered, the root issue was that the '?' in the URL was encoded as %3F, instead of just being ?. Fixing the URL in the scheduler addressed the issue.
What's not clear is what caused it to be that way if it used to work. That could be some kind of scheduler or portal issue. But at least we know it's not something related to WebJob itself.
It looks like you have a mismatch between the name of the WebJob that your scheduler is trying to invoke (batch), and the actual name of your WebJob in your Web App (PyramisBatch). So the error is expected.
Can you change you scheduler to hit the right WebJob?
I had the same issue with my web api app. Restarting the app fixed it.
I have some images that I need to do a HttpRequestMethod.HEAD in order to find out some details of the image.
When I go to the image url on a browser it loads without a problem.
When I attempt to get the Header info via my code or via online tools it fails
An example URL is http://www.adorama.com/images/large/CHHB74P.JPG
As mentioned, I have used the online tool Hurl.It to try and attain the Head request but I am getting the same 403 Forbidden message that I am getting in my code.
I have tried adding many various headers to the Head request (User-Agent, Accept, Accept-Encoding, Accept-Language, Cache-Control, Connection, Host, Pragma, Upgrade-Insecure-Requests) but none of this seems to work.
It also fails to do a normal GET request via Hurl.it. Same 403 error.
If it is relevant, my code is a c# web service and is running on the AWS cloud (just in case the adorama servers have something against AWS that I dont know about). To test this I have also spun up an ec2 (linux box) and run curl which also returned the 403 error. Running curl locally on my personal computer returns the binary image which is presumably just the image data.
And just to remove the obvious thoughts, my code works successfully for many many other websites, it is just this one where there is an issue
Any idea what is required for me to download the image headers and not get the 403?
same problem here.
Locally it works smoothly. Doing it from an AWS instance I get the very same problem.
I thought it was a DNS resolution problem (redirecting to a malfunctioning node). I have therefore tried to specify the same IP address as it was resolved by my client but didn't fix the problem.
My guess is that Akamai (the service is provided by an Akamai CDN in this case) is blocking AWS. It is understandable somehow, customers pay by traffic for CDN, by abusing it, people can generate huge bills.
Connecting to www.adorama.com (www.adorama.com)|104.86.164.205|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 403 Forbidden
Server: **AkamaiGHost**
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 301
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 09:34:20 GMT
Connection: close
2016-03-23 09:34:20 ERROR 403: Forbidden.
I tried that URL from Amazon and it didn't work for me. wget did work from other servers that weren't on Amazon EC2 however. Here is the wget output on EC2
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:42:33-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.40.219.79
Connecting to www.adorama.com|23.40.219.79|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 403 Forbidden
Server: AkamaiGHost
Mime-Version: 1.0
Content-Type: text/html
Content-Length: 299
Cache-Control: max-age=604800
Date: Wed, 23 Mar 2016 08:42:33 GMT
Connection: close
2016-03-23 08:42:33 ERROR 403: Forbidden.
But from another Linux host it did work. Here is output
wget -S http://www.adorama.com/images/large/CHHB74P.JPG
--2016-03-23 08:43:11-- http://www.adorama.com/images/large/CHHB74P.JPG
Resolving www.adorama.com... 23.45.139.71
Connecting to www.adorama.com|23.45.139.71|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Content-Type: image/jpeg
Last-Modified: Wed, 23 Mar 2016 08:41:57 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
ServerID: C01
Content-Length: 15131
Cache-Control: private, max-age=604800
Date: Wed, 23 Mar 2016 08:43:11 GMT
Connection: keep-alive
Set-Cookie: 1YDT=CT; expires=Wed, 20-Apr-2016 08:43:11 GMT; path=/; domain=.adorama.com
P3P: CP="NON DSP ADM DEV PSD OUR IND STP PHY PRE NAV UNI"
Length: 15131 (15K) [image/jpeg]
Saving to: \u201cCHHB74P.JPG\u201d
100%[=====================================>] 15,131 --.-K/s in 0s
2016-03-23 08:43:11 (460 MB/s) - \u201cCHHB74P.JPG\u201d saved [15131/15131]
I would guess that the image provider is deliberately blocking requests from EC2 address ranges.
The reason the wget outgoing ip address is different in the two examples is due to DNS resolution on the cdn provider that adorama are providing
Web Server may implement ways to check particular fingerprint attributes to prevent automated bots . Here a few of them they can check
Geoip, IP
Browser headers
User agents
plugin info
Browser fonts return
You may simulate the browser header and learn some fingerprinting "attributes" here : https://panopticlick.eff.org
You can try replicate how a browser behave and inject similar headers/user-agent. Plain curl/wget are not likely to satisfied those condition, even tools like phantomjs occasionally get blocked. There is a reason why some prefer tools like selenium webdriver that launch actual browser.
I found using another url also being protected by AkamaiGHost was blocking due to certain parts in the user agent. Particulary using a link with protocol was blocked:
Using curl -H 'User-Agent: some-user-agent' https://some.website I found the following results for different user agents:
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0 okay
facebookexternalhit/1.1 (+http\://www.facebook.com/externalhit_uatext.php): 403
https ://bar: okay
https://bar: 403
All I could find for now is this (downvoted) answer https://stackoverflow.com/a/48137940/230422 stating that colons (:) are not allowed in header values. That is clearly not the only thing happening here as the Mozilla example also has a colon, only not a link.
I guess that at least most webservers don't care and allow facebook's bot and other bots having a contact url in their user agent. But appearently AkamaiGHost does block it.
I've done plenty of ASP.NET and PHP development, but I'm less familiar with how to track this sort of thing down in CF. My naive first angle of attack was to search for any reference to Google in any of the source code. No luck.
I'm running the site on IIS7. Google, Bing and Yahoo all apparently "see" nothing on my site.
Update: I ran Fetch as Googlebot and got the following:
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Server: Microsoft-IIS/7.0
Set-Cookie: CFID=1638251;expires=Sat, 14-Apr-2040 15:51:41 GMT;path=/
Set-Cookie: CFTOKEN=35688222;expires=Sat, 14-Apr-2040 15:51:41 GMT;path=/
Set-Cookie: LANGUAGEID=1;expires=Sat, 14-Apr-2040 15:51:41 GMT;path=/
Set-Cookie: CFGLOBALS=urltoken%3DCFID%23%3D1638251%26CFTOKEN%23%3D35688222%23lastvisit%3D%7Bts%20%272010%2D04%2D22%2008%3A51%3A41%27%7D%23timecreated%3D%7Bts%20%272010%2D04%2D22%2008%3A51%3A41%27%7D%23hitcount%3D2%23cftoken%3D35688222%23cfid%3D1638251%23;expires=Sat, 14-Apr-2040 15:51:41 GMT;path=/
X-Powered-By: ASP.NET
Date: Thu, 22 Apr 2010 15:51:40 GMT
Use Google Webmaster Tools "Fetch as Googlebot" (its in labs) to see exactly what your server is returning to Google.
It turned out to be a convoluted application.cfm page.
It turns out it didn't work without cookies. Oh the joys of maintaining an old, rusty website! It's not the type of website (in terms of content and overall purpose) I would have expected to completely fail if cookies were disabled.
Being a newbie to CF, I mistakenly assumed that my simple "example.cfm" would only execute code on that page. I wasn't aware of the application.cfm. I checked for includes and saw nothing. That's when I hunted through the trace using IIS7's Failed Request Tracing capability. By comparing the googlebot request with a normal browser request, I became certain that nothing strange was happening at that level. There wasn't any rouge module being loaded that was messing with my request.