download large file from Jetty (ambari webhdfs) is slow

download large file from Jetty (ambari webhdfs) is slow - jetty

I have a file about 5G, download from hdfs using python client at 12M/s, buy my network could reach 500M/s, and smaller file work fine. Then I reproduced this problem with curl.
Here is curl debug log:
curl -v -X GET http://x.x.x.x/file
> GET /webhdfs/v1/user/sohuvideo/online/srcFile/188/718/188718791/dat1_188718791_2020_4_11_17_4_172647e6e60.mp4?op=OPEN&user.name=sohuvideo&namenoderpcaddress=sotocyon&offset=0 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: x.x.x.com:50075
> Accept: */*
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Expires: Tue, 21 Apr 2020 03:01:26 GMT
< Date: Tue, 21 Apr 2020 03:01:26 GMT
< Pragma: no-cache
< Expires: Tue, 21 Apr 2020 03:01:26 GMT
< Date: Tue, 21 Apr 2020 03:01:26 GMT
< Pragma: no-cache
< Content-Type: application/octet-stream
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Origin: *
< Transfer-Encoding: chunked
< Server: Jetty(6.1.26)
<
{ [data not shown]
100 119M 0 119M 0 0 13.0M 0 --:--:-- 0:00:09 --:--:-- 12.1M^C
After some digging, I found if attach header Connection: close to the request, it could end up much faster.
curl -v -H "Connection: close" -X GET http://x.x.x.x/file
> GET /webhdfs/v1/user/sohuvideo/online/srcFile/188/718/188718791/dat1_188718791_2020_4_11_17_4_172647e6e60.mp4?op=OPEN&user.name=sohuvideo&namenoderpcaddress=sotocyon&offset=0 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: x.x.x.com:50075
> Accept: */*
> Connection: close
>
< HTTP/1.1 200 OK
< Cache-Control: no-cache
< Expires: Tue, 21 Apr 2020 03:00:13 GMT
< Date: Tue, 21 Apr 2020 03:00:13 GMT
< Pragma: no-cache
< Expires: Tue, 21 Apr 2020 03:00:13 GMT
< Date: Tue, 21 Apr 2020 03:00:13 GMT
< Pragma: no-cache
< Content-Type: application/octet-stream
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Origin: *
< Connection: close
< Server: Jetty(6.1.26)
<
{ [data not shown]
100 4517M 0 4517M 0 0 138M 0 --:--:-- 0:00:32 --:--:-- 153M
* Closing connection 0
I think this probably caused by Transfer-Encoding: chunked from server when file is large, server chose this because when server transfer the file the file size has not yet be decided, chunked stream could give a lots of overhead. If given Connection: close then server would not use Transfer-Encoding: chunked to indicate the end of steam, just close the connection instead.
Is there any way to fix this from server side?

Related

Getting GSSException: Defective token detected error while calling HDFS API on a kerberised cluster

I have a kerberised CDH v5.14 cluster with 3 nodes.I trying to call the HDFS API using python as below
baseurl = "http://<host_name>:50070/webhdfs/v1/prod/?op=LISTSTATUS"
__, krb_context = kerberos.authGSSClientInit("HTTP/<host_name>")
#kerberos.authGSSClientStep(krb_context, "")
negotiate_details = kerberos.authGSSClientResponse(krb_context)
headers = {"Authorization": "Negotiate " + str(negotiate_details)}
r = requests.get(baseurl, headers=headers)
print r.status_code
The below error is returned
GSSException: Defective
token detected (Mechanism level: GSSHeader did not find the right tag)
HTTP ERROR 403
But the same works fine when I run it using curl
curl -i --negotiate -u: http://<host_name>:50070/webhdfs/v1/prod/?op=LISTSTATUS
HTTP/1.1 401 Authentication required Cache-Control:
must-revalidate,no-cache,no-store Date: Wed, 30 May 2018 02:50:04 GMT
Pragma: no-cache Date: Wed, 30 May 2018 02:50:04 GMT Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1 X-FRAME-OPTIONS:
SAMEORIGIN WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=;
Path=/; HttpOnly Content-Length: 1409
HTTP/1.1 200 OK Cache-Control: no-cache Expires: Wed, 30 May 2018
02:50:04 GMT Date: Wed, 30 May 2018 02:50:04 GMT Pragma: no-cache
Expires: Wed, 30 May 2018 02:50:04 GMT Date: Wed, 30 May 2018 02:50:04
GMT Pragma: no-cache Content-Type: application/json X-FRAME-OPTIONS:
SAMEORIGIN WWW-Authenticate: Negotiate
YGYGCSqGSIb3EgECAgIAb1cwVaADAgEFoQMCAQ+iSTBHoAMCAReiQAQ+6Seu0SSYGmoqN4hdykSQ55ZcP+juBO/jk8/BGjoK5NCmdlBRFPMSbCZXvVjNHLg9iPACGvM8V0jqXTM5UfQ=
Set-Cookie:
hadoop.auth="u=XXXX&p=XXXX#HOSTNAME&t=kerberos&e=1527684604664&s=tVsrEsDMBGV0To8hOPp8mLxyiSo=";
Path=/; HttpOnly Transfer-Encoding: chunked
and it gives the correct response, what am I missing here? Any help is appreciated.

Can't adjust buffer to fit data

I am trying to make an HTTP request the with EtherCard library, then get the full response. Using the code from the examples, I'm only able to capture the headers, which are then abruptly cut off. The issue seems to be that I can't make the buffer big enough to store the data, but the data, hence why it's cut off. But it's only 292 bytes.
Here is another question I asked trying to understand what the example code was doing: What is happening in this C/Arduino code?
Here is the data I'm trying to GET: http://jsonplaceholder.typicode.com/posts/1
String response;
byte Ethernet::buffer[800]; // if i raise this to 1000, response will be blank
static void response_handler (byte status, word off, word len) {
Serial.println("Response:");
Ethernet::buffer[off + 400] = 0; // if i raise 400 much higher, response will be blank
response = String((char*) Ethernet::buffer + off);
Serial.println(response);
}
See the comments above for what I've attempted.
Here is the output from the code above:
Response:
HTTP/1.1 404 Not Found
Date: Fri, 20 Jan 2017 12:15:19 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 2
Connection: close
Set-Cookie: __cfduid=d9714bd94284b999ceb0e87bc91705d501484914519; expires=Sat, 20-Jan-18 12:15:19 GMT; path=/; domain=.typicode.com; HttpOnly
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no
As you can see, it's not the complete data, only some of the headers.

There are several problems here:
1) You get a HTTP 404 response, which means the resource was not found on the server. So you need to check your request.
2) You are cutting off the string at pos 400:
Ethernet::buffer[off + 400] = 0; // if i raise 400 much higher, response will be blank
That's why it stops after Cache-Control: no, which is exactly 400 bytes (byte 0-399).
You probably want Ethernet::buffer[off + len] = 0;, but you also need to check if that is not out of bounds (i.e. larger than your buffer size - that's probably why you get a 'blank' response).
For example, a 404 response from that server looks like this:
HTTP/1.1 404 Not Found
Date: Mon, 23 Jan 2017 07:00:00 GMT
Content-Type: application/json; charset=utf-8
Content-Length: 2
Connection: keep-alive
x-powered-by: Express
Vary: Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: no-cache
Pragma: no-cache
Expires: -1
x-content-type-options: nosniff
Etag: W/"2-mZFLkyvTelC5g8XnyQrpOw"
Via: 1.1 vegur
CF-Cache-Status: MISS
Server: cloudflare-nginx
CF-RAY: 32595301c275445d-xxx
{}
and the 200 response headers (from a browser):
HTTP/1.1 200 OK
Date: Mon, 23 Jan 2017 07:00:00 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
x-powered-by: Express
Vary: Accept-Encoding
Access-Control-Allow-Credentials: true
Cache-Control: public, max-age=14400
Pragma: no-cache
Expires: Mon, 23 Jan 2017 10:59:01 GMT
x-content-type-options: nosniff
Etag: W/"124-yv65LoT2uMHrpn06wNpAcQ"
Via: 1.1 vegur
CF-Cache-Status: HIT
Server: cloudflare-nginx
CF-RAY: 32595c4ff39b445d-xxx
Content-Encoding: gzip
So your buffer needs to be big enough to hold both the response headers and the data.
3) In the 200 response we see 2 things: the transfer is chunked, and gzipped (but the latter only happens when there is a Accept-Encoding: gzip header in the request.
The easiest way to handle this is to send a HTTP/1.0 request instead of HTTP/1.1 (chunked transfer and gzip are not allowed/available in HTTP/1.0).

fail2ban regex rule

just for test, I would like to block all traffic to my website not coming from Android browser using fail2ban.
This is the string in the log file:
GET http://www.aaaaa.com/video/09_12_2014_spot_app.mp4 - ORIGINAL_DST/171.171.171.171 video/mp4
[
User-Agent: stagefright/1.2 (Linux;Android 5.0)
Cookie: _gat=1; _ga=GA1.2.909922659.1455111791
Range: bytes=705201-
Connection: Keep-Alive
Accept-Encoding: gzip
Host: www.aaaaa.com
]
[
HTTP/1.1 206 Partial Content
Date: Thu, 26 May 2016 15:27:16 GMT
Server: Apache/2.2.15 (CentOS)
Last-Modified: Tue, 09 Dec 2014 19:55:17 GMT
ETag: "2b739f-ec2b1-509cdec1610e2"
Accept-Ranges: bytes
Content-Length: 262144
Content-Range: bytes 705201-967344/967345
Connection: close
Content-Type: video/mp4
]
Any help? Thank you in advance!

cURL connection closes on http status 200(OK)

I'm trying to connect server and get response with curl but when data starting transfer curl drop connection. Can anyone say me whats going wrong ?
< HTTP/1.1 200 OK
* Server nginx is not blacklisted
< Server: nginx
< Date: Sat, 15 Feb 2014 02:08:27 GMT
< Content-Type: application/json; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Status: 200 OK
< X-UA-Compatible: IE=Edge,chrome=1
< Cache-Control: max-age=0, private, must-revalidate
< ETag: "e283c3fc75aa2172d77a717ecbb49b41"
< Set-Cookie: remember_token=BAhbB2kCyVsiRTUxMmI2MzQwOTk5MGU2ZDU3NmMxMmRjY2UxZTI0ODgzNzJmOGRlNzFlNzZjMWExZWM3NTA3OWJjY2E0NmQ5Mjc%3D--1e016eb43ab4803ac65211ddd383ddeeaf9b53f2; path=/; expires=Wed, 15-Feb-2034 02:08:26 GMT
< Set-Cookie: _gameserver2250_session=BAh7BiIPc2Vzc2lvbl9pZCIlYzIxNGNlNTExNjYwNDg0MTU0YWJkOGQ2ZWI1ZDI3ZTk%3D--f9c25e0cd8d94283b01056e9aeff46a1feb48d6b; path=/; HttpOnly
< X-Runtime: 1.249006
< Content-Encoding: gzip
<
* Recv failure: Connection reset by peer
* Closing connection 0

writing proper "HEAD" and "GET" request in winsock c++

Actually I was coding for downloading the files in HTTP using winsock c++ and to get the details I fired "HEAD" header..
(this is what actually I did)
HEAD /files/ODBC%20Programming%20in%20C%2B%2B.pdf HTTP/1.0
Host: devmentor-unittest.googlecode.com
Response was:
HTTP/1.0 404 Not Found
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=feeed8106df5e5f1:TM=1370157208:LM=1370157208:S=10bN4nrXqkcCDN5n; expires=Tue, 02-Jun-2015 07:13:28 GMT; path=/; domain=devmentor-unittest.googlecode.com
X-Content-Type-Options: nosniff
Date: Sun, 02 Jun 2013 07:13:28 GMT
Server: codesite_downloads
Content-Length: 974
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
But if I do:
GET /files/ODBC%20Programming%20in%20C%2B%2B.pdf HTTP/1.0
Host: devmentor-unittest.googlecode.com
The file sucessfully gets downloaded....
After then after I download, again if I fire the HEAD request... it also brings up the following
HTTP/1.0 200 OK
Content-Length: 320381
Content-Type: application/pdf
Content-Disposition: attachment; filename="ODBC Programming in C++.pdf"
Accept-Ranges: bytes
Date: Sun, 02 Jun 2013 05:47:11 GMT
Last-Modified: Sun, 11 Nov 2007 03:17:59 GMT
Expires: Sun, 09 Jun 2013 05:47:11 GMT
Cache-Control: public, max-age=604800
Server: DFE/largefile
//something like this.....
Question: why "HEAD" is returning the false "error not found" at first but the file gets downloaded in using "GET" and after downloading "HEAD" also returns goodies i need...where have i mistaken..
The file I am trying to download here is "http://devmentor-unittest.googlecode.com/files/ODBC%20Programming%20in%20C%2B%2B.pdf" (just for example)

The problem is not on your end. Google Code simply does not implement HEAD correctly. This was reported 5 years ago and is still an open issue:
Issue 660: support HTTP HEAD method for file download urls

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

download large file from Jetty (ambari webhdfs) is slow - jetty

Related

Getting GSSException: Defective token detected error while calling HDFS API on a kerberised cluster

Can't adjust buffer to fit data

fail2ban regex rule

cURL connection closes on http status 200(OK)

writing proper "HEAD" and "GET" request in winsock c++

Categories

Resources