I am trying to implement web service and web client applications using Ruby on Rails 3. For that I am considering to use a SSL but I would like to know: how "heavy" is it for servers to handle a lot of HTTPS connection instead of HTTP? what is the difference of response time and the performance at all?
The cost of SSL/TLS handshake (which takes most of the overall "slowdown" SSL/TLS adds) nowadays is much less than the cost of TCP connection establishment and other actions associated with session establishment (logging, user lookup etc). And if you worry about speed and want to save any ns of time, there exist hardware SSL accelerators that you can install to your server.
It is several times slower to go with HTTPS, however, most of the time that's not what is actually going to slow your app down. Especially if you're running on Rails, your performance scaling is going to be bottlenecked elsewhere in the system. If you are doing anything that requires the passing of secrets of any kind over the wire (including a shared session cookie), SSL is the only way to go and you probably won't notice the cost. If you happen to scale up to the point where you do start to see a performance hit from encryption, there are hardware acceleration appliances out there that help tremendously. However, rails is likely to fall over long before that point.
Related
I've a client who I developed a Rails app for. The app relies on his customers uploading varies of images, files, and pdf size ranges from 1mb to 100mb.
His been telling me that many of his customer are complaining the slowness and unstable upload speed.
I use direct connect to Amazon S3 to handle the upload. I explain to him that it could there are factors that is out of my control in terms of upload speed.
But he insist that there is something we can do to improve upload speed.
I'm running out of ideas and expertise here. Does anyone have a solution?
On the surface, there are two answers -- no, of course there's nothing you can do, the Internet is a best-effort transport, etc., etc.,... and no, there really shouldn't be a problem, because S3 uploads perform quite well.
There is an option worth considering, though.
You can deploy a global network of proxy servers in front of S3 and use geographic DNS to route those customers to their nearest proxy. Then install high-speed, low latency optical circuits from the proxies back to S3, reducing the amount of "unknown" in the path, as well as reducing the round-trip time and packet loss potential between the browser and the chosen proxy node at the edge of your network, improving throughput.
I hope the previous paragraph is amusing on first reading, since it sounds like a preposterously grandiose plan for improving uploads to S3... but of course, I'm referring to CloudFront.
You don't actually have to use it for downloads; you can, if you want, just use it for uploads.
your users can now benefit from accelerated content uploads. After you enable the additional HTTP methods for your application’s distribution, PUT and POST operations will be sent to the origin (e.g. Amazon S3) via the CloudFront edge location, improving efficiency, reducing latency, and allowing the application to benefit from the monitored, persistent connections that CloudFront maintains from the edge locations to the origin servers.
https://aws.amazon.com/blogs/aws/amazon-cloudfront-content-uploads-post-put-other-methods/
To illustrate that the benefit here does have a solid theoretical basis...
Back in the day when we still used telnet, when T1s were fast Internet connections and 33.6kbps was a good modem, I discovered that I had far better responsiveness from home, making a telnet connection to a distant system, if I first made a telnet connection to a server immediately on the other side of the modem link, then make a telnet connection to the distant node from within the server.
A direct telnet connection to the distant system followed exactly the same path, through all the same routers and circuits, and yet, it was so sluggish as to be unusable. Why the stark difference, and what caused the substantial improvement?
The explanation was that making the intermediate connection to the server meant there were two independent TCP connections, with only their payload tied together: me to the server... and the server to the distant system. Both connections were bad in their own way -- high latency on my modem link, and congestion/packet loss on the distant link (which had much lower round-trip times, but was overloaded with traffic). The direct connection meant I had a TCP connection that had to recover from packet loss while dealing with excessive latency. Making the intermediate connection meant that the recovery from the packet loss was not further impaired by the additional latency added by my modem connection, because the packet loss was handled only on the 2nd leg of the connection.
Using CloudFront in front of S3 promises to solve the same sort of problem in reverse -- improving the responsiveness, and therefore the throughput, of a connection of unknown quality by splitting the TCP connection into two independent connections, at the user's nearest CloudFront edge.
I have a large number of machines (thousands and more) that every X seconds would perform an HTTP request to a Jetty server to notify they are alive. For what value of X should I use persistent HTTP connections (which limits number of monitored machines to number of concurrent connections), and for what value of X the client should re-establish a TCP connection (which in theory would allow to monitor more machines with the same Jetty server).
How would the answer change for HTTPS connections? (Assuming CPU is not a constraint)
This question ignores scaling-out with multiple Jetty web servers on purpose.
Update: Basically the question can be reduced to the smallest recommended value of lowResourcesMaxIdleTime.
I would say that this is less of a jetty scaling issue and more of a network scaling issue, in which case 'it depends' on your network infrastructure. Only you really know how your network is laid out and what sort of latencies are involved in order to come up with a value of X.
From an overhead perspective the persistent HTTP connections will of course have some minor effect (well I say minor but depends on your network) and the HTTPS will again have a larger impact....but only from a volume of traffic perspective since you are assuming CPU is not a constraint.
So from a jetty perspective, it really doesn't need to be involved in the question, you seem to ultimately be asking for help optimizing bytes of traffic on the wire so really you are looking for the best protocol at this point. Since with HTTP you are having to mess with headers for each request you may be well served looking at something like spdy or websocket which will give you persistent connections but are optimized for low round trip network overhead. But...they seem sort of overkill for a heartbeat. :)
How about just make them request at different time? Assume first machine request, then you pick a time to response to that machine as the next time to heart beat of that machine (also keep the id/time at jetty server), the second machine request, you can pick another time to response to second machine.
In this way, you can make each machine perform heart beat request at different time so no concurrent issue.
You can also use a random time for the first heart beat if all machines might start up at the same time.
When deploying web applications, a common approach is to implement the actual application logic as a series of services and expose them via HTTP, then put some front ends between the services and the users. Typically those front ends will take care of things like SSL, session data, load balancing, routing requests, possibly caching, and so on.
AFAIK, that usually means that every time a new request comes in, one or more new requests must be made to one or more of the backends: each one with its TCP handshake, HTTP overhead, etc.
Doesn't that additional connections add a measurable latency and/or performance hit?
If so, what techniques are in common practice to get the best performance from those kind of deployments?
Latency on a local connection will be minimal - single digit milliseconds at most probably. There will be some occupancy overhead for extra HTTP sessions, but then its spread out among different apps.
The advantage of the approach you describe is that is distributes the load amongst different apps so you can have lots of front-end bits doing heavy lifting like SSL and have fewer backend apps that handle more sessions. And you can pick and mix what apps you need.
A single monolithic app will probably be a bit faster until it runs out of capacity at which point you have a problem because its hard to scale up.
I've got a short-lived client process that talks to a server over SSL. The process is invoked frequently and only runs for a short time (typically for less than 1 second). This process is intended to be used as part of a shell script used to perform larger tasks and may be invoked pretty frequently.
The SSL handshaking it performs each time it starts up is showing up as a significant performance bottleneck in my tests and I'd like to reduce this if possible.
One thing that comes to mind is taking the session id and storing it somewhere (kind of like a cookie), and then re-using this on the next invocation, however this is making me feel uneasy as I think there would be some security concerns around doing this.
So, I've got a couple of questions,
Is this a bad idea?
Is this even possible using OpenSSL?
Are there any better ways to speed up the SSL handshaking process?
After the handshake, you can get the SSL session information from your connection with SSL_get_session(). You can then use i2d_SSL_SESSION() to serialise it into a form that can be written to disk.
When you next want to connect to the same server, you can load the session information from disk, then unserialise it with d2i_SSL_SESSION() and use SSL_set_session() to set it (prior to SSL_connect()).
The on-disk SSL session should be readable only by the user that the tool runs as, and stale sessions should be overwritten and removed frequently.
You should be able to use a session cache securely (which OpenSSL supports), see the documentation on SSL_CTX_set_session_cache_mode, SSL_set_session and SSL_session_reused for more information on how this is achieved.
Could you perhaps use a persistent connection, so the setup is a one-time cost?
You could abstract away the connection logic so your client code still thinks its doing a connect/process/disconnect cycle.
Interestingly enough I encountered an issue with OpenSSL handshakes just today. The implementation of RAND_poll, on Windows, uses the Windows heap APIs as a source of random entropy.
Unfortunately, due to a "bug fix" in Windows 7 (and Server 2008) the heap enumeration APIs (which are debugging APIs afterall) now can take over a second per call once the heap is full of allocations. Which means that both SSL connects and accepts can take anywhere from 1 seconds to more than a few minutes.
The Ticket contains some good suggestions on how to patch openssl to achieve far FAR faster handshakes.
Inside of large companies, is it standard practice to use SSL (e.g. https) for running corporate apps over the LAN. I am thinking of ERP systems, SFA systems, HR systems, etc. But I am also thinking of SOA...web service providers and consumers.
In other words, is there any concern that something on the LAN could be sniffing plaintext info going around? If not SSL, how is this security threat dealt with?
What's your experience?
Inside of large companies, is it standard practice to use SSL (e.g. https) for running corporate apps over the LAN.
Generally SSL for LAN only internal applications is not common practice. Historically the LAN has been viewed as a "trusted" network, and so SSL for LAN apps hasn't been a priority.
Also, connection to internal application servers is usually via an authenticated proxy, which in itself mitigates some of the risk.
This is, slowly, changing however as organisations (generally) increasingly treat the LAN with less trust.
If not SSL, how is this security threat dealt with?
Most enterprises do monitor what is attached to their LAN and record events when new devices are added.
If the device doesn't correspond to something planned (i.e a new desktop or printer) - then it is investigated.
Unauthorised devices are seen as a much greater risk (than not using SSL) because they pose additional threats, like introducing a virus, an external network connection, or some other kind of attack vector.
It really depends on what you consider a "large company". The company I work at has over 50,000 employees; thus our corporate network is really not a great deal more trustable than the Internet.
We do use SSL on corporate Intranet web applications. We have our own internal CA certificate installed on all corporate PCs, so we can issue our own internal SSL certificates in-house.
Unfortunately, no it's not standard practice.
What's done and what should be done are not necessarily the same here...
Without a doubt any system with confidential information should be secured, especially on a LAN, as that's where most attacks originate - disgruntles employees etc etc.
unfortunately, it's often not the case.
Yep, pretty standard practice in a lot of places I've seen.
I think the reasons why should be obvious:
Extra security against common attacks
Pretty much no reason not to
Inside of large companies, is it standard practice to use SSL (e.g. https) for running corporate apps over the LAN. I am thinking of ERP systems, SFA systems, HR systems, etc. But I am also thinking of SOA...web service providers and consumers.
I would feel very uncomfortable if such apps weren't secured. In many place I've worked, they were. In some other, they weren't and I consider this as unprofessional.
In other words, is there any concern that something on the LAN could be sniffing plaintext info going around?
For me, the answer is obviously YES.
If not SSL, how is this security threat dealt with?
One Time Password (with RSA SecureID).
I wonder if one of the problems is that going to SSL always seems just a bit more complicated than it should be. If one could enable SSL with a single switch without having to worry about certificates perhaps at least the encryption part could become default.
Obviously you wouldn't get endpoint authentication without taking the extra step of setting up certificates, but then at least there would be truly no reason to go without encryption.