How to make a superfast webserver for "check for updates"?

How to make a superfast webserver for "check for updates"? - web-services

Which is the best approach for creating a fast response in case a client application asks webserver for "check for updates".
Skype for example takes about 1 second to answer. How to achieve the same?

I assume you are running one or more web servers and one or more back-end servers (with business logic).
One possible approach that I have seen: keep a change counter in webserver and when the back-end state changes, let the business logic notify all webservers with new change counter value.
Each web browser polls regularly the webserver for counter value and compares the value to the previous value. In case old_value != new_value, the web browser goes and asks the webserver for new content.
This allows the regular polling to be super-fast (1ms) and cheap. And only if something has really changed the browser will ask for more resource-expensive content generation.
The other option would be to use some asynchronous HTTP magic (cometd) but the approach outlined above is simpler, more understandable and easier to troubleshoot.

The simple approach is to just have a flat text or XML file on the server, containing the details of the most recent version. The client app fetches it via http GET, compares the version, and reacts accordingly. The http server is simply returning a small file, which is what http servers are designed to do. You should be able to handle hundreds of requests per second this way.

Use a large, distributed systems, depending on the number of your users. Put your web server(s) closer to clients, avoiding longer latencies. Use cluster and load balancing software to enhance performance. Use reverse proxies to cache data.
But is is really important that a "check for updates" is that fast? You can also check in a background thread. I would improve performance for other tasks.

Related

What is the modern programming standard for synchronizing data between a web service and a client?

The question is a little general, so to help narrow the focus, I'll share my current setup that is motivating this question. I have a LAMP web service running a RESTful API. We have two client implementations: one browser-based javascript client (local storage store) and one iOS-based client (core data store). Obviously these two clients store data very differently, but the data itself needs to be kept in two-way sync with the remote server as often as possible.
Currently, our "sync" process is a little dumb (as in, non-smart). Conceptually, it looks like:
Client periodically asks the server for ALL of the most-recent data.
Server sends down the remote data, which overwrites the current set of local data in the client's store.
Any local creates/updates/deletes after this point are treated as gold, and immediately sent to the server.
The data itself is stored relationally, and updated occasionally by client users. The clients in my specific case don't care too much about the relationships themselves (which is why we can get away with local storage in the browser client for now).
Obviously this isn't true synchronization. I want to move to a system where, conceptually, a "diff" of the most recent changes are sent to the server periodically, and the server sends back a "diff" of the most recent changes it knows about. It seems very difficult to get to this point, but maybe I just don't understand the problem very well.
REST feels like a good start, but REST only talks about the way two data stores talk to each other, not how the data itself is synchronized between them. (This sync process is left up to the implementer of each store.) What is the best way to implement this process? Is there a modern set of programming design patterns that apply to inform a specific solution to this problem? I'm mostly interested in a general (technology agnostic) approach if possible... but specific frameworks would be useful to look at too, if they exist.

Multi-master replication is always (and will always be) difficult and bespoke, because how conflicts are handled will be specific to your application.
IMO A more robust approach is to use Master-slave replication, with your web service as the master and the clients as slaves. To keep the clients in sync, use an archived atom feed of the changes (see event sourcing) as per RFC5005. This is the closest you'll get to a modern standard for this type of replication and it's RESTful.
When the clients are online, they do not update their replica directly, instead they send commands to the server and have their replica updated via the atom feed.
When the clients are offline things get difficult. Your clients will need to have a model of how your web service behaves. It will need to have an offline copy of your replica, which should be copied on write from the online replica (the online replica is the one that is updated by the atom feed). When the client executes commands that modify the data, it should store the command (for later replay against the web service), the expected result (for verification during replay) and update the offline replica.
When the client goes back online, it should replay the commands, compare the result with the expected result and notify the client of any variances. How these variances are handled will vary based on your application. The offline replica can then be discarded.

CouchDB replication works over HTTP and does what you are looking to do. Once databases are synced on either end it will send diffs for adds/updates/deletes.
Couch can do this with other Couch machines or with a mobile framework like TouchDB.
https://github.com/couchbaselabs/TouchDB-iOS
I've done a fair amount of it, but you can always set up CouchDB on one machine, set up TouchDB on a mobile device and then watch the HTTP traffic go back and forth to get an idea of how they do it.
Or read this: http://guide.couchdb.org/draft/replication.html
Maybe something from the link above will help you get an idea of how to do your own diffs for your REST service. (Since they are both over HTTP thought it could be useful.)

You may want to look into the Dropbox Datastore API:
https://www.dropbox.com/developers/datastore
It sounds like it might be a very good fit for your purposes. They have iOS and javascript clients.

Lately, I've been interested in Meteor.
The platform sets up Mongo on the server and minimongo in the browser. The client subscribes to some data and when that data changes, the platform automatically sends down the new data to the client.
It's a clever solution to the syncing problem, and it solves several other problems as well. It will be interesting to see if more platforms do this in the future.

Why are RESTful Applications easier to scale

I always read that one reason to chose a RESTful architecture is (among others) better scalability for Webapplications with a high load.
Why is that? One reason I can think of is that because of the defined resources which are the same for every client, caching is made easier. After the first request, subsequent requests are served from a memcached instance which also scales well horizontally.
But couldn't you also accomplish this with a traditional approach where actions are encoded in the url, e.g. (booking.php/userid=123&travelid=456&foobar=789).

A part of REST is indeed the URL part (it's the R in REST) but the S is more important for scaling: state.
The server end of REST is stateless, which means that the server doesn't have to store anything across requests. This means that there doesn't have to be (much) communication between servers, making it horizontally scalable.
Of course, there's a small bonus in the R (representational) in that a load balancer can easily route the request to the right server if you have nice URLs, and GET could go to a slave while POSTs go to masters.

I think what Tom said is very accurate, however another problem with scalability is the barrier to change upon scaling. So, one of the biggest tenants of REST as it was intended is HyperMedia. Basically, the server will own the paths and pass them to the client at runtime. This allows you to change your code without breaking existing clients. However, you will find most implementations of REST to simply be RPC hiding behind the guise of REST...which is not scalable.

"Scalable" or "web scale" is one of the most abused terms when it comes to the web, the cloud and REST, and mainly used to convince management to get their support for moving their development team on board the REST train.
It is a buzzword that holds no value. If you search the web for "REST scalability" you'll find a lot of people parroting each other without any concrete evidence.
A REST service is exactly equally scalable as a service exposed over a SOAP interface. Both are just HTTP interfaces to an application service. How well this service actually scales depends entirely on how this service was actually implemented. It's possible to write a service that cannot scale as all in both REST and SOAP.
Yes, you can do things with SOAP that makes it scale worse, like rely on state and sessions. SOAP out of the box does not do this. This requires you to use a smarter load balancer, which you want anyway if you're really concerned with whatever form of scaling.
One thing that REST allows that SOAP doesn't, and that some other answers here address, is caching cacheable responses through an HTTP caching proxy or at the client side. This may make a REST service somewhat more lightly loaded than a SOAP service when a lot of operations' responses are cacheable. All this means is that fewer requests end up in your service.

The main reason behind saying a rest application is scalable is, Its built upon a HTTP protocol. Because HTTP is stateless. Stateless means it wont share anything between other request. So any request can go to any Server in a load balanced cluster. There is nothing forcing this user request go to this server. We can overcome this by using token.
Because of this statelessness,All REST application are very easy to scale. But if you want get high throughput(number of request capable in one second) in each server, then you should optimize blocking things from the application. Follow the following tips
Make each REST resource is a small entity. Don't read data from join of many tables.
Read data from near by databases
Use caches (Redis) instead of databases(You can save DISK I/O)
Always keep data sources as much as near by because these blocks will make server resources (CPU) ideal and it no other request can use that resource while it is ideal.

A reason (perhaps not the reason) is that RESTful services are sessionless. This means you can easily use a load balancer to direct requests to various web servers without having to replicate session state among all of your web servers or making sure all requests from a single session go to the same web server.

Django node.js socket.io

I am trying to make a realtime messaging application. There will be 2 distinct server(node.js and django) and when a user sends message to another user message will be stored in database than node.js will send a message to receiver like "You have new Message!". For that i am planing to call url which node.js serve. So node.js and django will interact each other. And what is best way send message to specifig client ? (I keep clients with their id's in a assosicative array.)
what do you think about that? is it efficent or do you suggest better way to do this ?

Now that I understand more about what you're trying to do, here my answer, just keep in mind that this only reflects my opinion, and I bet that many others would argue about it.
It all matter on how much traffic you expect to have in your application. If it's not a high traffic application, then efficiency in run-time is insignificant when compared to that of the development, and so choose the technology you feel most comfortable with.
If though you do aim for high traffic application, then I believe that this setup is not a good one.
First of all while http based communication between servers might seem comfortable, you are dealing with the overhead of http over tcp (since http is based on tcp). And so regular tcp sockets scale better, but on the other hand if you write the sockets server in python than you can run it from the same process as the django and then just use it as an object from django (you're entering the realm of threads here). But that's problematic if you have a few web instances, again depends on how much traffic you expect.
As for your choice for implementing the messaging server, I've never tested node.js but I believe that in benchmark tests it won't compare for something written in erlang or Java NIO. For example: JAVA AIO (NIO.2) VS NODEJS

Ideal way/architecture to deliver large data over Web Services

We are trying to design 6 web services, which will serve another client component. The client component requires data from the web service we are implementing.
Now, the problem is, there is not 1 Web Service we are implementing, there is one Web Service which the client component hits, this initiates a series (5 more) of Web Services which gather data from their respective data stores and finally provide the data back to the original Web Service, which then delivers the data back to the client component.
So, if the requested data becomes huge, then, this will be a serious problem for our internal communication channel.
So, what do you guys suggest? What can be done to avoid overloading of the communication channel between the internal Web Service and at the same time, also delivering the data to the client component.
Update 1
Using 5 WS, where, 1WS does not know about the others, except the next one is a business requirement. Actually, 5 companies "small services" are being integrated.
We use Java and Axis2

We've had a similar problem. Apart from trying to avoid it (eg for internal communication go direct to db instead of web service) you can mitigate it by at least not performing the 5 or so tasks in series. Make new threads to collect them all in parallel and process them at the end to reduce latency (except where they might contend for the same resource and bottle neck).
But before I'd do anything load test it and see if it is even an issue and get some baseline stats so you can see what improvement each change makes. Also sometimes you might be better off tweaking network settings or the actual network rather than trying to optimise the code - but again test and see.

Put all the data on a temporary compressed file and give back the ftp url of the file.
The client fetches the big data chunk uncompress it and reads it. (maybe some authentication mechanism for the ftp server)

How good and/or necessary are Stateful Web Services?

What kind of server do you people see in real projects?
1) Web Services MUST be stateless: Basically you must send username/password with every request, every request must use HTTPS and I will authenticate and load the User object everytime if needed.
2) A Session for Web Services: like in a web container so I can at least save the authenticated User object and have something similar to a session ID so I don't need to authenticate, load and check the User on every request.
3) Sticky Service (persistent service across requests): https://jax-ws.dev.java.net/nonav/2.1/docs/statefulWebservice.html
I understand the scalability problems of stateful services (and of web application sessions), but sometimes you must have some kind of state, for example for a shopping cart. But you can also put this state in the database (use the back-end as a kind of session argh) or passing the entire state to the client (the client becomes responsible for resending the entire shopping cart).
The truth is, at least for web applications, the session helps a lot in many situations. Scalability issues can be ignored if your system accepts that "the user must start over doing whatever he is doing if his web server happens to go down" or you can try a session cluster if that's unacceptable.
How it is for web services? I am inclined to conclude that web services are very different than web applications and accept option 1) (always stateless), but it would be nice to hear other opinions based on real project experience.

While it's only a small difference but it should still be mentioned:
It's not state in web services that kill scalability, rather it's state on the App Server that's hosting the web services that will kill scalability. The moment you say that this user needs to access this server (as done in sticky sessions) you are effectively limiting your scalability options. The point you want to get to is that 'Any of your free load-balanced App servers' can handle this web service request and if I add 1 more App Server I should be able to handle % more users.
It's totally fine (and personally recommended) if you want to maintain state to pass in an authentication token and on each request get the service to retrieve your 'state' from a data store (preferably a redundant and partitioned one, e.g. distributed+replicated key/value data store). That's how Amazon does it with SimpleDb and Google with BigTable.
Ebay takes a slightly different approach and stores most of the clients state in a cookie so it gets passed in with every request. Although it generates a lot more traffic, it still scalable as any of their servers can still handle the request.
If you want a scalable data store I would recommend looking at redis it has speed and features that can't be beat in a key/value data store.
You should also check out highscalability.com if you want access to good material on how to build fast and scalable services.

Ideally webservices (and web sites) should be stateless.
Unfortunately this takes very well thought out problem domain, and clear separation of concerns.
I've found that in practice most real-world web sites depend on state even though this limits their scalability.
I've also found that many real-world web-services also rely on state.
Ultimately the 'right' decision is the one that works for the specific problem, so it's probably okay to write a webservice that relies on state, and refactor it later if scalability becomes an issue.

Highly dependent on whether the service is single transaction oriented (say getting stock quotes) or if the output from the service is dependent on a data provided from a particular client across multiple transactions(in that case state must be maintained.)
As far as scalability issues, storing state in a database isn't actually a bad way to go (in fact it's probably the only way to go if you're load balancing your service across a server farm.)

I think with Flex clients the state is moved out of the service and into the client tier. Keep the services stateless and let the clients maintain the state needed. The services stay simple, and the clients are free to mash them together as they wish.

You seem to be equating state and authentication. Perhaps you're accustomed to storing username and password in session state?
This is not necessary, even with old ASMX web services. Simply pass whatever information you need to your "Login" operation. This operation will be defined to return an "Authentication Ticket" header.
All other operations that require authentication will require this "Authentication Ticket" header. They will each check the header to see if it represents a valid, authenticated user. If so, then they will perform their task. If not, then they will return a SOAP Fault indicating that authentication is required.
No state is required. Simply make sure that the authentication ticket can be validated on any server your service runs on (for instance, in a web farm), and you'll be fine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js