Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Under what conditions would one favor apps talking via a message queue instead of via web services (I just mean XML or JSON or YAML or whatever over HTTP here, not any particular type)?
I have to talk between two apps on a local network. One will be a web app and have to request commands on another app (running on different hardware). The requests are things like creating users, moving files around, and creating directories. Under what conditions would I prefer XML Web Services (or straight TCP or something) to using a Message queue?
The web app is Ruby on Rails, but I think the question is broader than that.
When you use a web service you have a client and a server:
If the server fails the client must take responsibility to handle the error.
When the server is working again the client is responsible of resending it.
If the server gives a response to the call and the client fails the operation is lost.
You don't have contention, that is: if million of clients call a web service on one server in a second, most probably your server will go down.
You can expect an immediate response from the server, but you can handle asynchronous calls too.
When you use a message queue like RabbitMQ, Beanstalkd, ActiveMQ, IBM MQ Series, Tuxedo you expect different and more fault tolerant results:
If the server fails, the queue persist the message (optionally, even if the machine shutdown).
When the server is working again, it receives the pending message.
If the server gives a response to the call and the client fails, if the client didn't acknowledge the response the message is persisted.
You have contention, you can decide how many requests are handled by the server (call it worker instead).
You don't expect an immediate synchronous response, but you can implement/simulate synchronous calls.
Message Queues has a lot more features but this is some rule of thumb to decide if you want to handle error conditions yourself or leave them to the message queue.
There's been a fair amount of recent research in considering how REST HTTP calls could replace the message queue concept.
If you introduce the concept of a process and a task as a resource, the need for middle messaging layer starts to evaporate.
Ex:
POST /task/name
- Returns a 202 accepted status immediately
- Returns a resource url for the created task: /task/name/X
- Returns a resource url for the started process: /process/Y
GET /process/Y
- Returns status of ongoing process
A task can have multiple steps for initialization, and a process can return status when polled or POST to a callback URL when complete.
This is dead simple, and becomes quite powerful when you realize that you can now subscribe to an rss/atom feed of all running processes and tasks without any middle layer. Any queuing system is going to require some sort of web front end anyway, and this concept has it built in without another layer of custom code.
Your resources exist until you delete them, which means you can view historical information long after the process and task complete.
You have built in service discovery, even for a task that has multiple steps, without any extra complicated protocols.
GET /task/name
- returns form with required fields
POST (URL provided form's "action" attribute)
Your service discovery is an HTML form - a universal and human readable format.
The entire flow can be used programmatically or by a human, using universally accepted tools. It's a client driven, and therefore RESTful. Every tool created for the web can drive your business processes. You still have alternate message channels by POSTing asynchronously to a separate array of log servers.
After you consider it for a while, you sit back and start to realize that REST may just eliminate the need for a messaging queue and an ESB altogether.
http://www.infoq.com/presentations/BPM-with-REST
Message queues are ideal for requests which may take a long time to process. Requests are queued and can be processed offline without blocking the client. If the client needs to be notified of completion, you can provide a way for the client to periodically check the status of the request.
Message queues also allow you to scale better across time. It improves your ability to handle bursts of heavy activity, because the actual processing can be distributed across time.
Note that message queues and web services are orthogonal concepts, i.e. they are not mutually exclusive. E.g. you can have a XML based web service which acts as an interface to a message queue. I think the distinction your looking for is Message Queues versus Request/Response, the latter is when the request is processed synchronously.
Message queues are asynchronous and can retry a number of times if delivery fails. Use a message queue if the requester doesn't need to wait for a response.
The phrase "web services" make me think of synchronous calls to a distributed component over HTTP. Use web services if the requester needs a response back.
I think in general, you'd want a web service for a blocking task (this tasks needs to be completed before we execute more code), and a message queue for a non-blocking task (could take quite a while, but we don't need to wait for it).
Related
Background:
I've a local application that process the user input for 3 second (approximately) and then return an answer (output) to the user.
(I don't want to go into details about my application in purpose of not complicate the question and keep it a pure architectural question)
My Goal:
I want to make my application a service in the cloud and expose API
(for the upcoming website and for clients that will connect the service without install the software locally)
Possible Solutions:
Deploy WCF on the cloud and use my application there, so clients can invoke the service and use my application on the cloud. (RPC style)
Use a Web-API that will insert the request into queue and then a worker role will dequeue requests and post the results to a DB, so the client will send one request for creating a request in the queue, and another request for getting the result (which the Web-API will get from the DB).
The Problems:
If I go with the WCF solution (#1) I cant handle great loads of requests, maybe 10-20 simultaneously.
If I go with the WebAPI-Queue-WorkerRole solution (#2) sometimes the client will need to request the results multiple times its can be a problem.
If I go with the WebAPI-Queue-WorkerRole solution (#2) the process isn't sync, the client will not get the result once the process of his request is done, he need to request the result.
Questions:
In the WebAPI-Queue-WorkerRole solution (#2), can I somehow alert the client once his request has processed and done ? so I can save the client multiple request (for the result).
Asking multiple times for the result isn't old stuff ? I remmemeber that 10 - 15 years ago its was accepted but now ? I know that VirusTotal API use this kind of design.
There is a better solution ? one that will handle great loads and will be sync or async (returning result to the client once it done) ?
Thank you.
If you're using Azure, why not simply fire up more servers and use load balancing to handle more load? In that way, as your load increases, you have more servers to handle the requests.
Microsoft recently made available the Azure Service Fabric, which gives you a lot of control over spinning up and shutting down these services.
(Edited to try to explain better)
We have an agent, written in C++ for Win32. It needs to periodically post information to a server. It must support disconnected operation. That is: the client doesn't always have a connection to the server.
Note: This is for communication between an agent running on desktop PCs, to communicate with a server running somewhere in the enterprise.
This means that the messages to be sent to the server must be queued (so that they can be sent once the connection is available).
We currently use an in-house system that queues messages as individual files on disk, and uses HTTP POST to send them to the server when it's available.
It's starting to show its age, and I'd like to investigate alternatives before I consider updating it.
It must be available by default on Windows XP SP2, Windows Vista and Windows 7, or must be simple to include in our installer.
This product will be installed (by administrators) on a couple of hundred thousand PCs. They'll probably use something like Microsoft SMS or ConfigMgr. In this scenario, "frivolous" prerequisites are frowned upon. This means that, unless the client-side code (or a redistributable) can be included in our installer, the administrator won't be happy. This makes MSMQ a particularly hard sell, because it's not installed by default with XP.
It must be relatively simple to use from C++ on Win32.
Our client is an unmanaged C++ Win32 application. No .NET or Java on the client.
The transport should be HTTP or HTTPS. That is: it must go through firewalls easily; no RPC or DCOM.
It should be relatively reliable, with retries, etc. Protection against replays is a must-have.
It must be scalable -- there's a lot of traffic. Per-message impact on the server should be minimal.
The server end is C#, currently using ASP.NET to implement a simple HTTP POST mechanism.
(The slightly odd one). It must support client-side in-memory queues, so that we can avoid spinning up the hard disk. It must allow flushing to disk periodically.
It must be suitable for use in a proprietary product (i.e. no GPL, etc.).
How is your current solution showing its age?
I would push the logic on to the back end, and make the clients extremely simple.
Messages are simply stored in the file system. Have the client write to c:/queue/{uuid}.tmp. When the file is written, rename it to c:/queue/{uuid}.msg. This makes writing messages to the queue on the client "atomic".
A C++ thread wakes up, scans c:\queue for "*.msg" files, and if it finds one it then checks for the server, and HTTP POSTs the message to it. When it receives the 200 status back from the server (i.e. it has got the message), then it can delete the file. It only scans for *.msg files. The *.tmp files are still being written too, and you'd have a race condition trying to send a msg file that was still being written. That's what the rename from .tmp is for. I'd also suggest scanning by creation date so early messages go first.
Your server receives the message, and here it can to any necessary dupe checking. Push this burden on the server to centralize it. You could simply record every uuid for every message to do duplication elimination. If that list gets too long (I don't know your traffic volume), perhaps you can cull it of items greater than 30 days (I also don't know how long your clients can remain off line).
This system is simple, but pretty robust. If the file sending thread gets an error, it will simply try to send the file next time. The only time you should be getting a duplicate message is in the window between when the client gets the 200 ack from the server and when it deletes the file. If the client shuts down or crashes at that point, you will have a file that has been sent but not removed from the queue.
If your clients are stable, this is a pretty low risk. With the dupe checking based on the message ID, you can mitigate that at the cost of some bookkeeping, but maintaining a list of uuids isn't spectacularly daunting, but again it does depend on your message volume and other performance requirements.
The fact that you are allowed to work "offline" suggests you have some "slack" in your absolute messaging performance.
To be honest, the requirements listed don't make a lot of sense and show you have a long way to go in your MQ learning. Given that, if you don't want to use MSMQ (probably the easiest overall on Windows -- but with [IMO severe] limitations), then you should look into:
qpid - Decent use of AMQP standard
zeromq - (the best, IMO, technically but also requires the most familiarity with MQ technologies)
I'd recommend rabbitmq too, but that's an Erlang server and last I looked it didn't have usuable C or C++ libraries. Still, if you are shopping MQ, take a look at it...
[EDIT]
I've gone back and reread your reqs as well as some of your comments and think, for you, that perhaps client MQ -> server is not your best option. I would maybe consider letting your client -> server operations be HTTP POST or SOAP and allow the HTTP endpoint in turn queue messages on your MQ backend. IOW, abstract away the MQ client into an architecture you have more control over. Then your C++ client would simply be HTTP (easy), and your HTTP service (likely C# / .Net from reading your comments) can interact with any MQ backend of your choice. If all your HTTP endpoint does is spawn MQ messages, it'll be pretty darned lightweight and can scale through all the traditional load balancing techniques.
Last time I wanted to do any messaging I used C# and MSMQ. There are MSMQ libraries available that make using MSMQ very easy. It's free to install on both your servers and never lost a message to this day. It handles reboots etc all by itself. It's a thing of beauty and 100,000's of message are processed daily.
I'm not sure why you ruled out MSMQ and I didn't get point 2.
Quite often for queues we just dump record data into a database table and another process lifts rows out of the table periodically.
How about using Asynchronous Agents library from .NET Framework 4.0. It is still beta though.
http://msdn.microsoft.com/en-us/library/dd492627(VS.100).aspx
I already wrote here about the http chat server I want to create: Alternative http port?
This http server should stream text to every user in the same chat room on the website. The browser will stay connected and wait for further html code. (yes that works, the browser won't reject the connection).
I got a new question: Because this chat server doesn't need to receive information from the client, it's not necessary to listen to the client after the server sent its first response. New chat messages will be send to the server on a new connection.
So I can open 2 threads, one waiting for new clients (or new messages) and one for the html streaming.
Is this a good idea or should I use one thread per client? I don't think it's good to have one thread/client when there are many chat users online, since the server should handle multiple different chats with their own rooms.
3 posibilities:
1. One thread for all clients, send text to each client successive - there shouldn't be much lag since it's only text
this will be like: user1.send("text");user2.send("text"),...
2. One thread per chat or chatroom
3. One thread per chat user - ... many...
Thank you, I haven't done much with sockets yet ;).
Right now, you seem to be thinking in terms of a given thread always carrying out a given (type of) task. While that basic design can make sense, to produce a scalable server like this, it generally doesn't work very well.
Often a slightly more abstract viewpoint works out better: you have tasks that need to get done, and threads that do those tasks -- but a thread doesn't really "care" about what task it executes.
With this viewpoint, you simply need to create some sort of data structure that describes each task that needs to be done. When you have a task you want done, you fill in a data structure to describe the task, and hand it off to get done. Somewhere, there are some threads that do the tasks.
In this case, the exact number of threads becomes mostly irrelevant -- it's something you can (and do) adjust to fit the number of CPU cores available, the type of tasks, and so on, not something that affects the basic design of the program.
I think easiest pattern for this simple app is to have pool of threads and then for each client pick available thread or make it wait until one becomes available.
If you want serious understanding of http server architecture concepts google following:
apache architecture
nginx architecture
I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.
What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.
Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...
In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?
What would be a more standard way of notifying a web service consumer of a data change?
Having the consumer periodically calling the web service to pull change notification.
Consumer setting up a call back web service that can be invoked to forward notification about the change.
Other?
Both of these are options. There is also something called "comet" which is like setting up a stream between between the consumer and producer - messages can then be passed back and forth between the two. Wikipedia is probably the best place to start investigating to see if it will work for you project: http://en.wikipedia.org/wiki/Comet_(programming)
Depends on the scenario. If you're working in a closed environment with only a few consumers of your service, you could switch to a COMET style service which allows a callback from the service to the client. More about that here:
Wikipedia - COMET
From what I've read, that method doesn't scale well in larger environments so I'd be careful.
The more traditional method is your first option of polling the service for changes. As long as your service performs well and you have the appropriate hardware to serve up the requests, it's probably your best bet for a public facing web service.
In case you weren't aware of it, and in case it helps: WCF can work with a Duplex contract that in effect creates a callback service contract on the client. It's fairly transparent.