I have an C++ application that reads a variety of sensors and then acts on them as required. Currently the sensors run in their own threads and have get/set methods for their values.
I'm trying to integrate a web socket server using POCO libraries to display the state of the sensors.
How do I go about getting the sensor information into the HTTPRequestHandler?
Should I be using the POCO::Application class and defining the sensors & server as subsystems? Is there another approach that I should be taking?
You can derive from HTTPRequestHandler and override handleRequest() and give access to the sensor information by for example storing a reference to your sensor info object as a member of the class derived from HTTPRequestHandler.
class SensorStateRequestHandler : public Poco::Net::HTTPRequestHandler
{
public:
SensorStateRequestHandler(SensorInfo &sensorInfo)
: sensorInfo_(sensorInfo)
{}
virtual void handleRequest(Poco::Net::HTTPServerRequest &request, Poco::Net::HTTPServerResponse &response) override
{
// receive request websocket frame
sensorInfo_.get_state(); // must be thread safe
// send response websocket frame with sensor state
}
private:
sensorInfo &sensorInfo_;
};
See how WebEventService in macchina.io is implemented - using Poco::Net::HTTPServer, WebSocket and Poco::NotificationQueue.
The design "in a nutshell" is a pub/sub pattern, client subscribes to notifications and receives them through WebSocket; in-process subscriptions/notifications (using Poco events) are also supported. There is a single short-living thread (HTTP handler) launched at subscription time and the rest of communication is through WebSocket reactor-like functionality, so performance and scalability is reasonably good (although there is room for improvement, depending on target platform).
You may consider using macchina.io itself (Apache license) - it is based on POCO/OSP and targets the type of application you have. WebEvent functionality will be part of Poco::NetEx in 1.7.0 release (sheduled for September this year).
So the only way that I know how to find which client I received from is by comparing the received endpoint in a loop of all the clients, and I was wondering if there was a more elegant way of handling this.
In tcp, every client has its own socket, and with it, it can find which client it receives from instantly. If I make every client have its own socket in udp, will it be more or less efficient?
I was also thinking of making a global socket, and making every client object listen to only their endpoint's, but I don't think that's possible, or efficient in asio.
The application code is responsible for demultiplexing. At a high-level, there are two options:
Use a single endpoint to conceptually function as an acceptor. Upon receiving a handshake message, the client would instantiate a new local endpoint, and inform the client to use the newly constructed endpoint for the remainder of the client's session. This results in a socket per client, and with connected UDP sockets, a client can be guaranteed to only receive messages from the expected remote endpoint. This should be no less efficient than the same approach used with TCP sockets. However, it requires making changes to the application protocol on both the sender and receiver.
Use a single socket. Upon receiving a message, the remote endpoint is used to demultiplex to the client object. If the application depends upon the demultiplex abstraction, then the implementation may be freely changed to best suit the application's usage. This requires no changes to the application protocol.
The first option will more easily support higher concurrency levels, as each client can control the lifetime of its asynchronous call chain. While it is possible to have a call chain per client in the second option, controlling the lifetime introduces complexity, as all asynchronous call chains are bound to the same I/O object.
On the other hand, as concurrency increase, so does memory. Hence, the first option is likely to use more memory than the second option. Furthermore, controlling overall memory is easier in the second, as the concurrency level will not be completely dynamic. In either case, reactor style operations can be used to mitigate the overall memory usage.
In the end, abstract the application from the implementation whilst keeping the code maintainable. Once the application is working, profile, identify bottlenecks, and make choices based on actual data.
To expand slightly on the second option, here is an complete minimal example of a basic client_manager that associates endpoints to client objects:
#include <memory>
#include <unordered_map>
#include <boost/asio.hpp>
namespace ip = boost::asio::ip;
/// #brief Mockup client.
class client:
public std::enable_shared_from_this<client>
{
public:
explicit client(ip::udp::endpoint endpoint)
: endpoint_(endpoint)
{}
const ip::udp::endpoint& endpoint() const { return endpoint_; }
private:
ip::udp::endpoint endpoint_;
};
/// #brief Basic class that manages clients. Given an endpoint, the
/// associated client, if any, can be found.
class client_manager
{
private:
// The underlying implementation used by the manager.
using container_type = std::unordered_map<
ip::udp::endpoint, std::shared_ptr<client>,
std::size_t (*)(const ip::udp::endpoint&)>;
/// #brief Return a hash value for the provided endpoint.
static std::size_t get_hash(const ip::udp::endpoint& endpoint)
{
std::ostringstream stream;
stream << endpoint;
std::hash<std::string> hasher;
return hasher(stream.str());
}
public:
using key_type = container_type::key_type;
using mapped_type = container_type::mapped_type;
/// #brief Constructor.
client_manager()
: clients_(0, &client_manager::get_hash)
{}
// The public abstraction upon which the application will depend.
public:
/// #brief Add a client to the manager.
void add(mapped_type client)
{
clients_[client->endpoint()] = client;
}
/// #brief Given an endpoint, retrieve the associated client. Return
/// an empty shared pointer if one is not found.
mapped_type get(key_type key) const
{
auto result = clients_.find(key);
return clients_.end() != result
? result->second // Found client.
: mapped_type(); // No client found.
}
private:
container_type clients_;
};
int main()
{
// Unique endpoints.
ip::udp::endpoint endpoint1(ip::address::from_string("11.11.11.11"), 1111);
ip::udp::endpoint endpoint2(ip::address::from_string("22.22.22.22"), 2222);
ip::udp::endpoint endpoint3(ip::address::from_string("33.33.33.33"), 3333);
// Create a client for each endpoint.
auto client1 = std::make_shared<client>(endpoint1);
auto client2 = std::make_shared<client>(endpoint2);
auto client3 = std::make_shared<client>(endpoint3);
// Add the clients to the manager.
client_manager manager;
manager.add(client1);
manager.add(client2);
manager.add(client3);
// Locate a client based on the endpoint.
auto client_result = manager.get(endpoint2);
assert(client1 != client_result);
assert(client2 == client_result);
assert(client3 != client_result);
}
Note that as the application only depends upon the client_manager abstraction (i.e. pre and post conditions for client_manager::add() and client_manager::get()), then the client_manager implementation can be changed without affecting the application as long as the implementation maintains the pre and post conditions. For instance, instead of using std::unordered_map, it could be implemented with a sequence container, such as std::vector, or an ordered associated container, such as std::map. Choose a container that best fits the expected usage. After profiling, if the container choice is an identified bottleneck, then change the implementation of client_manager to use a more suitable container based on the actual usage.
I'm trying to embed a telnet server in a data-capture program I've written. I've got both the data capture, and the telnet server working in their own classes, but now I want to transfer data from one to another, and I'm not sure where to start.
In the example below, I want to be able to send a command to the telnet server to request a data packet from the data capture thread.
So, in code (C++) this is what I want to do:
#include <thread>
void StartTelnetServer()
{
MyTelnetClass tnet;
tnet.Start(); // In here, server starts listening for connections.
}
void StartDataCapture()
{
MyDataCapture dCap;
dCap.Start(); // In here, data capture begins
}
main()
{
std::thread tnetThread(StartTelnetServer);
std::thread dCapThread(StartDataCapture);
// This will run until killed
}
I then want to telnet into it, with a string command such as "SIZE" and for the telnet class to query the latest dCap.GetSize(). There are dozen or so bits of data that I'll want to access in this way. Do I need to declare a static structure of some sort that both classes access? Am I way off base?!
This needs to run on Linux, if that matters to anything.
If the telnet handler should be able to access the data-capture object, but not the other way around, you can create both object in the main function, passing the data-capture object by reference to the telnet handler constructor. Then start the threads using the Start member functions instead.
Something like
...
class MyDataCapture;
class MyTelnetClass
{
public:
MyTelnetClass(MyDataCapture& dc)
: dCap(dc)
{}
...
private:
MyDataCapture& dCap;
...
};
...
int main()
{
MyDataCapture dCap;
MyTelnetClass tnet{dCap}
std::thread dCapThread(&MyDataCapture::Start, dCap);
std::thread tnetThread(&MyTelnetClass::Start, tnet);
...
}
This way the telnet handler can just call functions in the data-capture object when needed. Be careful through so you don't get data-races, protect data with mutexes and locks.
If you want the data-capture object to call functions in the telnet handler object as well you can't use references but have to use pointers.
I am using a protocol, which is basically a request & response protocol over TCP, similar to other line-based protocols (SMTP, HTTP etc.).
The protocol has about 130 different request methods (e.g. login, user add, user update, log get, file info, files info, ...). All these methods do not map so well to the broad methods as used in HTTP (GET,POST,PUT,...). Such broad methods would introduce some inconsequent twists of the actual meaning.
But the protocol methods can be grouped by type (e.g. user management, file management, session management, ...).
Current server-side implementation uses a class Worker with methods ReadRequest() (reads request, consisting of method plus parameter list), HandleRequest() (see below) and WriteResponse() (writes response code & actual response data).
HandleRequest() will call a function for the actual request method - using a hash map of method name to member function pointer to the actual handler.
The actual handler is a plain member function there is one per protocol method: each one validates its input parameters, does whatever it has to do and sets response code (success yes/no) and response data.
Example code:
class Worker {
typedef bool (Worker::*CommandHandler)();
typedef std::map<UTF8String,CommandHandler> CommandHandlerMap;
// handlers will be initialized once
// e.g. m_CommandHandlers["login"] = &Worker::Handle_LOGIN;
static CommandHandlerMap m_CommandHandlers;
bool HandleRequest() {
CommandHandlerMap::const_iterator ihandler;
if( (ihandler=m_CommandHandlers.find(m_CurRequest.instruction)) != m_CommandHandler.end() ) {
// call actual handler
return (this->*(ihandler->second))();
}
// error case:
m_CurResponse.success = false;
m_CurResponse.info = "unknown or invalid instruction";
return true;
}
//...
bool Handle_LOGIN() {
const UTF8String username = m_CurRequest.parameters["username"];
const UTF8String password = m_CurRequest.parameters["password"];
// ....
if( success ) {
// initialize some state...
m_Session.Init(...);
m_LogHandle.Init(...);
m_AuthHandle.Init(...);
// set response data
m_CurResponse.success = true;
m_CurResponse.Write( "last_login", ... );
m_CurResponse.Write( "whatever", ... );
} else {
m_CurResponse.Write( "error", "failed, because ..." );
}
return true;
}
};
So. The problem is: My worker class now has about 130 "command handler methods". And each one needs access to:
request parameters
response object (to write response data)
different other session-local objects (like a database handle, a handle for authorization/permission queries, logging, handles to various sub-systems of the server etc.)
What is a good strategy for a better structuring of those command handler methods?
One idea was to have one class per command handler, and initializing it with references to request, response objects etc. - but the overhead is IMHO not acceptable (actually, it would add an indirection for any single access to everything the handler needs: request, response, session objects, ...). It could be acceptable if it would provide an actual advantage. However, that doesn't sound much reasonable:
class HandlerBase {
protected:
Request &request;
Response &response;
Session &session;
DBHandle &db;
FooHandle &foo;
// ...
public:
HandlerBase( Request &req, Response &rsp, Session &s, ... )
: request(req), response(rsp), session(s), ...
{}
//...
virtual bool Handle() = 0;
};
class LoginHandler : public HandlerBase {
public:
LoginHandler( Request &req, Response &rsp, Session &s, ... )
: HandlerBase(req,rsp,s,..)
{}
//...
virtual bool Handle() {
// actual code for handling "login" request ...
}
};
Okay, the HandlerBase could just take a reference (or pointer) to the worker object itself (instead of refs to request, response etc.). But that would also add another indirection (this->worker->session instead of this->session). That indirection would be ok, if it would buy some advantage after all.
Some info about the overall architecture
The worker object represents a single worker thread for an actual TCP connection to some client. Each thread (so, each worker) needs its own database handle, authorization handle etc. These "handles" are per-thread-objects that allow access to some sub-system of the server.
This whole architecture is based on some kind of dependency injection: e.g. to create a session object, one has to provide a "database handle" to the session constructor. The session object then uses this database handle to access the database. It will never call global code or use singletons. So, each thread can run undisturbed on its own.
But the cost is, that - instead of just calling out to singleton objects - the worker and its command handlers must access any data or other code of the system through such thread-specific handles. Those handles define its execution context.
Summary & Clarification: My actual question
I am searching for an elegant alternative to the current ("worker object with a huge list of handler methods") solution: It should be maintainable, have low-overhead & should not require writing too much glue-code. Additionally, it MUST still allow each single method control over very different aspects of its execution (that means: if a method "super flurry foo" wants to fail whenever full moon is on, then it must be possible for that implementation to do so). It also means, that I do not want any kind of entity abstraction (create/read/update/delete XFoo-type) at this architectural layer of my code (it exists at different layers in my code). This architectural layer is pure protocol, nothing else.
In the end, it will surely be a compromise, but I am interested in any ideas!
The AAA bonus: a solution with interchangeable protocol implementations (instead of just that current class Worker, which is responsible for parsing requests and writing responses). There maybe could be an interchangeable class ProtocolSyntax, that handles those protocol syntax details, but still uses our new shiny structured command handlers.
You've already got most of the right ideas, here's how I would proceed.
Let's start with your second question: interchangeable protocols. If you have generic request and response objects, you can have an interface that reads requests and writes responses:
class Protocol {
virtual Request *readRequest() = 0;
virtual void writeResponse(Response *response) = 0;
}
and you could have an implementation called HttpProtocol for example.
As for your command handlers, "one class per command handler" is the right approach:
class Command {
virtual void execute(Request *request, Response *response, Session *session) = 0;
}
Note that I rolled up all the common session handles (DB, Foo etc.) into a single object instead of passing around a whole bunch of parameters. Also making these method parameters instead of constructor arguments means you only need one instance of each command.
Next, you would have a CommandFactory which contains the map of command names to command objects:
class CommandFactory {
std::map<UTF8String, Command *> handlers;
Command *getCommand(const UTF8String &name) {
return handlers[name];
}
}
If you've done all this, the Worker becomes extremely thin and simply coordinates everything:
class Worker {
Protocol *protocol;
CommandFactory *commandFactory;
Session *session;
void handleRequest() {
Request *request = protocol->readRequest();
Response response;
Command *command = commandFactory->getCommand(request->getCommandName());
command->execute(request, &response, session);
protocol->writeResponse(&response);
}
}
If it were me I would probably use a hybrid solution of the two in your question.
Have a worker base class that can handle multiple related commands, and can allow your main "dispatch" class to probe for supported commands. For the glue, you would simply need to tell the dispatch class about each worker class.
class HandlerBase
{
public:
HandlerBase(HandlerDispatch & dispatch) : m_dispatch(dispatch) {
PopulateCommands();
}
virtual ~HandlerBase();
bool CommandSupported(UTF8String & cmdName);
virtual bool HandleCommand(UTF8String & cmdName, Request & req, Response & res);
virtual void PopulateCommands();
protected:
CommandHandlerMap m_CommandHandlers;
HandlerDispatch & m_dispatch;
};
class AuthenticationHandler : public HandlerBase
{
public:
AuthenticationHandler(HandlerDispatch & dispatch) : HandlerBase(dispatch) {}
bool HandleCommand(UTF8String & cmdName, Request & req, Response & res) {
CommandHandlerMap::const_iterator ihandler;
if( (ihandler=m_CommandHandlers.find(req.instruction)) != m_CommandHandler.end() ) {
// call actual handler
return (this->*(ihandler->second))(req,res);
}
// error case:
res.success = false;
res.info = "unknown or invalid instruction";
return true;
}
void PopulateCommands() {
m_CommandHandlers["login"]=Handle_LOGIN;
m_CommandHandlers["logout"]=Handle_LOGOUT;
}
void Handle_LOGIN(Request & req, Response & res) {
Session & session = m_dispatch.GetSessionForRequest(req);
// ...
}
};
class HandlerDispatch
{
public:
HandlerDispatch();
virtual ~HandlerDispatch() {
// delete all handlers
}
void AddHandler(HandlerBase * pHandler);
bool HandleRequest() {
vector<HandlerBase *>::iterator i;
for ( i=m_handlers.begin() ; i < m_handlers.end(); i++ ) {
if ((*i)->CommandSupported(m_CurRequest.instruction)) {
return (*i)->HandleCommand(m_CurRequest.instruction,m_CurRequest,m_CurResponse);
}
}
// error case:
m_CurResponse.success = false;
m_CurResponse.info = "unknown or invalid instruction";
return true;
}
protected:
std::vector<HandlerBase*> m_handlers;
}
And then to glue it all together you would do something like this:
// Init
m_handlerDispatch.AddHandler(new AuthenticationHandler(m_handlerDispatch));
As for the transport (TCP) specific part, did you have a look at the ZMQ library that supports various distributed computing patterns via messaging sockets/queues? IMHO you should find an appropriate pattern that serves your needs in their Guide document.
For choice of the protocol messages implementation i would personally favorite google protocol buffers which works very well with C++, we are using it for a couple of projects now.
At least you'll boil down to dispatcher and handler implementations for specific requests and their parameters + the necessary return parameters. Google protobuf message extensions allow to to this in a generic way.
EDIT:
To get a bit more concrete, using protobuf messages the main difference of the dispatcher model vs yours will be that you don't need to do the complete message parsing before dispatch, but you can register handlers that tell themselves if they can handle a particular message or not by the message's extensions. The (main) dispatcher class doesn't need to know about the concrete extensions to handle, but just ask the registered handler classes. You can easily extend this mechanism to have certain sub-dispatchers to cover deeper message category hierarchies.
Because the protobuf compiler can already see your messaging data model completely, you don't need any kind of reflection or dynamic class polymorphism tests to figure out the concrete message content. Your C++ code can statically ask for possible extensions of a message and won't compile if such doesn't exist.
I don't know how to explain this in a better way, or to show a concrete example how to improve your existing code with this approach. I'm afraid you already spent some efforts on the de-/serialization code of your message formats, that could have been avoided using google protobuf messages (or what kind of classes are Request and Response?).
The ZMQ library might help to implement your Session context to dispatch requests through the infrastructure.
Certainly you shouldn't end up in a single interface that handles all kinds of possible requests, but a number of interfaces that specialize on message categories (extension points).
I think this is an ideal case for a REST-like implementation. One other way could also be grouping the handler methods based on category/any-other-criteria to several worker classes.
If the protocol methods can only be grouped by type but methods of the same group do not have anything common in their implementation, possibly the only thing you can do to improve maintainability is distributing methods between different files, one file for a group.
But it is very likely that methods of the same group have some of the following common features:
There may be some data fields in the Worker class that are used by only one group of methods or by several (but not every) group. For example, if m_AuthHandle may be used only by user management and session management methods.
There may be some groups of input parameters, used by every method of some group.
There may be some common data, written to the response by every method of some group.
There may be some common methods, called by several methods of some group.
If some of these facts is true, there is a good reason to group these features into different classes. Not one class per command handler, but one class per event group. Or, if there are features, common to several groups, a hierarchy of classes.
It may be convenient to group instances of all these group classes in one place:
classe UserManagement: public IManagement {...};
classe FileManagement: public IManagement {...};
classe SessionManagement: public IManagement {...};
struct Handlers {
smartptr<IManagement> userManagement;
smartptr<IManagement> fileManagement;
smartptr<IManagement> sessionManagement;
...
Handlers():
userManagement(new UserManagement),
fileManagement(new FileManagement),
sessionManagement(new SessionManagement),
...
{}
};
Instead of new SomeClass, some template like make_unique may be used. Or, if "interchangeable protocol implementations" are needed, one of the possibilities is to use factories instead of some (or all) new SomeClass operators.
m_CommandHandlers.find() should be split into two map searches: one - to find appropriate handler in this structure, other (in the appropriate implementation of IManagement) - to find a member function pointer to the actual handler.
In addition to finding a member function pointer, HandleRequest method of any IManagement implementation may extract common parameters for its event group and pass them to event handlers (one by one if there are just several of them, or grouped in a structure if there are many).
Also IManagement implementation may contain WriteCommonResponce method to simplify writing responce fields, common to all event handlers.
The Command Pattern is your solution to both aspects of this problem.
Use it to implement your protocol handler with a generalised IProtocol Interface (and/or abstract base class) and different implementations of protocol handler with a different Classes specialised for each protocol.
Then implement your Commands the same way with an ICommand Interface and each Command Methods implemented in seperate class. You are nearly there with this. Split your existing Methods into new Specialised Classes.
Wrap Your Requests and Responses as Mememento objects
I'm looking for the best way to modify the Boost Asio HTTP Server 3 example to maintain a list of the currently connected clients.
If I modify server.hpp from the example as:
class server : private boost::noncopyable
{
public:
typedef std::vector< connection_ptr > ConnectionList;
// ...
ConnectionList::const_iterator GetClientList() const
{
return connection_list_.begin();
};
void handle_accept(const boost::system::error_code& e)
{
if (!e)
{
connection_list_.push_back( new_connection_ );
new_connection_->start();
// ...
}
}
private:
ConnectionList connection_list_;
};
Then I mess up the lifetime of the connection object such that it doesn't go out of scope and disconnect from the client because it still has a reference maintained in the ConnectionList.
If instead my ConnectionList is defined as typedef std::vector< boost::weak_ptr< connection > > ConnectionList; then I run the risk of the client disconnecting and nullifying its pointer while somebody is using it from GetClientList().
Anybody have a suggestion on a good & safe way to do this?
Thanks,
PaulH
HTTP is stateless. That means it's difficult to even define what "currently connected client" means, not to mention keeping track of which clients are at any given time. The only time there's really a "current client" is from the time a request is received to the time that request is serviced (often only a few milliseconds). A connection is not maintained even for the duration of downloading one page -- rather, each item on the page is requested and sent separately.
The typical method for handling this is to use a fairly simple timeout -- a client is considered "connected" for some arbitrary length of time (a few minutes) after they send in a request. A cookie of some sort is used to identify the client sending in a particular request.
The rest of what you're talking about is just a matter of making sure the collection you use to hold connection information is thread safe. You have one thread that adds connections, one thread that deletes them, and N threads that use the data currently in the list. The standard collections don't guarantee any thread safety, but there are others around that do.