How to 'web enable' a legacy C++ application - c++

I am working on a system that splits users by organization. Each user belongs to an organization. Each organization stores its data in its own database which resides on a database server machine. A db server may manage databases for 1 or more organizations.
The existing (legacy) system assumes there is only one organization, however I want to 'scale' the application by running an 'instance' of it (tied to one organization), and run several instances on the server machine (i.e. run multiple instances of the 'single organization' application - one instance for each organization).
I will provide a RESTful API for each instance that is running on the server, so that a thin client can be used to access the services provided by the instance running on the server machine.
Here is a simple schematic that demonstrates the relationships:
Server 1 -> N database (each
organization has one database)
organization 1 -> N users
My question relates to how to 'direct' RESTful requests from a client, to the appropriate instance that is handling requests from users for that organization.
More specifically, when I receive a RESTful request, it will be from a user (who belongs to an organization), how (or indeed, what is the best way) to 'route' the request to the appropriate application instance running on the server?

From what I can gather, this is essentially a sharding problem. Regardless of how you split the instances at a hardware level (using VMs, multiple servers, all on one powerful server, etc), you need a central registry and brokering layer in your overall architecture that maps given users to the correct destination instance per request.
There are many ways to implement this of course, so just choose one that you know and is fast, and will scale, as all requests will come through it. I would suggest a lightweight stateless web application backed by a simple read only database that does the appropriate client identifier -> instance mapping, which you would load into memory/cache. To add flexibility on hardware and instance location, use (assuming Java) JNDI to store the hardware/port/etc information for each instance, and in your identifier mapping map the client identifier to the appropriate JNDI lookup key.

Letting the public API only specify the user sounds a little fragile to me. I would change the public API so that requests specify organization as well as user, and then have something trivial server-side that maps organizations to instances (eg. organization foo -> instance listening on port 7331).

That is a very tough question indeed; simply because there are many possible answers, and which one is the best can only be determined by you and your environment.
I would write an apache module in C++ to do that. Using this book, I managed to start writing very efficient modules.
To be able to give you more solutions (maybe just setting up a Squid proxy?), you'll need to specify how you will be able to determine to which server you need to redirect the client. If you can do it by IPs, though a GET param, though a POST XML param (like SOAP). Etc.

As the other answer says there are many ways to approach this issue. Lets assume that you DON'T have access to legacy software source code, which means you cannot modify it to listen on different ports for different instances.
Writing Apache module seems VERY extreme to solve this issue (and as someone who actually just finished writing a production apache module, I suggest avoiding it unless you are making serious money).
The approach can be as esoteric as you like. For instance if your legacy software runs on normal Intel architecture and you have the hardware capacity there are VM solutions, where you should be able to create a thin virtual machine, one running a single instance of the software and a multiplexer to tie them all.
If on the other hand you are running something like HPUX well :-) there are other approaches. How about you give a bit more detail?
Ahmed.

Related

Create a Distributed Application using AWS

Using AWS cloud services I would like to create a truly distributed application (ie where every node of the network is actually located in a different place) , based on a distributed consensus algorithm (similar to RAFT). Unfortunately I don't know AWS services very well and I only have in mind what I have to do, but I still have a lot of doubts about what services should I use (after having understood how they work).
In summary (as in the figure) what I would like to do is this:
1) User can use an API, called "main API" in the image (providing certain parameters). This API "starts the algorithm" by communicating directly with one or more remote machines.
2) Each machine has its own API (with its own HTTP address), and for this reason it is able to communicate with all the other machines. Each of these machines runs the same consensus algorithm.
3)At the end of the algorithm (ie when a machine chooses the value) it writes this value into a database (DB in the image). For example, there may be a database for every single machine, but it is not mandatory.
So my question is about which AWS services are right for this purpose and above all how to best use these services for my task.
What is the best service for creating your own machines and exposing them to the world via API? I only know (very poorly) the API Gateway service, but I don't know absolutely how to "own" remote machines and databases.
What is the best way to associate the algorithm code with a single machine? I was thinking of using AWS Lambda (a program is called in response to a certain HTTP call on the respective API). As for the output, I thought I was writing to the Dynamo DB database.
I would like to send to the main API parameters that specify the number of machines (and DBs) that will be used (maybe they could be "dynamically allocated"). Can you do this thing? For example I could tell the main API that I want to execute the algorithm with 3 machines and therefore only 3 remote machines (I don't care what) will work.
Thanks a lot!

Building Erlang applications for the cloud

I'm working on a socket server that'll be deployed to AWS and so far we have the basic OTP application set up following a structure similarly to the sample project in Erlang in Practice, but we wanted to avoid having a global message router because that's not going to scale well.
Having looked through the OTP design guide on Distributed Applications and the corresponding chapters (Distribunomicon and Distributed OTP) in Learn You Some Erlang it seems the built-in distributed application mechanism is geared towards on-premise solutions where you have known hostnames and IPs and the cluster configuration is determined ahead of time, whereas in our intended setup the application will need to scale dynamically up and down and the IP addresses of the nodes will be random.
Sorry that's a bit of a long-winded build up, my question is whether there are design guidelines for distributed Erlang applications that are deployed to the cloud and need to deal with all the dynamic scaling?
Thanks,
There are a few possible approaches:
In Erlang and OTP in Action, one method presented is to use one or two central nodes with known domains or IPs, and have all the other nodes connect to this one to discover each other
Applications like https://github.com/heroku/redgrid/tree/logplex require having a central redis node where all Erlang nodes register themselves instead, and do membership management
Third party services like Zookeeper and whatnot to do something similar
Whatever else people may recommend
Note that unless you're going to need to protect your communication, either by switching the distribution protocol to use SSL, or by using AWS security groups and whatnot to restrict who can access your network.
I'm just learning Erlang so can't offer any practical advice of my own but it sounds like your situation might require a "Resource Discovery" type of approach as i've read about in Erlang & OTP in Action.
Erlware also have an application to help with this: https://github.com/erlware/resource_discovery
Other stupid answers in addition to Fred's smart answers include:
Using Route53 and targetting a name instead of an IP
Keeping an IP address in AWS KMS or AWS Secrets Manager, and connecting to that (nice thing about this is it's updatable without a rebuild)
Environment variables: scourge or necessary evil?
Stuffing it in a text file in an obscured, password protected s3 bucket
VPNs
Hardcoding and updating the build in CI/CD
I mostly do #2

Is it possible to use Django and Node.Js?

I have a django backend set up for user-logins and user-management, along with my entire set of templates which are used by visitors to the site to display html files. However, I am trying to add real-time functionality to my site and I found a perfect library within Node.Js that allows two users to type in a text box and have the text appear on both their screens. Is it possible to merge the two backends?
It's absolutely possible (and sometimes extremely useful) to run multiple back-ends for different purposes. However it opens up a few cans of worms, depending on what kind of rigour your system is expected to have, who's in your team, etc:
State. You'll want session state to be shared between different app servers. The easiest way to do this is to store external session state in a framework-agnostic way. I'd suggest JSON objects in a key/value store and you'll probably benefit from JSON schema.
Domains/routing. You'll need your login cookie to be available to both app servers, which means either a single domain routed by Apache/Nginx or separate subdomains routed via DNS. I'd suggest separate subdomains for the following reason
Websockets. I may be out of date, but to my knowledge neither Apache nor Nginx support proxying of websockets, which means if you want to use that you'll sacrifice the flexibility of using an http server as a app proxy and instead expose Node directly via a subdomain.
Non-specified requirements. Things like monitoring, logging, error notification, build systems, testing, continuous integration/deployment, documentation, etc. all need to be extended to support a new type of component
Skills. You'll have to pay in time or money for the skill-sets required to manage a more complex application architecture
So, my advice would be to think very carefully about whether you need this. There can be a lot of time and thought involved.
Update: There are actually companies springing around who specialise in adding real-time to existing sites. I'm not going to name any names, but if you look for 'real-time' on the add-on marketplace for hosting platforms (e.g. Heroku) then you'll find them.
Update 2: Nginx now has support for Websockets
You can't merge them. You can send messages from Django to Node.Js through some queue system like Reddis.
If you really want to use two backends, you could use a database that is supported by both backends.
Though I would not recommended it.

Web Services Architecture - Multiple services & multiple database connections?

Could someone please direct me to some good documentation or feedback here on what are best practices for implementing web services in an application that handles different concerns? For example, should I create different services, one that handles security, (AuthService), one that handles data-entry for customer service reps, (CRUDService), BillingService and so on or should I just encapsulate all these "services" into one, e.g. ApplicationService? Basically, I am asking if it is bad design to create multiple services (files) within one application. Can some of you note on your experiences or what you've experienced?
Also, let's say three of the listed services from above connect to the same database, but are actually hitting totally different concerns, e.g. one is for all transactions like CRUD, and the other one is for purely reporting purposes. Should I create two services here, one CRUDService and the other for ReportingService? Is it bad to create two different database connections via these 2 services? Or how can I share the same database connection with different services?
I think there is a tendency among publicly available services to just dump everything into one service. Which, may not be a bad idea for a publicly available API. It just makes it easier for developers. However, for any project i work on, i try to break things down into logical groups. This way your client doesn't need to be inheriting functionality it may not need. Updating services would also be a slightly easier task because you're only affecting a certain subset of your web service framework and not everything. So if your service contract breaks and your clients no longer support it, they may still be able to use other parts of your system, but not that particular one. Where as if you break a contract on your aggregated service, everything fails. Finally, if you have to implement something like a fail-over support, you have more flexibility to choose which service requires more fail-over nodes, allowing you to better manage your resources allocation.
If you want best practices take a look to the SOA Design Pattern Catalog

Best approach(es) or technolog(y/ies) for this specific problem?

I have a web-based interface for handing invoices, customer records and other transaction records which interacts currently with a database of all the aforementioned stored upon the same machine. As you can imagine, this is quite a simple set-up consisting of a web-app (PHP) and a database (MySQL). However, the ideal scenario is to keep the records on the machine they are currently on (easy) and move the web-app to another server within the same network (again, easy) ... but in addition, provide facilities on a public-facing website for managing accounts by customers and so forth. The problem is this - the public-facing web server is located in a completely separate location as it is a dedicated server provided by a well-known ISP.
What would be the best way to enable the records to be accessible from this other server whilst ensuring that all communications are secure. Speed is not a huge factor, although any outages on either side should be handled gracefully. Initially my thoughts went towards web services (XML-RPC/SOAP/Hessian), but these options seem to present difficulties (security being the main one, overcomplexity as well).
The web-app must remain PHP-based. The public-facing site is likely to be PHP-based as well, although Python (likely using Django) is another option. The introduction of any other technologies (Java etc) is not a problem, although it is preferred if they be Linux-friendly (so .NET would not be the best fit here).
Apologies if this question is somewhat verbose and vague. I am testing the water somewhat in regards to this kind of problem. Any advice or suggestions gratefully received.
I've done something similar. You can expose a web service to the internet that will do the database access, but requests to the service must match a strong hashed and salted password (which will be secured on the ISP's server in the DMZ.)
Either this or some sort of public/private key encryption scheme.
OK, this might seem a bit silly, but what if you just used mysql replication?
Instead of using all sorts of fancy web services, just have a master sql server on one machine, then have it replicate to another server that holds the slave sql server as well as the web app