Create a Distributed Application using AWS - amazon-web-services

Using AWS cloud services I would like to create a truly distributed application (ie where every node of the network is actually located in a different place) , based on a distributed consensus algorithm (similar to RAFT). Unfortunately I don't know AWS services very well and I only have in mind what I have to do, but I still have a lot of doubts about what services should I use (after having understood how they work).
In summary (as in the figure) what I would like to do is this:
1) User can use an API, called "main API" in the image (providing certain parameters). This API "starts the algorithm" by communicating directly with one or more remote machines.
2) Each machine has its own API (with its own HTTP address), and for this reason it is able to communicate with all the other machines. Each of these machines runs the same consensus algorithm.
3)At the end of the algorithm (ie when a machine chooses the value) it writes this value into a database (DB in the image). For example, there may be a database for every single machine, but it is not mandatory.
So my question is about which AWS services are right for this purpose and above all how to best use these services for my task.
What is the best service for creating your own machines and exposing them to the world via API? I only know (very poorly) the API Gateway service, but I don't know absolutely how to "own" remote machines and databases.
What is the best way to associate the algorithm code with a single machine? I was thinking of using AWS Lambda (a program is called in response to a certain HTTP call on the respective API). As for the output, I thought I was writing to the Dynamo DB database.
I would like to send to the main API parameters that specify the number of machines (and DBs) that will be used (maybe they could be "dynamically allocated"). Can you do this thing? For example I could tell the main API that I want to execute the algorithm with 3 machines and therefore only 3 remote machines (I don't care what) will work.
Thanks a lot!

Related

Design service on GCP

In google cloud platform i want to write one application that will take http request , hit apis in chain and then show a template based on the response received from the api and populate them with data received from apis . There are many templates .
What is the best way to design on GCP considering the below.
1. The application will received huge traffic.
2. Some apis will return dynamic urls that template needs.
I was thinking of wrinting in java and putting that on Kubernetes , that will manage the traffic . But what should be the choice of database to be used ?
The data is mostly key value pairs and should be highly available , in case it is down some backup should be there
Yes, Kubernetes is one option, something else that you may want to consider to handle huge app traffic is Google App Engine (GAE), since you mentioned Java development you can use the GAE Standard environment which is easy to build, deploy and runs reliably even under heavy load (fully managed).
You may want to consider using Cloud Datastore since based on your description, it is the best fit for the application needs (NoSQL database and automatically handles sharding and replication). You can also use the diagram to choose the best storage option.

Amazon Web Services and non-amazon website

I'm confused about some facets of the Amazon Web Services stuff. Here is what I want to do.
My site lets users enter equations and solve them. Some of the equations will deal with large data sets and math that is too computationally expensive for the browser.
My site will look at each equation and determine if it should be solved in the browser or on a server.
If it needs to be solved on the server, I want to do one of two things. First, either send the data and a function and have AWS run the code on that data. The other option is to have preset code with is given data.
AWS then runs the code and returns a JSON of the solution.
For example, lets say that a user has a numeric matrix of 1,000 by 1,000 and they want to take the inverse or do Gaussian elimination. My code would look at the size of the matrix and decide that it needs to be run on the server. The code would then call my function on AWS to solve this, send it the data, and AWS returns the answer.
As I read, I don't understand exactly how to set up EC2 to call a function from a server or from an ajax call. Does AWS not do what I think it does? Do I need to host my site on AWS to do this?
If it matters, I am running a LAMP stack on Hostmonster.
You can use Amazon EC2 to create a server (eg a web server) that is accessible on the Internet. What you load on the server, and how you use the server, is up to you.
There is no functionality provided by Amazon EC2 that would help you for your specific stated use case. Anything you would run on a "normal" server can be run on Amazon EC2, since it is just a virtual machine running an operating system and whatever software you configure.
From your description, you will need to develop a web app that runs mostly in the browser (eg with JavaScript), but also makes calls to a back-end server. How you do that is totally in your control.

C++ runtime API

I want to create an application that, when executed, has runtime functions that are accessible by other applications.
For example, a C++ application that stores values in files and retrieves this information. While this application is running, any other C++ applications could access it's save and retrieve functionality to save and retrieve data, but it should have no other connection to this system.
Sounds like a simple job for web services, or a remote database, or even an LDAP server.
Store and retrieve are operations common to all of these.
If the goal is to learn some specific technology, then ask a more specific question. Otherwise, don't reinvent any wheels. There are plenty of things out there for store and retrieve.
One of the simplest "store and retrieve" APIs I know of is Berkeley DB or Sleepycat.
We built a giant, clustered, simple key based database for a major telecom company using LDAP on top of Berkeley DB (aka Sleepycat). All open-source software and commodity hardware and it supports mission critical operations for millions of customers.
A more modern rendition of this might use memcached as well.
If you go HTTP based, you can use something simple as libcurl against an Apache web server to implement "RESTful" services with GET and PUT commands.
If you run it locally (same server), and access via localhost (127.0.0.1) then there is very little latency in the TCP stack, and it amounts to little more than memcpys at the kernel level.
simple message passing would do, say, JSON over ØMQ, or i.e. all in all, msgpack-rpc or protobuf-remote or Cap'n Proto RPC

How to 'web enable' a legacy C++ application

I am working on a system that splits users by organization. Each user belongs to an organization. Each organization stores its data in its own database which resides on a database server machine. A db server may manage databases for 1 or more organizations.
The existing (legacy) system assumes there is only one organization, however I want to 'scale' the application by running an 'instance' of it (tied to one organization), and run several instances on the server machine (i.e. run multiple instances of the 'single organization' application - one instance for each organization).
I will provide a RESTful API for each instance that is running on the server, so that a thin client can be used to access the services provided by the instance running on the server machine.
Here is a simple schematic that demonstrates the relationships:
Server 1 -> N database (each
organization has one database)
organization 1 -> N users
My question relates to how to 'direct' RESTful requests from a client, to the appropriate instance that is handling requests from users for that organization.
More specifically, when I receive a RESTful request, it will be from a user (who belongs to an organization), how (or indeed, what is the best way) to 'route' the request to the appropriate application instance running on the server?
From what I can gather, this is essentially a sharding problem. Regardless of how you split the instances at a hardware level (using VMs, multiple servers, all on one powerful server, etc), you need a central registry and brokering layer in your overall architecture that maps given users to the correct destination instance per request.
There are many ways to implement this of course, so just choose one that you know and is fast, and will scale, as all requests will come through it. I would suggest a lightweight stateless web application backed by a simple read only database that does the appropriate client identifier -> instance mapping, which you would load into memory/cache. To add flexibility on hardware and instance location, use (assuming Java) JNDI to store the hardware/port/etc information for each instance, and in your identifier mapping map the client identifier to the appropriate JNDI lookup key.
Letting the public API only specify the user sounds a little fragile to me. I would change the public API so that requests specify organization as well as user, and then have something trivial server-side that maps organizations to instances (eg. organization foo -> instance listening on port 7331).
That is a very tough question indeed; simply because there are many possible answers, and which one is the best can only be determined by you and your environment.
I would write an apache module in C++ to do that. Using this book, I managed to start writing very efficient modules.
To be able to give you more solutions (maybe just setting up a Squid proxy?), you'll need to specify how you will be able to determine to which server you need to redirect the client. If you can do it by IPs, though a GET param, though a POST XML param (like SOAP). Etc.
As the other answer says there are many ways to approach this issue. Lets assume that you DON'T have access to legacy software source code, which means you cannot modify it to listen on different ports for different instances.
Writing Apache module seems VERY extreme to solve this issue (and as someone who actually just finished writing a production apache module, I suggest avoiding it unless you are making serious money).
The approach can be as esoteric as you like. For instance if your legacy software runs on normal Intel architecture and you have the hardware capacity there are VM solutions, where you should be able to create a thin virtual machine, one running a single instance of the software and a multiplexer to tie them all.
If on the other hand you are running something like HPUX well :-) there are other approaches. How about you give a bit more detail?
Ahmed.

Best approach(es) or technolog(y/ies) for this specific problem?

I have a web-based interface for handing invoices, customer records and other transaction records which interacts currently with a database of all the aforementioned stored upon the same machine. As you can imagine, this is quite a simple set-up consisting of a web-app (PHP) and a database (MySQL). However, the ideal scenario is to keep the records on the machine they are currently on (easy) and move the web-app to another server within the same network (again, easy) ... but in addition, provide facilities on a public-facing website for managing accounts by customers and so forth. The problem is this - the public-facing web server is located in a completely separate location as it is a dedicated server provided by a well-known ISP.
What would be the best way to enable the records to be accessible from this other server whilst ensuring that all communications are secure. Speed is not a huge factor, although any outages on either side should be handled gracefully. Initially my thoughts went towards web services (XML-RPC/SOAP/Hessian), but these options seem to present difficulties (security being the main one, overcomplexity as well).
The web-app must remain PHP-based. The public-facing site is likely to be PHP-based as well, although Python (likely using Django) is another option. The introduction of any other technologies (Java etc) is not a problem, although it is preferred if they be Linux-friendly (so .NET would not be the best fit here).
Apologies if this question is somewhat verbose and vague. I am testing the water somewhat in regards to this kind of problem. Any advice or suggestions gratefully received.
I've done something similar. You can expose a web service to the internet that will do the database access, but requests to the service must match a strong hashed and salted password (which will be secured on the ISP's server in the DMZ.)
Either this or some sort of public/private key encryption scheme.
OK, this might seem a bit silly, but what if you just used mysql replication?
Instead of using all sorts of fancy web services, just have a master sql server on one machine, then have it replicate to another server that holds the slave sql server as well as the web app