Sharing data between isolated microservices - web-services

I'd like to use the microservices architectural pattern for a new system, but I'm having trouble figuring out how to share and merge data between the services when the services are isolated from each other. In particular, I'm thinking of returning consolidated data to populate a web app UI over HTTP.
For context, I'm intending to deploy each service to its own isolated environment (Heroku) where I won't be able to communicate internally between services (e.g. via //localhost:PORT. I plan to use RabbitMQ for inter-service communication, and Postgres for the database.
The decoupling of services makes sense for CREATE operations:
Authenticated user with UserId submits 'Join group' webform on the frontend
A new GroupJoinRequest including the UserId is added to the RabbitMQ queue
The Groups service picks up the event and processes it, referencing the user's UserId
However, READ operations are much harder if I want to merge data across tables/schemas. Let's say I want to get details for all the users in a certain group. In a monolithic design, I'd just do a SQL JOIN across the Users and the Groups tables, but that loses the isolation benefits of microservices.
My options seem to be as follows:
Database per service, public API per service
To view all the Users in a Group, a site visitor gets a list of UserIDs associated with a group from the Groups service, then queries the Users service separately to get their names.
Pros:
very clear separation of concerns
each service is entirely responsible for its own data
Cons:
requires multiple HTTP requests
a lot of postprocessing has to be done client-side
multiple SQL queries can't be optimized
Database-per-service, services share data over HTTP, single public API
A public API server handles request endpoints. Application logic in the API server makes requests to each service over a HTTP channel that is only accessible to other services in the system.
Pros:
good separation of concerns
each service is responsible for an API contract but can do whatever it wants with schema and data store, so long as API responses don't change
Cons:
non-performant
HTTP seems a weird transport mechanism to be using for internal comms
ends up exposing multiple services to the public internet (even if they're notionally locked down), so security threats grow from greater attack surface
Database-per-service, services share data through message broker
Given I've already got RabbitMQ running, I could just use it to queue requests for data and then to send the data itself. So for example:
client requests all Users in a Group
the public API service sends a GetUsersInGroup event with a RequestID
the Groups service picks this up, and adds the UserIDs to the queue
The `Users service picks this up, and adds the User data onto the queue
the API service listens for events with the RequestID, waits for the responses, merges the data into the correct format, and sends back to the client
Pros:
Using existing infrastructure
good decoupling
inter-service requests remain internal (no public APIs)
Cons:
Multiple SQL queries
Lots of data processing at the application layer
harder to reason about
Seems strange to pass large quantities around data via event system
Latency?
Services share a database, separated by schema, other services read from VIEWs
Services are isolated into database schemas. Schemas can only be written to by their respective services. Services expose a SQL VIEW layer on their schemas that can be queried by other services.
The VIEW functions as an API contract; even if the underlying schema or service application logic changes, the VIEW exposes the same data, so that
Pros:
Presumably much more performant (single SQL query can get all relevant data)
Foreign key management much easier
Less infrastructure to maintain
Easier to run reports that span multiple services
Cons:
tighter coupling between services
breaks the idea of fundamentally atomic services that don't know about each other
adds a monolithic component (database) that may be hard to scale (in contrast to atomic services which can scale databases independently as required)
Locks all services into using the same system of record (Postgres might not be the best database for all services)
I'm leaning towards the last option, but would appreciate any thoughts on other approaches.

To evaluate the pros and cons I think you should focus on what microservices architecture is aiming to achieve. In my opinion Microservices is architectural style aiming to build loosely couple applications. It is not designed to build high performance application so scarification of performance and data redundancy are something we are ready accept when we decided to build applications in a microservices way.
I don't think you services should share database. Tighter coupling scarify the main objective of the microservices architecture. My suggestion is to create a consolidated data service which pick up the data changes events from all the other services and update the database behind it. You might want to design the database behind the consolidated data service in a way that is optimised for query (like a data warehouse) because that's all this service will be used for. You might want to consider using a NoSQL database to support your consolidated data service.

Related

Distributed transaction using web services on different platforms

is it possible to achieve transactionality using web services that are implemented in different technologies?
For example: let's imagine a case where we want to offer an integrated service of 2 different organizations, each of which already have their different systems implemented using different technologies and located in different countries.
Organization A has a Java server exposing Rest services that allow consumers to insert data in a table, then Organization B has a .Net server exposing Rest services that also allow consumers to insert data in a table.
Then I want to create a new server to integrate both these services in one, allowing consumers to insert in both organization's databases. So from this new server I have to invoke those 2 rest services in a transactional way (meaning that both organizations will insert or none will insert if there is a failure, it will rollback).
Is that possible to achieve even tho server 1 and server 2 are implemented with different technologies? What if there were n servers all implemented in different technologies and all exposing Rest services?
It is not possible to have real transactions in case of microservice architecture. You have to implement compensations which roll back changes in case of failures. See Saga pattern for one of the approaches.
Look at the Cadence Workflow that makes implementing Sagas trivial. Here is an example.

Implementing a simple Restful service to store and retrieve data using AWS API Gateway/Lambda

I'm new to AWS, so apologies in advance if this question is missing some important considerations, or has incorrect assumptions.
But basically I want to implement a service on AWS to store and retrieve data from multiple clients, which may be Android apps, Windows applications, websites etc. The way I've considered doing this is via a RESTful service using API Gateway front end, with a Lambda back end and maybe an S3 bucket to hold the data.
The basic requirements are:
(1) Clients can publish data to the server, where it is stored, perhaps with some kind of key/value structure.
(2) Clients can retrieve said data by key.
(3) If it is possible, clients to be able to subscribe to events from the service, so that they are notified if the value of a piece of data changes. This would avoid the need to poll the service, which would presumably start racking up unnecessary charges if the data doesn't change often.
Any pointers on how to get started with this welcome!
Creating a RESTful API on top of Lambda and API Gateway is one of the main use cases for this architecture. You can think of Lambda functions as controllers with methods and API Gateway as a router that forwards requests to functions based on the URL pattern. There are many frameworks and approaches that can help out here if you don't want to write from scratch:
Lambdasync
https://medium.com/#fredrikanderzon/create-a-rest-api-on-aws-lambda-using-lambdasync-e46c68f8043f
Serverless
https://serverless.com/framework/docs/providers/aws/events/apigateway/
Swagger
https://cloudonaut.io/create-a-serverless-restful-api-with-api-gateway-swagger-lambda-and-dynamodb/
As far as event subscriptions go (requirement #3) you can model this in many datastores, certainly in a relational/SQL database, with a table like this:
Subscription (key_of_interest, user_id, events_of_interest)
I'm leaving out data types for you to figure out, but you get the idea hopefully. After each data modification on a particular key, see if that key is of interest in the subscription table, then wire up a response to the user's who indicated interest. The details of this of course depend on your particular requirements. A caution though: this approach will increase the cost of data modifications because of the additional overhead needed to process subscriptions.
EDIT: One other thing I forgot. S3 is better suited for non-structured data (think 'files'). For relational databases, checkout RDS. For a simple NoSQL database you might use DynamoDB, or host your own NoSQL database of choice on an EC2 instance.

Is DAO microservice good approach in microservices architecture?

I'm creating a web-application and decided to use micro-services approach. Would you please tell me what is the best approach or at least common to organize access to the database from all web-services (login, comments and etc. web-services). Is it well to create DAO web-service and use only it to to read/write values in the database of the application. Or each web-service should have its own dao layer.
Each microservice should be a full-fledged application with all necessary layers (which doesn't mean there cannot be shared code between microservices, but they have to run in separate processes).
Besides, it is often recommended that each microservice have its own database. See http://microservices.io/patterns/data/database-per-service.html https://www.nginx.com/blog/microservices-at-netflix-architectural-best-practices/ Therefore, I don't really see the point of a web service that would only act as a data access facade.
Microservices are great, but it is not good to start with too many microservices right away. If you have doubt about how to define the boundaries between microservices in your application, start by a monolith (all the time keeping the code clean and a good object-oriented with well designed layers and interfaces). When you get to a more mature state of the application, you will more easily see the right places to split to independently deployable services.
The key is to keep together things that should really be coupled. When we try to decouple everything from everything, we end up creating too many layers of interfaces, and this slows us down.
I think it's not a good approach.
DB operation is critical in any process, so it must be in the DAO layer inside de microservice. Why you don't what to implement inside.
Using a service, you loose control, and if you have to change the process logic you have to change DAO service (Affecting to all the services).
In my opinion it is not good idea.
I think that using Services to expose data from a database is ideal due to the flexibility it provides. Development of a REST service to expose some or all of your data as a service provides flexibility to consume the data directly to the UI via AJAX or by other services which can process the data and generate new information. These consumers do not need to implement a DAO and can be in any language. While a REST Service of your entire database is probably not a Micro-Service, a case could be made for breaking this down as Read only for Students, Professors and Classes for exposing on the School Web site(s), with different services for Create, Update and Delete (CUD) available only to the Registrars office desktop applications.
For example building a Service to exposes a statistical value on data will protect the data from examination by a user/program who only needs a statistical value without the requirement of having the service implement an entire DAO for the components of that statistic. Full function databases like SQL Server or Oracle provide a lot of functionality that application developers can use, including complex queries(using indexes), statistics the application of set operations on data.
Having a database service is a completely valid pattern. In fact, this is one of the key examples of where to start to export aspects of a monolith to a micro service in the Building Microservices book.
How to organize your code around such idea is a different issue. Yes, from the db client programmer's stand point, having the same DAO layer on each DB client makes a lot of sense.
The DAO pattern may be suitable to bind your DB to one programming language that you use. But then you need to ask yourself why you are exposing your database as a web service if all access to it will be mediated by the same DAO infrastructure. Or are you going to create one DAO pattern for each client programming language binding?
If all database clients are going to be written on the same programming language, then are you sure you really need to wrap your DB as a microservice? After all, the DB is usually already a remote service with a well-defined network protocol optimized to transfer data fast and reliably. Why adding HTTP on top of it? What are you expecting to gain from adding such complexity?
Another problem with using the DAO pattern is that the DAO structure does not necessarily follow the evolution of the web service. The web service may evolve in a way that does not make old clients incompatible. You may have different clients using different features of the micro service. In this case you are not sharing the same DAO layer structure on each client.
Make sure you are not using RPC-style programming over web services, which does not make much sense. You will be basically throwing away one of the key advantages of micro services, which is the decoupling between service and client.

Creating API for application interaction

We have an internal application. As time went on and new applications were requested, that exchange data between eachother, the interaction became bound to the database schema. Meaning changes in the database require changes everywhere else. As we plan to build even more applications that will depend on the same data this quickly will become and unmanagable mess.
Now i'm looking to abstract that interaction behind an API. Currently i have trouble choosing the right tool.
Interaction at times could be complex, meaning data is posted to one service and if the action has been completed it should notify the sender of that.
Another example would be that some data does not have context without the data from other services. Lets say there is one service for [Schools] and one for [Students]. So if the [School] gets deleted or changed the [Student] needs to be informed about it immeadetly and not when he comes to [School].
Advice? Suggestions? SOAP/REST/?
I don't think you need an API. In my opinion you need an architecture which decouples your database from the domain logic and other parts of the application. Such an architecture is for example clean architecture, onion architecture and hexagonal architecture (ports&adapters by new name). They share the same concepts, you have a domain logic, which does not depend from any framework, external lib, delivery method, data storage solutions, etc... This domain logic communicates with the outside world through adapters having well defined interfaces. If you first design the inside of your domain logic, and the interfaces of the adapters, and just after the outside components, then it is called domain driven design (DDD).
So for example if you want to move from MySQL to MongoDB you already have a DataStorageInterface, and the only thing you need is writing a MongoDBAdapter which implements this interface, and ofc migrate the data...
To design the adapters you can use two additional concepts; command and query segregation (CQRS) and event sourcing (ES). CQRS is for connecting delivery methods like REST, SOAP, webapplications, etc... to the domain logic. For example you can raise a CreateUserCommand from your REST API. After that the proper listener in the domain logic processes that command, and by success it raises a domain event, like UserCreatedEvent. Your REST API can listen to that event and respond with a success message to the REST client. The UserCreatedEvent can be listened by one or more storage adapter too. So they can process that event and persist the new user. You don't necessary use only a single database. For example if a relational database is faster by a specific type of query, then you can use that, but if a noSQL database suites better to the job, then you can use that too. So you can use as many databases as you want for your queries, the only thing you need is writing a storage adapter for them. For example if your REST client wants to retrieve the profile of a specific user, then it can raise a GetUserProfileByIdQuery and the domain logic can ask the adapter of a database which can serve the query. After that the adapter can send for example an SQL query to a MySQL database and return the response. By ES you add EventStorage to your system, which stores the raised domain events. It can be very useful if you want to migrate your data from one query database to another. In that case you create a new storage adapter to your new database, and replay all of the domain events from the EventStorage in historical order to that adapter, so it can fill the new database with the relevant data. That's all, you don't have to write complicated migration scripts...
In your case I think your should create at least domain events, and use event sourcing. That will totally decouple your database from the other parts of your application. Adding a REST or SOAP API can have a similar effect, but building HTTP connections to access your database can slow down your application.

What is the difference between a web service and application layer of code in an application server

Hello I am a newbie to n-tier architecture and was trying to find out the difference between what an application server hosting application layer of code does, and what a web service does?
So I'll tell you people my understanding of the whole n-tier concept, we have the UI -> Web Server -> business logic/application logic on an Application server -> Database Server. (Of course load balancers and multiple server instances would also be existent to fasten and store the state of processes)
But to be specific, the business logic layer would not be tied to a UI, so it is more or less independent and can be reused.
A web service on an other hand too provides functionality similar to the business logic, where it is not tied to a UI, and can be reused for different cases.
Could anybody explain if what I just explained above is right? And as I did mention earlier, I am a newbie to this, so if this sounds stupid or naive please do not bash me :)
Here's a quick, dirty, and very general explanation of a 4-tier architecture, which I'm assuming may best apply to your application:
Presentation Layer : The interface to the outside world (web site)
Application Layer : The mechanics necessary to create the interface(s) to the outside world (web application frameworks, web services)
Business Logic Layer : The actual logic that embodies/simulates/emulates your business's processes and workflows (algorithms, transformations, approval processes, etc.)
Database Layer : The database and the logic needed to query information from it
In general, web services are not part of the business logic layer. That layer is usually protected as much as the database layer, because there could be trade secrets or confidential ways of doing things in there, and you usually don't want anyone accessing that directly, except programatically or through approved interfaces (such as web services).
Web services, application layers, and business logic can be aptly compared to Coca-Cola and it's business. Bottles and cans are usually how Joe Blow consumes Coke's product (e.g. web site in the presentation layer), but other businesses want to be able to serve Coca-Cola to their customers as well, so Coke lets them use carbonated water and Coca-Cola syrup (e.g. web services in the application layer). Coke's secret formula (e.g. business logic layer), and Coke's distribution processes to get it into the store (e.g. application layer) are all hidden from the consumer. Joe Blow doesn't care how it gets into the store, he just knows he can get Coke from a variety of sources (web site, mobile client, etc.). And Coke doesn't want people knowing its secret formula (business logic). If you want a Coke, you have to go through a store or a restaurant (approved interfaces).