Web services, architectural design advice for central logging - web-services

We have a certain number of SOAP and REST Web Services, which provide legal information for clients. Management demands to log all the information which is requested by this services. Using logs they want to collect statistics and bill clients.
My colleague offered to use central relational database for logging.
I don’t like this solution, because number of services are growing and I think such architecture will be bottleneck for productivity.
Can you advise me what architectural design will be good for such kind of task ?

When you say the central database will be a bottleneck, do you mean that it will be too slow to keep up with the logging requests? Or are you saying you are expecting database changes for each logging request type?
I would define a more generic payload for logging (figure out your minimum standardized fields), and then create a database for those logs.
<log><loglevel>INFO</loglevel><systemName>ClientValueActualizer</systemName><userIp>123.123.123.432</userIp><logpayload><![CDATA[useful payload for billing]]</logpayload></log>
If you are worried about capacity, you could throw a queue in front of it, which would have the advantage of not bogging down the client if the logs are busy.
You can decouple the consumption of these messages into separate systems. each of which can understand the various payloads. The risk here is if you want to add new attributes, it will be difficult to control what systems are sending what. But that's just a general issue with decoupled services.

you can consider Apache Kafka as distributed commit log. This be good for performance wise as it scales out horizontally and it can deliver messages only when client pulls those messages.

Related

Display real time data on website that scales?

I am starting a project where I want to create a website which will display LIVE flight information and status. We all have seen this at airport. An example is given here - http://www.computronics.biz/productimages/prodairport4.jpg. As you can see this information changes continuously. The website will talk to a backend api and the this backend api will talk to database. Now the important part is that the flight information in the database will be updated by the airline itself. There could be several airlines and they will update their data respectively. I have drawn a diagram and uploaded here - https://imgur.com/a/ssw1S.
Now those airlines will obviously have an interface (website talking to some backend API) through which they will update the database.
Now here is my attempt to solve it. We need to have some sort of trigger such that if any airline updates a flight detail in the database between current time - 1 hour to current + 4 hours (website will only display few hours of flights), we need to call the web api and then send the update to the website in the real time. The user must not refresh the page at all. At the same time the website needs to scale well i.e. if 1 million users are on the website, and there is an update in the database in the correct time range, all 1 million user's website should get updated within a decent amount of time.
I did some research and it looks like we need to have an event based approach. For example - we need to create a function (AWS lambda or Azure function) that should be called whenever there is an update in the database (Dynamo DB for example) within the correct time range. This function then should call an API which should then update the website through web socket technology for example.
I am not looking for any code but just some alternative suggestions on how this can be solved in a scalable way. Also how do we test scalability?
Dont use serverless functions(Lambda/Azure functions)
Although I am a huge fan of serverless functions, and currently running a full web app in Lambda, I don't think its needed for your use case and doesn't make sense economically. As you've answered in the comments, each airline will not write directly to the database, they'll push to an API, meaning you are explicitly told when flights have changed. When an airline has sent you new data you can simply propagate this to all the browser endpoints via websockets. This keeps the design very simple. There is no need to artificially create a database event that then triggers a function that will then tell you a flight has been updated. Thats like removing your doorbell and replacing it with a motion detector that triggers a doorbell :)
Cost
Money always deserves its own section. Lambda is more of an economic break through than a technological one. You have to know when its cost effective. You pay per request so if your dealing with a process that handles 10,000 operations a month, or something that only fires 1,000 times a day, than lambda is dirt cheap and practically free. You also pay for the length of time the function is executing and the memory consumed while executing. Generally, it makes sense to use lambda functions where a dedicated server would be sitting idle for most of the time. So instead of a whole EC2 instance, AWS provides you with a container on demand. There are points at which high requests rates and constantly running processes makes lambda more expensive than EC2. This article discusses how generally its cheaper to use lambda up to a point -> https://www.trek10.com/blog/lambda-cost/ The same applies to Azure functions and googles equivalent. They are all just containers offered on demand.
If you're dealing with flight information I would imagine you will have thousands of flights being updated every minute so your lambda functions will be firing constantly as if you were running an EC2 instance. You will end up paying a lot more than EC2. When you have a service that needs to stay up 24/7 and run 24/7 with high activity that is most certainly a valid use case for a dedicated server or servers.
Proposed Solution
These are the components I would use below:
Message Queue of some sort (RabbitMQ or AWS SQS with SNS perhaps)
Web Socket Backend (The choice will depend on programming language)
Airline input API (REST,GraphQL, or maybe AWS Kinesis Data Firehose)
The airlines publish their data to a back-end api. The updates are stored on a message queue and the web applicaton that actually displays the results to users, via websockets, reads from the queue.
Scalability
For scalability you can run the websocket application on multiple EC2 instances (all reading from the same queuing service) in an autoscaling group, so with extra load more instances will be created automatically hence the name "autoscaling". And those instances can sit behind an elastic load balancer. Lots of AWS documentation on how to do this and its their flagship design pattern. If you use AWS SQS you don't have to manage the scalability details yourself, aws handles that. The only real components to scale are your websocket application and the flight data input endpoint. You can run the flight api in an autoscaling group as well but AWS does offer an additional tool for high traffic data processing. I detail that below.
Testing Scalability
It would be fairly easy to have a mock airline blast your service with thousands and thousands of fake updates and on the other end you can easily run multiple threads of selenium tests simulating browser clicks and validating that the UI is still operational.
Additional tools
If it ends up being large amounts of data, rather than using a conventional REST api for your flight update service you could consider a service AWS offers specifically for dealing with large amounts of real time updates (Kinessis Data Firehose) https://aws.amazon.com/kinesis/data-firehose/ But I've never used it.
First, please don't over think this. This is a trivial problem to solve and doesn't require any special techniques, technologies or trendy patterns & frameworks.
You actually have three functional areas you can address almost separately.
Ingestion - Collection and normalization of the data from the various sources. For this, you'll need a process and transformation engine, LogicApps or such.
Your databases. You'll quickly learn that not all flights are the same ;). While it might seem so, the amount of data isn't that much. Instances of MySQL/SQL Server tuned for a particular function will work just fine. Hint, you don't need to have data for every movement ready to present all the time.
Presentation. The data API and UIs. This, really, is the easy part. I would suggest you use basic polling at first. For reasons you will never have any control over, the SLA for flight data is ~5 minutes so a real-time client notification system is time you should spend elsewhere at first.

Microservice Composition Approaches

I have a question for the microservices community. I'll give an example from the educational field but it applies to every microservices architecture.
Let's say I have student-service and licensing-service with a business requirement that the number of students is limited by a license. So every time a student is created a licensing check has to be made. There are multiple types of licenses so the type of the license would have to be included in the operation.
My question is which approach have you found is better in practice:
Build a composite service that calls the 2 services
Coupling student-service to licensing-service so that when createStudent is called the student-service makes a call to licensing-service and only when that completes will the student be created
Use an event-based architecture
People talk about microservice architectures being more like a graph than a hierarchy and option 1 kinda turns this into a hierarchy where you get increasingly coarse composites. Other downsides is it creates confusion as to what service clients should actually use and there's some duplication going on because the composites API would have to include all of the parameters that are needed to call the downstream services.
It does have a big benefit because it gives you a natural place to do failure handling, choreography and handle consistency.
Option 2 seems like it has disadvantages too:
the API of licensing would have to leak into the student API so that you can specify licensing restrictions.
it puts a lot of burden on the student-service because it has to handle consistency across all of the dependent services
as more services need to react when a student is created I could see the dependency graph quickly getting out of control and the service would have to handle that complexity in addition to the one from its own logic for managing students.
Option 3 While being decoupling heaven, I don't really think would work because this is all triggered from an UI and people aren't really used to "go do something else until this new student shows up" approach.
Thank you
Option 1 and 2 creates tight coupling which should be avoided as much as possible because you would want to have your services to be independent. So the question becomes:
How do we do this with an event-based architecture?
Use events to keep track of licensing information from license service in student service, practically a data duplication. Drawbacks here are: you only have eventual consistency as the data duplication is asynchronous.
Use asynchronous events to trigger event chain which ultimately trigger a student creation. From your question, it looks like you already got the idea, but have an issue dealing with UI. You have two possible options here: wait for the student creation (or failure) event with a small amount of timeout, or (event better), make you system completely reactive (use server-client push mechanism for the UI).
Application licensing and creating students are orthogonal so option 2 doesn't make sense.
Option 1 is more sensible but I would try not to build another service. Instead I would try to "filter" calls to student service through licensing middleware.
This way you could use this middleware for other service calls (e.g. classes service) and changes in API of both licensing and students can be done independently as those things are really independent. It just happens that licensing is using number of students but this could easily change.
I'm not sure how option 3, an event-based approach can help here. It can solve other problems though.
IMHO, I would go with option 2. A couple of things to consider. If you are buying complete into SOA and furthermore microservices, you can't flinch everytime a service needs to contact another service. Get comfortable with that.... remember that's the point. What I really like about option 2 is that a successful student-service response is not sent until the license-service request succeeds. Treat the license-service as any other external service, where you might wrap the license-service in a client object that can be published by the license-service JAR.
the API of licensing would have to leak into the student API so that you can specify licensing restrictions.
Yes the license-service API will be used. You can call it leakage (someone has to use it) or encapsulation so that the client requesting the student-service need not worry about licensing.
it puts a lot of burden on the student-service because it has to handle consistency across all of the dependent services
Some service has to take on this burden. But I would manage it organically. We are talking about 1 service needing another one. If this grows and becomes concretely troublesome then a refactoring can be done. If the number of services that student-service requires grows, I think it can be elegantly refactored and maybe the student-service becomes the composite service and groups of independently used services maybe be consolidated into new services if required. But if the list of dependency services that student-service uses is only used by student-service, then I do not know if its worth grouping them off into their own service. I think instead of burden and leakage you can look at it as encapsulation and ownership.... where student-service is the owner of that burden so it need not leak to other clients/services.
as more services need to react when a student is created I could see the dependency graph quickly getting out of control and the service would have to handle that complexity in addition to the one from its own logic for managing students.
The alternative would be various composite services. Like my response for the previous bullet point, this can be tackled elegantly if it surfaces as a real problem.
If forced each of your options can be turned into viable solution. I am making an opinionated case for option 2.
I recommend option 3. You have to choose between availability and consistency - and availability is most often desired in microservices architecture.
Your 'Student' aggregate should have a 'LicenseStatus' attribute. When a student is created, its license status is set to 'Unverfied', and publishes an event 'StudentCreated'. The LicenseService should then react to this event and attempt to reserve a license for this student. It would then publish a 'Reserved' or 'Rejected' event accordingly. The student service would update the student's status by subscribing to these events.
When the UI calls your API gateway to create a student, the gateway would simply call the Student service for creation and return a 202 Accepted or 200 OK response without having to wait for the student to be properly licensed. The UI can notify the user when the student is licensed through asynchronous communication (e.g. via long-polling or web sockets).
In case the license service is down or slow, only licensing would be affected. The student service would still be available and would continue to handle requests successfully. Once the license service is healthy again, the service bus will push any pending 'StudentCreated' events from the queue (Eventual consistency).
This approach also encourages expansion. A new microservice added in the future can subscribe to these events without having to make any changes to the student or license microservices (Decoupling).
With option 1 or option 2, you do not get any of these benefits and many of your microservices would stop working due to one unhealthy microservice.
I know the question has been asked a while ago, but I think I have something to say that might be of value here.
First of all, your approach will depend on the overall size of your final product. I tend to go with a rule of thumb: if I would have too many dependencies between individual micro-services, I tend to use something that would simplify and possibly remove these dependencies. I don't want to end up with a spider-web of services! A good thing to look at here are Message queues, like RabbitMQ for example.
However, if I have just a few services that talk to each other, I will just make them call each other directly, as any alternative solutions whilst simplifying the architecture, add some computing and infrastructure overhead.
Whatever approach you will decide to go with, design your services in a Hexagonal architecture in mind! This will save you trouble when you decide to migrate from one solution to another. What I tend to do is design my DAOs as "adapters", so a DAO that calls Service A will either call it directly or via message queue, independent of the business logic. When I need to change it, I can just change this DAO for another one, without having to touch any of the business logic (at the end of the day business logic doesn't care how it gets the data). Hexagonal architecture fits really well with micro-service, TDD and black-box testing.

Logical Layer to connect multiple .Net Services

I am not sure if this is the appropriate place for this, but I have come up with a "conceptual" modular design architecture that separates the logic out into individual services to allow an almost plug and play type scenario whereby there are no dependencies between the services. Think a list of features and only enabling the ones that you want.
To facilitate this I realise that I will need some type of middleware that will connect these all together and control the flow of data. However I am not sure of the specifics around what would be appropriate to achieve this.
I plan on implementing the services using .NET soap based services, so is this a case of using something like Tibco?
Any suggestions around what would be most appropriate or even where to start looking would be great.
If the above description didn't make sense hopefully this image is a bit clearer in describing the relationship between the services.
Thanks.
Depending on your needs you could use NServiceBus (http://particular.net/nservicebus). NServiceBus is communication middle ware which can be used with different types of queuing systems like MSMQ, RabbitMQ and others. It is essentially a servicebus which is very developer friendly and focused. It does not only facilitate asynchronous message based distributed communication but also:
Publish / Subscribe that is transport agnostic using automatic registration
Transports: Can be used with MSMQ, RabbitMQ, Azure Storage Queues, etc.
Security: Supports encryption of messages
BLOB's: Has support for storing large message payloads transparently with the data bus to allow for communicatie message larger then the transport allows.
Scalability: Out and upscaling to increase throughput
Reliability: Deduplication, idempotent processing without having distributed transactions.
Orchestration: Sagas can help in controlling message flow and routing.
Exception handling: Exceptions get automatically retried in two different stages.
Monitoring: Tools like Service Pulse, Service Insight and Windows Performance monitors to monitor performance and errors. See what errors occurred and
Serialization: Can use different serializers that support formats like xml, json, binary
Open Source: All source code is available
Auditing: Can move all processed message to an audit queue for archiving or audit requirements
Community: Has a large community of developers that are active on the forums but also supply additional transports, serializers and other features.
I must mention that I work for Particular but also that there are other options to consider. NServiceBus does not use SOAP for message exchange but a lightweight message in a format of choice as mentioned as the serialization bullet. It can integrate with services that require SOAP. It has the ability to expose an service (endpoint) as a WCF service for easy integration and it can use SOAP from within code to call external SOAP services using the features that the .net framework and visual studio provide.
Good luck in choosing the right technology for your project.

How do the big boys deal with api limits?

We have an app that interacts with Facebook a lot, intensive enough to make us worry about the api limits that we know are there. My question is : How is it that some applications have like millions of users while they proactively engage with facebook and never face the api limits ? One such application is "hootsuite".
Do they implement sophisticated load-reduction mechanism? (queues, batches and caches comes to mind)
Does facebook somehow treat them specially? (partnership perhaps?)
Both options are possible.
I would recommend some form of load-reduction mechanism. This could be accomplished with caching data or executing heavy queries ahead of time (possibly in a cron job of sorts).
Facebook provides some good suggestions with regard to application API rate limiting here.
You can also get more information on rate limiting that is being enforced on your application by visiting this dashboard:
https://developers.facebook.com/apps/<app_id>/insights?ref=nav&sk=ae_<app_id>

How does one go about breaking a monolithic application into web-services?

Not having dealt much with creating web-services, either from scratch, or by breaking apart an existing application, where does one start? Should a web-service encapsulate an entity, much like a class does, or should the service have more/less to it?
I realize that much of this is based on a case by case analysis of what the needs are, but are there any general guide-lines or best practices or even small nuggets of information that web-service veterans can impart to a relative newbie?
Our web services are built around functional areas. Sometimes this is just for a single entity, sometimes it's more than that.
For example, if you have a CRM, one of your web services might revolve around managing Contacts. Creating, updating, searching for, etc. If you do some type of batch type processing, a web service might exist to create and submit a job.
As far as best practices, bear in mind that web services add to the processing overhead. Mainly in serializing / deserializing the data as it goes across the wire. Because of this the main upside is solely in scalability. Meaning that you trade an increased per transaction processing time for the ability to run the service through multiple machines.
The main parts to pull out into a web service are those areas which are common across multiple applications, or which you intend to expose publicly, or which would benefit from greater load balancing.
Of course, you need to analyze your application to see where any bottlenecks really are. In some cases it doesn't make sense. For example, if you have a single application that isn't sharing its code and/or the bottleneck is primarily database related.
Web Services are exactly what they sound like Services for the Web.
A web service should be built as an API for the service layer of your app.
A service usually encapsulates an entity larger than a single class.
To learn more about service layers and refactoring to add a service layer read about DDD.
Good Luck
The number 1 question is: To what end are you refactoring your application functionality to be consumned as a bunch of web services?