Design a System to monitor web services

Design a System to monitor web services - c++

I want to know how to design a system that would monitor my web services status like CPU usage, whether the service is up or not. I searched it on internet but it shows me different tools. I want to design my own system. A very basic guidance will help me a lot.

It depends how deep you want to go. In principle you need to store timeseries data, for example rrdtool can be used, then you need to collect a data based on some time interval and last but not least you should be able to present data, graphs for example.
Does it make sense to reinvent the wheel it is up to you, but for such problem there are open source systems based on rrdtool, like Cacti or Nagios.
Also influxdb with Graphana is powerfull in this respect, just to name a few.

Related

Correct Architecture for Micro Services with Multiple Customer Interfaces

I am new to micro services and I am keen to use this architecture. I am interested to know what architecture structure should be used for systems with multiple customer interfaces where customer systems may use one or many of the available services. Here is a simple illustration of a couple of ways I think it would be used:
An example of this type of system could be:
Company with multiple staff using system for quotes of products
using products, quotes and users mirco services
Company with website to display products
using products micro service
Company with multiple staff using system for own quotes
using quotes and users micro services
Each of these companies would have their own custom build interface only displaying relevant services.
As in the illustrations all quotes, products and users could be stored local to the mirco services, using unique references to identify records for each company. I dont know if this is advisable as it could make data difficult to manage and could grow fast making it difficult to manage.
Alternatively I could store such as users and quotes local to the client system and reference the micro services for data thats generic. Here mirco services could be used just to handle common logic and return results. This does feel someone illogical and problematic to me.
I've not been able to find anything online to explain the best course of action for this scenario and would be grateful for any experienced feedback on this.

I am afraid you will not find many useful recipes or patterns for microservice architectures yet. I think that the relative quiet on your question is that it doesn’t have enough detail for anybody to readily grasp. I will make a wag:
From first principles, you have the concept of a quote which would have to interrogate the product to get a price and other details. It might need to access users to produce commission information, and customers for things like discounts and lead times. Similar concepts may be used in different applications; for example inventory, catalog, ordering [ slightly different from quote ].
The idea in microservices is to reduce the overlap between these concepts by dispatching the common operations as their own (micro) services, and constructing the aggregate services in terms of them. Just because something exists as a service does not mean it has to be publicly available. It can be private to just these services.
When you have strained your system into these single function services, the resulting system will communicate more, but will be able to be deployed more flexibly. For example, more resources &| redundancy might be applied to the product service if it is overtaxed by requests from many services. In the end, infrastructure like service mesh help to isolate the implementation of these micro services from the sorts of deployment considerations.
Don’t be misled into thinking there is a free lunch. Micro service architectures require more upfront work in defining the service boundaries. A failure in this critical area can yield much worse problems than a poorly scaled monolithic app. Even when you have defined your services well, you might find they rely upon external services that are not as well considered. The only solace there is that it is much easier to insulate your self from these if you have already insulated the rest of your system from its parts.

After much research following various courses online, video tutorials and some documentation provided by Netflix, I have come to understand the first structure in the diagram in the best solution.
Each service should be able to effectively function independently, with the exception of referencing other services for additional information. Each service should in effect be able to be picked up and put into another system without any need to be aware of anything beyond the API layer of the architecture.
I hope this is of some use to someone trying to get to grips with this architecture.

Microservice architecture for ETL

I am redesigning a small monolith ETL software written in Python. I find a microservice architecture suitable as it will give us the flexibility to use different technologies if needed (Python is not the nicest language for enterprise software in my opinion). So if we had three microservices (call them Extract, Transform, Load), we could use Java for Transform microservice in the future.
The problem is, it is not feasible here to pass the result of a service call in an API response (say HTTP). The output from Extract is going to be gigabytes of data.
One idea is to call Extract and have it store the results in a database (which is really what that module is doing in the monolith, so easy to implement). In this case, the service will return only a yes/no response (was the process successful or not).
I was wondering if there were a better way to approach this. What would be a better architecture? Is what I'm proposing reasonable?

If your ETL process works on individual records (some parallelize-able units of computation), then there are a lot of options you could go with, here are a few:
Messaging System-based
You could base your processing around a messaging system, like Apache Kafka. It requires a careful setup and configuration (depending on durability, availability and scalability requirements of your specific use-cases), but may give you a better fit than a relational db.
In this case, the ETL steps would work completely independently, and just consume some topics, produce into some other topics. Those other topics are then picked up by the next step, etc. There would be no direct communication (calls) between the E/T/L steps.
It's a clean and easy to understand solution, with independent components.
Off-the-shelf processing solutions
There are a couple of OTS solutions for data processing/computation and transformation: Apache Flink, Apache Storm, Apache Spark.
Although these solutions would obviously confine you to one particular technology, they may be better than building a similar system from scratch.
Non-persistent
If the actual data is streaming/record-based, and it is not required to persist the results between steps, you could just get away with long-polling the HTTP output of the previous step.
You say it is just too much data, but that data doesn't have to go to the database (if it's not required), and could just go to the next step instead. If the data is produced continuously (not everything in one batch), on the same local network, I don't think this would be a problem.
This would be technically very easy to do, very simple to validate and monitor.

I would suggest you to have a look into the Apache flink, It is very similar to what big sized enterprise apps like informatica, talend and data stage mappings but it process in a smaller scale but repetitively. It actually helps you to compute and transform the stuff on the fly/as they arrive and then store/load into a file/db.
The current infra we have with flink process close 28.5GB per every 4 hours and it just works. In the initial days, we had to run our daily batch and the flink stream to ensure both of them are producing consistent results and eventually most of the streams were left active and the daily batches were retired gradually.
Hope it helps someone.

There's none preventing you to have an SFTP server containing CSV or database storing the results. You can do whatever make senses. Using messaging to pass gigabytes of data, or streaming through HTTP may or may not make senses for your case.

This is an interesting problem. The best solution for this could be Reactive Spring Boot. You can have your Extract service to be as a Reactive Spring Boot app and instead of sending GBs of data, stream the data to the required service.
Now you might be wondering that while streaming, it might hold on the working thread. The answer is NO. IT works at the OS level. It doesn't hold up any request thread to stream the results. That's the beauty of the Reactive Spring Boot.
Go through this and explore
https://spring.io/blog/2016/07/28/reactive-programming-with-spring-5-0-m1

How to do large-scale, batch reverse geo-coding?

I have a very large list of lat/lon coordinate pairs (>50 million). I want to attach address information to each one. Most geo/revgeo services have strict call limits. Assuming computing power isn't the issue, how can I accomplish this? Also note that time/speed are not the primary concern.
One place to start might be the

You can get one of the dedicated AWS geocoders for unlimited volumme processing: https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=geocoder

Intro
I have experience working with SmartyStreets's batch processing tool. They don't have call limits (paid version). But, they also don't have a Reverse Geocode API (yet!). Their batch processing is strictly for flexibility and ease-of-use in addition to normal calls. But, I am aware of a couple services that do Reverse Geocoding, and they mention batch processing on their website.
How they work
Batch processing services generally allow you to upload your data, even arbitrarily large files. You probably want to put your data in a CSV file (type of spreadsheet) as latitude and longitude pairs. Then, their servers will process the data and alert you when you can download. It's common practice to charge money for this download, but maybe TAMU's is free?
Suggestions on who to use
Texas A&M Geoservices
MapLarge
Both of these services have demos and developer portals to guide you along if there is something you want to research before using them.
(Full disclosure: I have worked for SmartyStreets.)

Is there a software platform for generic stats tracking?

Tracking and graphing statistics that happen in an application is a very common thing. Are there any open source solutions out there that let you define arbitrary statistics, log events against those statistics, and then it handles the storage and display of those stats?
If it had pretty graphs, that's a bonus.
Obviously, it's easy to custom-develop a solution. But this seems like one of those infrastructure things that doesn't have to be re-invented every time.

Since asking this a few years back, I've found the ELK stack.
Elastic Search - https://www.elastic.co/products/elasticsearch
LogStash - https://www.elastic.co/products/logstash
Kibana - https://www.elastic.co/products/kibana
It's perfect for this sort of thing, and can also consolidate ALL of your logging needs (http logs, etc)

What type of database is best for high number of users and high concurrency?

We are building a web-based application that needs to support large number of users in a very high concurrency environment. Users will be attempting to change the same record at the same time. In terms of data volume in the database, we expect it to be very low (we're not trying to build the next Facebook), instead we need to provide each user very quick turnaround time for each request, so from the database perspective we need a solution that scales very easily as we add more users and records.
We are currently looking at relational and object-based databases, and also distributed database systems such as Cassandra and Hypertable. We prefer the open source solutions over commercial.
We're just looking for some direction, we don't need details on how to build the solution. Any suggestions would be greatly appreciated.

Amazon's SimpleDB supports conditional puts and consistent reads, but at that point, you're defeating the purpose and might as well just use mysql/percona and scale out vertically.
do you really need ACID? something's gotta give. and eventual consistency isn't all that bad, right? :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js