Reducing memory usage by ISTIO side car

Reducing memory usage by ISTIO side car - istio

We are evaluating ISTIO for one of the projects. In the project, we have around 200 containers and around 200 services. Each container might be brought up with replica set to 2. So, there could be 400 containers in total and hence 400 ISTIO side cars.
With default settings, each ISTIO side car is using up 300Mbytes. With 400 side cars, this will result to 120Gbytes (300Mbutes * 400) of memory for side cars. That seems to be even higher than the memory requirements needed for the application.
There are two observations
Memory usage seems to be going up with number of cores. On the system, we are trying, has 88 cores. Current understanding that it takes around 1.5 to 2Mbutes for each core.
Memory usage seems to be going up with increasing number of services. In the project we are working on, we expect to see around 200 services. It seems to be occupying around 3Mbytes for each service.
On (1), we have found (from googling) a solution. Solution is to set the envoy concurrency to smaller number.
On (2), is there any solution? In our project, it is not that every container talks to every other container/service. A given container can talk to set of destination services (typically around 5 or so). Is it possible for side car of a given container to allocate memory for services that the container uses?
I am wondering whether anybody has similar issue with ISTIO and how they got around the memory consumption issues.

You are right that #2 is a problem. There has been some discussion about this on the Istio mailing list: https://groups.google.com/forum/#!topic/istio-users/gRP4roSnHtQ
The bottom line is that the current implementation, where every service is configured to talk with every other service, is O(N^2) which doesn't scale. So far, there has only been some (mostly internal) early discussion about what the various options might be for pruning the sidecar config, but I think there will probably be some work starting in this area relatively soon.
UPDATE Istio 1.1 includes a new Sidecar config resource that can be used for this purpose.

Related

Is concurrency on Serverless (like Google Cloud Run) pointless?

As far as I can tell by default, on Google Cloud and presumably elsewhere, each vCPU = 1 hyperthread. (3rd paragraph in the intro) Which, from my perspective, would suggest that unless one changes this setting to 2 or 4 vCPUs, concurrency in the code running on the docker image achieves nothing. Is there some multi-threaded knowledge im missing that means that concurrency on a single hyperthread accomplishes something? scaling up the vCPU number isnt very attractive as the minimum memory setting is already forced to 2GB for 4 vCPUs
This question is framed based on the Google Cloud tech stack, but is meant to umbrella all providers.
Do Serverless solutions ever really benefit from concurrency?
EDIT:
The accepted answer is a great first look, but I realized my above assumptions ignored context switching idle time. For example:
If we wish to write a backend which talks to a database, a lot of our compute time might be spent idling for the database request results. context switching to the next request in this case would allow us to fill CPU load more efficiently.
Therefore, depending on the use case, even on a single threaded vCPU our Serverless app can benefit from concurrency

I wrote this. From my experience, YES, you can handle several thread in parallel and your performance increase with the number of CPU. however, you need to have a process that support multithread.
In case of Cloud Run, each request can be processed in a thread, parallelization is easy.

how to determine the cpu units and memory limit values for a container/task to be deployed in ECS?

I have few applications to be deployed in ECS with launch type fargate.
How to determine the no.of cpu units and memory required for an application , what are the parameters i need to considers for this?, Can someone please help?

Ultimately this will come down to testing and validating your applications resource usage. You need to consider whether your application is CPU or memory heavy, this will help guide some initial estimations.
I would suggest that you perform some basic load testing against each container to try and determine its bounds. Try to keep this testing realistic whilst accounting for near term growth.
When you have these figures set these as limits within your containers task definition. Keep an eye on these over the first few days of rollout and you should get an idea of how realistic these values are. You can then start right-sizing as you get more of an idea over the applications are performing.
If in doubt give it more than you think it requires during launch. You can always adjust later as you become more confident in the boundaries.
Take a look at AWS general Tips for Right-Sizing which might help to inform decisions. These are fairly general for most AWS services.

Replicated caching solutions compatible with AWS

My use case is as follow:
We have about 500 servers running in an autoscaling EC2 cluster that need to access the same configuration data (layed out in a key/value fashion) several million times per second.
The configuration data isn't very large (1 or 2 GBs) and doesn't change much (a few dozen updates/deletes/inserts per minute during peak time).
Latency is critical for us, so the data needs to be replicated and kept in memory on every single instance running our application.
Eventual consistency is fine. However we need to make sure that every update will be propagated at some point. (knowing that the servers can be shutdown at any time)
The update propagation across the servers should be reliable and easy to setup (we can't have static IPs for our servers, or we don't wanna go the route of "faking" multicast on AWS etc...)
Here are the solutions we've explored in the past:
Using regular java maps and use our custom built system to propagate updates across the cluster. (obviously, it doesn't scale that well)
Using EhCache and its replication feature. But setting it up on EC2 is very painful and somehow unreliable.
Here are the solutions we're thinking of trying out:
Apache Ignite (https://ignite.apache.org/) with a REPLICATED strategy.
Hazelcast's Replicated Map feature. (http://docs.hazelcast.org/docs/latest/manual/html-single/index.html#replicated-map)
Apache Geode on every application node. (http://geode.apache.org/)
I would like to know if each of those solutions would work for our use case. And eventually, what issues I'm likely to face with each of them.
Here is what I found so far:
Hazelcast's Replicated Map is somehow recent and still a bit unreliable (async updates can be lost in case of scaling down)
It seems like Geode became "stable" fairly recently (even though it's supposedly in development since the early 2000s)
Ignite looks like it could be a good fit, but I'm not too sure how their S3 discovery based system will work out if we keep adding / removing node regularly.
Thanks!

Geode should work for your use case. You should be able to use a Geode Replicated region on each node. You can choose to do synchronous OR asynchronous replication. In case of failures, the replicated region gets an initial copy of the data from an existing member in the system, while making sure that no in-flight operations are lost.
In terms of configuration, you will have to start a couple/few member discovery processes (Geode locators) and point each member to these locators. (We recommend that you start one locator/AZ and use 3 AZs to protect against network partitioning).
Geode/GemFire has been stable for a while; powering low latency high scalability requirements for reservation systems at Indian and Chinese railways among other users for a very long time.
Disclosure: I am a committer on Geode.

Ignite provides native AWS integration for discovery over S3 storage: https://apacheignite-mix.readme.io/docs/amazon-aws. It solves main issue - you don't need to change configuration when instances are restarted. In a nutshell, any nodes that successfully joins topology writes its coordinates to a bucket (and removes them when fails or leaves). When you start a new node, it reads this bucket and connects to one the listed addresses.

Hazelcast's Replicated Map will not work for your use-case. Note that it is a map that is replicated across all it's nodes not on the client nodes/servers. Also, as you said, it is not fully reliable yet.
Here is the Hazelcast solution:
Create a Hazelcast cluster with a set of nodes depending upon the size of data.
Create a Distributed map(IMap) and tweak the count & eviction configurations based on size/number of key/value pairs. The data gets partitioned across all the nodes.
Setup Backup count based on how critical the data is and how much time it takes to pull the data from the actual source(DB/Files). Distributed maps have 1 backup by default.
In the client side, setup a NearCache and attach it to the Distributed map. This NearCache will hold the Key/Value pair in the local/client side itself. So the get operations would end up in milliseconds.
Things to consider with NearCache solution:
The first get operation would be slower as it has to go through network to get the data from cluster.
Cache invalidation is not fully reliable as there will be a delay in synchronization with the cluster and may end reading stale data. Again, this is same case across all the cache solutions.
It is client's responsibility to setup timeout and invalidation of Nearcache entries. So that the future pulls would get fresh data from cluster. This depends on how often the data gets refreshed or value is replaced for a key.

Separate server for Memcache/Redis?

I am using Django for my project and I ll be hosting it on Linode or any other hosting service. Plus if I want to use memcache will I require a new Linode for it? Means just one server will be ok or I ll have to host my site on 2 servers, one for memcache and one for django? And is it the same for Redis? Also will I require a separate server for Mysql?

I don't think you understand that nobody is a fortune telling wizard. Nobody knows how many requests you will receive per second, nor how cpu/memory intensive each request will be. Nobody knows how optimized your code is. Nobody knows if your application is read heavy or write heavy. Your use case is your own, and your probably the only one who estimate it.
My only actual advice to you is to try to estimate your server data and sever load and benchmark your setup on one machine. If you are unsatisfied with the performance then scale up. You can either scale up vertically, by increasing the size of your linode, or scale horizontally by adding more linode instances. In the latter case, you will most likely put your DB on a machine of it's own and have multiple django instances fed by a load balancer. These Django instances could each share the same memcache on a machine, or they can each have their own memcaches on their own machine. Which one is better? I can't tell you. It again depends on your use case.
If I were you, I would set it all up on one linode instance. I would create test data that I assume would be close to real world. Then I would try to test my response times with an estimated number of requests per second. I would measure response times, cache hits, and memory usage. I would then decide based on that if my use case is satisfied with this level of performance or not because I'm really the only one who would know what is satisfactory performance. Additionally, adding more linode resources is not necessarily where I would first try and improve performance.
Some great tips on optimizing and benchmarking can be found here:
https://docs.djangoproject.com/en/1.8/topics/performance/
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
http://scottbarnham.com/blog/2008/04/28/django-performance-testing-a-real-world-example/
Late night reading about scaling up Django can be found in many books, I like this one:
https://highperformancedjango.com/
Sorry if I sound a bit blunt, I just want you to understand that nobody can walk in here and give you an answer with a large degree of confidence. This question doesn't have a straight-forward answer.
TL;DR Start with one instance and scale up only if you've convinced yourself you need to.

You say Memcached or Redis, so I assume Redis would be deployed without persistence, with a purely in-memory configuration.
In such case both Memcached and Redis are unlikely to get saturated even if you run them in one server, since the limiting factor is more likely to be a single Django instance if your requests/second go high.
However you should make sure to have enough memory and to configure an appropriate max memory usage for Memcached / Redis (different ways to accomplish this in the two different services). Note that under memory pressure, the Linux OOM killer may kill your cache otherwise, so if you go for a single instance, which seems to me a sensible first step, make sure your Django memory usage plus the memory you allocate for caching, are not enough to go near the limits of the instance free memory.
CPU is hardly going to be an issue as I said since Memcached / Redis are pretty good at using little CPU, so I can't foresee a setup where Django is ok serving pages but the instance is in trouble since the CPU is burned by the cache.

Scaling of ElasticSearch

I'm searching for information on how ElasticSearch would scale with the amount of data in its indexes and am surprised how little I can find on that topic. Maybe some experience from the crowd here can help me.
We are currently using CloudSearch to index ≈ 7 million documents; in CloudSearch this results in 2 instances of type m2.xlarge. We are considering switching to ElasticSearch instead to reduce the cost. But all I find on the scaling of ElasticSearch is that it does scale well, can be distributed over several instances etc.
But what kind of machine (memory, disc) would I need for this kind of data?
How would that change if I increased the amount of data by the factor of 12 (≈ 80 million documents)?

As Javanna said, it depends. Mostly on: (1) rate of indexing; (2) size of documents; (3) rate and latency requirements for searches; and (4) type of searches.
Considering this, the best we can help is giving examples.
On our site (news monitoring) we:
Index more than 100 docs per minute. We have, currently, near 50 million documents. I've also heard of ES indexes with hundreds of millions of documents.
Documents are news articles with some metadata, not short but not that large.
Our search latency varies between ~50ms (for normal and rare terms) up to 800ms for common terms (stopwords, we index them). This variation is largely due to our custom scoring (thanks to Lucene/ES support for customizing it) and to the fact the dataset (inverted lists) do not fit entirely in memory (OS cache). So when it hits a cached inverted list, it's faster.
We do OR queries with a lot of terms which are one of the hardest. Also we do faceting on two single-valued fields. And have some experiments with date facet (to show rate of publication through time).
We do all this with 4 EC2's m1.large instances. And now we're planning moving to ES, just released, 0.9 version to get all the goodies and performance improvements of Lucene 4.0.
Now leaving examples aside. ElasticSearch is pretty scalable. It is very simple to create an index with N shards and M replicas, and then create X machines with ES. It will distribute all shards and replicas accordingly. You can change the number of replicas anytime you want (for each index).
One downside is that you can't change the number of shards after the index creation. But you can still "overshard" it beforehand to leave room for scaling when needed. Or create a new index with the right number of shards and reindex everything (we do this).
Finally, ElasticSearch (and also Solr) uses, under the hood, the Lucene Search library, which is very mature and well known library.

I've actually recently switched from using CloudSearch to a hosted ElasticSearch service at the company I work for. Our specific application has a little over 100 million documents and is growing daily. So far, our experience with ElasticSearch has been absolutely wonderful. Search performance averages at ~250ms, even with all the sorting, filtering, and faceting. Indexing documents is also relatively fast, despite the several MB load we pass through HTTP with the bulk API every couple of hours. Refresh rates seem to be near instant, as well.
For our ~100M doc / 12GB index, we used 4 shards / 2 replicas (will bump to 3 replicas if performance degrades) spread across 4 nodes. Prior to setting up the index, our team spent a couple of days researching ElasticSearch cluster deployment/maintenance, and opted to use http://qbox.io to save money and time. We were paralyzingly afraid of performance and scale issues choosing to host our index on a dedicated cluster like Qbox, but so far the experience has been seriously fantastic.
Since our index lives on a dedicated cluster, we don't have access to nuts-and-bolts node-level configuration settings, so my technical expertise with ES deployment is still pretty limited. That being said, I can't be sure of exactly what performance tweeks are needed for the performance we've experienced on our index. However, I do know Qbox's cluster uses SSD... so that could definitely have a significant impact.
Point in case, ElasticSearch has scaled seamlessly. I highly, highly recommend the switch (even if it's just to save $$, CloudSearch is crazy expensive). Hope this information helps!

CloudSearch recently dropped prices and may now be a cheaper alternative than maintaining your own Search infrastrcuture on EC2 - http://aws.amazon.com/blogs/aws/cloudsearch-price-reduction-plus-features/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js