QuestDB: How to configure memory usage? - questdb

I'm going to use QuestDB as an embedded database.
But what I found is pretty high memory usage after startup(No queries were performed).
If there is some sort of startup cache how it can be tuned?
Any config options?

Related

Recommended minimum system requirements to try out the QuestDB binaries?

I'm trying out QuestDB using the binaries, running them in an Ubuntu container under Proxmox. The docs for the binaries don't say what resources you need, so I guesstimated. Looking at the performance metrics for the container when running some of the CRUD examples with 10,000,000 rows, I still managed to over-provision — by a lot.
Provisioned the container with 4 CPU cores, 4GB RAM & swap, and 8GB SSD. It would probably be fine with a fraction of that: CPU usage during queries is <1%, RAM usage <1.25GB, and storage is <25%.
There is some good info in the capacity planning section of the QuestDB docs (e.g. 8 GB RAM for light workloads), but my question is really about the low end of the scale — what’s the least you can get away with and still be performant when getting started with the examples from the docs?
(I don't mind creating a pull request with this and some other docs additions. Most likely, 2 cores, 2 GB of RAM and 4 GB of storage would be plenty and still give you a nice 'wow, this is quick' factor, with the proviso that this is for evaluation purposes only.)
In QuestDB ingestion and querying are separated by design, meaning if you are planning to ingest medium/high throughput data while running queries, you want to have a dedicated core for ingestion and then another for the shared pool.
The shared pool is used for queries, but also for internal tasks QuestDB needs to run. If you are just running a demo, you probably can do well with just one core for the shared pool, but for production scenarios it is likely you would want to increase that depending on your access patterns.
Regarding disk capacity and memory, it all depends on the size of the data set. QuestDB queries will be faster if the working dataset fits in memory. 2GB of RAM and 4GB of disk storage as you suggested should be more than enough for the examples, but for most production scenarios you would probably want to increase both.

What caching backends can be implemented in django

Which of the caching strategies can be implemented in django?
What are the pros and cons of each cache backends (in terms of ease of use and ease of developing)?
Which backend should be preferred for production etc.
There are several backends that Django supports.
Listing a few here with comments around the positives and negatives of each.
Memcached: The gold standard for caching. An in-memory service that can return keys at a very fast rate. Not a good choice if your keys are very large in size
Redis: A good alternative to Memcached when you want to cache very large keys (for example, large chunks of rendered JSON for an API).
Dynamodb: Another good alternative to Memcached when you want to cache very large keys. Also scales very well with little IT overhead.
Localmem: Only use for local testing; don’t go into production with this cache type
Database: It’s rare that you’ll find a use case where the database caching makes sense. It may be useful for local testing, but otherwise, avoid.
File system: Can be a trap. Although reading and writing files can be faster than making sql queries, it has some pitfalls. Each cache is local to the application server (not shared), and if you have a lot of cache keys, you can theoretically hit the file system limit for number of files allowed.
Dummy: A great backend to use for local testing when you want your data changes to be made immediately without caching. Be warned: permanently using dummy caching locally can hide bugs from you until they hit an environment where caching is enabled.

mmap performance of Amazon ESB

I am looking at porting an application to the cloud, more speficially I am looking at Amazon EC2 or Google GCE.
My app heavily uses Linux's mmap to memory map large read-only files and I I would like to understand how mmap would actually work when a file is on the ESB volume.
I would specifically like to know what happens when I call mmap as EBS appears to be a black-box. Also, are the benefits negated?
I can speak for GCE Persistent Disks. It behaves pretty much in the same way a physical disk would. At a high level, pages are faulted in from disk as mapped memory is accessed. Depending on your access pattern these pages might be loaded one by one, or in a larger quantity when readahead kicks in. As the file system cache fills up, old pages are discarded to give space to new pages, writing out dirty pages if needed.
One thing to keep in mind with Persistent Disk is that performance is proportional to disk size. So you'd need to estimate your throughput and IOPS requirements to ensure you get a disk with enough performance for your application. You can find more details here: Persistent disk performance.
Is there any aspect of mmap that you're worried about? I would recommend to write a small app that simulates your workload and test it before deciding to migrate your application.
~ Fabricio.

How does caching affect memory consumption?

I have an app which has a search feature. This feature looks up the search term in a giant object (dictionary) that I cache for 24 hours. The object is about 50,000 keys and weighs roughly 10MB.
When I profile the memory usage on my hosting, I notice that after a few queries, the memory usage goes from around 50MB to over 450MB, prompting my hosting provider to kill the app.
So I'm wondering what is going on here. Specifically, how does the cache utilize the memory on each request and what can I do to fix this?
Django FileBasedCache is known for having performance issues. You can get a big picture on the following links:
A smarter filebasedcache for Django
Bug: File based cache not very efficient with large amounts of cached files
Bug was set as wont fix arguing:
I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching
strategy. The default cache size and the cull strategy implemented by
the file cache should make that obvious.
Consider using a KVS like Memcache or Redis as a caching strategy because they both support expiry. Also, consider a dedicated search like ElasticSearch if more anticipated features will be search-related.
Tools are howtos are available:
Installing memcached for a django project
http://code.google.com/p/memcached/wiki/NewStart
http://redis.io/commands/expire
https://github.com/bartTC/django-memcache-status
http://www.elasticsearch.org/guide/reference/index-modules/cache.html

Tuning postgresql (for fast reads with django)

we have a django & postgresql setup running on ec2. Our application is always writing to the DB in the background - but this is not initiated from user action.
The problem is that when a user does use the system we need to do a great big read, sometimes with full text search, of around 20k items. Any tips on tuning for this scenario??
20k items is not that big a read. :)
On EC2, the main things to do are:
Get as much memory as you can rationally afford; EBS performance is terrible, and you want as much cache as you can manage.
Make sure your shared_buffers setting is correct; 25% of available RAM is a good starting point.
Look at the big read with EXPLAIN ANALYZE to look for opportunities to create indexes (but don't just create indexes without a practical reason; they're expensive if they are not being used for anything).
If changing EBS configuration is an option, consider moving to an 8-stripe soft-RAID configuration.