AppFabric LocalCache - Least Used Eviction

AppFabric LocalCache - Least Used Eviction - appfabric

If I am caching something in local cache and using it on a regular basis does the distributed cache know that the local cache is using it?
The reason I ask this is because the distributed cache has a Least Used Eviction policy. If I am not using it from distributed cache and the distributed cache doesn't know I am using it then it will be evicted at some stage.
This is a large piece of data that rarely changes so I will cache it for a long period of time. I don't want to have to drag 2 or 3 MG across the wire more often than I have to based on it being evicted from the Distributed Cache on a least used basis.
Hence my question - does the distributed cache have knowledge that it is being used and therefore not evict it as least used?

Interesting.
The documentation says that when an object is requested from the cache, if it exists in the local cache then
the reference to the object is returned immediately without contacting
the server
(MSDN)
in which case, the remote cache cannot possibly be notified that the object is being used, and this would seem to support your thought that the object could become a candidate for eviction.
There's some discussion here that suggests the local cache also uses an LRU algorithm, and I'd be interested to know whether the two caches do any communication/synchronisation of their LRU timings.

Related

Multiple rocksdb Instances: Use a Single Shared Cache or Multiple Independent Caches?

We are opening multiple rocksdb instances into a single process and they are all accessed equally. When using BlockBasedTableOptions::block_cache is there any benefit allocating a single large cache over several smaller caches?
With NewLRUCache it appears that the num_shard_bits allows a single large shared cache to reduce the resource contention just like having multiple smaller caches each with no sharding. From the outside they appear equal.
Edit
I think it best for someone to close/delete. There isn't a programing answer to this question. I was attempting to understand conceptually how rocksdb works. This is a question for the rocksdb Google Group not SO.

A Cache object can be shared by multiple RocksDB instances in the same process, allowing users to control the overall cache capacity.
https://github.com/facebook/rocksdb/wiki/Block-Cache

Does Multiple reader single writer implementation in g++-4.4(Not in C++11/14) via boost::shared_mutex impact performance?

Usage: In our production we have around 100 thread which can access the cache we are trying to implement. If cache is missed then information will be fetched from the database and cache will be updated via writer thread.
To achieve this we are planning to implement multiple read and single writer We cannot update the g++ version since we are using g++-4.4
Update: Each worker thread can work for both read and write. If cache is missed then information is cached from the DB.
Problem Statement:
We need to implement the cache to enhance the performance.
For this, cache read are more frequent and write operations to the cache is very much less.
I think we can use boost::shared_mutex boost::shared_lock, boost::upgrade_lock, boost::upgrade_to_unique_lock implementation
But we learnt that boost::shared_mutex has performance issues:
Performance comparison on reader writer locks
Lib boost devel
Questions
Does boost::shared_mutex impact the performance in case read are much frequent?
What are other constructs and design approaches we can take while considering compiler version g++4.4?
Is there a way-around on how to design it, such that reads are lock free?
Also, we are intended to use Map to keep the information for cache.

If writes were non-existent, one possibility can be 2-level cache where you first have a thread-local cache, and then the normal cache with mutex or reader/writer lock.
If writes are extremely rare, you can do the same. But have some lock-free way of invalidating the thread-local cache, e.g. an atomic int updated with every write, and in those cases clear the thread-local cache.

You need to profile it.
In case you're stuck because you don't have a "similar enough" environment where you can actually test things, you can probably write a simple wrapper using pthreads: pthread_rwlock_t
pthread_rwlock_rdlock
pthread_rwlock_wrlock
pthread_rwlock_unlock
Of course you can design things to be lock free. Most obvious solution would be to not share state. (If you do share state, you'll have to check that your target platform supports atomic instructions). However, without any knowledge of your application domain, I feel very safe suggesting you do not want lock-free. See e.g. Do lock-free algorithms really perform better than their lock-full counterparts?

It all depends on the frequency of the updates, the size of the cache and how much is changed in the update.
Let's assume you have a rather big cache with a lot of changes on each update. Then I would use a read-copy-update pattern, which is lock-free.
If your cached data is pretty small and one time read (e.g. a single integer) RCU is also a good choice.
A big cache, with small updates or a big cache with updates which are to frequent for RCU a Read-Write Lock is a good choice.

Alongside other answers suggesting you profile it, a large benefit can be had if you can somehow structure or predict the type, order and size of the requests.
If particular types of data are requested in a typical cycle, it would be better to split up the cache per data type. You will increase cache-hit/miss ratios and the size of each cache can be adapted to the type. You will also reduce possible contention.
Likewise, the size of the requests is important when choosing your update approach. Smaller data fragments may be stored longer or even pooled together, while larger chunks may be requested less frequently.
Even with a basic prediction scheme in place that covers only the most frequent fetch patterns, you may already improve performance quite a bit. It's definitely worth it to try and train e.g. a NN (Neural Network) to guess the next request in advance.

Which Key value, Nosql database can ensure no data loss in case of a power failure?

At present, we are using Redis as an in-memory, fast cache. It is working well. The problem is, once Redis is restarted, we need to re-populate it by fetching data from our persistent store. This overloads our persistent
store beyond its capacity and hence the recovery takes a long time.
We looked at Redis persistence options. The best option (without compromising performance) is to use AOF with 'appendfsync everysec'. But with this option, we can loose last second data. That is not acceptable. Using AOF with 'appednfsync always' has a considerable performance penalty.
So we are evaluating single node Aerospike. Does it guarantee no data loss in case of power failures? i.e. In response to a write operation, once Aerospike sends success to the client, the data should never be lost, even if I pull the power cable of the server machine. As I mentioned above, I believe Redis can give this guarantee with the 'appednfsync always' option. But we are not considering it as it has the considerable performance penalty.
If Aerospike can do it, I would want to understand in detail how persistence works in Aerospike. Please share some resources explaining the same.
We are not looking for a distributed system as strong consistency is a must for us. The data should not be lost in node failures or split brain scenarios.
If not aerospike, can you point me to another tool that can help achieve this?

This is not a database problem, it's a hardware and risk problem.
All databases (that have persistence) work the same way, some write the data directly to the physical disk while others tell the operating system to write it. The only way to ensure that every write is safe is to wait until the disk confirms the data is written.
There is no way around this and, as you've seen, it greatly decreases throughput. This is why databases use a memory buffer and write batches of data from the buffer to disk in short intervals. However, this means that there's a small risk that a machine issue (power, disk failure, etc) happening after the data is written to the buffer but before it's written to the disk will cause data loss.
On a single server, you can buy protection through multiple power supplies, battery backup, and other safeguards, but this gets tricky and expensive very quickly. This is why distributed architectures are so common today for both availability and redundancy. Distributed systems do not mean you lose consistency, rather they can help to ensure it by protecting your data.
The easiest way to solve your problem is to use a database that allows for replication so that every write goes to at least 2 different machines. This way, one machine losing power won't affect the write going to the other machine and your data is still safe.
You will still need to protect against a power outage at a higher level that can affect all the servers (like your entire data center losing power) but you can solve this by distributing across more boundaries. It all depends on what amount of risk is acceptable to you.
Between tweaking the disk-write intervals in your database and using a proper distributed architecture, you can get the consistency and performance requirements you need.

I work for Aerospike. You can choose to have your namespace stored in memory, on disk or in memory with disk persistence. In all of these scenarios we perform favourably in comparison to Redis in real world benchmarks.
Considering storage on disk when a write happens it hits a buffer before being flushed to disk. The ack does not go back to the client until that buffer has been successfully written to. It is plausible that if you yank the power cable before the buffer flushes, in a single node cluster the write might have been acked to the client and subsequently lost.
The answer is to have more than one node in the cluster and a replication-factor >= 2. The write then goes to the buffer on the client and the replica and has to succeed on both before being acked to the client as successful. If the power is pulled from one node, a copy would still exist on the other node and no data would be lost.
So, yes, it is possible to make Aerospike as resilient as it is reasonably possible to be at low cost with minimal latencies. The best thing to do is to download the community edition and see what you think. I suspect you will like it.

I believe aerospike would serves your purpose, you can configure it for hybrid storage at namespace(i.e. DB) level in aerospike.conf
which is present at /etc/aerospike/aerospike.conf
For details please refer official documentation here: http://www.aerospike.com/docs/operations/configure/namespace/storage/

I believe you're going to be at the mercy of the latency of whatever the storage medium is, or the latency of the network fabric in the case of cluster, regardless of what DBMS technology you use, if you must have a guarantee that the data won't be lost. (N.B. Ben Bates' solution won't work if there is a possibility that the whole physical plant loses power, i.e. both nodes lose power. But, I would think an inexpensive UPS would substantially, if not completely, mitigate that concern.) And those latencies are going to cause a dramatic insert/update/delete performance drop compared to a standalone in-memory database instance.
Another option to consider is to use NVDIMM storage for either the in-memory database or for the write-ahead transaction log used to recover from. It will have the absolute lowest latency (comparable to conventional DRAM). And, if your in-memory database will fit in the available NVDIMM memory, you'll have the fastest recovery possible (no need to replay from a transaction log) and comparable performance to the original IMDB performance because you're back to a single write versus 2+ writes for adding a write-ahead log and/or replicating to another node in a cluster. But, your in-memory database system has to be able to support direct recovery of an in-memory database (not just from a transaction log). But, again, two requirements for this to be an option:
1. The entire database must fit in the NVDIMM memory
2. The database system has to be able to support recovery of the database directly after system restart, without a transaction log.
More in this white paper http://www.odbms.org/wp-content/uploads/2014/06/IMDS-NVDIMM-paper.pdf

Is it good idea to store operational data in memcached?

I write data processor on cpp, which should process a lot of requests and do a lot of calculations, requests are connected with each other. Now I think about easy horizontal scalability.
Is it a good idea to use memcached with replication (an instance on every processor) to store operational data? Such every processor instance could process every requests in an equal time.
How fast and stable is memcached replication?

very fast, one major potential shortcoming of memcached is that it is not persistent. While a common design consideration when using a cache layer is that “data in cache may go away at any point”, this can result in painful warmup time and/or costly cache stampedes.

I would check out Couchbase. http://www.couchbase.com/ It stores the cached data in RAM, but also flushes it out to disk periodically so if a machine gets restarted, the data is still there.
It's very easy to add nodes on the fly as well.
Just for fun you could also check out Riak: http://basho.com/riak/. Very easy to add nodes as your cache needs grow and very easy to get up and running. Also focused on key/value storage, which is good for caching objects.

How can i know my array is in cache?

Lets say my array is 32KB, L1 is 64 KB. Does Windows use some of it while program is running? Maybe I am not able to use L1 because windows is making other programs work? Should I set priority of my program to use all cache?
for(int i=0;i<8192;i++)
{
array_3[i]+=clock()*(rand()%256);//clock() and rand in cache too?
//how many times do I need to use a variable to make it stay in cache?
//or cache is only for reading? look below plz
temp_a+=array_x[i]*my_function();
}
The program is in C/C++.
Same thing for L2 too please.
Also are functions kept in cache? Cache is read only? (If I change my array then it loses the cache bond?)
Does the compiler create the asm codes to use cache more yield?
Thanks

How can i know my array is in cache?
In general, you can't. Generally speaking, the cache is managed directly by hardware, not by Windows. You also can't control whether data resides in the cache (although it is possible to specify that an area of memory shouldn't be cached).
Does windows use some of it while program is running? Maybe i am not able to use L1 because windows is making other programs work? Should i set priority of my program to use all cache?
The L1 and L2 caches are shared by all processes running on a given core. When your process is running, it will use all of cache (if it needs it). When there's a context switch, some or all of the cache will be evicted, depending on what the second process needs. So next time there's a context switch back to your process, the cache may have to be refilled all over again.
But again, this is all done automatically by the hardware.
also functions are kept in cache?
On most modern processors, there is a separate cache for instructions. See e.g. this diagram which shows the arrangement for the Intel Nehalem architecture; note the shared L2 and L3 caches, but the separate L1 caches for instructions and data.
cache is read only?(if i change my array then it loses the cache bond?)
No. Caches can handle modified data, although this is considerably more complex (because of the problem of synchronising multiple caches in a multi-core system.)
does the compiler create the asm codes to use cache more yield?
As cache activity is generally all handled automatically by the hardware, no special instructions are needed.

Cache is not directly controlled by the operating system, it is done
in hardware
In case of a context switch, another application may modify the
cache, but you should not care about this. It is more important to
handle cases when your program behaves cache unfriendly.
Functions are kept in cache (I-Cahce , instruction cache)
Cache is not read only, when you write something it goes to [memory
and] the cache.

The cache is primarily controlled by the hardware. However, I know that Windows scheduler tends to schedule execution of a thread to the same core as before specifically because of the caches. It understands that it will be necessary to reload them on another core. Windows is using this behavior at least since Windows 2000.

As others have stated, you generally cannot control what is in cache. If you are writing code for high-performance and need to rely on cache for performance, then it is not uncommon to write your code so that you are using about half the space of L1 cache. Methods for doing so involve a great deal of discussion beyond the scope of StackOverflow questions. Essentially, you would want to do as much work as possible on some data before moving on to other data.
As a matter of what works practically, using about half of cache leaves enough space for other things to occur that most of your data will remain in cache. You cannot rely on this without cooperation from the operating system and other aspects of the computing platform, so it may be a useful technique for speeding up research calculations but it cannot be used where real-time performance must be guaranteed, as in operating dangerous machinery.
There are additional caveats besides how much data you use. Using data that maps to the same cache lines can evict data from cache even though there is plenty of cache unused. Matrix transposes are notorious for this, because a matrix whose row length is a multiple of a moderate power of two will have columns in which elements map to a small set of cache lines. So learning to use cache efficiently is a significant job.

As far as I know, you can't control what will be in the cache. You can declare a variable as register var_type a and then access to it will be in a single cycle(or a small number of cycles). Moreover, the amount of cycles it will take you to access a chunk of memory also depends on virtual memory translation and TLB.
It should be noted that the register keyword is merely a suggestion and the compiler is perfectly free to ignore it, as was suggested by the comment.

Even though you may not know which data is in cache and which not, you still may get an idea how much of the cache you are utilizing. Modern processor have quite many performance counters and some of them related to cache. Intel's processors may tell you how many L1 and L2 misses there were. Check this for more details of how to do it: How to read performance counters on i5, i7 CPUs

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js