I am pretty much new to GoogleCloudPlatform. I am pondering to use it.
Can I route an URL for a HTML String through an API?
The idea is to create some static cache mechanism for heavy pages.
I would also need to remove or update the cache.
Maybe I'm not understanding what you need, but wouldn't make more sense to use their already available Cloud Memorystore?
What it's good for
Cloud Memorystore for Redis provides a fast, in-memory store for use cases that require fast, real-time processing of data. From simple caching use cases to real time analytics, Cloud Memorystore for Redis provides the performance you need.
Caching: Cache is an integral part of modern application
architectures. Cloud Memorystore for Redis provides low latency access
and high throughput for heavily accessed data, compared to accessing
the data from a disk based backend store. Session management,
frequently accessed queries, scripts, and pages are common examples of
caching.
Related
I am new to web development in general.
I am developing a social media website (very much like twitter) using django rest framework as backend and react as front end. I am going to deploy the app to Heroku.
Now, I just heard about this thing called memcached and redis. So, what is the usecase here? Should I use it or It's just for high traffic websites?
Cache in generally called in-memory cache, which store data primarily in memory(like memcached and Redis), and will provide faster way for data access in heavy traffic case.
And Cache-database consistency is always been an issue as you do have multiple different data sources. There are some good solutions to improve it but it still not perfect in sync.
So based on your read/write traffic, if db can handle the traffic perfectly and no performance issue, you don't need to consider cache(most of the productive database also have caching, like MySQL, or DynamoDB). And if db cannot handle your traffic, you should consider using cache.
Any recommendation on how to make superset faster?
Cache seems to load full data from the cache, I thought it load only old data from the cache, and real-time data from the database, isn't it like this?
What about some parallel processing?
This answer is valid as of Superset 0.37.0.
At the moment, dashboard performance is affected by a few different factors. I'll enumerate them below along with methods to improve performance:
Database concurrency limits can have an impact on dashboard performance. Dashboards load their information in parallel via concurrent web requests. Make sure that the database user provided allows enough concurrency that queries aren't being queued at the database layer.
Cache performance your caching layer should be able to return multiple results, if not in parallel, extremely quickly. We've had success leveraging S3 for our cache.
Cache hit percentage Superset will hit the cache only for queries that exactly match one that has been run recently. Otherwise the full query will fall through to the underlying analytical DB (Druid in this case). You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have it update less frequently, say a couple of times a day rather than in real-time, this can hit cache for all requests other than the first request in the new period under consideration.
Python Web Process Concurrency Limits make sure that your web application server can handle enough parallel requests. The browser will request multiple charts' data at the same time, and the system will need to be able to handle these requests in parallel.
Chart Query Performance As data is frequently requested, especially for real-time data from a database like Druid, optimizing the queries run by the charts can be very useful. I'd take a look at any virtual datasources that are being leveraged to see if they can be materialized or made more efficient.
Web browser concurrent request limits By default most web browsers limit the number of concurrent requests that can be made to the same FQDN. If you have more than 6 charts on the same dashboard, it can be helpful to balance requests across multiple FQDNs running Superset to get around this browser limitation. There's more information on the approach to that in the issue history on Github, but Superset does support this type of configuration.
The community is very interested in improving performance over time, and as such there have been recommendations to move all analytical queries to Celery as well as making other architectural changes to improve performance. I hope this description helps and that something in here will help you track down the issue!
Which of the caching strategies can be implemented in django?
What are the pros and cons of each cache backends (in terms of ease of use and ease of developing)?
Which backend should be preferred for production etc.
There are several backends that Django supports.
Listing a few here with comments around the positives and negatives of each.
Memcached: The gold standard for caching. An in-memory service that can return keys at a very fast rate. Not a good choice if your keys are very large in size
Redis: A good alternative to Memcached when you want to cache very large keys (for example, large chunks of rendered JSON for an API).
Dynamodb: Another good alternative to Memcached when you want to cache very large keys. Also scales very well with little IT overhead.
Localmem: Only use for local testing; don’t go into production with this cache type
Database: It’s rare that you’ll find a use case where the database caching makes sense. It may be useful for local testing, but otherwise, avoid.
File system: Can be a trap. Although reading and writing files can be faster than making sql queries, it has some pitfalls. Each cache is local to the application server (not shared), and if you have a lot of cache keys, you can theoretically hit the file system limit for number of files allowed.
Dummy: A great backend to use for local testing when you want your data changes to be made immediately without caching. Be warned: permanently using dummy caching locally can hide bugs from you until they hit an environment where caching is enabled.
Start to learn and use appfabric cache.
From the whitepaper, http://msdn.microsoft.com/en-us/library/gg186017%28v=azure.10%29.aspx, it says:
Bulk get calls result in better network utilization.Direct cache access is much faster than proxies (ASP.NET, WCF).
I am not sure what this means. What is a proxy in appfabric world?
We do websites base on asp.net/mvc, so if we write some logic to access our abpfabric cluster, it will be called from asp.net/mvc code?
Many Thanks
If you look at the document refernced by that page it explains what is meant by caching:
In some cases, the cache client is wrapped and accessed via a proxy
with additional application or domain logic. Oftentimes, performance
of such applications is much different from the Windows Server
AppFabric Cache cluster itself. The goal of the tests in this category
is to show performance of a middle tier application with additional
logic and compare it with performance of direct access to the cache.
To accomplish the goal, a simple WCF application was implemented that
provided access to the cache and contained additional logic of
populating the cache from an external data source if the requested
object is not yet in the cache.
The document contains details on how this affected performance, but if you need more detail the source code used is available.
Using the DataCacheFactory (and/or AppFabric Session provider) from your MVC site will access the cache cluster directly, once you've granted access to the Application Pool user.
So if for example I am trying to implement something that looks like Facebook's Graph API that needs to be very quick and support millions of users, what is the disadvantage of just using Redis instead of a RDBMS?
Thanks!
Jonathan
There are plenty of potential benefits and potential drawbacks of using Redis instead of a classical RDBMS. They are very different beasts indeed.
Focusing only on the potential drawbacks:
Redis is an in-memory store: all your data must fit in memory. RDBMS usually stores the data on disks, and cache part of the data in memory. With a RDBMS, you can manage more data than you have memory. With Redis, you cannot.
Redis is a data structure server. There is no query language (only commands) and no support for a relational algebra. You cannot submit ad-hoc queries (like you can using SQL on a RDBMS). All data accesses should be anticipated by the developer, and proper data access paths must be designed. A lot of flexibility is lost.
Redis offers 2 options for persistency: regular snapshotting and append-only files. None of them is as secure as a real transactional server providing redo/undo logging, block checksuming, point-in-time recovery, flashback capabilities, etc ...
Redis only offers basic security (in term of access rights) at the instance level. RDBMS all provide fine grained per-object access control lists (or role management).
A unique Redis instance is not scalable. It only runs on one CPU core in single-threaded mode. To get scalability, several Redis instances must be deployed and started. Distribution and sharding are done on client-side (i.e. the developer has to take care of them). If you compare them to a unique Redis instance, most RDBMS provide more scalability (typically providing parallelism at the connection level). They are multi-processed (Oracle, PostgreSQL, ...) or multi-threaded (MySQL, Microsoft SQL Server, ... ), taking benefits of multi-cores machines.
Here, I have only described the main drawbacks, but keep in mind there are also plenty of benefits in using Redis (very fast, good concurrency support, low latency, protocol pipelining, good to easily implement optimistic concurrent patterns, good usability/complexity ratio, excellent support from Salvatore and Pieter, pragmatic no-nonsense approach, ...)
For your specific problem (graph), I would suggest to have a look at neo4J or OrientDB which are specifically designed to store graph-oriented data.
I have some additions:
There is a value length limitations in redis. When using redis, you always think about your redis K,V size, especially in redis cluster