Using Hibernate in Memory to cache web service results and run queries - web-services

I'm considering caching web services results in a in-memory database (Hibernate) and run queries on the cache to perform some filtering and joins (on data from multiple web services.)
Is this a good idea (esp. from a performance point of view given that I've to return a response back to a client) ? Has anyone done something similar? Are there any other alternatives ?
Thanks.

Related

Do I need to use a caching technology like memcached or redis?

I am new to web development in general.
I am developing a social media website (very much like twitter) using django rest framework as backend and react as front end. I am going to deploy the app to Heroku.
Now, I just heard about this thing called memcached and redis. So, what is the usecase here? Should I use it or It's just for high traffic websites?
Cache in generally called in-memory cache, which store data primarily in memory(like memcached and Redis), and will provide faster way for data access in heavy traffic case.
And Cache-database consistency is always been an issue as you do have multiple different data sources. There are some good solutions to improve it but it still not perfect in sync.
So based on your read/write traffic, if db can handle the traffic perfectly and no performance issue, you don't need to consider cache(most of the productive database also have caching, like MySQL, or DynamoDB). And if db cannot handle your traffic, you should consider using cache.

Apache SuperSet is very slow

Any recommendation on how to make superset faster?
Cache seems to load full data from the cache, I thought it load only old data from the cache, and real-time data from the database, isn't it like this?
What about some parallel processing?
This answer is valid as of Superset 0.37.0.
At the moment, dashboard performance is affected by a few different factors. I'll enumerate them below along with methods to improve performance:
Database concurrency limits can have an impact on dashboard performance. Dashboards load their information in parallel via concurrent web requests. Make sure that the database user provided allows enough concurrency that queries aren't being queued at the database layer.
Cache performance your caching layer should be able to return multiple results, if not in parallel, extremely quickly. We've had success leveraging S3 for our cache.
Cache hit percentage Superset will hit the cache only for queries that exactly match one that has been run recently. Otherwise the full query will fall through to the underlying analytical DB (Druid in this case). You can reduce the query load on Druid by using a less granular resolution on your dashboard - if it's possible to have it update less frequently, say a couple of times a day rather than in real-time, this can hit cache for all requests other than the first request in the new period under consideration.
Python Web Process Concurrency Limits make sure that your web application server can handle enough parallel requests. The browser will request multiple charts' data at the same time, and the system will need to be able to handle these requests in parallel.
Chart Query Performance As data is frequently requested, especially for real-time data from a database like Druid, optimizing the queries run by the charts can be very useful. I'd take a look at any virtual datasources that are being leveraged to see if they can be materialized or made more efficient.
Web browser concurrent request limits By default most web browsers limit the number of concurrent requests that can be made to the same FQDN. If you have more than 6 charts on the same dashboard, it can be helpful to balance requests across multiple FQDNs running Superset to get around this browser limitation. There's more information on the approach to that in the issue history on Github, but Superset does support this type of configuration.
The community is very interested in improving performance over time, and as such there have been recommendations to move all analytical queries to Celery as well as making other architectural changes to improve performance. I hope this description helps and that something in here will help you track down the issue!

How to configure Sitecore processing server?

I just installed Sitecore Experience Platform and configured it according to the Sitecore scaling recommendations for processing servers.
But I want to know the following things:
1.How can I use the sitecore processing server?
2.How can I check whether processing server is working fine?
3.How collections DB data is processed and send to reporting server?
The processing server is a piece of the whole analytics (xDB) part of the Sitecore solution. More info can be found here.
Snippet:
"The processing and aggregation component extracts information from
captured, raw analytics data and transforms it into a form suitable
for use in reporting applications. It also performs specific tasks on
the collection database that involve mass updates.
You implement processing and aggregation on a Sitecore application
server connected to both the collection and reporting databases. A
processing server can run independently on a dedicated server, or on
the same server together with other Sitecore components. By
implementing multiple processing or aggregation servers, it is
possible to achieve higher performance on high-traffic solutions."
In short: the processing server will aggregate the data in Mongo and processes it (to the reporting database). This can be put on a separate server in order to spare resources on your other servers. I'm not quite sure what it all does behind the scenes and how to check exactly and only that part of the process, but you could check the the reporting tools in the Sitecore backend, like Experience Analytics. If those are working, you probably are fine. Also, check the logs on the processing server - that will give you an indication what he is doing and if any errors occur.

Pattern for sharing a large amount of data between the web application and a backend service in a Service Oriented Application

I have a web application which performs CRUD operations on a database. At times, I have to run a backend job to do a fair amount of number crunching/analytics on this data. This backend job will be written as a different service in a concurrent language, which will be independent of the main web application.
But actually sharing the DB between the 2 applications is probably not a best practice as it will lead to tight coupling. What is the right pattern to use here? Since this data might amount to millions of DB rows, I'm not sure using a message queue / REST APIs would be the best way to go.
This is perhaps a very common scenario and many companies/devs have already solved this problem. Any pointers will be helpful.
From the question, it would seem that the background job does not modify state of database.
Simplest way to avoid performance hit on main application, while there is a background job running, is to take database dump and perform analysis on that dump.

Using Redis as intermediary cache for REST API

We have an iOS app that talks to a django server via REST API. Most of the data consists of rather large Item objects that involve a few related models that render into single flat dictionary, and this data changes rarely.
We've found, that querying this is not a problem for Postgres, but generating JSON responses takes a noticeable amount of time. On the other hand, item collections vary per-user.
I thought about a rendering system, where we just build a dictionary for Item object and save it into redis as JSON string, this way we can serve API directly from redis (e.g. HMGET(id of items in user library), which is fast, and makes it relatively easy to regenerate "rendered instances", basically just a couple of post_save signals.
I wonder how good this design is, are there any major flaws in it? Maybe there's a better way for the task?
Sure, we do the same at our firm, using Redis to store not JSON but large XML strings which are generated from backend databases for RESTful requests, and it saves lots of network hops and overhead.
A few things to keep in mind if this is the first time you're using Redis...
Dedicated Redis Server
Redis is single-threaded and should be deployed on a dedicated server with sufficient CPU power. Don't make the mistake of deploying it on your app or database server.
High Availability
Set up Redis with Master/Slave replication for high availability. I know there's been lots of progress with Redis cluster, so you may want to check on that too for HA.
Cache Hit/Miss
When checking Redis for a cache "hit", if the connection is dead or any exception occurs, don't fail the request, just default to the database; caching should always be 'best effort' since the database can always be used as a last resort.