Apollo Server - Confusion about cache/datasource options - apollo

The docs (https://www.apollographql.com/docs/apollo-server/features/data-sources.html#Using-Memcached-Redis-as-a-cache-storage-backend) show code like this:
const { RedisCache } = require('apollo-server-cache-redis');
const server = new ApolloServer({
typeDefs,
resolvers,
cache: new RedisCache({
host: 'redis-server',
// Options are passed through to the Redis client
}),
dataSources: () => ({
moviesAPI: new MoviesAPI(),
}),
});
I was wondering how that cache key is used, considering it seems like the caching is actually custom implemented in something like MoviesAPI() and then used via context.dataSources.moviesAPI.someFunc(). For example, say I wanted to implement my own cache for a SQL database. It'd look like
cache: new RedisCache({
host: 'redis-server',
}),
dataSources: () => ({
SQL: new SQLCache(),
}),
});
where SQLCache has my own function that connects to the RedisCache like:
getCached(id, query, ttl) {
const cacheKey = `sqlcache:${id}`;
return redisCache.get(cacheKey).then(entry => {
if (entry) {
console.log('CACHE HIT!');
return Promise.resolve(JSON.parse(entry));
}
console.log('CACHE MISS!');
return query.then(rows => {
if (rows) redisCache.set(cacheKey, JSON.stringify(rows), ttl);
return Promise.resolve(rows);
});
});
}
So that means I have RedisCache in both the ApolloServer cache key and dataSource implementation. Clearly, the RedisCache is used in the dataSource implementation, but then what does that ApolloServer cache key do exactly?
Also on the client, examples mostly show use of InMemoryCache instead of Redis cache. Should the client Apollo cache be a different cache from the server cache or should the same cache like RedisCache be in both places?

The cache passed to the ApolloServer is, to my knowledge, strictly used in the context of a RESTDataSource. When fetching resources from the REST endpoint, the server will examine the Cache-Control header on the response, and if one exists, will cache the resource appropriately. That means if the header is max-age=86400, the response will be cached with a TTL of 24 hours, and until the cache entry expires, it will be used instead of calling the same REST url.
This is different than the caching mechanism you've implemented, since your code caches the response from the database. Their intent is the same, but they work with different resources. The only way your code would effectively duplicate what ApolloServer's cache already does is if you had written a similar DataSource for a REST endpoint instead.
While both of these caches reduce the time it takes to process your GraphQL response (fetching from cache is noticeably faster than from the database), client-side caching reduces the number of requests that have to be made to your server. Most notably, the InMemoryCache lets you reuse one query across different places in your site (like different components in React) while only fetching the query once.
Because the client-side cache is normalized, it also means if a resource is already cached when fetched through one query, you can potentially avoid refetching it when it's requested with another query. For example, if you fetch a list of Users with one query and then fetch a user with another query, your client can be configured to look for the user in the cache instead of making the second query.
It's important to note that while resources cached server-side typically have a TTL, the InMemoryCache does not. Instead, it uses "fetch policies" to determine the behavior of individual queries. This lets you, for example, have a query that always fetches from the server, regardless of what's in the cache.
Hopefully that helps to illustrate that both server-side and client-side caching are useful but in very different ways.

Related

AWS API gateway cache - invalidate cache for all users of a tenant

I have a resource called Sites.
I am planning to have an endpoint as follows:
/tenant/:tenantId/users/:userId/sites/:siteId
The endpoint is to return a site’s tree which will vary based on the tenantId, userId, siteId.
The sites tree returned by this endpoint will also change based on updates in another resource (i.e users/groups)
How should the cache be discarded for all users of a given tenant whenever there is a change in the site's resource itself or when there is a change in groups?
I understand the client cache-control headers can be used but just not sure how they could be used in this situation? I am also aware of the stage cache purge but I do not need to do it for all tenants in this situation so not too keen on it.
Following Approach can be followed.
Token Storage & Validation
Maintain token list in table auth-details-> id,token,user_id,tenant_id
Maintain token in cache with time-to-live
While processing request validate Token against both auth-details table from step and token validation API
Invalidating Token from cache
Use AmazonMQ, and define publisher to publish message when invalidate condition meets. Define consumer which will be responsible for purging cache for a particular tenant. Use #CacheEvict annotation to clear data for particular tenant.
#CacheEvict(value = "your-cache-name", key = "#tenantId + '_details'")
public Long deleteCacheForTenant(Long tenantId) {
return tenantId;
}

CubeJS Multitenant: How to use COMPILE_CONTEXT as users access the server with different Tokens?

We've been getting started with CubeJS. We are using BiqQuery, with the following heirarchy:
Project (All client)
Dataset (Corresponding to a single client)
Tables (Different data-types for a single client)
We'd like to use COMPILE_CONTEXT to allow different clients to access different Datasets based on the JWT that we issue them after authentication. The JWT includes the user info that'd cause our schema to select a different dataset:
const {
securityContext: { dataset_id },
} = COMPILE_CONTEXT;
cube(`Sessions`, {
sql: `SELECT * FROM ${ dataset_id }.sessions_export`,
measures: {
// Count of all session objects
count: {
sql: `Status`,
type: `count`,
},
In testing, we've found that the COMPILE_CONTEXT global variable is set when the server is launched, meaning that even if a different client submits a request to Cube with a different dataset_id, the old one is used by the server, sending info from the old dataset. The Cube docs on Multi-tenancy state that COMPILE_CONTEXT should be used in our scenario (at least, this is my understanding):
Multitenant COMPILE_CONTEXT should be used when users in fact access different databases. For example, if you provide SaaS ecommerce hosting and each of your customers have a separate database, then each ecommerce store should be modelled as a separate tenant.
SECURITY_CONTEXT, on the other hand, is set at Query time, so we tried to also access the appropriate data from SECURITY_CONTEXT like so:
cube(`Sessions`, {
sql: `SELECT * FROM ${SECURITY_CONTEXT.dataset_id}.sessions_export`,
But the query being sent to the database (found in the error log in the Cube dev server) is SELECT * FROM [object Object].sessions_export) AS sessions.
I'd love to inspect the SECURITY_CONTEXT variable but I'm having trouble finding how to do this, as it's only accessible within our cube Sql to my knowledge.
Any help would be appreciated! We are open to other routes besides those described above. In a nutshell, how can we deliver a specific dataset to a client using a unique JWT?
Given that all your datasets are in the same BigQuery database, I think your use-case reflects the Multiple DB Instances with Same Schema part of the documentation (that title could definitely be improved):
// cube.js
const PostgresDriver = require('#cubejs-backend/postgres-driver');
module.exports = {
contextToAppId: ({ securityContext }) =>
`CUBEJS_APP_${securityContext.dataset_id}`,
driverFactory: ({ securityContext }) =>
new PostgresDriver({
database: `${securityContext.dataset_id}`,
}),
};
// schema/Sessions.js
cube(`Sessions`, {
sql: `SELECT * FROM sessions_export`,
}

Path based AWS API Caching Keys Issue

I have several API paths set up in a test API Gateway setup with a simple 'api' stage. I am using AWS Lambda and wish to cache the results of the lambda call.
There are three test paths (no authentication)
/a/{thing} (GET Caching turned on in stage)
/b/{thing} (GET Caching turned off in stage)
/c/{thing} (GET Caching turned off in stage)
They all map to the same lambda function. The lambda function returns the current time and the value of {thing}.
If I request /a/0000 through /a/1000 I get back the same result for a function that ran for thing=0000.
If I request /b/0000 through /b/1000 (or /c/) I get back uncached results.
thing is selected as 'cache' in resources /a/{thing}. Nothing else is set 'cache'.
It is my understanding that selecting 'cache' next to a path element, query element, or header would construct a cache key - possibly a multi-key cache key hash. That would be ideal!
Ideally /a/0000 and /a/1234 would return a cached version keyed to the {thing} value.
What did I do wrong or misread or step over? Am I hitting a bug when it comes to AWS Lambda? Is caching keyed to authorization - these URLs are public and unauthenticated. I'm just using curl to request these and nothing is being cached on the client side of course.
Honestly. I've also tried using a query argument as the only cache key and let the cache flush and waited 30 minutes to try try try again. Still not giving the results I would expect.
Pro Tip:
You still have to deploy from resources to stage when you set up cache keys. This makes sense of course but it would be good if the management console showed more about the method parameters than it does.
I am using Chalice.. which is why I wasn't deploying in the normal fashion.

How to invalidate AWS APIGateway cache

We have a service which inserts into dynamodb certain values. For sake of this question let's say its key:value pair i.e., customer_id:customer_email. The inserts don't happen that frequently and once the inserts are done, that specific key doesn't get updated.
What we have done is create a client library which, provided with customer_id will fetch customer_email from dynamodb.
Given that customer_id data is static, what we were thinking is to add cache to the table but one thing which we are not sure that what will happen in the following use-case
client_1 uses our library to fetch customer_email for customer_id = 2.
The customer doesn't exist so API Gateway returns not found
APIGateway will cache this response
For any subsequent calls, this cached response will be sent
Now another system inserts customer_id = 2 with its email id. This system doesn't know if this response has been cached previously or not. It doesn't even know that any other system has fetched this specific data. How can we invalidate cache for this specific customer_id when it gets inserted into dynamodb
You can send a request to the API endpoint with a Cache-Control: max-age=0 header which will cause it to refresh.
This could open your application up to attack as a bad actor can simply flood an expensive endpoint with lots of traffic and buckle your servers/database. In order to safeguard against that it's best to use a signed request.
In case it's useful to people, here's .NET code to create the signed request:
https://gist.github.com/secretorange/905b4811300d7c96c71fa9c6d115ee24
We've built a Lambda which takes care of re-filling cache with updated results. It's a quite manual process, with very little re-usable code, but it works.
Lambda is triggered by the application itself following application needs. For example, in CRUD operations the Lambda is triggered upon successful execution of POST, PATCH and DELETE on a specific resource, in order to clear the general GET request (i.e. clear GET /books whenever POST /book succeeded).
Unfortunately, if you have a View with a server-side paginated table you are going to face all sorts of issues because invalidating /books is not enough since you actually may have /books?page=2, /books?page=3 and so on....a nightmare!
I believe APIG should allow for more granular control of cache entries, otherwise many use cases aren't covered. It would be enough if they would allow to choose a root cache group for each request, so that we could manage cache entries by group rather than by single request (which, imho, is also less common).
Did you look at this https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html ?
There is way to invalidate entire cache or a particular cache entry

How to use Ember Adapter

Why the 'Todos.ApplicationAdapter = DS.FixtureAdapter.extend();' replace to '
Todos.ApplicationAdapter = DS.LSAdapter.extend({
namespace: "todos-emberjs"
});
' can be achieved local stores?
what's the meaning of 'namespace: "todos-emberjs"'?
There are how much kinds of adapters? And I should how to use them? How to define an adapter?
(Check out the picture here to see where ADAPTER component fits in)
I just went through EmberJS tutorial recently and from what I understood:
1)What are EmberJS adapters?
The adapters are objects that take care of communication between your application and a server. Whenever your application asks the store for a record that it doesn't have cached, it will ask the adapter for it. If you change a record and save it, the store will hand the record to the adapter to send the appropriate data to your server and confirm that the save was successful.
2)What types of EmberJS adapters are available?
Right now I am only aware of DS.RESTAdapter which is used by default by the store (it communicates with an HTTP server by transmitting JSON via XHR), DS.FixtureAdapter(something like in-memory storage which is not persistent) and DS.LSAdapter(something like local-storage which is persistent).
3)Why LSAdapter instead of FixtureAdapter in Todos tutorial?
FixtureAdapter stores data in-memory and thus whenever you refresh your page, the data gets reassigned to initial values. But LSAdapter is available on github which uses persistent storage to store and retrieve data, hence enabling you to retain all the changes even after you refresh your page.
4)Why namespace: "todos-emberjs"?
If your JSON API lives somewhere other than on the host root, you can set a prefix that will be added to all requests. For example, if your JSON APIs are available at /todo-emberjs/ you would want it to be used as a prefix to all the URLs that you are going to call. In that case, set namespace property to todo-emberjs.
(Hope it helps, loving EmberJS btw !)