Related
I find this hard to believe it seems that when implementing a Redis backend cache in Apollo Server, if Apollo Server/Redis Client is unable to connect to the server, the Apollo thread is blocked until it is able to connect.
I've been digging through Apollo's server source code and it seems it probably relies on the Redis client itself?
This may be some configuration in Ioredis but I'm not able to find the right combination. Hoping someone can help.
When Apollo does its fetch through their HTTPCache class, this is the problematic code:
async fetch(
request: Request,
options: {
cacheKey?: string;
cacheOptions?:
| CacheOptions
| ((response: Response, request: Request) => CacheOptions | undefined);
} = {},
): Promise<Response> {
const cacheKey = options.cacheKey ? options.cacheKey : request.url;
const entry = await this.keyValueCache.get(cacheKey);
The very first thing it does is a get. And if that never resolves, it never continues on.
And my semi-random attempt at trying to find the right configuration for ioredis:
const cluster = new Redis.Cluster(
[
{
host: config.redis.endpoint,
port: config.redis.port,
},
],
{
retryDelayOnTryAgain: 0,
retryDelayOnClusterDown: 0,
retryDelayOnFailover: 0,
slotsRefreshTimeout: 0,
clusterRetryStrategy: (times, reason) => {
// Function should return how long to wait before retrying to connect to redis.
const maxRetryDelay = 30000;
const delay = Math.min(times * 1000, maxRetryDelay);
logger.info(`Redis`, `Connection retry. Try number ${times}. Delay: ${delay}`);
if (reason) logger.error(reason.message);
return delay; // Steadily increase retry times until max which is defined above.
},
redisOptions: {
tls: {
rejectUnauthorized: false,
},
autoResendUnfulfilledCommands: false,
retryStrategy: () => {
return;
},
disconnectTimeout: 0,
reconnectOnError: () => false,
connectTimeout: 0,
commandTimeout: 0,
maxRetriesPerRequest: 0,
connectionName: 'Tank Dev',
username: config.redis.auth.username,
password: config.redis.auth.password,
},
slotsRefreshInterval: 60000,
},
);
I need to create a k8s resource which take some time until it will be available,
for this I use the following
https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/controller/controllerutil#example-CreateOrUpdate
op, err := controllerutil.CreateOrUpdate(context.TODO(), c, deploy, func() error {
})
func2()
now I need to call to func2 right after the creation of the object was done (it may take 2-3 min until finish),
How should I do it right?
I found this but not sure how to combine them ...
https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg#hdr-Watching_and_EventHandling
im using kubebuilder
The above approach is more for cli usage.
When you are using kubebuilder or the operator sdk then you need to deal with it in your reconcile function.
Usually you have a custom resource that triggers your controllers reconcile function. When the custom resource is being created you then create the deployment and instead of returning an empty reconcile.Result (which marks it as done) you can return the reconcile.Result with the Requeue attribute.
reconcile.Result{Requeue: true}
So during the next run you check if the deployment is ready. If not then you requeue again. Once it is ready you return an empty reconcile.Result struct.
Also keep in mind that the reconcile function always needs to be idempotent as it will be run again for every custom resource during a restart of the controller and also every 10 hours by default.
Alternatively you could also use an owner reference on the created deployment and then setup the controller to reconcile the owner resource (your custom resource) whenever an update happens on the owned resource (the deployment). With operator sdk this can be configured in the SetupWithManager function, which by default only uses the For option function. Here you need to add the Owns option function.
// SetupWithManager sets up the controller with the Manager.
func (r *YourReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&yourapigroup.YourCustomResource{}).
Owns(&appv1.Deployment{}).
Complete(r)
}
I never used that approach though therefore it might be required to add more code for this to work.
Using the owner reference can also come in handy if you do not require any finalizer code, because kubernetes will delete your owned resource (the deployment) automatically when the custom resource is being deleted.
Here is an example how to create a deployment and check if it has at least 1 ready replica.
Maybe it would be even better to check the conditions in the status and look for the condition of type Available and status of "True".
package main
import (
"context"
"fmt"
v1 "k8s.io/api/apps/v1"
podv1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/config"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
"time"
)
const (
namespace = "default"
deploymentName = "nginx"
)
func main() {
cfg, err := config.GetConfig()
if err != nil {
panic(err)
}
clientset, err := kubernetes.NewForConfig(cfg)
if err != nil {
panic(err)
}
client, err := client.New(cfg, client.Options{})
if err != nil {
panic(err)
}
d := &v1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: deploymentName,
Namespace: namespace,
},
Spec: v1.DeploymentSpec{
Replicas: toInt32Ptr(2),
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": "nginx",
},
},
Template: podv1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": "nginx",
},
},
Spec: podv1.PodSpec{
Containers: []podv1.Container{
{
Name: "nginx",
Image: "nginx",
},
},
},
},
},
}
fmt.Println("Deploying")
_, err = controllerutil.CreateOrUpdate(context.TODO(), client, d, func() error {
return nil
})
if err != nil {
panic(err)
}
stop := make(chan struct{})
watchList := cache.NewListWatchFromClient(clientset.AppsV1().RESTClient(), "deployments", namespace, fields.Everything())
_, ctrl := cache.NewInformer(watchList, &v1.Deployment{}, time.Second, cache.ResourceEventHandlerFuncs{
UpdateFunc: func(o, n interface{}) {
newDeployment := n.(*v1.Deployment)
if newDeployment.Name != deploymentName {
return
}
if newDeployment.Status.ReadyReplicas > 0 {
close(stop)
return
}
return
},
})
ctrl.Run(stop)
fmt.Println("Deployment has at least 1 ready replica")
}
func toInt32Ptr(i int32) *int32 {
return &i
}
This example, from operator-sdk documentation, might help: https://sdk.operatorframework.io/docs/building-operators/golang/references/client/#example-usage. It is based on Create() and Update() functions, which might lead to a simpler algorithm.
This one, from kubebuilder documentation, also provides an interesting track, although it is deprecated: https://book-v1.book.kubebuilder.io/basics/simple_controller.html
I'm following the tutorial https://docs.aws.amazon.com/appsync/latest/devguide/tutorial-lambda-resolvers.html
And have some doubts for using just a switch to handle graphql queries.
Is there a better approach to handle more complicated requests?
The choice is yours as to how to setup lambda within your AppSync API. It is entirely reasonable to have a lambda function per resolver and have a function be responsible for a single resolver. You can alternatively take an approach like the tutorial and use a single function and some lightweight routing code to take care of calling the correct function. Using a single function can often offer some performance benefits because of how lambda's container warming works (esp. for Java & C# where VM startup time can add up) but has less separation of concerns.
Here are some approaches I have taken in the past:
Option 1: JS
This approach uses JavaScript and should feel familiar to those who have run their own GraphQL servers before.
const Resolvers = {
Query: {
me: (source, args, identity) => getLoggedInUser(args, identity)
},
Mutation: {
login: (source, args, identity) => loginUser(args, identity)
}
}
exports.handler = (event, context, callback) => {
// We are going to wire up the resolver to give all this information in this format.
const { TypeName, FieldName, Identity, Arguments, Source } = event
const typeResolver = Resolvers[TypeName]
if (!typeResolver) {
return callback(new Error(`No resolvers found for type: "${TypeName}"`))
}
const fieldResolver = typeResolver[FieldName]
if (!fieldResolver) {
return callback(new Error(`No resolvers found for field: "${FieldName}" on type: "${TypeName}"`), null)
}
// Handle promises as necessary.
const result = fieldResolver(Source, Arguments, Identity);
return callback(null, result)
};
You can then use a standard lambda resolver from AppSync. For now we have to provide the TypeName and FieldName manually.
#**
The value of 'payload' after the template has been evaluated
will be passed as the event to AWS Lambda.
*#
{
"version" : "2017-02-28",
"operation": "Invoke",
"payload": {
"TypeName": "Query",
"FieldName": "me",
"Arguments": $util.toJson($context.arguments),
"Identity": $util.toJson($context.identity),
"Source": $util.toJson($context.source)
}
}
Option 2: Go
For the curious, I have also used go lambda functions successfully with AppSync. Here is one approach that has worked well for me.
package main
import (
"context"
"fmt"
"github.com/aws/aws-lambda-go/lambda"
"github.com/fatih/structs"
"github.com/mitchellh/mapstructure"
)
type GraphQLPayload struct {
TypeName string `json:"TypeName"`
FieldName string `json:"FieldName"`
Arguments map[string]interface{} `json:"Arguments"`
Source map[string]interface{} `json:"Source"`
Identity map[string]interface{} `json:"Identity"`
}
type ResolverFunction func(source, args, identity map[string]interface{}) (data map[string]interface{}, err error)
type TypeResolverMap = map[string]ResolverFunction
type SchemaResolverMap = map[string]TypeResolverMap
func resolverMap() SchemaResolverMap {
return map[string]TypeResolverMap{
"Query": map[string]ResolverFunction{
"me": getLoggedInUser,
},
}
}
func Handler(ctx context.Context, event GraphQLPayload) (map[string]interface{}, error) {
// Almost the same as the JS option.
resolvers := resolverMap()
typeResolver := resolvers[event.TypeName]
if typeResolver == nil {
return nil, fmt.Errorf("No type resolver for type " + event.TypeName)
}
fieldResolver := typeResolver[event.FieldName]
if fieldResolver == nil {
return nil, fmt.Errorf("No field resolver for field " + event.FieldName)
}
return fieldResolver(event.Source, event.Arguments, event.Identity)
}
func main() {
lambda.Start(Handler)
}
/**
* Resolver Functions
*/
/**
* Get the logged in user
*/
func getLoggedInUser(source, args, identity map[string]interface{}) (data map[string]interface{}, err error) {
// Decode the map[string]interface{} into a struct I defined
var typedArgs myModelPackage.GetLoggedInUserArgs
err = mapstructure.Decode(args, &typedArgs)
if err != nil {
return nil, err
}
// ... do work
res, err := auth.GetLoggedInUser()
if err != nil {
return nil, err
}
// Map the struct back to a map[string]interface{}
return structs.Map(out), nil
}
// ... Add as many more as needed
You can then use the same resolver template as used in option 1. There are many other ways to do this but this is one method that has worked well for me.
Hope this helps :)
You are not forced to use one single AWS Lambda to handle each request. For this tutorial it's easier for newcomers to get the idea of it, therefore they used this approach.
But it's up to you how to implement it in the end. An alternative would be to create for each resolver a separate AWS Lambda to eliminate the switch and to follow Single Responsibility Principle (SRP).
You can proxy all the queries to a graphql-server
Apollo GraphQL Server provides a very good setup to deploy a GraphQL server in AWS Lambda.
I am using aws-sdk to push data to Kinesis stream.
I am using PutRecord to achieve realtime data push.
I am observing same delay in putRecords as well in case of batch write.
I have tried out this with 4 records where I am not crossing any shard limit.
Below is my node js http agent configurations. Default maxSocket value is set to infinity.
Agent {
domain: null,
_events: { free: [Function] },
_eventsCount: 1,
_maxListeners: undefined,
defaultPort: 80,
protocol: 'http:',
options: { path: null },
requests: {},
sockets: {},
freeSockets: {},
keepAliveMsecs: 1000,
keepAlive: false,
maxSockets: Infinity,
maxFreeSockets: 256 }
Below is my code.
I am using following code to trigger putRecord call
event.Records.forEach(function(record) {
var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
// put record request
evt = transformEvent(payload );
promises.push(writeRecordToKinesis(kinesis, streamName, evt ));
}
Event structure is
evt = {
Data: new Buffer(JSON.stringify(payload)),
PartitionKey: payload.PartitionKey,
StreamName: streamName,
SequenceNumberForOrdering: dateInMillis.toString()
};
This event is used in put request.
function writeRecordToKinesis(kinesis, streamName, evt ) {
console.time('WRITE_TO_KINESIS_EXECUTION_TIME');
var deferred = Q.defer();
try {
kinesis.putRecord(evt , function(err, data) {
if (err) {
console.warn('Kinesis putRecord %j', err);
deferred.reject(err);
} else {
console.log(data);
deferred.resolve(data);
}
console.timeEnd('WRITE_TO_KINESIS_EXECUTION_TIME');
});
} catch (e) {
console.error('Error occured while writing data to Kinesis' + e);
deferred.reject(e);
}
return deferred.promise;
}
Below is output for 3 messages.
WRITE_TO_KINESIS_EXECUTION_TIME: 2026ms
WRITE_TO_KINESIS_EXECUTION_TIME: 2971ms
WRITE_TO_KINESIS_EXECUTION_TIME: 3458ms
Here we can see gradual increase in response time and function execution time.
I have added counters in aws-sdk request.js class. I can see same pattern in there as well.
Below is code snippet for aws-sdk request.js class which executes put request.
send: function send(callback) {
console.time('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
if (callback) {
this.on('complete', function (resp) {
console.timeEnd('SEND_REQUEST_TO_KINESIS_EXECUTION_TIME');
callback.call(resp, resp.error, resp.data);
});
}
this.runTo();
return this.response;
},
Output for send request:
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1751ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 1816ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 2761ms
SEND_REQUEST_TO_KINESIS_EXECUTION_TIME: 3248ms
Here you can see it is increasing gradually.
Can anyone please suggest how can I reduce this delay?
3 seconds to push single record to Kinesis is not at all acceptable.
How can we make our MapReduce Queries Faster?
We have built an application using a five node Riak DB cluster.
Our data model is composed of three buckets: matches, leagues, and teams.
Matches contains links to leagues and teams:
Model
var match = {
id: matchId,
leagueId: meta.leagueId,
homeTeamId: meta.homeTeamId,
awayTeamId: meta.awayTeamId,
startTime: m.match.startTime,
firstHalfStartTime: m.match.firstHalfStartTime,
secondHalfStartTime: m.match.secondHalfStartTime,
score: {
goals: {
a: 1*safeGet(m.match, 'score.goals.a'),
b: 1*safeGet(m.match, 'score.goals.b')
},
corners: {
a: 1*safeGet(m.match, 'score.corners.a'),
b: 1*safeGet(m.match, 'score.corners.b')
}
}
};
var options = {
index: {
leagueId: match.leagueId,
teamId: [match.homeTeamId, match.awayTeamId],
startTime: match.startTime || match.firstHalfStartTime || match.secondHalfStartTime
},
links: [
{ bucket: 'leagues', key: match.leagueId, tag: 'league' },
{ bucket: 'teams', key: match.homeTeamId, tag: 'home' },
{ bucket: 'teams', key: match.awayTeamId, tag: 'away' }
]
};
match.model = 'match';
modelCache.save('matches', match.id, match, options, callback);
Queries
We write a query that returns results from several buckets, one way is to query each bucket separately. The other way is to use links to combine results from a single query.
Two versions of the query we tried both take over a second, no matter how small our bucket size.
The first version uses two map phases, which we modeled after this post (Practical Map-Reduce: Forwarding and Collecting).
#!/bin/bash
curl -X POST \
-H "content-type: application/json" \
-d #- \
http://localhost:8091/mapred \
<<EOF
{
"inputs":{
"bucket":"matches",
"index":"startTime_bin",
"start":"2012-10-22T23:00:00",
"end":"2012-10-24T23:35:00"
},
"query": [
{"map":{"language": "javascript", "source":"
function(value, keydata, arg){
var match = Riak.mapValuesJson(value)[0];
var links = value.values[0].metadata.Links;
var result = links.map(function(l) {
return [l[0], l[1], match];
});
return result;
}
"}
},
{"map":{"language": "javascript", "source": "
function(value, keydata, arg) {
var doc = Riak.mapValuesJson(value)[0];
return [doc, keydata];
}
"}
},
{"reduce":{
"language": "javascript",
"source":"
function(values) {
var merged = {};
values.forEach(function(v) {
if(!merged[v.id]) {
merged[v.id] = v;
}
});
var results = [];
for(key in merged) {
results.push(merged[key]);
}
return results;
}
"
}
}
]
}
EOF
In the second version we do four separate Map-Reduce queries to get the objects from the three buckets:
async.series([
//First get all matches
function(callback) {
db.mapreduce
.add(inputs)
.map(function (val, key, arg) {
var data = Riak.mapValuesJson(val)[0];
if(arg.leagueId && arg.leagueId != data.leagueId) {
return [];
}
var d = new Date();
var date = data.startTime || data.firstHalfStartTime || data.secondHalfStartTime;
d.setFullYear(date.substring(0, 4));
d.setMonth(date.substring(5, 7) - 1);
d.setDate(date.substring(8, 10));
d.setHours(date.substring(11, 13));
d.setMinutes(date.substring(14, 16));
d.setSeconds(date.substring(17, 19));
d.setMilliseconds(0);
startTimestamp = d.getTime();
var short = {
id: data.id,
l: data.leagueId,
h: data.homeTeamId,
a: data.awayTeamId,
t: startTimestamp,
s: data.score,
c: startTimestamp
};
return [short];
}, {leagueId: query.leagueId, page: query.page}).reduce(function (val, key) {
return val;
}).run(function (err, matches) {
matches.forEach(function(match) {
result.match[match.id] = match; //Should maybe filter this
leagueIds.push(match.l);
teamIds.push(match.h);
teamIds.push(match.a);
});
callback();
});
},
//Then get all leagues, teams and lines in parallel
function(callback) {
async.parallel([
//Leagues
function(callback) {
db.getMany('leagues', leagueIds, function(err, leagues) {
if (err) { callback(err); return; }
leagues.forEach(function(league) {
visibleLeagueIds[league.id] = true;
result.league[league.id] = {
r: league.regionId,
n: league.name,
s: league.name
};
});
callback();
});
},
//Teams
function(callback) {
db.getMany('teams', teamIds, function(err, teams) {
if (err) { callback(err); return; }
teams.forEach(function(team) {
result.team[team.id] = {
n: team.name,
h: team.name,
s: team.stats
};
});
callback();
});
}
], callback);
}
], function(err) {
if (err) { callback(err); return; }
_.each(regionModel.getAll(), function(region) {
result.region[region.id] = {
id: region.id,
c: 'https://d1goqbu19rcwi8.cloudfront.net/icons/silk-flags/' + region.icon + '.png',
n: region.name
};
});
var response = {
success: true,
result: {
modelRecords: result,
paging: {
page: query.page,
pageSize: 50,
total: result.match.length
},
time: moment().diff(a)/1000.00,
visibleLeagueIds: visibleLeagueIds
}
};
callback(null, JSON.stringify(response, null, '\t'));
});
How do we make these queries faster?
Additional info:
We are using riak-js and node.js to run our queries.
One way to make it at least a bit faster would be to deploy the JavaScript mapreduce functions to the server instead of passing them through as part of the job. (see description of js_source_dir parameter here). This is usually recommended if you have a JavaScript functions that you run repeatedly.
As there is some overhead associated with running JavaScript mapreduce functions compared to native ones implemented in Erlang, using non-JavaScript functions where possible may also help.
The two map phase functions in your first query appear to be designed to work around the limitation that a normal linking phase (which I believe is more efficient) does not pass on the record being processed (the matches record). The first function includes all the links and passes on the match data as additional data in JSON form, while the second passes on the data of the match as well as the linked record in JSON form.
I have written a simple Erlang function that includes all links as well as the ID of the record passed in. This could be used together with the native Erlang function riak_kv_mapreduce:map_object_value to replace the two map phase functions in your first example, removing some of the JavaScript usage. As in the existing solution, I would expect you to receive a number of duplicates as several matches may link to the same league/team.
-module(riak_mapreduce_example).
-export([map_link/3]).
%% #spec map_link(riak_object:riak_object(), term(), term()) ->
%% [{{Bucket :: binary(), Key :: binary()}, Props :: term()}]
%% #doc map phase function for adding linked records to result set
map_link({error, notfound}, _, _) ->
[];
map_link(RiakObject, Props, _) ->
Bucket = riak_object:bucket(RiakObject),
Key = riak_object:key(RiakObject),
Meta = riak_object:get_metadata(RiakObject),
Current = [{{Bucket, Key}, Props}],
Links = case dict:find(<<"Links">>, Meta) of
{ok, List} ->
[{{B, K}, Props} || {{B, K}, _Tag} <- List];
error ->
[]
end,
lists:append([Current, Links]).
The results of these can either be sent back to the client for aggregation or passed into a reduce phase function as in the example you provided.
The example function would need to be compiled and installed on all nodes, and may require a restart.
Another way to improve performance (that very well may not be an option for you) would perhaps be alter the data model in order to avoid having to use mapreduce queries for performance critical queries altogether.