k8s get resources from cluster take too much time - amazon-web-services

I need to get all resources based on label, I used the following code which works, However, it takes too much time ( ~20sec) to get the response, even which I restrict it to only one namespace (vrf), any idea what im doing wrong here?
resource.NewBuilder(flags).
Unstructured().
ResourceTypes(res...).
NamespaceParam("vrf").AllNamespaces(false).
LabelSelectorParam("a=b").SelectAllParam(selector == "").
Flatten().
Latest().Do().Object()
https://pkg.go.dev/k8s.io/cli-runtime#v0.26.1/pkg/resource#Builder
As I already using label and ns, not sure what should else I do in this case.
Ive checked the cluster connection and it seems that everything is ok, running regular kubectl are getting very fast response, just this query took much time.

The search may be heavy due to the sheer size of the resources the query has to search into. Have you looked into this possibility and further reduce the size using one more label or filter on top of current.
Also check the performance of you Kubernetes api server when the operation is being performed and optimize it.

Related

AWS "state file" solution for Lambda

I'm using a library in lambda where a "state file" is persisted
This is what it looks like in code:
def initialize
#config = '/tmp/dogscaler.yaml'
#state = self.load
end
If you need to look at the whole logic
https://github.com/cvent/dogscaler/blob/master/lib/dogscaler/state.rb#L5
My issue is that, this won't work in lambda (it being serverless). I'm trying to look for a solution where I don't have to change the logic in how the file is read and modifed.
Can this be achieved with S3?
Would something like this pseudo code work?
read s3://path/to/file
write s3://path/to/file
Are there better solutions to S3?
Additional Context
The file is needed for a cooldown period logic. Every time the application runs, it would check a time stamp from that file to make a judgement on wether to change an element or not. File is less than 1KB.
Based on the updated information you could store the data in a number of places.
S3 would be perfectly fine, but might be overkill if this is all you're using it for.
The same can be said of DynamoDB.
Parameter Store is a solid option for your use case. Bear in mind that if you are calling it often you may need to increase your TPS limit. It doesn't sound like that will be an issue for you. Also keep in mind that there is no protection here for multiple instances of your Lambda function writing to the parameter at the "same time." The last write will win. If you need to protect against that DynamoDB is probably the best option.

Amazon CloudSearch : first search after some idle time is slow

I am currently evaluating if I can use Amazon CloudSearch for our search need instead of Elastic Search.
Right now, I just have about 4K small documents for my testing purpose. Let's if I try to performance the search after a good 2-3 hours, the first search is about 8 to 10 times slower compared to the subsequent searches. The first search after some idle time takes about about 300ms where as the subsequent searches take about 40ms. I am not using the same search terms in the first and subsequent searches, so dont think the subsequent searches are faster due to cached results.
Please note that if I change my instance type to search.m3.xlarge or search.m3.2xlarge instead of default, the response of first search is not all that bad. I tried to look into the documentation if it is an expected behavior, but could not find any. Can someone throw some light on this please?

Kibana Mapping conflict How to make sure this error wont get repeated

I am new to kibana we are using Aws es 5.5. i have setuped the dashboards yesterday which are working fine but today morning when i see all dashboards are empty with out no data. i found it was due to Mapping conflict. In google i found one Answer was to reindex the data. how can we prevent in future this type of errors.
Any Answers would be greatly Appreciated.
Probably you have the same field twice with not the same mapping, for example gender define as string in one place and in other place define as number.
You need to check it and prevent it next time

When does an action not run on the driver in Apache Spark?

I have just started with Spark and was struggling with the concept of tasks.
Can any one please help me in understanding when does an action (say reduce) not run in the driver program.
From the spark tutorial,
"Aggregate the elements of the dataset using a function func (which
takes two arguments and returns one). The function should be
commutative and associative so that it can be computed correctly in
parallel. "
I'm currently experimenting with an application which reads a directory on 'n' files and counts the number of words.
From the web UI the number of tasks is equal to number of files. And all the reduce functions are taking place on the driver node.
Can you please tell a scenario where the reduce function won't execute at the driver. Does a task always include "transformation+action" or only "transformation"
All the actions are performed on the cluster and results of the actions may end up on the driver (depending on the action).
Generally speaking the spark code you write around your business logic is not the program that would actually run - rather spark uses it to create a plan which will execute your code in the cluster. The plan creates a task of all the actions that can be done on a partition without the need to shuffle data around. Every time spark needs the data arranged differently (e.g. after sorting) It will create a new task and a shuffle between the first and the latter tasks
Ill take a stab at this, although I may be missing part of the question. A task is indeed always transformation(s) and an action. The transformation's are lazy and would not submit anything, thus the need for an action. You can always call .toDebugString on your RDD to see where each job split will be; each level of indentation is a new stage. I think the reduce function showing on the driver is a bit of a misnomer as it will run first in parallel and then merge the results. So, I would expect that the task does indeed run on the workers as far as it can.

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).
There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.