How to set SLO for operations that are dependent on file size? - sre

I have an endpoint POST /upload that uploads file into my storage.
The response time is dependent on the file size (the bigger file, the longer it takes to respond with 200).
How should I set a Service Level Objective (SLO) with this endpoint?
Any suggestion?

I would suggest looking at it at a higher level first. Usually, you measure response time from the server where it mostly depends on the server-side. Whereas uploading files to storage mostly depends on the client (network bandwidth). So it depends if you want to measure client performance upload speed or not.
But if you still want to measure performance SLO I'd suggest measuring performance against specific size. Say if you know for example average is 500kb and 90% is 1Mb then measure performance up to 1Mb of files.

Related

should I do many smaller requests, or fewer but larger requests using s3 to pass data

I'm working on a project that requires data entries to be inserted into an RDS instance. We're using a serverless stack (cognito, api gateway, lambda, rds) to accomplish this. Our application requires a large amount of data to be read off of an embedded device, prior to insertion. That data must then be inserted immediately.
Based on our current setup, a single batch of data could be in excess of 60KB, but that's a worst case scenario.
Is there an accepted best practice or ideal way of sending/accessing this data this large in my lambda function? As of right now, I'm planning on shipping it off with my API request. I've seen s3 mentioned as an intermediary for large quantities of data, but I'm not sure if it's really necessary for something like this.
In my experience it depends on a number of factors. What communication are you using? What is the drop rate? do you experience corrupt packages? What is your embedded device?
If you can send the data in one time with a 97% success rate then I don't see a reason to split the data. If packets take a long time and connections can drop then its good to send multiple packets and resend the failed ones.
For the network 60KB is a small amount of data. If you have a slow 2G embedded device then that's your bottleneck and you need to experience what the most efficient way is to get the data out of it. A single stream of data would probably be the most efficient.

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

Amazon S3 multipart upload

I am trying to upload a .bak file(24gb) to amazon s3 using multipart upload low-level API approach in Java. I was able to write the file successfully but the time it took was around 7-8 hours. I want to know what is the average/ideal time to upload a file of such a big size, is the time it took is expected or it can be improved? If there is a scope for improvement than what could be the approach?
If you are using default settings of Transfer Manager, then for multipart uploads, the DEFAULT_MINIMUM_UPLOAD_PART_SIZE is 5MB which is too low for a 24GB file. This essentially means that you'll end up having thousands of small part uploaded to s3. Since each part is uploaded by a different worker thread, your application will spend too much time in Network communication. This will not give you optimal uploading speed.
You must increase the minimum upload part size to be between 100MB to 500 MB. Use this setting : setMinimumUploadPartSize
Official Documentation for setting MinimumUploadPartSize :
Decreasing the minimum part size will cause multipart uploads to be split into a larger number of smaller parts. Setting this value too low can have a negative effect on transfer speeds since it will cause extra latency and network communication for each part.
I am certain you'll see improvement in upload throughput by tuning this setting if you are currently using default settings. Let me know if this improves the throughput.
Happy Uploading !!!

Determine available upload/download bandwidth

I have an application which does file upload and download. I also am able to limit upload/download speed to a desired level (CONFIGURABLE), so that my application does not consume the whole available bandwidth. I am able to achieve this using the libcurl (http) library.
But my question is, if I have to limit my upload speed to say 75% of the available upload bandwidth, how do I find out my available upload bandwidth programatically? preferably in C/C++. If it is pre-configured, I have no issues, but if it has to be learnt and adapted each time, like I said, 75% of the available upload limit, I do not know who to figure it out. Same is applicable to download. Any pointers would be of great help.
There's no way to determine the absolute network capacity between two points on a regular network.
The reason is that the traffic can be rerouted in between, other data streams appear or disappear or links can be severed.
What you can do is figure out what is the available bandwidth right now. One way to do it is to upload/download a chunk of data (say 1MB) as fast as possible (no artificial caps), and measure how long it takes. From there you can figure out what bandwidth is available now and go from there.
You could periodically measure the bandwidth again to make sure you're not too way off.

Estimate minimum hardware requirement of a webservice at design time

is it possible to estimate to minimum hardware requirement of a web service at design time?
i.e. based on the estimation of input size, response time and etc.
IMO, you will have to do some analysis on the expected request/second, and how many requests do you want to be able to handle?
Also, you can scale and distribute a web service easily. So a little skew wouldnt be a problem.
without knowing your application, it s very hard guess what kind of server you will need. A blade server with xeon processor easily handles about 2K requests per second. I did work on a application where we were able to process 3K requests per second on a blade server. Note: we had minimal data access, we were using in memory cache and distributed cache.
So there are lot of factors to consider while you are doing capacity planning. That s why you can start with small hardware and then you can scale your application/hardware horizantally or vertically based on your needs.