How to lower Data Transfer costs on an AWS course platform?

How to lower Data Transfer costs on an AWS course platform? - amazon-web-services

I am calculating the operation costs for a platform we want to develop for a client in AWS. The platform is an online course solution where users can subscribe and access different multimedia contents.
My initial thought was to store the videos in an S3 bucket and simply "feed" them to my back end solution so that the front end can access and show them.
My problem is that when doing the cost estimate I am getting huge cost estimates in Outbound Data Transfer. I don't really know how much traffic the client is expecting so I estimated traffic the following way (this platform is gonna be supported by the state so it will have some traffic):
20MB for every minute of video:
2 hours per week for every user
200 users per month
20MB/m * 60 m/h * 2h/week * 4weeks * 200 = 1.92 TB
This, at 0.09 USD / GB gives me 184.23 USD per month...
I don't know if I am not designing a well made solution, if my estimate is wrong... but I find this to be very expensive. Adding other costs this means I have to pay nearly 2 USDs per user. If someone finds a way to reduce costs please let me know!
Thank you

Related

Amazon obtain current buybox prices

I have a list containing Amazon products (ASINS). I want to update the buybox price in my list like every 5 hours. I am an registered Amazon seller so I do have access to the Selling Partner API - Amazon-Services-API. But the issue here is the rate limit. It is only 0.5 requests per second.
I have like 500k products in my list, it would take like multiple days with a rate limit of 0.5 requests per second.
There are serveral tools like scanunlimited or analyzer.tools which are able to obtain the current buybox price of a product way faster. Where are they getting their live data from? Am I missing out on some API?
Does anyone have an idea, how I can gather the data more quickly then 0.5 requests per seconds?
Kind regards

Dataflow resource usage

After following the dataflow tutorial, I used the pub/sub topic to big query template to parse a JSON record into a table. The Job has been streaming for 21 days. During that time I have ingested about 5000 JSON records, containing 4 fields (around 250 bytes).
After the bill came this month I started to look into resource usage. I have used 2,017.52 vCPU hr, memory 7,565.825 GB hr, Total HDD 620,407.918 GB hr.
This seems absurdly high for the tiny amount of data I have been ingesting. Is there a minimum amount of data I should have before using dataflow? It seems over powered for small cases. Is there another preferred method for ingesting data from a pub sub topic? Is there a different configuration when setting up a Dataflow Job that uses less resources?

It seems that the numbers you mentioned, correspond to not customizing the job resources. By default streaming jobs use a n1-standar-4 machine:
3 Streaming worker defaults: 4 vCPU, 15 GB memory, 400 GB Persistent Disk.
4 vCPU x 24 hrs x 21 days = 2,016
15 GB x 24 hrs x 21 days = 7,560
If you really need streaming in Dataflow, you will need to pay for resources allocated even if there is nothing to process.
Options:
Optimizing Dataflow
Considering that the number and size of the JSON string you need to process are really small, you can reduce the cost to aprox 1/4 of current charge. You just need to set the job to use a n1-standard-1 machine, which has 1vCPU and 3.75GB memory. Just be careful with max nodes, unless you are planning increase the load, one node may be enough.
Your own way
If you don't really need streaming (not likely), you can just create a function that pulls using Synchronous Pull, and add the part that writes to BigQuery. You can schedule according to your needs.
Cloud functions (my recommendation)
You can create a serverless Event-Driven Cloud Function with a Cloud Pub/Sub trigger. This way, considering your low volume, you can take advantage of the Free Tier and keep the real time processing:
"Cloud Functions provides a perpetual free tier for compute-time resources, which includes an allocation of both GB-seconds and GHz-seconds. In addition to the 2 million invocations, the free tier provides 400,000 GB-seconds, 200,000 GHz-seconds of compute time and 5GB of Internet egress traffic per month."[1]
[1] https://cloud.google.com/functions/pricing

How to calculate the average user concurrency for below task in a Load runner scenario? Can someone help me?

Total number of simultaneous users - 200,
Test duration - 2 hours
Load Profile:
Script 1: Browse Catalogue -> 10 steps > 2000 expected rate of business processes/hour > 100 users
Script 2: Search Product -> 6 steps > 1400 expected rate of business processes/hour > 60 users
Script 3: Buy Product -> 12 steps > 600 expected rate of business processes/hour > 40 users
With only this data, how to find out the average user concurrency (per sec)?

Concurrency is about collision in a give time frame. Simultaneity is about same request, same time.
Concurrency over the course of an hour is different for a second. For each of your steps, it is also impossible to understand how many requests are made to the servers under test and what resources are in use. As an example, it is not uncommon for public facing web pages to be made up of hundreds of individual requested elements.
Concurrency on which tier? Web? If web, then I can distort that highly with a cache model, a cache appliance (Varnish), a CDN. App Server? If my requests are all under 100ms apiece to be satisfied I might never have a concurrency in a second above but a few. Database server? If queries are the same across users then some of that might be distorted by caching of the results, either living in the query cache or a front end cache which takes the load off of the DB.
Run your test, report on it. That will be the easiest way.

You could refer to this web, it describe Performance Test Workload Modeling, could help you to calculate the Percent Load Distribution.
Then you can measure the manual operation time of each transaction and average time of each script. When you have Percent Load Distribution and average time, you could calculate the minimum amount of virtual user.

AWS Personalize in Cost Explorer

I am using for 4 dataset group for example:-
Movies
Mobile
Laptops
AC
And in each datasetGroup, we have 3 datasets with name Users, Item and Item_User_INTERACTIONS
And we also have one solution and Campaigns for each dataset group.
I am also sending the real-time event to AWS Personalize using API (putEvent)
The above things cost me about 100USD in two days and showing 498 TPS hours used and I am unable to find the real reason for this much cost.
Or does AWS Personalize simply cost this much?

As your billing tells you, you have used 498 TPS hours, let's calculate if it should be $100.
According to official Amazon Personalize pricing:
https://aws.amazon.com/personalize/pricing/
For first 20K TPS-hour per month you have to pay $0.20 per TPS-hour.
You have used 498 TPS hours in two days, it gives us:
$0.2 * 498 = $99.6 in total.
The answer is: yes, it's expensive.
Another question is:
How TPS usage is calculated?
They charge you for each TPS that is currently reserved. So if you have a campaign with 1 TPS and it's created for 24 hours, then you will be charged for 24[h] x 1[TPS] = 24 TPS hours = $4.8.
The problem is, that $0.2 doesn't look expensive, but if you multiply it by hours, it becomes very expensive.
For testing purposes you should always set TPS to 1, since you cannot set it to 0. 1 TPS allows you to get 3600 recommendations per hour, which is a lot anyways.

The reason for such high price is because of created Campaign which exists and therefore running (this part of AWS Personalize uses more resources than uploading data to s3/creating a model. It is based on TPS-hour per month metric)
E.g. suppose you uploaded a dataset with 100000 rows
Training will cost you about $0.24*2=0.5$ (assuming training took 2h)
Uploading to s3 and dataset - almost free
A created campaign which allows 1 request per second will cost $0.2*24*30=144$ per month
If in the production environment you will set a campaign to support 20 requests per second, it will be 2880$ per month.
So definitely, if these are your first steps with AWS Personalize, create campaigns only that support 1 request per second and verify that you delete unused resources on time.
In case of the SIMS recipe, there is also another way which might save you some money. Try to check how much it will cost for you just to retrain the model every 3d, for example, and to create batch recommendations for your items. Using this strategy we are spending now only 50$ per month per e-Shop instead of 1000$ per month.
Find more data in AWS docs

How to see # of requests per hour on Google Cloud Platform APIs reports

Google API metrics only shows 1 hour to 30 days metrics from today/now. It shows the total but when you narrow the graph it wont update total for that gap of time.
How do I see total amount of request for specific point in time.
Besides, it only provides requests per second, which is a constant variation form my app.
I have tried using "Traffic by API" graph on Google Cloud Platform and narrowing it to shorter time.
Expected the results at the bottom to update with a count of requests of the shorter period of time.
Screen cap of one day of metrics adding up 34,238 requests
Screen cap of graph narrowed from 18hs to 21hs but counts still on 34,238

Looking into it, it seems you can go into the desired API and then into metrics where you will find a drop down.
In that drop down you can choose "Traffic by API" and then filter the graph for the period you are interested. Then download/export that graph and process it on Excel.
In the output file you will find request per second, that you can multiply by 60000 and then create a pivot table to add up everything.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js