I am using for 4 dataset group for example:-
Movies
Mobile
Laptops
AC
And in each datasetGroup, we have 3 datasets with name Users, Item and Item_User_INTERACTIONS
And we also have one solution and Campaigns for each dataset group.
I am also sending the real-time event to AWS Personalize using API (putEvent)
The above things cost me about 100USD in two days and showing 498 TPS hours used and I am unable to find the real reason for this much cost.
Or does AWS Personalize simply cost this much?
As your billing tells you, you have used 498 TPS hours, let's calculate if it should be $100.
According to official Amazon Personalize pricing:
https://aws.amazon.com/personalize/pricing/
For first 20K TPS-hour per month you have to pay $0.20 per TPS-hour.
You have used 498 TPS hours in two days, it gives us:
$0.2 * 498 = $99.6 in total.
The answer is: yes, it's expensive.
Another question is:
How TPS usage is calculated?
They charge you for each TPS that is currently reserved. So if you have a campaign with 1 TPS and it's created for 24 hours, then you will be charged for 24[h] x 1[TPS] = 24 TPS hours = $4.8.
The problem is, that $0.2 doesn't look expensive, but if you multiply it by hours, it becomes very expensive.
For testing purposes you should always set TPS to 1, since you cannot set it to 0. 1 TPS allows you to get 3600 recommendations per hour, which is a lot anyways.
The reason for such high price is because of created Campaign which exists and therefore running (this part of AWS Personalize uses more resources than uploading data to s3/creating a model. It is based on TPS-hour per month metric)
E.g. suppose you uploaded a dataset with 100000 rows
Training will cost you about $0.24*2=0.5$ (assuming training took 2h)
Uploading to s3 and dataset - almost free
A created campaign which allows 1 request per second will cost $0.2*24*30=144$ per month
If in the production environment you will set a campaign to support 20 requests per second, it will be 2880$ per month.
So definitely, if these are your first steps with AWS Personalize, create campaigns only that support 1 request per second and verify that you delete unused resources on time.
In case of the SIMS recipe, there is also another way which might save you some money. Try to check how much it will cost for you just to retrain the model every 3d, for example, and to create batch recommendations for your items. Using this strategy we are spending now only 50$ per month per e-Shop instead of 1000$ per month.
Find more data in AWS docs
Related
I have a list containing Amazon products (ASINS). I want to update the buybox price in my list like every 5 hours. I am an registered Amazon seller so I do have access to the Selling Partner API - Amazon-Services-API. But the issue here is the rate limit. It is only 0.5 requests per second.
I have like 500k products in my list, it would take like multiple days with a rate limit of 0.5 requests per second.
There are serveral tools like scanunlimited or analyzer.tools which are able to obtain the current buybox price of a product way faster. Where are they getting their live data from? Am I missing out on some API?
Does anyone have an idea, how I can gather the data more quickly then 0.5 requests per seconds?
Kind regards
I am calculating the operation costs for a platform we want to develop for a client in AWS. The platform is an online course solution where users can subscribe and access different multimedia contents.
My initial thought was to store the videos in an S3 bucket and simply "feed" them to my back end solution so that the front end can access and show them.
My problem is that when doing the cost estimate I am getting huge cost estimates in Outbound Data Transfer. I don't really know how much traffic the client is expecting so I estimated traffic the following way (this platform is gonna be supported by the state so it will have some traffic):
20MB for every minute of video:
2 hours per week for every user
200 users per month
20MB/m * 60 m/h * 2h/week * 4weeks * 200 = 1.92 TB
This, at 0.09 USD / GB gives me 184.23 USD per month...
I don't know if I am not designing a well made solution, if my estimate is wrong... but I find this to be very expensive. Adding other costs this means I have to pay nearly 2 USDs per user. If someone finds a way to reduce costs please let me know!
Thank you
The biggest chunk of my BigQuery billing comes from query consumption. I am trying to optimize this by understanding which datasets/tables consume the most.
I am therefore looking for a way to track my BigQuery usage, but ideally something that is more in realtime (that I don't have to wait a day before I get the final results). The best way would be for instance how much each table/dataset consumed in the last hour.
So far I managed to find the Dashboard Monitoring but this only allows to display the queries in flight per project and the stored bytes per table, which is not what I am after.
What other solutions are there to retrieve this kind of information?
Using Stackdriver logs, you could create a sink with Pub/Sub topic as target for real-time analysis that filter only BigQuery logs like this :
resource.type="bigquery_resource" AND
proto_payload.method_name="jobservice.jobcompleted" AND
proto_payload.service_data.job_completed_event.job.job_statistics.total_billed_bytes:*
(see example queries here : https://cloud.google.com/logging/docs/view/query-library?hl=en_US#bigquery-filters)
You could create the sink on a specific project, a folder or even an organization. This will retrieve all the queries done in BigQuery in that specific project, folder or organization.
The field proto_payload.service_data.job_completed_event.job.job_statistics.total_billed_bytes will give you the number of bytes processed by the query.
Based on on-demand BigQuery pricing (as of now, $5/TB for most regions, but check for your own region), you could easily estimate in real-time the billing. You could create a Dataflow job that aggregates the results in BigQuery, or simply consume the destination Pub/Sub topic with any job you want to make the pricing calculation :
jobPriceInUSD = totalBilledBytes / 1_000_000_000_000 * pricePerTB
because 1 TB = 1_000_000_000_000 B. As I said before, pricePerTB depends on regions (see : (https://cloud.google.com/bigquery/pricing#on_demand_pricing for the exact price). For example, as of time of writing :
$5/TB for us-east1
$6/TB for asia-northeast1
$9/TB for southamerica-east1
Also, for each month, as of now, the 1st TB is free.
It might be easier to use the INFORMATION_SCHEMA.JOBS_BY_* views because you don't have to set up the stackdriver logging and can use them right away.
Example taken & modified from How to monitor query costs in Google BigQuery
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Some caveats:
you need to UNION ALL all of the projects that you use explicitly
JOBS_BY_USER did not work for me on my private account (supposedly because me login email is #googlemail and big query stores my email as #gmail`)
the WHERE condition needs to be adjusted for your billing period (instead of the last 30 days)
doesn't provide the "bytes billed" information, so we need to determine those based on the cache usage
doesn't include the "if less than 10MB use 10MB" condition
data is only retained for the past 180 days
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5; reflects only US costs - other regions might have different costs - see https://cloud.google.com/bigquery/pricing#on_demand_pricing
you can only query one region at a time
Total number of simultaneous users - 200,
Test duration - 2 hours
Load Profile:
Script 1: Browse Catalogue -> 10 steps > 2000 expected rate of business processes/hour > 100 users
Script 2: Search Product -> 6 steps > 1400 expected rate of business processes/hour > 60 users
Script 3: Buy Product -> 12 steps > 600 expected rate of business processes/hour > 40 users
With only this data, how to find out the average user concurrency (per sec)?
Concurrency is about collision in a give time frame. Simultaneity is about same request, same time.
Concurrency over the course of an hour is different for a second. For each of your steps, it is also impossible to understand how many requests are made to the servers under test and what resources are in use. As an example, it is not uncommon for public facing web pages to be made up of hundreds of individual requested elements.
Concurrency on which tier? Web? If web, then I can distort that highly with a cache model, a cache appliance (Varnish), a CDN. App Server? If my requests are all under 100ms apiece to be satisfied I might never have a concurrency in a second above but a few. Database server? If queries are the same across users then some of that might be distorted by caching of the results, either living in the query cache or a front end cache which takes the load off of the DB.
Run your test, report on it. That will be the easiest way.
You could refer to this web, it describe Performance Test Workload Modeling, could help you to calculate the Percent Load Distribution.
Then you can measure the manual operation time of each transaction and average time of each script. When you have Percent Load Distribution and average time, you could calculate the minimum amount of virtual user.
I am going to create a simple blog app which has included only reading facility (list/search/get). So, what is the quota limitation for reading in Blogger API? In my Quotas section which show as follows,
Queries per day 10,000
Queries per 100 seconds per user 100
I wanted know that what is the reading data quota limit from above stats?
10,000 request (list/search/get) per day; with the limit of 100 request per 100 second for each unique IP address.
These numbers are since, as stated here:
Depending on the API, Quota information may include requests per day,
requests per minute, and requests per minute per user.
Additionally, according to this you may modify these limits given your preference and performance, as well as monitor your API Usage following this.