Estimating and measuring CloudKit database usage - icloud

CloudKit has some pretty stringent data limits, restricting to 50MB of database storage plus 1MB per user, with 250KB/day in database bandwidth plus 5KB per user.
To find out how CloudKit incorporates database structure and protocol overheads into these numbers, is there a dashboard I'm able to look at to see the size of a record or table, or the amount of bandwidth consumed?

Unfortunately there isn't any way to view usage statistics in the CloudKit dashboard.
Note though that the data limits mentioned are just for the public database. If you use the private database the use is counted against the individual user's quota (every iCloud account gets 5GB of storage free and users can pay for additional storage), and the transfer limits are high enough that you shouldn't run into them in practice.

Related

Which EC2 instance size should I use if I have 100 concurrent users daily and 300 max users? I need to save cost

I have made a Business Card ordering portal and its backend is in NodeJS.
I am currently using t2.micro and I am getting like 50 daily users and 15-20 concurrent users but in some time the user count would go up to 300 daily users and 100 concurrent users. I don't want to spend much either.
It has single database and we don't use threads.
I am confused whether I should change my instance type or should use Auto Scaling Groups.
I am not a pro in AWS. Please help!!
Nobody can give you an answer to your question because every application is different.
Some applications need more CPU (eg for encoding/encrypting). Some need lots of RAM (for calculations). Some need lots of disk access (eg for file manipulation).
Only you know how your application behaves and what resources it would need.
You could either pick something and then monitor it (CPU, RAM, Disk) in production to see where the 'bottleneck' lies, or you could create a test that simulates the load of users and pushes it to breaking point to discover the bottleneck.

bigstore increasing almost linearly Google Cloud

I use many api's from Google Cloud. Recently I noticed that the bigstore is gradually increasing on a daily basis. I am worried that if this continues I wont be able to pay the bill.
I do not know however how to check where this increase is coming from. Is there a way to see which cloud functions are causing this increased traffic?
The reason I am surprised about the increase in the traffic of bigstore is because I have cron jobs that are running multiple times per day to store the data in BigQuery. I have not changed these settings, so I would assume that this traffic should not increase as shown on the chart.
One other explanation I can think of is that the amount of data that I am storing has increased, which is indeed true on a daily basis. But why does this increase the traffic?
What is the way to check this?
There are two main data sources you should use:
GCP-wide billing export. This will tell you an exact breakdown of your costs. This is important to make sure you target your effort where the cost is largest to you. It also provides some level of detail about what the usage is.
Enable access & storage logging. The access log will give you an exact accounting of incoming requests down to the number of bytes transferred. The storage logs give you similar granularity into the cost of storage itself.
In addition, if you have a snapshot of your bigstore, as time goes on and you replace or even rename files, your storage charges will increase because where once you had 2 views of the same storage, as the files change each file forks in 2 copies (one is the current view of your storage, one is the snapshot.)

aws dynamodb free tier practical limit

As per AWS Dynamodb pricing
it allows 25 read capacity units which translates to 50 GetItem requests per second ( with eventual consistency and each item being less than 4kb).
Free Tier*
As part of AWS’s Free Tier, AWS customers can get started with Amazon
DynamoDB for free. DynamoDB customers get 25 GB of free storage, as
well as up to 25 write capacity units and 25 read capacity units of
ongoing throughput capacity (enough throughput to handle up to 200
million requests per month) and 2.5 million read requests from
DynamoDB Streams for free.
How does this translate to online web site? If more than 50 users make GET calls at the same time, do the requests gets throttled ? and eventually 400 response is returned? Does it mean free tier practical limits are, during bursts if 100 users log in the site and make GET calls at same time, we may see 400 responses from DynamoDB. Is this valid conclusion?
I understand that there may hot be 100 requests per every second. But if the site has active users of more than 200 at any time, does Dydnamodb free tier still work?
First, read up on DynamoDB burst capacity. Your DynamoDB tables should be able to sustain short bursts of higher throughput without throttling.
Second, the question of how database throughput capacity will "translate to online web site" is way too broad to answer directly. It entirely depends on how many database calls your web application makes per web request. Your question sounds like you are assuming every page load on your website results in exactly one DynamoDB request. That seems like an extremely unrealistic assumption.
You should be using a CDN to prevent as many requests as possible from even hitting your web server. You should be using a cache like Redis to prevent as many data lookups from hitting your database as possible. Then you should perform some benchmarking to determine the database throughput you are going to need, and evaluate that against the DynamoDB free tier.
How often does your web app content change? Do users submit changes to your content and if so, how often? These questions will directly affect how well your site can be cached, which will directly affect how often your app will have to go back to the database to get the latest content.
As for what response is returned to the user in the scenario where the DynamoDB request is throttled, that would be entirely dependent how you capture and handle errors in your web app.

AWS S3 Standard Infrequent Access vs Reduced Redundancy storage class when coupled with CloudFront?

I'm using CloudFront to cache and distribute all of my thumbnails currently stored on S3 in Standard storage class. Since CloudFront caches originals and accesses them only every 24 hours, it makes sense to use a cheaper storage class than Standard: either Standard Infrequent Access (IA) or Reduced Redundancy (RR). But I'm not sure which one would be more suitable and cost effective.
Standard-IA has the cheapest storage among all (58% cheaper than Standard class and 47% cheaper than RR), but 60% more expensive requests than both Standard and RR. However, all files under 128kb stored in Standard-IA class are rounded to 128kb when calculating cost, which would apply to most of my thumbnail images.
Meanwhile, storage in RR class is only 20% cheaper than Standard, but its request cost is 60% cheaper than that of Standard-IA.
I'm unsure which one would be most cost effective in practice and would appreciate anyone with experience using both to give some feedback.
There's a problem with the premise of your question. The fact that CloudFront may cache your objects for some period of time actually has little relevance when selecting an S3 storage class.
REDUCED_REDUNDANCY is sometimes less expensive¹ because S3 stores your data on fewer physical devices, reducing the reliability somewhat in exchange for lower pricing... and in the event of failures, the object is statistically more likely to be lost by S3. If S3 loses the object because of the reduced redundancy, CloudFront will at some point begin returning errors.
The deciding factor in choosing this storage class is whether the object is easily replaced.
Reduced Redundancy Storage (RRS) is an Amazon S3 storage option that enables customers to reduce their costs by storing noncritical, reproducible data at lower levels of redundancy than Amazon S3’s standard storage. It provides a cost-effective, highly available solution for distributing or sharing content that is durably stored elsewhere, or for storing thumbnails, transcoded media, or other processed data that can be easily reproduced.
https://aws.amazon.com/s3/reduced-redundancy/
STANDARD_IA (infrequent access) is less expensive for a different reason: the storage savings are offset by retrieval charges. If an object is downloaded more than once per month, the combined charge will exceed the cost of STANDARD. It is intended for objects that will genuinely be accessed infrequently. Since CloudFront has multiple edge locations, each with its own independent cache,² whether an object is "currently stored in" CloudFront is not a question with a simple yes/no answer. It is also not possible to "game the system" by specifying large Cache-Control: max-age values. CloudFront has no charge for its cache storage, so it's only sensible that an object can be purged from the cache before the expiration time you specify. Indeed, anecdotal observations confirm what the docs indicate, that objects are sometimes purged from CloudFront due to a relative lack of "popularity."
The deciding factor in choosing this storage class is whether the increased data transfer (retrieval) charges will be low enough to justify the storage charge savings that they offset. Unless the object is expected to be downloaded less than once or twice a month, this storage class does not represent a cost savings.
Standard/Infrequent Access should be reserved for things you really don't expect to be needed often, like tarballs and database dumps and images unlikely to be reviewed after they are first accessed, such as (borrowing an example from my world) a proof-of-purchase/receipt scanned and submitted by a customer for a rebate claim. Once the rebate has been approved, it's very unlikely we'll need to look at that receipt again, but we do need to keep it on file. Hello, Standard_IA. (Note that S3 does this automatically for me, after the file has been stored for 30 days, using a lifecycle policy on the bucket).
Standard - IA is ideally suited for long-term file storage, older data from sync and share, backup data, and disaster recovery files.
https://aws.amazon.com/s3/faqs/#sia
Side note: one alternative mechanism for saving some storage cost is to gzip -9 the content before storing, and set Content-Encoding: gzip. I have been doing this for years with S3 and am still waiting for my first support ticket to come in reporting a browser that can't handle it. Even content that is allegedly already compressed -- such as .xlsx spreadsheets -- will often shrink a little bit, and every byte you squeeze out means slightly lower storage and download bandwidth charges.
Fundamentally, if your content is easily replaceable, such as resized images where you still have the original... or reports that can easily be rerun from source data... or content backed up elsewhere (AWS is essentially always my first choice for cloud services, but I do have backups of my S3 assets stored in another cloud provider's storage service, for example)... then reduced redundancy is a good option.
¹ REDUCED_REDUNDANCY is sometimes less expensive only in some regions as of late 2016. Prior to that, it was priced lower than STANDARD, but in an odd quirk of the strange world of competitive pricing, as a result of S3 price reductions announced in November, 2016, in some AWS regions, the STANDARD storage class is now slightly less expensive than REDUCED_REDUNDANCY ("RRS"). For example, in us-east-1, Standard was reduced from $0.03/GB to $0.023/GB, but RRS remained at $0.024/GB... leaving no obvious reason to ever use RRS in that region. The structure of the pricing pages leaves the impression that RRS may no longer be considered a current-generation offering by AWS. Indeed, it's an older offering than both STANDARD_IA and GLACIER. It is unlikely to ever be fully deprecated or eliminated, but they may not be inclined to reduce its costs to a point that lines up with the other storage classes if it's no longer among their primary offerings.
² "CloudFront has multiple edge locations, each with its own independent cache" is still a technically true statement, but CloudFront quietly began to roll out and then announced some significant architectural changes in late 2016, with the introduction of the regional edge caches. It is now, in a sense, "less true" that the global edge caches are indepenent. They still are, but it makes less of a difference, since CloudFront is now a two-tier network, with the global (outer tier) edge nodes sometimes fetching content from the regional (inner tier) edge nodes, instead of directly from the origin server. This should have the impact of increasing the likelihood of an object being considered to be "in" the cache, since a cache miss in the outer tier might be transformed into a hit by the inner tier, which is also reported to have more available cache storage space than some or all of the outer tier. It is not yet clear from external observations how much of an impact this has on hit rates on S3 origins, as the documentation indicates the regional edges are not used for S3 (only custom origins) but it seems less than clear that this universally holds true, particularly with the introduction of Lambda#Edge. It might be significant, but as of this writing, I do not believe it to have any material impact on my answer to the question presented here.
Since CloudFront caches originals and accesses them only every 24 hours
You can actually make CloudFront cache things for much longer if you want. You just need to add metadata to your objects that sets a Cache Control header, and according to the S3 documentation you can specify an age up to 100 years. You simply set a max-age in seconds, so if you really want to have your objects cached for 100 years:
Cache-Control: max-age=3153600000
As for your main question regarding SIA vs. RR, you've pretty much hit on all the differences between the two. It's just a matter of calculating the costs of using one vs. the other. You'll just need to run some calculations and see what the cost estimates are. If you have 100 thumbnails all under 128K then SIA will charge you for 100 * 128K bytes, whereas RR will just charge you for the costs of the total size of those 100 thumbnails. Similarly, if you set a fairly high cache timeout in CloudFront then you may see only 10 fetches from S3 each day, so SIA would charge you for retrieval of 10 * 128K bytes each day while RR would only charge you for the cost of the size of those 10 thumbnails.
Using some real numbers based on the size & quantity of your thumbnails and the amount of traffic you anticipate it should be pretty easy to come up with cost estimates.
FYI, you might also want to take a look at some of these slideshows and/or these videos. These are all from Amazon's re:Invent conferences, and these links should provide you with S3-specific presentations at those conferences.

iCloud versus iCloud Drive versus CloudKit

Would I be correct in assuming that a user of my iOS app that uses CloudKit would not need to pay a monthly fee to Apple to use my CloudKit app? Apple just changed from annual to monthly for anything above 5GB for iCloud... or is it unrelated? I've been hunting around Apples development site and I cannot seem to find a reference for who gets charged a fee for use of CloudKit. WWDC made brief reference that there is a limit that can be reached and it seemed like they were indicating that the developer would have to pay Apple if their user base exceeded a certain point. If that's the case, a developer would have to be very careful to make sure they charge enough for the app. It could get expensive if there is a sudden increase in app use by a smaller number of users.
When you store data in the public database, then it counts as data for your app. If you store data in the private database, then it's counted at the user's account (now max 5GB for free). If you want to know how much data is available for free for your app, then have a look here: https://developer.apple.com/icloud/documentation/cloudkit-storage/