My team is using Amazon Kinesis to output the results of queries on other datasets to our own S3 bucket.
Although we were not running the queries often at all, we saw that in our billing console, we were still using 24,000 shard hours so far in December alone.
Does anyone now if Kinesis charges when the shards are actually running, or if they are just up and existing?
You are charged for each shard at an hourly rate. If you enable Extended Data Protection you are charged an additional rate on each shard hour. It does not matter if you are actually using the shard.
Amazon Kinesis Data Streams Pricing
Related
Is there any data limitation on aws cloudwatch logs to send the logs , because in my case I am getting the logs data 6 million records per 3 days from my application. So is aws cloudwatch logs will able to handle that much data?
Check out the aws quotas page. Not sure what you mean by "60lac" but the limits on CloudWatch are more than adequate for the majority of use cases.
There is no published limit on the overall data volume held. There'll be a practical limit somewhere but it won't be hit by a single AWS customer. If you're using the putLogEvents API you could be constrained by the limit of 5 requests per second per log stream, in which case consider using more streams or larger batches of events (up to 1MB).
I have an ec2 instance which is running apache application.
I have to store my apache log somewhere. For this, I have used two approaches:
Cloudwatch Agent to push logs to cloudwatch
CronJob to push log file to s3
I have used both of the methods. Both methods suit fine for me. But, here I am little worried about the costing.
Which of these will have minimum cost?
S3 Pricing is basically is based upon three factors:
The amount of storage.
The amount of data transferred every month.
The number of requests made monthly.
The cost for data transfer between S3 and AWS resources within the same region is zero.
According to Cloudwatch pricing for logs :
All log types. There is no Data Transfer IN charge for any of CloudWatch.Data Transfer OUT from CloudWatch Logs is priced.
Pricing details for Cloudwatch logs:
Collect (Data Ingestion) :$0.50/GB
Store (Archival) :$0.03/GB
Analyze (Logs Insights queries) :$0.005/GB of data scanned
Refer CloudWatch pricing for more details.
Similarly, according to AWS, S3 pricing differs region wise.
e.g For N.Virginia :
S3 Standard Storage
First 50 TB / Month :$0.023 per GB
Next 450 TB / Month :$0.022 per GB
Over 500 TB / Month :$0.021 per GB
Refer S3 pricing for more details.
Hence, we can conclude that sending logs to S3 will be more cost effective than sending them to CloudWatch.
They both have similar storage costs, but CloudWatch Logs has an additional ingest charge.
Therefore, it would be lower cost to send straight to Amazon S3.
See: Amazon CloudWatch Pricing – Amazon Web Services (AWS)
I have messages being put into SQS on a cron job at a rate of about 1,000 per minute.
I am looking to run a lambda function periodically, that will grab some of the messages and out them into dynamoDB with regards to the throughout which will changeover time.
You can go with 'OnDemand' pricing for your use-case. AWS link The pricing is different that the provisioned capacity method.
With on-demand capacity mode, you pay per request for the data reads and writes your application performs on your tables. You do not need to specify how much read and write throughput you expect your application to perform as DynamoDB instantly accommodates your workloads as they ramp up or down.
With this approach, you don't need to configure WCUs (or RCUs).
I'm attempting to price out a streaming data / analytic application deployed to AWS and looking at using Kinesis Firehose to dump the data into S3.
My question is, when pricing out the S3 costs for this, I need to figure out out how many PUT's I will need.
So, I know the Firehose buffers the data and then flushes out to S3, however I'm unclear on whether it will write a single "file" with all of the records accumulated up to that point or if it will write each record individually.
So, assuming I set the buffer size / interval to an optimal amount based on size of records, does the number of S3 PUT's still equal the number of records OR the number of flushes that the Firehose performs?
Having read a substantial amount of AWS documentation, I respectfully disagree with the assertion that S3 will not charge you.
You will be billed separately for charges associated with Amazon S3 and Amazon Redshift usage including storage and read/write requests. However, you will not be billed for data transfer charges for the data that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further details, see Amazon S3 pricing and Amazon Redshift pricing. [emphasis mine]
https://aws.amazon.com/kinesis/firehose/pricing/
What they are saying you will not be charged is anything additional by Kinesis Firehose for the transfers, other than the $0.035/GB, but you'll pay for the interactions with your bucket. (Data inbound to a bucket is always free of actual per-gigabyte transfer charges).
In the final analysis, though, you appear to be in control of the rough number of PUT requests against your bucket, based on some tunable parameters:
Q: What is buffer size and buffer interval?
Amazon Kinesis Firehose buffers incoming streaming data to a certain size or for a certain period of time before delivering it to destinations. You can configure buffer size and buffer interval while creating your delivery stream. Buffer size is in MBs and ranges from 1MB to 128MB. Buffer interval is in seconds and ranges from 60 seconds to 900 seconds.
https://aws.amazon.com/kinesis/firehose/faqs/#creating-delivery-streams
Unless it is collecting and aggregating the records into large files, I don't see why there would be a point in the buffer size and buffer interval... however, without firing up the service and taking it for a spin, I can (unfortunately) only really speculate.
I don't believe you pay anything extra for the write operation to S3 from Firehose.
You will be billed separately for charges associated with Amazon S3
and Amazon Redshift usage including storage and read/write requests.
However, you will not be billed for data transfer charges for the data
that Amazon Kinesis Firehose loads into Amazon S3 and Amazon Redshift.
For further details, see Amazon S3 pricing and Amazon Redshift
pricing.
https://aws.amazon.com/kinesis/firehose/pricing/
the cost is one S3 PUT for any operation done by kinesis, not for a single object.
so one flush of firehose is one put:
https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html
https://forums.aws.amazon.com/thread.jspa?threadID=219275&tstart=0
I started working with amazon CloudWatch Logs. The question is, are AWS using Glacier or S3 to store the logs? They are using Kinesis to process the logs using filters. Can anyone please tell the answer?
AWS is likely to use S3, not Glacier.
Glacier would make problems if you would want access older logs as to get data stored in Amazon Glaciers can take few hours and this is definitely not the reaction time one expects from CloudWatch log analysing solution.
Also the price set for storing 1 GB of ingested logs seems to be derived from 1 GB stored on AWS S3.
S3 price for one GB stored a month is 0.03 USD, and price for storing 1 GB of logs per month is also 0.03 USD.
On CloudWatch pricing page is a note:
*** Data archived by CloudWatch Logs includes 26 bytes of metadata per log event and is compressed using gzip level 6 compression. Archived
data charges are based on the sum of the metadata and compressed log
data size.
According to Henry Hahn (AWS) presentation on CloudWatch it is "3 cents per GB and we compress it," ... " so you get 3 cents per 10 GB".
This makes me believe, they store it on AWS S3.
They are probably using DynamoDB. S3 (and Glacier) would not be good for files that are appended to on a very frequent basis.