I am using CloudFront to distribute HLS video streams. The original video files that CloudFront uses are broken down into thousands of .ts files that are stored in S3 buckets. CloudFront Reports only seem to show total bytes transferred for the top 50 .ts files. Is it possible to find the total bytes transferred from CloudFront for an entire video? I am not interested in the amount of data transferred for only a selection of .ts files. Id like to see the total bytes transferred for the total video folder from which those .ts files are stored.
You can find statistics under CloudFront-> Usage Reports
CloudFront Usage Reports - Data Transferred by Destination
Related
For my web application I am uploading all the files of a selected directory, if the total files size of that directory is less than 50GB then all the files are uploaded correctly but If goes beyond that then some of the uploaded files size is not matching with the actual files size (less than the actual file size).
I am using AWS JavaScript SDK for this.
Any help/input appreciated.
Thanks!
if your single put operation is exceeding the size of 5gb you may observe such inconsistencies.
what aws says
The total volume of data and number of objects you can store are unlimited. Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 terabytes. The largest object that can be uploaded in a single PUT is 5 gigabytes.
for more 5 gb PUToperations consider using multipart upload
I am doing some POC in Amazon Macie. I got from the documentation that it identifies PII data like credit card. Even I ran an example where I put some valid credit card numbers in CSV and put into S3 bucket and was identified by Macie.
I want to know if the same PII data is under some database backup/dump file, which is in S3 bucket. Will Macie be able to identify? I didn't find anything in the documentation.
So a couple of things are important here
Macie can only handle certain types of files and certain compression formats
If you specify S3 buckets that include files of a format that isn't supported in Macie, Macie doesn't classify them.
Compression formats
https://docs.aws.amazon.com/macie/latest/userguide/macie-compression-archive-formats.html
Encrypted Objects
Macie can only handle certain types of encrypted Amazon S3 objects
See the following link for more details:
https://docs.aws.amazon.com/macie/latest/userguide/macie-integration.html#macie-encrypted-objects
Macie Limits
Macie has a default limit on the amount of data that it can classify in an account. After this data limit is reached, Macie stops classifying the data. The default data classification limit is 3 TB. This can be increased if requested.
Macie's content classification engine processes up to the first 20 MB of an S3 object.
So specifically if you dump is compressed but in a suitable format inside the compression then yes Macie can classify, but on an important note it will only classify the first 20 MB of the file which is a problem if the file is large.
Typically I use lambda to split a large file into files just under 20 MB. You still need to think if you have X number of files how do you take a record from a file that has been classified as PII and map it back into something that is useable.
I need to transfer bunch of CSV files with around 4 TB in total to AWS.
What is preferred internet connection from my ISP which can handle this transfer or
link does not play any role. My link is 70 Mbps Upload/Download Dedicated. Is this enough or I need to increase my link speed?
Thnx.
4 TB = 4,194,304 mbyte
70 mbit/sec ~= 8.75 mbyte/sec (approximate because there will be network overheads)
Dividing results in 479,349 seconds, or 5.55 days
Increasing your link speed will certainly improve this, but you'll probably find that you get more improvement using compression (CSV implies text with a numeric bias, which compresses extremely well).
You don't say what you'll be uploading to, nor how you'll be using the results. If you're uploading to S3, I'd suggest using GZip (or another compression format) to compress the files before uploading, and then let the consumers decompress as needed. If you're uploading to EFS, I'd create an EC2 instance to receive the files and use rsync with the -z option (which will compress over the wire but leave the files uncompressed on the destination). Of course, you may still prefer pre-compressing the files, to save on long-term storage costs.
I'm using Amazon S3 to store videos and some audios (average size of 25 mb each) and users of my web and android app (so far) can access them with no problem but I want to know how much I'll pay later exceeding the free stage of S3 so I checked the S3 monthly calculator.
I saw that there is 5 fields:
Storage: I put 3 gb cause right now there are 130 files (videos and audios)
PUT/COPY/POST/LIST Requests: I put 15 cause I'll upload manually around 10-15 files each month
GET/SELECT and Other Requests: I put 10000 cause a projection tells me that the users will watch/listen those files around 10000 times monthly
Data Returned by S3 Select: I put 250 Gb (10000 x 25 mb)
Data Scanned by S3 Select: I don't know what to put cause I don't need that amazon scans or analyze those files.
Am I using that calculator in a proper way?
What do I need to put in "Data Scanned by S3 Select"?
Can I put only zero?
For audio and video, you can definitely specify 0 for S3 Select -- both data scanned and data returned.
S3 Select is an optional feature that only works with certain types of text files -- like CSV and JSON -- where you make specific requests for S3 to scan through the files and return matching values, rather than you downloading the entire file and filtering it yourself.
This would not be used with audio or video files.
Also, don't overlook "Data transfer out." In addition to the "get" requests, you're billed for bandwidth when files are downloaded, so this needs to show the total size of all the downloads. This line item is data downloaded from S3 via the Internet.
I have a web app with a download buttons to download objects from s3 buckets. I also have plot buttons to read the contents of csv files in s3 bucket using pandas read_csv to read the columns and make visualizations. I wanted to understand if the price for s3 data transfer out of the internet is only for actually download of files or it also includes just reading the contents too because the bytes are transferred over the internet in that case as well.
S3 does not operate like a file system. There is no notion of reading and writing portions of files as you would to a local or remote drive. To read a file you must always download the entire file and then read portions as needed. That is why AWS only shows pricing for data transfer.