Looking for usage details on Amazon S3 - amazon-web-services

I'm trying to look for reports on my Amazon S3 usage, but Amazon only provides simple summary on the usage, such as amount of storage / transfer in a particular month. I need to have a breakdown of these data by files, for example:
abc.mp3 : 123 GET request / 0.12Mb transferred
hello.mp4 : 345 GET request / 0.32Mb transferred
fun.docx : 834 GET request / 0.20Mb transferred
Also, I need to know where do these GET request coming from, so I can better monitor and control on the S3 usage. For example:
abc.mp3:
53 GET request from http://www.example.com/music/page1.html
70 GET request from http://www.example.com/music/page2.html
Any tools / method on achieving these? Thanks!

You need to enable S3 access logs: http://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html
Then you should be able to parse the logs for the information you want. Once you start getting logs there are many options for parsing them, here are a few that I found with a quick search:
s3stat
s3-logs-analyzer
Loggly S3 support

Related

Seeking advice: Aws Apigateway for serving data that is updated every day

My boss wants me to build an API that returns the daily currency exchange ratio beween USD and JPY.
For this information, my boss wants to use a specific website. This website published a daily exchange ratio on 10AM everyday, which is available from a certain public API.
Maybe the simplest solution is to invoke this public API from my API. The catch is that this public API has a limit of 1000 invocations daily, but we expect our customers to invoke my API way more than that.
I can run a cronjob to get the latest information on 10AM every day, but I don't know how to transfer this information to my API in AWS environment.
Database is clearly an overkill as this DB only has to store only one entry for the daily info.
Can anybody suggest a better solution for this use case?
There are tons of ways to implement this. Get the data via API call and use any of the following ways to store it:
Store the data in S3 in any format (txt, csv, json, yml, etc). Read the data from this S3 bucket via your API call
If you're planning to use API Gateway then you can cache the API call. Use this cache to serve the data and don't have to persist it anywhere else. Pretty sure you'll not hit 1k limit with this cache implemented. https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html
DynamoDB is also a good place to store such data. It will be cheap also if data is not huge AND super performant
Elastic Cache (Redis) is another place to store the data for a day
CloudFront in front of S3 is also a great way for not so dynamic data. Cache the data for a day and just read it from CloudFront
SSM param store is also an option but SSM is not meant to be persistent database
Storing to S3 should be easy.
let xr = 5.2838498;
await s3
.putObject({
Bucket: 'mybucket',
Key: `mydataobject`,
Body: xr.toString(),
ContentType: 'text/plain;charset=utf-8'
})
.promise();
xr = Number((await s3.getObject({
Bucket: 'mybucket',
Key: 'mydataobject',
}).promise()).Body?.toString('utf-8'));

Hasura on Google Cloud Run - Monitoring

I would like to have a monitoring on my Hasura API on Google Cloud Run. Actually I'm using the monitoring of Google Cloud but It is not really perfect. I have the count of 200 code request. But I want for example, the number of each query / mutation endpoint request.
I want :
count 123 : /graphql/user
count 234 :/graphql/profil
I have :
count 357 : /graphql
If you have an idea.
Thanks
You can't do this with GraphQL unfortunately. All queries are sent to the /v1/graphql endpoint on Hasura, and the only way to distinguish the operations is by parsing the query parameter of the HTTP request and grabbing the operation name.
If Google Cloud allows you to query properties in logs of HTTP requests, you can set up filters on the body, something like:
"Where [request params].query includes 'MyQueryName'"
Otherwise your two options are:
Use Hasura Cloud (https://hasura.io/cloud), which gives you a count of all operations and detailed metrics (response time, variables, etc) on your console dashboard
Write and deploy a custom middleware server or a script for a reverse proxy that handles this

Apache Beam Streaming data from Kafka to GCS Bucket (Not using pubsub)

I have seen lot of examples of Apache Beam where you read data from PubSub and write to GCS bucket, however is there any example of using KafkaIO and writing it to GCS bucket?
Where I can parse the message and put it in appropriate bucket based on the message content?
For e.g.
message = {type="type_x", some other attributes....}
message = {type="type_y", some other attributes....}
type_x --> goes to bucket x
type_y --> goes to bucket y
My usecase is streaming data from Kafka to GCS bucket, so if someone suggest some better way to do it in GCP its welcome too.
Thanks.
Regards,
Anant.
You can use Secor to load messages to a GCS bucket. Secor is also able to parse incoming messages and puts them under different paths in the same bucket.
You can take a look at the example present here - https://github.com/0x0ece/beam-starter/blob/master/src/main/java/com/dataradiant/beam/examples/StreamWordCount.java
Once you have read the data elements if you want to write to multiple destinations based on a specific data value you can look at multiple outputs using TupleTagList the details of which can be found here - https://beam.apache.org/documentation/programming-guide/#additional-outputs

CloudFront Top Referrers Report - ALL referrer URLs

In AWS I can find under:
Cloudfront >> Reports & Analytics >> Top Referrers (CloudFront Top Referrers Report)
There I get the top25 items. How can I get ALL of them?
I have turned on logging in my bucket, but it seems that the referrer is not part of the log-file. Any idea how amazon collects its top25 and how I can according to that get the whole list?
Thanks for your help, in advance.
Amazon's built in analytics are, as you've noticed, rather basic. The data you're looking for all lives in the logfiles that you can set cloudfront up to export (in the cs(Referer) field). If you know what you're looking for, you can set up a little pipeline to download logs, pull out the numbers you care about and generate reports.
Amazon also makes it easy[1] to set up Athena or Redshift to look directly at Cloudfront or S3 logfiles in their target bucket. After a one-time setup, you could query them directly for the numbers you need.
There are also paid services built to fill in the holes in Amazon's default reports. S3stat (https://www.s3stat.com/), for example, will give you a Top 200 Referrer list in its reports, with the ability to export complete lists.
[1] "easy", using Amazon's definition of the word, meaning really really hard.

How to check Allowed Attachment File Size in Amazon SES?

I am using Amazon SES for sending mails in a custom PHP project. I am facing a couple of issues.
1) The amazon ses allows me to send small sized pdf files. Where i can change the file size limit? I am unable to find it.
2) The amazon ses just allows pdf files to be sent. Whenever I try to send any other file type it says illegal file name. Please tell me how to fix this?
Thanks in advance.
Any help would be highly appreciated.
AWS SES mail size limit is 10MB. It will allow PDF's and many other file types, but there are restrictions.
You can read more here: http://aws.amazon.com/ses/faqs/#49
If you need to send a restricted file type, you can rename the file before it goes out and the recipient would have to know enough to rename it when it arrives (which is a pain), so I use a backup SMTP server in those cases.
While the default is 10 MB, in 2021 it is now possible to request Amazon increase your maximum message size to up to 40 MB as per https://aws.amazon.com/ses/faqs/#49.