amazon s3 access log files incorrect value »bytes sent« - amazon-web-services

analyzing our S3 access log files I have noticed that the value of the »data transfer out per month« in the S3 access log files (S3stat and own log file analysis) is strongly different from the values in your bills.
Now I have made a test downloading files from one of our buckets and it looks like the access log files are incorrect.
At the 03/02/2015 I have uploaded a zip file on our bucket and then downloaded the complete file successfully with two different internet connections.
One day later at the 04/02/2015 I have analyzed the log files. Unfortunately, both entries have the value "-" at "Bytes Sent".
Amazons »Server Access Log Format« (http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html) says:
»The number of response bytes sent, excluding HTTP protocol overhead, or "-" if zero.«
The corresponding entries looks like this:
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 41 +0000] RemoteIP -
RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP
/ 1.1 "200 - - 760 542 2228865159 58" - "" Mozilla / 5.0 (Windows NT
6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 57 +0000] RemoteIP -
RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP
/ 1.1 "200 - - 860 028 2228865159 23" - "" Mozilla / 5.0 (Windows NT
6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
As you can see has both logs quite long connection duration »Total Time«: 0:12:40 and 0:14:20.
Then I checked our log files of our main buckets for the month December 2014 based on these findings. In 2332 relevant entries (all ZIP files on our bucket) I found 860 entries with this error.
Thus, the Amazon S3 access log files seem flawed and useless for our analysis.
Can anybody help me? Do I make a mistake and if so, how can these log files be reliably evaluated?
Thanks
Peter

after two months of inquiry Amazon it looks like Amazon has fixed this issue. My first test for the time period 13.03. to 16.03. has no such errors anymore and our S3stat analysis has a massive (now correct) leap in the »Daily Bandwidth« since the 12.03.2015.
For more information you can look here:
https://forums.aws.amazon.com/thread.jspa?messageID=606654
Peter

Related

How to identify the object that consumes the most bandwidth in an AWS S3 bucket?

What is the best way to identify the object that consumes the most bandwidth in a S3 bucket with thousands of other objects ?
By "bandwidth" I will assume that you mean the bandwidth consumed by delivering files from S3 to some place on the Internet (as when you use S3 to serve static assets).
To track this, you'll need to enable S3 access logs, which creates logfiles in a different bucket that show all of the operations against your primary bucket (or a path in it).
Here are two examples of logged GET operations. The first is from anonymous Internet access using a public S3 URL, while the second uses the AWS CLI to download the file. I've redacted or modified any identifying fields, but you should be able to figure out the format from what remains.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [04/Feb/2020:15:50:00 +0000] 3.4.5.6 - XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 9 8 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - ECDHE-RSA-AES128-GCM-SHA256 - com-example-mybucket.s3.amazonaws.com TLSv1.2
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [05/Feb/2020:14:51:44 +0000] 3.4.5.6 arn:aws:iam::123456789012:user/me XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 29 29 "-" "aws-cli/1.17.7 Python/3.6.9 Linux/4.15.0-76-generic botocore/1.14.7" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader com-example-mybucket.s3.amazonaws.com TLSv1.2
So, to get what you want:
Enable logging
Wait for a representative amount of data to be logged. At least 24 hours unless you're a high-volume website (and note that it can take up to an hour for log records to appear).
Extract all the lines that contain REST.GET.OBJECT
From these, extract the filename and the number of bytes (in this case, the file is 90 bytes).
For each file, multiply the number of bytes by the number of times that it appears in a given period.
Beware: because every access is logged, the logfiles can grow quite large, quite fast, and you will pay for storage charges. You should create a life-cycle rule on the destination bucket to delete old logs.
Update: you could also use Athena to query this data. Here's an AWS blog post that describes the process.

Problem Uploading Files Using Chilkat sFTP

File upload returns "Status Code 8 - Invalid Parameter" response. Looking for some advice on what might be causing this.
I'm using Chilkat sFTP to transfer and receive files to and from multiple partners without issue but for a new partner I'm seeing the following error. The partner's tech team are asking if a passive connection is being invoked but I can't see any properties within Chilkat which would enable me to change this.
Log message:
ChilkatLog:
OpenFile:
DllDate: Jul 31 2014
ChilkatVersion: 9.5.0.43
UnlockPrefix: NORVICSSH
Username: LVPAPP005:scheduleradminprod
Architecture: Little Endian; 64-bit
Language: .NET 2.0 / x64
VerboseLogging: 0
SshVersion: SSH-2.0-FTP Server ready
SftpVersion: 3
sftpOpenFile:
remotePath: \GIB_DAILY_CENTAUR_POSITIONS_20190403.CSV
access: writeOnly
createDisposition: createTruncate
v3Flags: 0x1a
Sent FXP_OPEN
StatusResponseFromServer:
Request: FXP_OPEN
InformationReceivedFromServer:
StatusCode: 8
StatusMessage: Invalid parameter
--InformationReceivedFromServer
--StatusResponseFromServer
--sftpOpenFile
Failed.
--OpenFile
--ChilkatLog
You're confusing the SSH/SFTP protocol with the FTP protocol. The two are entirely different protocols. The concept of "passive" data transfers does not exist in SSH/SFTP as it does in the FTP protocol.

Gzip compression with CloudFront doesn't work

I have an angular app which, even builded with prod mode, has multiple large files (more than 1MB).
I want to compress them with gzip compression feature present on CloudFront.
I activated "Compress Objects Automatically" option in CloudFront console. The origin of my distribution is a s3 bucket.
However the bundle downloaded when I'm loading the page via my broswer are not compressed with gzip
here's an example of an request/response
Request header :
:authority:dev.test.com
:method:GET
:path:/vendor.cc93ad5b987bea0611e1.bundle.js
:scheme:https
accept:*/*
accept-encoding:gzip, deflate, br
accept-language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
cache-control:no-cache
pragma:no-cache
referer:https://dev.test.com/console/projects
user-agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
Response header
accept-ranges:bytes
age:17979
content-length:5233622
content-type:text/javascript
date:Tue, 07 Nov 2017 08:42:08 GMT
etag:"6dfe6e16901c5ee5c387407203829bec"
last-modified:Thu, 26 Oct 2017 09:57:15 GMT
server:AmazonS3
status:200
via:1.1 9b307acf1eed524f97301fa1d3a44753.cloudfront.net (CloudFront)
x-amz-cf-id:9RpiXSuSGszUaX7hBA4ZaEO949g76UDoCaxzwFtiWo7C-wla-PyBsA==
x-cache:Hit from cloudfront
According to the AWS documentation everything is ok :
Accept-Encoding: gzip
Content-Length present
file between 1,000 and
10,000,000 bytes
...
Have you an idea why cloudfront doen't compress my files ?
This response was cached several hours ago.
age:17979
CloudFront won't go back and gzip what has already been cached.
CloudFront compresses files in each edge location when it gets the files from your origin. When you configure CloudFront to compress your content, it doesn't compress files that are already in edge locations. In addition, when a file expires in an edge location and CloudFront forwards another request for the file to your origin, CloudFront doesn't compress the file if your origin returns an HTTP status code 304, which means that the edge location already has the latest version of the file. If you want CloudFront to compress the files that are already in edge locations, you'll need to invalidate those files.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
Do a cache invalidation, wait for it to complete, and try again.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Invalidation.html
Dynamic GZip compression is handle by CloudFront in best effort basis. Based on the capacity and availability in edge locations.
To get predictable compression, you need to gzip them before uploading to S3.

Why doesn't CloudFront return the cached version of these identical URL's?

I have a server on eb (running a tomcat application), I also have a CloudFront cache setup to cache duplicate requests so that they dont go to the server.
I have two behaviours set up
/artist/search
/Default(*)
and Default(*) is set to:
Allowed Http Methods :GET,PUT
Forward Headers :None
Headers :Customize
Timeout :84,0000
Forward Cookies :None
Forward Query Strings :Yes
Smooth Streaming :No
Restricted View Access:No
so there is no timeout and the only thing it forwards are queries strings
Yet I can see from looking at the localhost_access_log file that my server id receiving duplicate requests:
127.0.0.1 - - [22/Apr/2015:10:58:28 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:29 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:38 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
I can also see from my CloudFront Popular Objects page there are many objects that hit sometimes and miss sometimes including these artist urls, I was expecting only one miss and there all the rest to be hits
Why would this be ?
Update
Looking more carefully it seems (although not sure about this) that less likely to be cached as the size of the artist page increases, but extra weirdly even if the main artist page is larger it also seems to reget everthing referenced in that page such as icons (pngs) but not when the artist page is small. This is the worst outcome for me because it is the large artist pages that need more processing to create on the server - this is why I using cloudfront to try and avoid the recreation of these pages in the first place.
What you are seeing is a combination of two reasons:
Each individual CloudFront POP requests object separately, so if your viewers are in different locations you can expect multiple queries to your origin server (and they will be misses)
I'm not sure about report date range you are looking at, but CloudFront eventually evicts less popular objects to make room in cache for new objects

S3 PUT Bucket to a location endpoint results in a MalformedXML exception

I'm trying to create an AWS s3 bucket using libCurl thusly:
Location end-point
curl_easy_setopt(curl, CURLOPT_URL, "http://s3-us-west-2.amazonaws.com/");
Assembled RESTful HTTP header:
PUT / HTTP/1.1
Date:Fri, 18 Apr 2014 19:01:15 GMT
x-amz-content-sha256:ce35ff89b32ad0b67e4638f40e1c31838b170bbfee9ed72597d92bda6d8d9620
host:tempviv.s3-us-west-2.amazonaws.com
x-amz-acl:private
content-type:text/plain
Authorization: AWS4-HMAC-SHA256 Credential=AKIAISN2EXAMPLE/20140418/us-west-2/s3/aws4_request, SignedHeaders=date;x-amz-content-sha256;host;x-amz-acl;content-type, Signature=e9868d1a3038d461ff3cfca5aa29fb5e4a4c9aa3764e7ff04d0c689d61e6f164
Content-Length: 163
The body contains the bucket configuration
http://s3.amazonaws.com/doc/2006-03-01/">us-west-2
I get the following exception back.
MalformedXMLThe XML you provided was not well-formed or did not validate against our published schema
I've been able to carry out the same operation through the aws cli.
Things I've also tried.
1) In the xml, used \ to escape the quotes (i.e., xmlns=\"http:.../\").
2) Not providing a CreateBucketConfiguration ("Although s3 documentation suggests this is not allowed when sending the request to a location endpoint").
3) A get service call to the same end point is listing all the provisioned buckets correctly.
Please do let me know if there is anything else I might be missing here.
Ok, the problem was that I was not transferring the entire xml across as was revealed by a wireshark trace. Once I fixed it, the problem went away.
Btw... escaping the quotes with a \ works but the & quot ; does not.