How to configure jetty request.log date format? - jetty

Can someone advise, I have an issue with request.log on some of my jetty instances.
It looks like the date in the log record is locale dependent, for example below it is formatted like russian locale which means 18 of February, despite the fact that the system locale on this RHEL 6.6 + Jetty 9.2.1 instance is set to en_US.UTF-8.
10.1.182.45 - - [18/фев/2017:16:17:11 +0200] "GET /auth/ HTTP/1.0"
10.1.182.45 - - [18/фев/2017:16:17:23 +0200] "GET /auth/ HTTP/1.0"
10.1.182.45 - - [18/фев/2017:16:17:59 +0200] "GET /auth/ HTTP/1.0"
I would like to change format to "18/Feb/2017" because on other similar instances it is in English and I can't determine which factor affects this.
I didn't find such option in the jetty configuration files for request.log, there was only the time zone setting, and the system locale is already in en_US.UTF-8.

The NCSA Log has a Locale, and its using the Java Locale.getDefaults() to figure it out for your system.
Locale logLocale = Locale.getDefault();
As for how to change it, you can either ...
Setup your default Java Locale to be more appropriate for all things running in your Java JVM.
Or, in your chosen NCSA Log configuration, you can use the .setLogLocale(Locale) to set Locale you want it to use.

Related

What does "←[37m" mean in the terminal when running Flask?

When I type flask run and go to 127.0.0.1:5000/myfirstpage, I can see the following output in my terminal:
127.0.0.1 - - [29/Apr/2021 14:55:34] "←[37mGET /myfirstpage HTTP/1.1←[0m" 200 -
I understand that 127.0.0.1 is my localhost server, myfirstpage the path, HTTP/1.1 the version of the hypertext transfer protocol and 200 the HTTP status code for 'successfully responded to request'.
But what do ←[37m and ←[0m stand for?
Looks a lot like badly formatted terminal escape sequences.
According to https://www.lihaoyi.com/post/BuildyourownCommandLinewithANSIescapecodes.html
it is
White: \u001b[37m
Reset: \u001b[0m
also have a look at that table from wikipedia

How to identify the object that consumes the most bandwidth in an AWS S3 bucket?

What is the best way to identify the object that consumes the most bandwidth in a S3 bucket with thousands of other objects ?
By "bandwidth" I will assume that you mean the bandwidth consumed by delivering files from S3 to some place on the Internet (as when you use S3 to serve static assets).
To track this, you'll need to enable S3 access logs, which creates logfiles in a different bucket that show all of the operations against your primary bucket (or a path in it).
Here are two examples of logged GET operations. The first is from anonymous Internet access using a public S3 URL, while the second uses the AWS CLI to download the file. I've redacted or modified any identifying fields, but you should be able to figure out the format from what remains.
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [04/Feb/2020:15:50:00 +0000] 3.4.5.6 - XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 9 8 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx - ECDHE-RSA-AES128-GCM-SHA256 - com-example-mybucket.s3.amazonaws.com TLSv1.2
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx com-example-mybucket [05/Feb/2020:14:51:44 +0000] 3.4.5.6 arn:aws:iam::123456789012:user/me XXXXXXXXXXXXXXXX REST.GET.OBJECT index.html "GET /index.html HTTP/1.1" 200 - 90 90 29 29 "-" "aws-cli/1.17.7 Python/3.6.9 Linux/4.15.0-76-generic botocore/1.14.7" - xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx SigV4 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader com-example-mybucket.s3.amazonaws.com TLSv1.2
So, to get what you want:
Enable logging
Wait for a representative amount of data to be logged. At least 24 hours unless you're a high-volume website (and note that it can take up to an hour for log records to appear).
Extract all the lines that contain REST.GET.OBJECT
From these, extract the filename and the number of bytes (in this case, the file is 90 bytes).
For each file, multiply the number of bytes by the number of times that it appears in a given period.
Beware: because every access is logged, the logfiles can grow quite large, quite fast, and you will pay for storage charges. You should create a life-cycle rule on the destination bucket to delete old logs.
Update: you could also use Athena to query this data. Here's an AWS blog post that describes the process.

ELB Keeps Inconsistently Failing Health Checks, but EC2 Status Checks OK

Servers are accessible normally. Checking /
(default page)
We will have some sort of load, these will respond a little slower than it likes. Then Take down the Instances from our load balancer.
Because the Application doesnt fail, I cannot "reboot" the instances via Ec2. I can often access the webpage / IP direct myself when it's "out of service"
This isn't a general failure or a misconfiguration, it can be up for 12-2400 hours, but then randomly fail 3x in 3 hrs. Under medium-low load.
Server set to 10s response timeouts, 30s intervals, 5x to Fail; 2x to say its ok.
Any ideas?
Health check logs are responding normal, and nothing in ERRORS. Heres a sample from access:
10.0.100.30 - - [25/Nov/2016:06:49:22 +0000] "GET /index.html HTTP/1.1" 200 11415 "-" "ELB-HealthChecker/1.0"
::1 - - [25/Nov/2016:06:49:26 +0000] "OPTIONS * HTTP/1.0" 200 126 "-" "Apache/2.4.20 (Ubuntu) (internal dummy connection)"

Why doesn't CloudFront return the cached version of these identical URL's?

I have a server on eb (running a tomcat application), I also have a CloudFront cache setup to cache duplicate requests so that they dont go to the server.
I have two behaviours set up
/artist/search
/Default(*)
and Default(*) is set to:
Allowed Http Methods :GET,PUT
Forward Headers :None
Headers :Customize
Timeout :84,0000
Forward Cookies :None
Forward Query Strings :Yes
Smooth Streaming :No
Restricted View Access:No
so there is no timeout and the only thing it forwards are queries strings
Yet I can see from looking at the localhost_access_log file that my server id receiving duplicate requests:
127.0.0.1 - - [22/Apr/2015:10:58:28 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:29 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
127.0.0.1 - - [22/Apr/2015:10:58:38 +0000] "GET /artist/cee3e39e-fb10-414d-9f11-b50fa7d6fb7a HTTP/1.1" 200 1351114
I can also see from my CloudFront Popular Objects page there are many objects that hit sometimes and miss sometimes including these artist urls, I was expecting only one miss and there all the rest to be hits
Why would this be ?
Update
Looking more carefully it seems (although not sure about this) that less likely to be cached as the size of the artist page increases, but extra weirdly even if the main artist page is larger it also seems to reget everthing referenced in that page such as icons (pngs) but not when the artist page is small. This is the worst outcome for me because it is the large artist pages that need more processing to create on the server - this is why I using cloudfront to try and avoid the recreation of these pages in the first place.
What you are seeing is a combination of two reasons:
Each individual CloudFront POP requests object separately, so if your viewers are in different locations you can expect multiple queries to your origin server (and they will be misses)
I'm not sure about report date range you are looking at, but CloudFront eventually evicts less popular objects to make room in cache for new objects

amazon s3 access log files incorrect value »bytes sent«

analyzing our S3 access log files I have noticed that the value of the »data transfer out per month« in the S3 access log files (S3stat and own log file analysis) is strongly different from the values in your bills.
Now I have made a test downloading files from one of our buckets and it looks like the access log files are incorrect.
At the 03/02/2015 I have uploaded a zip file on our bucket and then downloaded the complete file successfully with two different internet connections.
One day later at the 04/02/2015 I have analyzed the log files. Unfortunately, both entries have the value "-" at "Bytes Sent".
Amazons »Server Access Log Format« (http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html) says:
»The number of response bytes sent, excluding HTTP protocol overhead, or "-" if zero.«
The corresponding entries looks like this:
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 41 +0000] RemoteIP -
RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP
/ 1.1 "200 - - 760 542 2228865159 58" - "" Mozilla / 5.0 (Windows NT
6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
Bucket Owner Bucket [03 / Feb / 2015: 10: 28: 57 +0000] RemoteIP -
RequestID REST.GET.OBJECT Download.zip "GET /Bucket/Download.zip HTTP
/ 1.1 "200 - - 860 028 2228865159 23" - "" Mozilla / 5.0 (Windows NT
6.1; WOW64; rv: 35.0) Gecko / 20100101 Firefox / 35.0 "-
As you can see has both logs quite long connection duration »Total Time«: 0:12:40 and 0:14:20.
Then I checked our log files of our main buckets for the month December 2014 based on these findings. In 2332 relevant entries (all ZIP files on our bucket) I found 860 entries with this error.
Thus, the Amazon S3 access log files seem flawed and useless for our analysis.
Can anybody help me? Do I make a mistake and if so, how can these log files be reliably evaluated?
Thanks
Peter
after two months of inquiry Amazon it looks like Amazon has fixed this issue. My first test for the time period 13.03. to 16.03. has no such errors anymore and our S3stat analysis has a massive (now correct) leap in the »Daily Bandwidth« since the 12.03.2015.
For more information you can look here:
https://forums.aws.amazon.com/thread.jspa?messageID=606654
Peter