Is there any better solution implement to get aws cloudtrail logs to kibana, here I am using ElasticSearch Service from AWS
Heres the logstash input that I use with 1.4.2. It works well, though I suspect it is noisy (it requires a lot of S3 GET/HEAD/LIST requests).
input {
s3 {
bucket => "bucketname"
delete => false
interval => 60 # seconds
prefix => "cloudtrail/"
type => "cloudtrail"
codec => "cloudtrail"
credentials => "/etc/logstash/s3_credentials.ini"
sincedb_path => "/opt/logstash_cloudtrail/sincedb"
}
}
filter {
if [type] == "cloudtrail" {
mutate {
gsub => [ "eventSource", "\.amazonaws\.com$", "" ]
add_field => {
"document_id" => "%{eventID}"
}
}
if ! [ingest_time] {
ruby {
code => "event['ingest_time'] = Time.now.utc.strftime '%FT%TZ'"
}
}
ruby {
code => "event.cancel if (Time.now.to_f - event['#timestamp'].to_f) > (60 * 60 * 24 * 1)"
}
ruby {
code => "event['ingest_delay_hours'] = (Time.now.to_f - event['#timestamp'].to_f) / 3600"
}
# drop events more than a day old, we're probably catching up very poorly
if [ingest_delay_hours] > 24 {
drop {}
}
# example of an event that is noisy and I don't care about
if [eventSource] == "elasticloadbalancing" and [eventName] == "describeInstanceHealth" and [userIdentity.userName] == "deploy-s3" {
drop {}
}
}
}
The credentials.ini format is explained on the s3 input page; it's just this:
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
I also have a search that sends results to our #chatops but I'm not posting that here.
If you haven't tried it already, you can use cloudtrail and cloudwatch logs together. Then use cloudwatch logs to create a subscription to send the cloudtrail data to elasticsearch.
Once that is done you should be able to define a kibana index that starts with cwl* that is time based.
Cheers-
Related
How can I create logstash (https://www.elastic.co/logstash/) pipeline to transfer a single individual message to AWS s3 bucket individual files with the file name as one of the attributes of Kafka message.
I am able to set up a simple pipeline using the following.
I am using s3 output plugin:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-s3.html
and Kafka input plugin :
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html
input {
kafka {
bootstrap_servers => "mykafkaserver:9092"
topics => "document"
group_id => "xLogAna1"
auto_offset_reset => "earliest"
max_poll_records => 1
fetch_max_bytes=>10
}
}
output {
s3{
access_key_id => "XXXXXXXXXXXXXXX"
secret_access_key => "SSSSSSSSSSSSSSS"
region => "eu-west-1"
bucket => "<my-documnt-bucket>"
size_file => 1
time_file => 5
codec => "plain"
}
}
Since I want individual messages in individual files i tweaked the parameters max_poll_records and fetch_max_bytes to get individual messages but it's no help. In resultant s3 files, i am getting numerous kafka messages
I am trying to use logstash to push messages to the GCS using the output plugin below. I am able to see the msgs in the bucket, however they appear every hour and not real time. Where can I change the frequency of the log send?
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-google_cloud_storage.html
P.S: I tried to add this to my config file but of no use:
flush_interval_secs => 2
my config looks something like this:
input{
kafka {
zk_connect => "xxxxxxxxxxxxxxxxxxxxxx"
group_id => "yyyyyyyyyyyyyyyyyyyyyyyyyyyy"
topic_id => "zzzzzzzzzzzzzzzzz"
reset_beginning => true
auto_offset_reset => "smallest"
}
}
output
{
google_cloud_storage {
bucket => "aaaa/bbb"
flush_interval_secs => 15
}
stdout
{
codec => rubydebug
}
}
From Documentation:
Uploader interval when uploading new files to GCS. Adjust time based on your time pattern (for example, for hourly files, this interval can be around one hour).
Default value is 60.
Example:
output {
google_cloud_storage {
bucket => "my_bucket" (required)
date_pattern => "%Y-%m-%dT%H:00" (optional)
uploader_interval_secs => 60 (optional)
}
}
Additionally, you can also set the date_pattern, which is the time pattern for log file.
I am able to use the Athena API with startQueryExecution() to create a CSV file of the responses in S3. However, I would like to be able to return to my application a JSON response so I can further process the data. I am trying to return JSON results after I run startQueryExecution() via the API, how do I can get the results as a JSON response back?
I am using the AWS PHP SDK [https://aws.amazon.com/sdk-for-php/] , however this is relevant to any language since I can not find any answers to actually getting a response back, it just saves a CSV file to S3.
$athena = AWS::createClient('athena');
$queryx = 'SELECT * FROM elb_logs LIMIT 20';
$result = $athena->startQueryExecution([
'QueryExecutionContext' => [
'Database' => 'sampledb',
],
'QueryString' => 'SELECT request_ip FROM elb_logs LIMIT 20', // REQUIRED
'ResultConfiguration' => [ // REQUIRED
'EncryptionConfiguration' => [
'EncryptionOption' => 'SSE_S3' // REQUIRED
],
'OutputLocation' => 's3://xxxxxx/', // REQUIRED
],
]);
// check completion : getQueryExecution()
$exId = $result['QueryExecutionId'];
sleep(6);
$checkExecution = $athena->getQueryExecution([
'QueryExecutionId' => $exId, // REQUIRED
]);
if($checkExecution["QueryExecution"]["Status"]["State"] == 'SUCCEEDED')
{
$dataOutput = $athena->getQueryResults([
'QueryExecutionId' => $result['QueryExecutionId'], // REQUIRED
]);
while (($data = fgetcsv($dataOutput, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
}
}
The Amazon Athena SDK will return the results of a query and then you can write (send) this as JSON. The SDK will not do this for you itself.
The API startQueryExecution() retuns QueryExecutionId. Use this to call getQueryExecution() to determine if the query is complete. Once the query completes call getQueryResults().
You can then process each row in the result set.
I setup logstash on my windows box. I run 7 instances of logstash on it. Each one has a folder with log files for input. I ran them at the same time and directed it to AWS es cluster running 7 instances of r3.xlarge and 3 master nodes (r3.xlarge). All input files combined take around 9GB. After all the logstash instances stopped running, I only had 6 million events in elasticsearch, there should be around 30 million. I went back to my one of my logstash cmd windows where I ran it and looked at the last event. It did not correspond to the last log line in the file where it took it from, it was like ~50th line towards the bottom. Then the second to last log event in the same window corresponded not to the log line right before the one I looked up first, I found it about 30 log lines above it in the log file. So it is apparent my logstash is skipping log lines.
Now I checked my elastic search and it shows all zeros, so nothing got dropped? (I looked at bulk.rejected in particular)
_cat/thread_pool?v
Is this data cumulative or does it get refreshed?
Which brings me to my second question. If logstash itself dropped the log lines for some reason, where and how can i troubleshoot it, I know that none of my logstash instances crashed. All I know is it happily dropped 70% of all my logs and I have no error log or clue to go by as to what happened.
Edit:
My logstash configuration:
(It is like it is ingoring all my logs for Friday, Sat and Sun, and just processing for Monday (3/21))
input {
file {
type => "apache_logs"
path => "D:/logs/apache_logs/all/ssl_access.*"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter {
grok {
match => ["message","%{IPORHOST:client_ip} (?<username>[-]) (?<password>[-]) \[(?<timestamp>\d{2}[/][a-zA-Z]{3}[/]\d{4}:\d{2}:\d{2}:\d{2}\s-\d{0,4})\] \"%{GREEDYDATA:request}\" %{NOTSPACE:obssocookie} %{NOTSPACE:ps_sso_uid_in} %{NOTSPACE:ps_sso_uid_out} (?<status>[0-9]{3}) (?<bytes>[0-9]{1,}|-) %{NOTSPACE:protocol} %{NOTSPACE:ciphers} \"%{GREEDYDATA:referrer}\" \"%{GREEDYDATA:user_agent}\""]
match => [ "path", "(?<app_node>webpr[0-9]{2}[a-z]{0,1})" ]
add_field => { "server_node" => "%{app_node}" }
break_on_match => false
}
mutate {
gsub => ["obssocookie","^.*=",""]
}
mutate {
gsub => ["ps_sso_uid_in","^.*=",""]
}
mutate {
gsub => ["ps_sso_uid_out","^.*=",""]
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => "timestamp"
}
geoip {
source => "client_ip"
}
if [geoip] {
mutate {
add_field => {
"ip_type" => "public"
}
}
} else {
mutate{
add_field => {
"ip_type" => "private"
}
}
}
}
output {
stdout{ codec => rubydebug}
amazon_es {
hosts => ["apache-logs-xxxxxxxxxxxxxxxxxxxxxxxxxx.us-west-2.es.amazonaws.com"]
region => "us-west-2"
aws_access_key_id => 'xxxxxxxxxxxxxxxxxx'
aws_secret_access_key => 'xxxxxxxxxxxxxxxxxxxxx'
index => "logstash-apache-friday"
}
}
How can I know how many events logstash dropped specifically, not how many elastic search rejected, because I already checked through the API and bulk.rejected=0
Found my culprit. I have to include this in my files input, looks like it skips any files older than 24 hours by default
ignore_older => 0
Kind of surprising, I would expect to add settings when I want to narrow my input, otherwise logstash should process any files, older than 24 hours or not. Really not something that was obvious..
I have been trying to use logstash, elastic search, and Kibana for monitoring my django server.
I have set the conf file as given below
input {
tcp { port => 5000 codec => json }
udp { port => 5000 type => syslog }
}
output {
elasticsearch_http {
host => "127.0.0.1"
port => 9200
}
stdout { codec => rubydebug }
}
But the messages logged are too lengthy and could not find a method to parse it.
Any help is appreciated
As far as I can tell, there is not a pattern or built-in that will directly parse Django exceptions.
You need to tell the forwarding agent to target the Django log files that you're generating, marking them as "type": "django".
Then, on the Logstash server, you can use the following:
pattern:
DJANGO_LOGLEVEL (DEBUG|INFO|ERROR|WARNING|CRITICAL)
DJANGO_LOG %{DJANGO_LOGLEVEL:log_level}\s+%{TIMESTAMP_ISO8601:log_timestamp}\s+%{TZ:log_tz}\s+%{NOTSPACE:logger}\s+%{WORD:module}\s+%{POSINT:proc_id}\s+%{GREEDYDATA:content}
filter:
filter {
if [type] == "django" {
grok {
match => ["message", "%{DJANGO_LOG}" ]
}
date {
match => [ "timestamp", "ISO8601", "YYYY-MM-dd HH:mm:ss,SSS"]
target => "#timestamp"
}
}
}
if you don't want to add the pattern file, you can expand the DJANGO_LOGLEVEL pattern into the %{DJANGO_LOGLEVEL:log_level} field and place the targeting rule that follows DJANGO_LOG into the grok match placeholder.