Multiple remote logging in airflow - amazon-web-services

How can I set the multiple remote logging destinations in airflow? Is it possible to do the same in aws MWAA? By default MWAA sends the logs to cloudwatch but I want to store the copy of logs to s3 bucket as well.
I tried
remote_base_log_folder = s3://paths/to/logs1
remote_base_log_folder_2 = s3://paths/to/logs2
But its storing only in the first folder

Related

AWS Amplify send logs to Splunk

I'd like to send Amplify monitoring data(access logs, metrics) to Splunk - this would be best case scenario. But for the beginning it would be ok if I could at least store them into another service like s3 or even grater to link it with CloudWatch, as I haven't found if those logs are somehow taken from CW log groups.
My question would be if there's a way to get those metrics outside of Amplify service?
There's a way you can send CW logs to your 3rd Party Apps.
Two Major steps:
Export CW logs to s3.
Configure lambda with s3 bucket & write your logic to read & send logs to 3rd party apps every time a file is written on bucket.
Cloudwatch allows you to export logs to s3.
From AWS docs:
To export data to Amazon S3 using the CloudWatch console
Sign in as the IAM user that you created in Step 2: Create an IAM user
with full access to Amazon S3 and CloudWatch Logs.
Open the CloudWatch console at
https://console.aws.amazon.com/cloudwatch/.
In the navigation pane, choose Log groups.
On the Log Groups screen, choose the name of the log group.
Choose Actions, Export data to Amazon S3.
On the Export data to Amazon S3 screen, under Define data export, set
the time range for the data to export using From and To.
If your log group has multiple log streams, you can provide a log
stream prefix to limit the log group data to a specific stream. Choose
Advanced, and then for Stream prefix, enter the log stream prefix.
Under Choose S3 bucket, choose the account associated with the Amazon
S3 bucket.
For S3 bucket name, choose an Amazon S3 bucket.
For S3 Bucket prefix, enter the randomly generated string that you
specified in the bucket policy.
Choose Export to export your log data to Amazon S3.
To view the status of the log data that you exported to Amazon S3,
choose Actions and then View all exports to Amazon S3.
Once you have exported logs into S3, You can setup simple S3 lambda trigger to read & send these logs to Third Party Applications(splunk in this case) using their APIs.
This way you'd also have saved logs in the S3 for future use or something.

Accessing Amazon S3 via FTP?

I have did a number of searches and can't seem to understand if this is doable at all.
I have a data logger that has FTP-push function. The FTP-push function have the following settings:
FTP server
Port
Upload directory
User name
Password
In general, I understand that a Filezilla client (I have a Pro edition) is able to drop files into my AWS S3 bucket and I had done this successfully in my local PC.
Is it possible to remove the Filezilla client requirement and input my S3 information directly into my data logger? Something like the below diagram:
Data logger ----FTP----> S3 bucket
If not, what will be the most sensible method to have my data logger JSON files drop into AWS S3 via FTP?
Frankly, you'd be better off with:
Logging to local files
Using a schedule to copy the log files to Amazon S3 using the aws s3 sync command
The schedule could be triggered by cron (Linux) or a Scheduled Task (Windows).
Amazon did add support recently to AWS Transfer for FTP support. This will provide an integration with Amazon S3 via FTP without setting up any additional infrastructure, however you should review the pricing at the moment.
As an alternative you could create an intermediary server that can sync between itself and AWS S3 using the cli aws s3 sync.

How to use Spark to read data from one AWS account and write to another AWS account?

I have spark jobs running on a EKS cluster to ingest AWS logs from S3 buckets.
Now I have to ingest logs from another AWS account. I have managed to use the below setting to successfully read in data from cross account with hadoop AssumedRoleCredentialProvider.
But how do I save the dataframe back to my own AWS account S3? It seems no way to set the Hadoop S3 config back to my own AWS account.
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.external.id","****")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.credentials.provider","com.amazonaws.auth.InstanceProfileCredentialsProvider")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.assumed.role.arn","****")
val data = spark.read.json("s3a://cross-account-log-location")
data.count
//change back to InstanceProfileCredentialsProvider not working
spark.sparkContext.hadoopConfiguration.set("fs.s3a.aws.credentials.provider","com.amazonaws.auth.InstanceProfileCredentialsProvider")
data.write.parquet("s3a://bucket-in-my-own-aws-account")
As per the hadoop documentation different S3 buckets can be accessed with different S3A client configurations, having a per bucket configuration including the bucket name.
Eg: fs.s3a.bucket.<bucket name>.access.key
Check the below URL: http://hadoop.apache.org/docs/r2.8.0/hadoop-aws/tools/hadoop-aws/index.html#Configurations_different_S3_buckets

How to put custom application logs in s3 bucket from AWS EMR

I have custom application/service for AWS EMR that I'm installing from bootstrap action on all nodes. My I want to put logs file of that application in same s3 bucket that I'm using for emr-logs.
Can anyone suggest me where I have to configure my log path in logpusher to get logs in s3 bucket in fixed interval same as a hadoop application.
you can configure it in /etc/logpusher/hadoop.config and restart the logpusher on all nodes

How to access AWS ElasticSearch Service automated snapshots/backups?

I was under the impression AWS ElasticSearch service comes with automated snaphshots/backups. Thats what I find in the documentation. It suggests they happen once a day and are stored on s3 but I do not see any backups in any of my s3 buckets. How do you get access to the automated snapshots?
It probably doesn't matter but I used the following template to create my elasticsearch domain explicitly indicating I want automated backups.
CloudFormation
"SnapshotOptions": {
"AutomatedSnapshotStartHour": "0"
}
You can't get to the S3 bucket itself but you can restore from the backup stored inside it by using CURL or another HTTP client to communicate directly with your cluster telling it to rebuild from the "cs-automated" repository which is linked to the s3 snapshots. To be able to communicate with your ES cluster directly via HTTP you'll have to temporarily open an IP access policy to your cluster.
http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-snapshots.html#es-managedomains-snapshot-restore