HDFS command to convert to AWS S3 - amazon-web-services

We are having many HDFS commands on our ETL scripts. Due to cloud migration we need to convert the HDFS commands to AWS S3 commadns.
We have a command as below:
hdfs dfs -text ${OUTPUT_PATH}/part* | hdfs dfs -cp ${INPUT_PATH}/${FILENAME}
Need help to convert this to S3 command, mainly the first part with '-text'.

Related

Write AWS Elasticache (Redis) Query Output To S3 Path

I want to get all the keys from AWS Redis DB exported to an S3 path as text file so that I can analyze the keys that are getting created in the db.
How can I export all the keys from redis-cli to a S3 path?
Can this command be used to write to a S3 path?
redis-cli -h 10.1.xx.xx -n 1 keys * >s3://bucket/path/filename.txt
After installing AWS CLI, you can copy your local file to your s3 in the command line.
# copy a single file to a specified bucket and key
aws s3 cp test.txt s3://mybucket/test2.txt
# upload a local file stream from standard input to a specified bucket and key
aws s3 cp - s3://mybucket/stream.txt
So, you can upload your stdout to your s3 like this
redis-cli -h 10.1.xx.xx -n 1 keys * | aws s3 cp - s3://bucket/path/filename.txt

How to `cp` multiple files from S3 to local machine in one command?

I am using aws-vault and am interested in cping multiple files from S3 to my local machine. The filenames do not follow any particular pattern. I was hoping for a command of the form
aws-vault exec <ROLE> aws s3 cp s3://path_to_file1 ~/file1 | aws s3 cp s3://path_to_file2 ~/file2
but a pipe like this doesn't work. The main reason I want to get this in one command is so that I only have to authenticate once, instead of for every single aws-vault call.
Without aws-vault, in your case, you won't be authenticated so that won't work.
Try this command, it should not ask every single time.
aws-vault exec <ROLE> aws s3 cp s3://path_to_file1 ~/file1 | aws-vault exec <ROLE> aws s3 cp s3://path_to_file2 ~/file2

How to redirect AWS S3 sync output to a file?

When I run, AWS S3 SYNC "local drive" "S3bucket", I see bunch of logs getting generated on my aws cli console. Is there a way to direct these logs to an output/log file for future reference?
I am trying to schedule a sql job which executes the powershell script that syncs backup from local drive to S3 bucket. Backups are getting synched to the bucket successfully. However, I am trying to figure out a way to direct the sync progress to an output file. Help appreciated. Thanks!
Simply pipe the output of the command into a file using the ">" symbol.
The file does not have to exist before hand (and in-fact will be overwritten if it does exist).
aws s3 sync . s3://mybucket > log.txt
If you wish to append to the given file then use the following operator: ">>".
aws s3 sync . s3://mybucket >> existingLogFile.txt
To test this command, you can use the --dryrun argument to the sync command:
aws s3 sync . s3://mybucket --dryrun > log.txt

Run HDFS command as EMR Step

How can I issue an hdfs command as a step in an EMR cluster? Adding the step as a script_runner.jar task seems to fail oddly.
Use command-runner.jar and call out to bash to issue your hdfs command.
aws emr add-steps --cluster-id j-XXXXXXXXX --steps Name="Command Runner",Jar="command-runner.jar",Args=[/bin/bash,-c,"\"hdfs dfs -mkdir /tmp/foo\""]
Note that the final argument is passed as one single escape quoted string to bash.

Download multiple files S3

I wonder if there is a way to download (Rest or SDK) for multiple files (one bucket or a folder) in zip format (or other compression format).
As Edward pointed out in the comment, there is no straight forward way how to download these files in a compressed format from S3.
I was solving a similar issue to yours lately and I ended up with a bash script doing this job for me:
#!/bin/bash
aws s3 sync s3://bucket1 /destination/bucket1
aws s3 sync s3://bucket2 /destination/bucket2
...
tar -zcvf bucket1.tar.gz /destination/bucket1
tar -zcvf bucket2.tar.gz /destination/bucket2
...
The workflow is as follows:
Install the AWS CLI
Setup the AWS credentials
Execute the script
You can use the appropriate commands to obtain the desired compression format and destinations etc.
Another way to download mutiples files:
aws s3 cp s3://<your bucket name> . --recursive