I have some tables in my S3 bucket and I want to create a backup for them via the EMR command line.
Here is a breakdown of what I want to do
copy s3 objects into a backup S3 location
create metadata for the backup table (use DDL from the original table but read data in from the backup S3 location)
validate row count between the main and the backup table
So far I have been able to write a script to copy objects from the main tables external location to the backup tables external location
backup table() {
db=${1}
table=${2}
s3_location=$(aws glue get-table --database-name $(db) --name $(table) --query "Table.StorageDescriptor.Location")
bash -c "aws s3 cp --recursive $s3_location ${s3_location}_bkp"
}
backup_table "database" "table"
I can't figure out how to access the main table's DDL in cli (show create table db.table in athena).
Will greatly appreciate any help.
Related
I want to get all the keys from AWS Redis DB exported to an S3 path as text file so that I can analyze the keys that are getting created in the db.
How can I export all the keys from redis-cli to a S3 path?
Can this command be used to write to a S3 path?
redis-cli -h 10.1.xx.xx -n 1 keys * >s3://bucket/path/filename.txt
After installing AWS CLI, you can copy your local file to your s3 in the command line.
# copy a single file to a specified bucket and key
aws s3 cp test.txt s3://mybucket/test2.txt
# upload a local file stream from standard input to a specified bucket and key
aws s3 cp - s3://mybucket/stream.txt
So, you can upload your stdout to your s3 like this
redis-cli -h 10.1.xx.xx -n 1 keys * | aws s3 cp - s3://bucket/path/filename.txt
I have script that runs daily and saves a CSV file with daily time stamp below is the AWS CLI command for moving a file to S3 bucket and then delete it from source I am using.
aws s3 mv /path/NW_test_Export_03_05_2020_14_37_24.csv s3://bucekt-name/folder/ --acl public-read-write
I want to automate this daily moving of file to S3 bucket using a Cron job. How do I make the file name variable in the AWS CLI command?
Ok by trial and error the below command is working perfectly for me.
aws s3 mv /path-to-file/ s3://bucekt-name/folder/ --exclude "*" --include "*.csv" --recursive
I am trying to get my AWS S3 API to list objects that I have stored in my S3 buckets. I have successfully used the code below to pull some of the links from my S3 buckets.
aws s3api list-objects --bucket my-bucket --query Contents[].[Key] --output text
The problem is the output in my command prompt is not listing the entire S3 Bucket inventory list. Is it possible to alter this code so that the output on my CLI lists the full inventory? If not, is there a way to alter the code to target specific file names within the S3 Bucket? For example, all the file names in my bucket are dates, so I would try and pull all the links from the file titled 3_15_20 Videos within the "my-bucket" bucket. Thanks in advance!
From list-objects — AWS CLI Command Reference:
list-objects is a paginated operation. Multiple API calls may be issued in order to retrieve the entire data set of results. You can disable pagination by providing the --no-paginate argument.
Therefore, try using --no-paginate and see whether it returns all objects.
If you are regularly listing a bucket that contains a huge number of objects, you could also consider using Amazon S3 Inventory, which can provide a daily CSV file listing the contents of a bucket.
I'd like to transfer data from s3 to a table in redshift postgres.
It is stored as a csv in my bucket on s3.
I set up amazon CLI and already ran configure and added my credentials and gave myself an IAM user with access to the s3 bucket and the redshift postgres instance.
I ran the command aws copy tmp3 from s3://[mybucket]/[mycsv].csv
But I got back the error aws: error: argument command: Invalid choice, valid choices are and copy is not on the list of vlaid commands that they offer.
Copy is not a supported argument, what you're looking for is cp. Have a look at the AWS CLI Documentation for S3.
What you're looking for is probably this:
aws s3 cp s3://[mybucket]/[mycsv].csv tmp3
I have a technical question about migrating GCP resources (DAGs, Datasets, BigQuery Tables) from one source GCP project into another target GCP project (BOTH PROJECTS LOCATED IN DIFFERENT GCP ORGANIZATIONS).
I meant, I already know that we can migrate / transfer BigQuery Datasets and DAGs between regions and also between projects (in the same projects):
for example:
To migrate Datasets and DAGs between REGIONS:
Create a Cloud Composer Environment (in US region).
Create two Cloud Storage buckets; one located in the source region and another in the target region.
Create the BigQuery destination dataset (in EU).
Define the composer workflow (basically create a dummy task -> export BQ table to the bucket -> import table from the bucket).
Upload the DAGs and dependencies to the bucket.
Trigger the DAG manually.
To migrate Datasets and DAGs between PROJECTS:
Use bq command line tool to copy a table from one project to another.
You can have a look at the following sample command:
Source:
projectid: 123456789123
dataset: dataset1
table: table1
Destination:
projectid: 0987654321098
dataset: dataset2
table: table2
Command:
bq cp 123456789123:dataset1.table1 0987654321098:dataset2.table2
Via shell script (shell script + bq tool):
export SOURCE_DATASET=$1 # project1:dataset
export DEST_PREFIX=$2 # project2:dataset2.any_prefix_
for f in `bq ls $SOURCE_DATASET |grep TABLE | awk '{print $1}'`
do
export CP_COMMAND="bq cp $SOURCE_DATASET.$f $DEST_PREFIX$f"
echo $CP_COMMAND
echo `$CP_COMMAND`
done
But, how could I migrate those GCP resources BETWEEN ORGANIZATIONS ? I meant, I have some Datasets and DAGs in a source Organization which I
need to copy / transfer to a target project created in a different GCP organization.
How could I do it ?
Thanks a lot,
Regards,