use SQL Workbench import csv file to AWS Redshift Database - amazon-web-services

I'm look for a manual and automatic way to use SQL Workbench to import/load a LOCAL csv file to a AWS Redshift database.
The manual way could be a way that click a navigation bar and select a option.
The automatic way could be some query codes to load the data, just run it.
here's my attempt:
there's an error "my target table in AWS is not found." but I'm sure the table exists, anyone know why?
WbImport -type=text
-file ='C:\myfile.csv'
-delimiter = ,
-table = public.data_table_in_AWS
-quoteChar=^
-continueOnError=true
-multiLine=true

You can use wbimport in SQL Workbench/J to import data
For more info : http://www.sql-workbench.net/manual/command-import.html
Like it was mentioned in the comments COPY command provided by Redshift is the optimal solution. You can use copy from S3, EC2 etc.
S3 Example:
copy <your_table>
from 's3://<bucket>/<file>'
access_key_id 'XXXX'
secret_access_key 'XXXX'
region '<your_region>'
delimiter '\t';
For more examples:
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html

Related

COPY command: AWS Aurora Postgresql, s3 extension

I'm trying to execute a COPY command to import csv file from S3 (result of UNLOAD command from Redshift) into an Amazon Aurora database Using the aws_s3.table_import_from_s3 Function to Import Amazon S3 Data, but I don't know to indicate the quotes character in the command.
SELECT aws_s3.table_import_from_s3(
'hr.person',
'',
'(FORMAT CSV,HEADER true,QUOTES ''"'')',
aws_commons.create_s3_uri('redshift-unload-tmp','resul_file.csv','us-east-2')
);
Thanks
I'll explain my use case more detailed:
Source
I used the UNLOAD command to "export" data from a table in Redshift, this is the command:
UNLOAD('SELECT * FROM schema.table')
TO 's3://bucket-name/prefix_'
HEADER
CSV
NULL AS '\000'
IAM_ROLE 'arn:aws:iam::accountNumber:role/aRoleWithRedshiftAndS3Permissions';
Target
I need to put the Redshift data (now file in s3) into RDS database (Aurora-postgresql), before import file, I did a rename of the files in s3 and add the extension .csv; I used pgAdmin 4 as a Postgresql client, and open a query editor to execute the following commands:
Add new s3 extension to the database:
CREATE EXTENSION aws_s3 CASCADE;
NOTICE: installing required extension "aws_commons"
Execute function to import file from s3
select aws_s3.table_import_from_s3(
'schame.table_name',
'',
'(FORMAT CSV, HEADER true)',
aws_commons.create_s3_uri('sample_s3_bucket_name','source_file_name.csv','aws-region')
);
Note: If you use CSV format by default quote character is ", then you don't need to indicate the quote as an option parameter, you need to do if you are using a different quote character.

How to export data from table as CSV from Greenplum database to AWS s3 bucket

I have data in a table
select * from my_table
It contains 10k observations.How do I export data in the table as CSV to s3 bucket .
(I dont want to export the data to my local machine and then push to s3).
Please, please, please STOP labeling your questions with both PostgreSQL and Greenplum. The answer to your question is very different if you are using Greenplum versus PostgreSQL. I can't stress this enough.
If you are using Greenplum, you should the S3 protocol in External Tables to read and write data to S3.
So your table:
select * from my_table;
And your external table:
CREATE EXTERNAL TABLE ext_my_table (LIKE my_table)
LOCATION ('s3://s3_endpoint/bucket_name')
FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
And then writing to your s3 bucket:
INSERT INTO ext_my_table SELECT * FROM my_table;
You will need to do some configuration on your Greenplum cluster so that you have an s3 configuration file too. This goes in every segment directory too.
gpseg_data_dir/gpseg-prefixN/s3/s3.conf
Example of the file contents:
[default]
secret = "secret"
accessid = "user access id"
threadnum = 3
chunksize = 67108864
More information on S3 can be found here: http://gpdb.docs.pivotal.io/5100/admin_guide/external/g-s3-protocol.html#amazon-emr__s3_config_file
I'll suggest to first load data into your master node using WINSCP or File transfer.
Then move this file from your master node to S3 storage.
Because, moving data from Master node to S3 storage utilises Amazon's bandwidth and it will be much faster than our local connection bandwidth used to transfer file from local machine to S3.

Run Redshift Queries Periodically

I have started researching into Redshift. It is defined as a "Database" service in AWS. From what I have learnt so far, we can create tables and ingest data from S3 or from external sources like Hive into Redhshift database (cluster). Also, we can use JDBC connection to query these tables.
My questions are -
Is there a place within Redshift cluster where we can store our queries run it periodically (like Daily)?
Can we store our query in a S3 location and use that to create output to another S3 location?
Can we load a DB2 table unload file with a mixture of binary and string fields to Redshift directly, or do we need a intermediate process to make the data into something like a CSV?
I have done some Googling about this. If you have link to resources, that will be very helpful. Thank you.
I used cursor method using psycopg2 function in python. The sample code is given below. You have to set all the redshift credentials in env_vars files.
you can set your queries using cursor.execute. here I mension one update query so you can set your query in this place (you can set multiple queries). After that you have to set this python file into crontab or any other autorun application for running your queries periodically.
import psycopg2
import sys
import env_vars
conn_string = "dbname=%s port=%s user=%s password=%s host=%s " %(env_vars.RedshiftVariables.REDSHIFT_DW ,env_vars.RedshiftVariables.REDSHIFT_PORT ,env_vars.RedshiftVariables.REDSHIFT_USERNAME ,env_vars.RedshiftVariables.REDSHIFT_PASSWORD,env_vars.RedshiftVariables.REDSHIFT_HOST)
conn = psycopg2.connect(conn_string);
cursor = conn.cursor();
cursor.execute("""UPDATE database.demo_table SET Device_id = '123' where Device = 'IPHONE' or Device = 'Apple'; """);
conn.commit();
conn.close();

How to export SQL Output directly to CSV on Amazon RDS

Amazon doesn't give Access to RDS Server directly ( they expose it only through service RDS) hence, "select into outfile" doesn't work..
Even the master user does not have privileges of FILE.
I created ticket with Amazon; talked at length with them.. They suggested few work-around like using Data Pipeline etc.. but all are too complicated..
Surely one of the way could be to use tool like MYSql Workbench --> execute query --> Export to CSV.. Only problem with this approach is that you need to execute same query twice on server and is problematic if your output is having thousands of rows.
Just write the query in a file a.sql. The SQL Should be in this format:
select concat( '"',Product_id,'","', Subcategory,'","', ifnull(Product_type,''),'","', ifnull(End_Date,''), '"') as data from tablename
mysql -h xyz.abc7zdltfa3r.ap-southeast-1.rds.amazonaws.com -u query -pxyz < a.sql > deepak.csv
Output will be there in file deepak.csv

CSV file in amazon s3 to amazon SQL Server rds

Is there any sample where I can find how to copy data from a CSV file inside Amazon S3 into a Microsoft SQL Server Amazon RDS ?
In the documentation its only mentioned about importing data from a local db into RDS.
Approach would be like - You have to spin up an EC2 instance and copy S3 CSV files into it and then from there you have to use Bulk insert command. Example:
BULK INSERT SchoolsTemp
FROM 'Schools.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
All this can be stitched together in AWS Data Pipeline.
It looks like they setup Sql Server RDS integration with S3. I found this aws docs article which explains it in good detail.
After you've setup the proper credentials, it appears they added specific stored procedures to download (and upload/delete) to a D:\S3 directory. I haven't personally done this, but I thought I would share since the comment on the other post mentions that BULK INSERT isn't supported. But this would provide a way for BULK INSERT to work using a file from s3.
Copy the file to the RDS instance:
exec msdb.dbo.rds_download_from_s3
#s3_arn_of_file='arn:aws:s3:::bucket_name/bulk_data.csv',
#rds_file_path='D:\S3\seed_data\data.csv',
#overwrite_file=1;
Then run the BULK INSERT:
BULK INSERT MyData
FROM 'D:\S3\seed_data\data.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)