Is there any sample where I can find how to copy data from a CSV file inside Amazon S3 into a Microsoft SQL Server Amazon RDS ?
In the documentation its only mentioned about importing data from a local db into RDS.
Approach would be like - You have to spin up an EC2 instance and copy S3 CSV files into it and then from there you have to use Bulk insert command. Example:
BULK INSERT SchoolsTemp
FROM 'Schools.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
All this can be stitched together in AWS Data Pipeline.
It looks like they setup Sql Server RDS integration with S3. I found this aws docs article which explains it in good detail.
After you've setup the proper credentials, it appears they added specific stored procedures to download (and upload/delete) to a D:\S3 directory. I haven't personally done this, but I thought I would share since the comment on the other post mentions that BULK INSERT isn't supported. But this would provide a way for BULK INSERT to work using a file from s3.
Copy the file to the RDS instance:
exec msdb.dbo.rds_download_from_s3
#s3_arn_of_file='arn:aws:s3:::bucket_name/bulk_data.csv',
#rds_file_path='D:\S3\seed_data\data.csv',
#overwrite_file=1;
Then run the BULK INSERT:
BULK INSERT MyData
FROM 'D:\S3\seed_data\data.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
Related
How to insert the data into snowflake using s3 stage integration
create a storage integration between aws and snowfalke.
Create a stage integration between s3 and snowflake.
USE DATABASE Database_name;
USE SCHEMA scema_name;
truncate table table_name_tmp;
COPY INTO table_name_tmp FROM
(
SELECT $1:emp_id::INTEGER,$1:col2::DATE,$1:col3::string
FROM #Stagename.schema_name.{0}/{1}
)
on_error = 'continue'
file_format = (type = parquet, null_if = ('NULL'), trim_space = true);
MERGE INTO table_name dst USING table_name_tmp srs
ON (dst.emp_id=srs.emp_id)
WHEN MATCHED THEN
UPDATE SET
dst.emp_id=srs.emp_id,dst.col2=srs.col2,dst.col3=srs.col3
WHEN NOT MATCHED THEN INSERT (emp_id,col2,col3 )
values
(srs.emp_id,srs.col2,srs.col3
);
truncate table table_name_tmp;
I think the question could be worded better; you see an s3 storage integration is a METHOD for connecting Snowflake to an external stage, you don't extract from an integration; you still use the external stage to COPY from into Snowflake. The alternate method is to use secrets and keys although Snowflake recommends using the Storage integration because this is a one-time activity and it means you don't have to mess about with those set keys.
S3 by the way is AWS' blob store, step-by-step guide is in the docs, https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration.html
I'm look for a manual and automatic way to use SQL Workbench to import/load a LOCAL csv file to a AWS Redshift database.
The manual way could be a way that click a navigation bar and select a option.
The automatic way could be some query codes to load the data, just run it.
here's my attempt:
there's an error "my target table in AWS is not found." but I'm sure the table exists, anyone know why?
WbImport -type=text
-file ='C:\myfile.csv'
-delimiter = ,
-table = public.data_table_in_AWS
-quoteChar=^
-continueOnError=true
-multiLine=true
You can use wbimport in SQL Workbench/J to import data
For more info : http://www.sql-workbench.net/manual/command-import.html
Like it was mentioned in the comments COPY command provided by Redshift is the optimal solution. You can use copy from S3, EC2 etc.
S3 Example:
copy <your_table>
from 's3://<bucket>/<file>'
access_key_id 'XXXX'
secret_access_key 'XXXX'
region '<your_region>'
delimiter '\t';
For more examples:
https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html
I have data in a table
select * from my_table
It contains 10k observations.How do I export data in the table as CSV to s3 bucket .
(I dont want to export the data to my local machine and then push to s3).
Please, please, please STOP labeling your questions with both PostgreSQL and Greenplum. The answer to your question is very different if you are using Greenplum versus PostgreSQL. I can't stress this enough.
If you are using Greenplum, you should the S3 protocol in External Tables to read and write data to S3.
So your table:
select * from my_table;
And your external table:
CREATE EXTERNAL TABLE ext_my_table (LIKE my_table)
LOCATION ('s3://s3_endpoint/bucket_name')
FORMAT 'TEXT' (DELIMITER '|' NULL AS '' ESCAPE AS E'\\');
And then writing to your s3 bucket:
INSERT INTO ext_my_table SELECT * FROM my_table;
You will need to do some configuration on your Greenplum cluster so that you have an s3 configuration file too. This goes in every segment directory too.
gpseg_data_dir/gpseg-prefixN/s3/s3.conf
Example of the file contents:
[default]
secret = "secret"
accessid = "user access id"
threadnum = 3
chunksize = 67108864
More information on S3 can be found here: http://gpdb.docs.pivotal.io/5100/admin_guide/external/g-s3-protocol.html#amazon-emr__s3_config_file
I'll suggest to first load data into your master node using WINSCP or File transfer.
Then move this file from your master node to S3 storage.
Because, moving data from Master node to S3 storage utilises Amazon's bandwidth and it will be much faster than our local connection bandwidth used to transfer file from local machine to S3.
I have an Amazon Elastic Map Reduce (EMR) job that I would like to use to process unloaded data from an Amazon Aurora MySQL table much the same way I do from Amazon Redshift. That is, run a query such as:
unload ('select * from whatever where week = \'2011/11/21\'') to 's3://somebucket' credentials 'blah'
Then, the EMR job processes lines from the dumped data and writes back to S3.
Is this possible? How?
This feature now appears to be supported. The command is called SELECT INTO OUTFILE S3.
After this answer was originally written (the answer at thst time was "no"), Aurora added this capability.
You can now use the SELECT INTO OUTFILE S3 SQL statement to query data from an Amazon Aurora database cluster and save it directly into text files in an Amazon S3 bucket. This means you no longer need the two-step process of bringing the data to the SQL client and then copying it from the client to Amazon S3. It’s an easy way to export data selectively to Amazon Redshift or any other application.
https://aws.amazon.com/about-aws/whats-new/2017/06/amazon-aurora-can-export-data-into-amazon-s3/
Aurora for MySQL doesn't support this.
As you know, on conventional servers, MySQL has two complementary capabilities, LOAD DATA INFILE and SELECT INTO OUTFILE, which work with local (to the server) files. In late 2016, Aurora announced an S3 analog to LOAD DATA INFILE -- LOAD DATA FROM S3 -- but there is not, at least as of yet, the opposite capability.
You can use the SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora MySQL DB cluster and save it directly into text files stored in an Amazon S3 bucket. This feature was added long time ago.
Example:
SELECT * FROM employees INTO OUTFILE S3 's3-us-west-2://aurora-select-into-s3-pdx/sample_employee_data'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
And here all options which are supported:
SELECT
[ALL | DISTINCT | DISTINCTROW ]
[HIGH_PRIORITY]
[STRAIGHT_JOIN]
[SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
[SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
select_expr [, select_expr ...]
[FROM table_references
[PARTITION partition_list]
[WHERE where_condition]
[GROUP BY {col_name | expr | position}
[ASC | DESC], ... [WITH ROLLUP]]
[HAVING where_condition]
[ORDER BY {col_name | expr | position}
[ASC | DESC], ...]
[LIMIT {[offset,] row_count | row_count OFFSET offset}]
[PROCEDURE procedure_name(argument_list)]
INTO OUTFILE S3 's3_uri'
[CHARACTER SET charset_name]
[export_options]
[MANIFEST {ON | OFF}]
[OVERWRITE {ON | OFF}]
export_options:
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
You can find this in the AWS Documentation here: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
Amazon doesn't give Access to RDS Server directly ( they expose it only through service RDS) hence, "select into outfile" doesn't work..
Even the master user does not have privileges of FILE.
I created ticket with Amazon; talked at length with them.. They suggested few work-around like using Data Pipeline etc.. but all are too complicated..
Surely one of the way could be to use tool like MYSql Workbench --> execute query --> Export to CSV.. Only problem with this approach is that you need to execute same query twice on server and is problematic if your output is having thousands of rows.
Just write the query in a file a.sql. The SQL Should be in this format:
select concat( '"',Product_id,'","', Subcategory,'","', ifnull(Product_type,''),'","', ifnull(End_Date,''), '"') as data from tablename
mysql -h xyz.abc7zdltfa3r.ap-southeast-1.rds.amazonaws.com -u query -pxyz < a.sql > deepak.csv
Output will be there in file deepak.csv