Possible to parametrize redshift COPY command? - amazon-web-services

Below is my stored procedure where I am trying to parametrize the COPY command in redshift:
CREATE OR REPLACE PROCEDURE myproc (accountid varchar(50),rolename varchar(50)) LANGUAGE PLPGSQL AS $$ BEGIN
/*copy csv data from s3 into the table*/
COPY mydb.my_table
FROM 's3://extracts/raw/data.csv'
credentials
'aws_iam_role=arn:aws-us-gov:iam::<accountid>:role/<rolename>;'
IGNOREHEADER 1 CSV
FILLRECORD;
commit;
However the account id and rolename parameters are not getting passed through the COPY command. Is it possible to do something like this? Any ideas on what I am missing here?

Followed the example here and it worked correctly
CREATE OR REPLACE PROCEDURE myproc (table_name in VARCHAR, s3_path in VARCHAR, iam_role in VARCHAR) LANGUAGE PLPGSQL AS $$ BEGIN
/*copy csv data from s3 into the table*/
EXECUTE 'COPY '|| table_name ||' FROM '||CHR(39)|| s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| iam_role ||CHR(39)||' IGNOREHEADER 1 CSV
FILLRECORD;';
commit;
This worked!!

Related

creation of store procedure in aws redshift

While creating the store procedure to create the table in redshift, I am not able to create the table by concatenating with current date. could please help me anyone.
Here is my code :
create or replace procedure stud.studentlists()
as $$
declare get_date date;
begin
execute 'CREATE TEMP TABLE stud.students_list_||"get_date"
as
select * from stud.students_list;
'
end
$$ LANGUAGE plpgsql;

AWS Redshift dynamically select column name from RECORD

I have created below procedure in AWS redshift.
In the query2 (at ????) I want to select the column from rec based on value provided in field input variable.
e.g. if field = 'Fname' then in query2 it should insert rec.Fname.
Please let me know how to select column names dynamically from RECORD in open cursor.
CREATE OR REPLACE PROCEDURE test3(source_table varchar(100), target_table varchar(100), field varchar(100) )
LANGUAGE plpgsql
AS $$
declare
query1 text;
query2 text;
rec RECORD;
begin
query1 := 'SELECT id, ' || field ||', load_date, end_date FROM ' || source_table || ' ORDER BY id, load_date';
FOR rec IN execute query1
loop
query2 := 'insert into '|| target_table ||' values ('||quote_literal(rec.id)||', '||quote_literal(field)||','||**????**||','||quote_literal(rec.load_date)||')';
execute query2;
END LOOP;
RETURN;
END;
$$
;
It is early here so let me just reference an answer I gave for a similar situation (inserting instead of selecting). This should get you started - How to join System tables or Information Schema tables with User defined tables in Redshift
The code looks like:
CREATE OR REPLACE procedure rewrite_data()
AS
$$
DECLARE
row record;
BEGIN
drop table if exists fred;
create table fred (schemaname varchar(256),tablename varchar(256),"column"varchar(256), "type"varchar(256));
for row in select "schemaname"::text, "tablename"::text, "column"::text, "type"::text from pg_table_def where "schemaname" <> 'pg_catalog' LOOP
INSERT INTO fred(schemaname,tablename,"column","type") VALUES (row.schemaname,row.tablename,row."column",row."type");
END LOOP;
END;
$$ LANGUAGE plpgsql;
call rewrite_data();
select * from fred;
Given that you have gotten this far on your stored procedure this should get you over the finish line.

Save stored procedure result set in a table

I have created a stored procedure that returns the distinct table_schema name. My database is using redshift and I am using SQL workbench/J to create it.
As recommended by AMAZON Redshift, we can use a cursor to return the result set. Below is my code.
CREATE OR REPLACE PROCEDURE get_dist_schema(rsout INOUT refcursor)
AS $$
BEGIN
OPEN rsout FOR SELECT DISTINCT table_schema FROM information_schema.tables;
END;
$$ LANGUAGE plpgsql;
BEGIN;
CALL get_dist_schema('sname');
FETCH ALL FROM sname;
commit;
Result from FETCH ALL FROM sname is as shown below:
table_schema
tableA
tableB
tableC
tableD
I want to save the result in a table such that when I do select statement for a table, the same result will appear.
I have tried this:
BEGIN;
CALL get_dist_schema('sname');
FETCH ALL FROM sname INTO public.distTable
The error say:
Invalid operation: syntax error at or near "INTO"
Position: 22;
Is there any way I can save the result into a table?
FETCH ALL FROM sname INTO public.distTable
UPDATE:
IF I USE SELECT INTO:
CREATE OR REPLACE PROCEDURE get_dist_schema(rsout INOUT refcursor)
AS $$
BEGIN
OPEN rsout FOR SELECT DISTINCT table_schema INTO public.distTable FROM information_schema.tables;
END;
$$ LANGUAGE plpgsql;
ERROR:
CALL get_dist_schema('sname')
[Amazon](500310) Invalid operation: Column "table_schema" has unsupported type "information_schema.sql_identifier".;

Unable to access Information_schema via stored procedure

I am writing a stored procedure which does involving using information schema in AWS Redshift.
VARIABLES FOR ALL THE Below -
Using same user
Using same Redshift database (endpoint)
Stored procedure:
create or replace procedure dev.gp_information_schema_test
(tablename varchar(64))
as $$
declare
table_name varchar(64);
schema_name varchar(64);
counts int;
begin
table_name := split_part(tablename,'.',1);
schema_name:= split_part(tablename,'.',2);
raise info 'table_name - %,Schema_name - %',table_name,schema_name;
counts := (select count(*) from information_schema.tables where table_schema = schema_name);
raise info 'count is -%',counts;
end;
$$
language plpgsql
call dev.gp_information_schema_test('dev.abc');
Result :
Warnings:
table_name - dev,Schema_name - abc
count is -0
0 rows affected
call executed successfully
Execution time: 0.55s
But if I run the same query outside (i.e not via stored procedure), then:
select count(*)
from information_schema.tables
where table_schema = 'dev'
Results:
I have already read the limitations of stored procedure in the AWS documentation (Link), but there is no mention of access restriction to system tables.
You can't set the value of a variable from a query using :=. Instead you need to use the SELECT INTO variable form. https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-structure.html
Try this SP:
CREATE OR REPLACE PROCEDURE gp_information_schema_test(tablename VARCHAR(64)) AS
$$
DECLARE
table_name VARCHAR(64);
schema_name VARCHAR(64);
counts INT;
BEGIN
schema_name := split_part(tablename, '.', 1);
table_name := split_part(tablename, '.', 2);
RAISE INFO 'table_name - % , Schema_name - %',table_name,schema_name;
counts := (SELECT count(*) FROM information_schema.tables WHERE table_schema = schema_name);
RAISE INFO 'Tables in schema: %',counts;
END;
$$ LANGUAGE plpgsql;
Call:
CALL gp_information_schema_test('dev.abc');

How to copy specific columns from a csv into redshift table using lambda

I am trying to load a csv file ins s3 into redshift using aws copy command in lambda. The problem is i have more columns in csv than in redshift table.
so whenever i trigger lambda fnction i get the error "Extra columns found"
how to load specific columns from csv
my csv files is of form
year, month, description, category,SKU, sales(month)
and my redshift table is of form
year month description category SKU
-----------------------------------
my copy command is as follows
COPY public.sales
FROM 's3://mybucket/sales.csv'
iam_role 'arn:aws:iam::99999999999:role/RedShiftRole'
delimiter ','
ignoreheader 1
acceptinvchars
You can specify the list of columns to import into your table - see COPY command documentation for more details.
COPY public.sales (year, month, description, category, SKU)
FROM 's3://mybucket/sales.csv'
iam_role 'arn:aws:iam::99999999999:role/RedShiftRole'
delimiter ','
ignoreheader 1
acceptinvchars