Redshift copy from S3 inside stored procedure - amazon-web-services

I would like to prepare a manifest file using Lambda and then execute the stored procedure providing input parameter manifest_location.
Stored procedure signature:
CREATE OR REPLACE PROCEDURE stage.sp_stage_user_activity_page_events(manifest_location varchar(256))
and I would like to use this parameter as follows:
COPY stage.user_activity_event
FROM manifest_location
IAM_ROLE 'arn:aws:iam::XXX:role/redshift-s3-read-only-role'
IGNOREHEADER 1
REMOVEQUOTES
DELIMITER ','
LZOP
MANIFEST;
but Redshift is giving me ERROR:
syntax error at or near "$1" Where: SQL statement in PL/PgSQL function "sp_stage_user_activity_page_events" near line 21
How can I achieve this?

Could you try the command below?
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
Please, let me know if you have any more questions, because I created one stored_procedure, as below today:
CREATE TABLE table_for_error_log (message varchar);
CREATE OR REPLACE procedure sp_proc_name(parameter_for_table in VARCHAR, parameter_for_s3_path in VARCHAR, parameter_for_iam_role in VARCHAR)
AS
$$
BEGIN
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
EXCEPTION WHEN OTHERS THEN
RAISE INFO 'An exception occurred.';
INSERT INTO table_for_error_log VALUES ('Error message: ' || SQLERRM);
END;
$$ LANGUAGE plpgsql;
It's already tested and worked.
Just replace it with your own command.

i don't have enough reputation to upvote the above answer but it works!.
you can also use the quote_literal function eg:
DATEFORMAT as ' || quote_literal('auto') ||
'TIMEFORMAT as '|| quoteliteral('MM/DD/YYYY HH24:MI:SS') ||
etc..

Related

How to load data from CSV into an external table in impala

I am following this solution for loading an external table into Impala as I get the same error if I load data by referring to the file.
So, If I run:
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> STORED as TEXTFILE
> location '/user/cloudera/rdpdata/rpd_data_all.csv' ;
I get:
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
STORED as TEXTFILE
location '/user/cloudera/rdpdata/rpd_data_all.csv'
ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore:
CAUSED BY: MetaException: hdfs://quickstart.cloudera:8020/user/cloudera/rdpdata/rpd_data_all.csv is not a directory or unable to create one
and If run the below, nothing get imported.
[quickstart.cloudera:21000] > create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
> fields terminated by ','
> location '/user/cloudera/rdpdata' ;
Query: create external table Police2 (Priority string,Call_Type string,Jurisdiction string,Dispatch_Area string,Received_Date string,Received_Time int,Dispatch_Time int,Arrival_Time int,Cleared_Time int,Disposition string) row format delimited
fields terminated by ','
location '/user/cloudera/rdpdata'
Fetched 0 row(s) in 1.01s
and the content of the folder
[cloudera#quickstart ~]$ hadoop fs -ls /user/cloudera/rdpdata
Found 1 items
-rwxrwxrwx 1 cloudera cloudera 75115191 2020-09-02 19:36 /user/cloudera/rdpdata/rpd_data_all.csv
and the content of the file:
[cloudera#quickstart ~]$ hadoop fs -cat /user/cloudera/rdpdata/rpd_data_all.csv
1,EMSP,RP,RC, 03/21/2013,095454,000000,000000,101659,CANC
and the screenshot of the cloudera quickstart vm
The location option in the impala create table statement determines the hdfs_path or the HDFS directory where the data files are stored. Try giving the directory location instead of the file name that should let you use the existing data.
For your reference : https://impala.apache.org/docs/build/html/topics/impala_tables.html

Snowflake - getting 'Error parsing JSON' while using the Copy command from S3 to snowflake

i'm trying to copy gz files from my S3 directory to Snowflake.
i created a table in snowflake (notice that the 'extra' field is defined as 'Variant')
CREATE TABLE accesslog
(
loghash VARCHAR(32) NOT NULL,
logdatetime TIMESTAMP,
ip VARCHAR(15),
country VARCHAR(2),
querystring VARCHAR(2000),
version VARCHAR(15),
partner INTEGER,
name VARCHAR(100),
countervalue DOUBLE PRECISION,
username VARCHAR(50),
gamesessionid VARCHAR(36),
gameid INTEGER,
ingameid INTEGER,
machineuid VARCHAR(36),
extra variant,
ingame_window_name VARCHAR(2000),
extension_id VARCHAR(50)
);
i used this copy command in snowflake:
copy INTO accesslog
FROM s3://XXX
pattern='.*cds_201911.*'
CREDENTIALS = (
aws_key_id='XXX',
aws_secret_key='XXX')
FILE_FORMAT=(
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TYPE = CSV
COMPRESSION = GZIP
FIELD_DELIMITER = '\t'
)
ON_ERROR = CONTINUE
I run it, and got this result (i got many error lines, this is an example to one)
snowflake result
snowflake result -more
a17589e44ae66ffb0a12360beab5ac12 2019-11-01 00:08:39 155.4.208.0 SE 0.136.0 3337 game_process_detected 0 OW_287d4ea0-4892-4814-b2a8-3a5703ae68f3 e9464ba4c9374275991f15e5ed7add13 765 19f030d4-f85f-4b85-9f12-6db9360d7fcc [{"Name":"file","Value":"wowvoiceproxy.exe"},{"Name":"folder","Value":"C:\\Program Files (x86)\\World of Warcraft\\_retail_\\Utils\\WowVoiceProxy.exe"}]
can you please tell me what cause this error?
thanks!
I'm guessing;
The 'Error parsing JSON' is certainly related to the extra variant field.
The JSON looks fine, but there are potential problems with the backslashes \.
If you look at the successfully loaded lines, have the backslashes been removed?
This can (maybe) happen if you have STAGE settings involving escape characters.
The \\Utils substring in the Windows path value can then trigger a Unicode decode error, eg.
Error parsing JSON: hex digit is expected in \U???????? escape sequence, pos 123
UPDATE:
It turns out you have to turn off escape char processing by adding the following to the FILE_FORMAT:
ESCAPE_UNENCLOSED_FIELD = NONE
The alternative is to doublequote fields or to doubly escape backslash, eg. C:\\\\Program Files.

Redshift Copy with Newline Embedded in Quotes

Trying to copy data from S3 to Redshift with a newline with in quotes
Example CSV file:
Line 1 --> ID,Description,flag
Line 2 --> "1111","this is a test", "FALSE"
Line 3 --> "2222","I hope someone
could help", "TRUE"
Line 4 --> "3333", "NA", "FALSE"
Sample Table:
TEST_TABLE:
ID VARCHAR(100)
DESCRIPTION VARCHAR(100)
FLAG VARCHAR(100)
If you notice in line 2 there is a linefeed and I get the error Delimited value missing end quote when using the COPY command.
This is the Copy command I use:
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
DELIMITER ','
removequotes
trimblanks
ESCAPE ACCEPTINVCHARS
EMPTYASNULL
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I've also tried the CSV option, but get "Extra column(s) found ":
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
CSV
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I would expect the description column in Line 2 to be loaded with the linefeed.
Since the field is delimited by quotes, use the CSV option.
Note: CSV cannot be used with FIXEDWIDTH, REMOVEQUOTES, or ESCAPE.

Amazon Redshift incorrectly rounding up Numeric(9,4) value

I was trying to load the data e.g. 49.9999 into numeric(9,4) column. How ever through copy command it is rounding up the values to 50.00.
Copy command sample:
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
ROUNDEC BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
PRICE_BAND_LOWER and PRICE_BAND_UPPER are having data type as numeric(9,4) but while processing the data it is rounding up the data.
Please let me know how to handle this scenario.
The ROUNDEC parameter has to go. Rounds up numeric values when the scale of the input value is greater than the scale of the column. By default, COPY truncates values when necessary to fit the scale of the column.
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Registering a SAS library in Metadata - programmatically

I am writing a deployment script, and would like to programmatically register a simple (and empty) BASE library, such as the one below, in Metadata.
libname MYLIB 'C:\temp';
Sample XML syntax can be found here.. Am just not sure how to combine that with proc metadata to perform the update (eg how do the metadata ID's get generated?)
#user2173800 Did u ever recieve a solution to the question above?
Here is what i came up with :
The below Code creates a SAS Library called BASE_Metalib under the Metadata
folder :/Shared Data/Libraries/BASE_Metalib (this folder is assumed to already exist in Metadata). The code also resgisters all tables under this Directory defined for this Library. The below code uses Metadata Datastep functions to Interface with metadata.
/*Creating a Metadata Library with BASE Engine and register all the tables under it */
options metaserver="taasasf2"
metaport=8561
metauser="testuser"
metapass="test123"
metarepository="Foundation";
%Let MetaLibName=BASE_Metalib; /* Name of the SAS Library with BASE Engine to be created */
data _null_;
length luri uri muri $256;
rc=0;
Call missing(luri,uri,muri);
/* Create a SASLibrary object in the Shared Data folder. */
rc=metadata_newobj("SASLibrary",
luri,
"&MetaLibname.",
"Foundation",
"omsobj:Tree?#Name=%bquote('&Metalibname.')",
"Members");
put rc=;
put luri=;
/* Add PublicType,UsageVersion,Engine,Libref,IsDBMSLibname attribute values. */
rc=metadata_setattr(luri,
"PublicType",
"Library");
put rc=;
put luri=;
rc=metadata_setattr(luri,
"UsageVersion",
"1000000.0");
put rc=;
put luri=;
rc=metadata_setattr(luri,
"Engine",
"BASE");
put rc=;
put luri=;
rc=metadata_setattr(luri,
"Libref",
"SASTEST");
put rc=;
put luri=;
rc=metadata_setattr(luri,
"IsDBMSLibname",
"0");
put rc=;
put luri=;
/* Set Directory Object via UsingPackages Association for the SAS Library Object */
rc=metadata_newobj("Directory",
uri,
"");
put uri=;
rc=metadata_setassn(luri,
"UsingPackages",
"Replace",
uri);
put rc=;
rc=metadata_setattr(uri,"DirectoryName","/shrproj/files/ANA_AR2_UWCRQ/data");
put rc=;
/* Set Server Context Object via DeployedComponents Association for the SAS Library Object */
rc=metadata_getnobj("omsobj:ServerContext?#Name='SASApp'",1,muri);
put muri=;
rc=metadata_setassn(luri,
"DeployedComponents",
"Append",
muri);
put rc=;
Run;
proc metalib;
omr (library="&Metalibname.");
report;
run;
I finally got around to this - there are a few things to consider!
1) Making sure all the necessary objects exist (to avoid orphan metadata data)
2) Checking to ensure that objects are successfully created
3) Checking to avoid creating the library twice (idempotence)
4) General preference to avoid data step metadata functions and the corresponding risk of infinite loops
The XML part of the program looks like this:
/**
* Prepare the XML and create the library
*/
data _null_;
file &frefin;
treeuri=quote(symget('treeuri'));
serveruri=quote(symget('serveruri'));
directoryuri=quote(symget('directoryuri'));
libname=quote(symget('libname'));
libref=quote(symget('libref'));
IsPreassigned=quote(symget('IsPreassigned'));
prototypeuri=quote(symget('prototypeuri'));
/* escape description so it can be stored as XML */
libdesc=tranwrd(symget('libdesc'),'&','&');
libdesc=tranwrd(libdesc,'<','<');
libdesc=tranwrd(libdesc,'>','>');
libdesc=tranwrd(libdesc,"'",'&apos;');
libdesc=tranwrd(libdesc,'"','"');
libdesc=tranwrd(libdesc,'0A'x,'
');
libdesc=tranwrd(libdesc,'0D'x,'
');
libdesc=quote(trim(libdesc));
put "<AddMetadata><Reposid>$METAREPOSITORY</Reposid><Metadata> "/
'<SASLibrary Desc=' libdesc ' Engine="BASE" IsDBMSLibname="0" '/
' IsHidden="0" IsPreassigned=' IsPreassigned ' Libref=' libref /
' UsageVersion="1000000" PublicType="Library" name=' libname '>'/
' <DeployedComponents>'/
' <ServerContext ObjRef=' serveruri "/>"/
' </DeployedComponents>'/
' <PropertySets>'/
' <PropertySet Name="ModifiedByProductPropertySet" '/
' SetRole="ModifiedByProductPropertySet" UsageVersion="0" />'/
' </PropertySets>'/
" <Trees><Tree ObjRef=" treeuri "/></Trees>"/
' <UsingPackages> '/
' <Directory ObjRef=' directoryuri ' />'/
' </UsingPackages>'/
' <UsingPrototype>'/
' <Prototype ObjRef=' prototypeuri '/>'/
' </UsingPrototype>'/
'</SASLibrary></Metadata><NS>SAS</NS>'/
'<Flags>268435456</Flags></AddMetadata>';
run;
For full code, check out the github repo.