Trying to copy data from S3 to Redshift with a newline with in quotes
Example CSV file:
Line 1 --> ID,Description,flag
Line 2 --> "1111","this is a test", "FALSE"
Line 3 --> "2222","I hope someone
could help", "TRUE"
Line 4 --> "3333", "NA", "FALSE"
Sample Table:
TEST_TABLE:
ID VARCHAR(100)
DESCRIPTION VARCHAR(100)
FLAG VARCHAR(100)
If you notice in line 2 there is a linefeed and I get the error Delimited value missing end quote when using the COPY command.
This is the Copy command I use:
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
DELIMITER ','
removequotes
trimblanks
ESCAPE ACCEPTINVCHARS
EMPTYASNULL
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I've also tried the CSV option, but get "Extra column(s) found ":
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
CSV
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I would expect the description column in Line 2 to be loaded with the linefeed.
Since the field is delimited by quotes, use the CSV option.
Note: CSV cannot be used with FIXEDWIDTH, REMOVEQUOTES, or ESCAPE.
Related
I would like to prepare a manifest file using Lambda and then execute the stored procedure providing input parameter manifest_location.
Stored procedure signature:
CREATE OR REPLACE PROCEDURE stage.sp_stage_user_activity_page_events(manifest_location varchar(256))
and I would like to use this parameter as follows:
COPY stage.user_activity_event
FROM manifest_location
IAM_ROLE 'arn:aws:iam::XXX:role/redshift-s3-read-only-role'
IGNOREHEADER 1
REMOVEQUOTES
DELIMITER ','
LZOP
MANIFEST;
but Redshift is giving me ERROR:
syntax error at or near "$1" Where: SQL statement in PL/PgSQL function "sp_stage_user_activity_page_events" near line 21
How can I achieve this?
Could you try the command below?
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
Please, let me know if you have any more questions, because I created one stored_procedure, as below today:
CREATE TABLE table_for_error_log (message varchar);
CREATE OR REPLACE procedure sp_proc_name(parameter_for_table in VARCHAR, parameter_for_s3_path in VARCHAR, parameter_for_iam_role in VARCHAR)
AS
$$
BEGIN
EXECUTE 'COPY '|| parameter_for_table ||' FROM '||CHR(39)|| parameter_for_s3_path ||CHR(39)||' IAM_ROLE '||CHR(39)|| parameter_for_iam_role ||CHR(39)||' FORMAT AS PARQUET;';
EXCEPTION WHEN OTHERS THEN
RAISE INFO 'An exception occurred.';
INSERT INTO table_for_error_log VALUES ('Error message: ' || SQLERRM);
END;
$$ LANGUAGE plpgsql;
It's already tested and worked.
Just replace it with your own command.
i don't have enough reputation to upvote the above answer but it works!.
you can also use the quote_literal function eg:
DATEFORMAT as ' || quote_literal('auto') ||
'TIMEFORMAT as '|| quoteliteral('MM/DD/YYYY HH24:MI:SS') ||
etc..
i'm trying to copy gz files from my S3 directory to Snowflake.
i created a table in snowflake (notice that the 'extra' field is defined as 'Variant')
CREATE TABLE accesslog
(
loghash VARCHAR(32) NOT NULL,
logdatetime TIMESTAMP,
ip VARCHAR(15),
country VARCHAR(2),
querystring VARCHAR(2000),
version VARCHAR(15),
partner INTEGER,
name VARCHAR(100),
countervalue DOUBLE PRECISION,
username VARCHAR(50),
gamesessionid VARCHAR(36),
gameid INTEGER,
ingameid INTEGER,
machineuid VARCHAR(36),
extra variant,
ingame_window_name VARCHAR(2000),
extension_id VARCHAR(50)
);
i used this copy command in snowflake:
copy INTO accesslog
FROM s3://XXX
pattern='.*cds_201911.*'
CREDENTIALS = (
aws_key_id='XXX',
aws_secret_key='XXX')
FILE_FORMAT=(
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TYPE = CSV
COMPRESSION = GZIP
FIELD_DELIMITER = '\t'
)
ON_ERROR = CONTINUE
I run it, and got this result (i got many error lines, this is an example to one)
snowflake result
snowflake result -more
a17589e44ae66ffb0a12360beab5ac12 2019-11-01 00:08:39 155.4.208.0 SE 0.136.0 3337 game_process_detected 0 OW_287d4ea0-4892-4814-b2a8-3a5703ae68f3 e9464ba4c9374275991f15e5ed7add13 765 19f030d4-f85f-4b85-9f12-6db9360d7fcc [{"Name":"file","Value":"wowvoiceproxy.exe"},{"Name":"folder","Value":"C:\\Program Files (x86)\\World of Warcraft\\_retail_\\Utils\\WowVoiceProxy.exe"}]
can you please tell me what cause this error?
thanks!
I'm guessing;
The 'Error parsing JSON' is certainly related to the extra variant field.
The JSON looks fine, but there are potential problems with the backslashes \.
If you look at the successfully loaded lines, have the backslashes been removed?
This can (maybe) happen if you have STAGE settings involving escape characters.
The \\Utils substring in the Windows path value can then trigger a Unicode decode error, eg.
Error parsing JSON: hex digit is expected in \U???????? escape sequence, pos 123
UPDATE:
It turns out you have to turn off escape char processing by adding the following to the FILE_FORMAT:
ESCAPE_UNENCLOSED_FIELD = NONE
The alternative is to doublequote fields or to doubly escape backslash, eg. C:\\\\Program Files.
I am loading data to Redshift using Copy. The text file has NUL.
I have looked at several options and tried using options such as:
null as '\0' EMPTYASNULL ACCEPTINVCHARS TRIMBLANKS TRUNCATECOLUMNS escape
However, it still errors out.
Below sample records and the error message.
NUL is after Main St|
2278|2047|5|1|1|1|18 N Main St| |Bowman|1|39|16443|15811|58623|Y|544|2018-05-21 17:29:12.000||||
2491|2047|6|1|1|1|18 N Main| |Bowman|1|39|16443|15811|58623-9613|Y|920|2018-11-26 18:28:26.000||||
2491|2047|7|1|1|1|18 N Main| |Bowman|1|39|16443|15811|58623-9613|Y|920|2018-11-26 18:28:26.000||||
2408|2154|7|1|1|1|101 Main St| |Lakota|1|39|16469|15956|58344|Y|447|2018-08-17 08:10:54.000||||
copy table1 from 's3://....txt' iam_role xx delimiter '|' null as '\0' EMPTYASNULL ACCEPTINVCHARS TRIMBLANKS TRUNCATECOLUMNS escape;
Missing newline: Unexpected character 0x7d found at location nn
I am trying to load the data that includes a new line within a field:
001|myname|fav\
movie | myaddress| myphone|
There is a blank line between fav\movie.
I am loading the data with this command:
COPY catdemo
FROM 's3://tickit/catego.csv'
IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'
REGION 'ap-south-1'
DELIMITER '|'
ESCAPE
ACCEPTINVCHARS
IGNOREBLANKLINES
NULL AS '\0';
I want to ignore this blank line, can anyone help me?
its showing delimiter not found between fav\ and movie, but its actually a single line.
fav\
movie
I was trying to load the data e.g. 49.9999 into numeric(9,4) column. How ever through copy command it is rounding up the values to 50.00.
Copy command sample:
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
ROUNDEC BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
PRICE_BAND_LOWER and PRICE_BAND_UPPER are having data type as numeric(9,4) but while processing the data it is rounding up the data.
Please let me know how to handle this scenario.
The ROUNDEC parameter has to go. Rounds up numeric values when the scale of the input value is greater than the scale of the column. By default, COPY truncates values when necessary to fit the scale of the column.
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
If I’ve made a bad assumption please comment and I’ll refocus my answer.