Amazon Redshift incorrectly rounding up Numeric(9,4) value - amazon-web-services

I was trying to load the data e.g. 49.9999 into numeric(9,4) column. How ever through copy command it is rounding up the values to 50.00.
Copy command sample:
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
ROUNDEC BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
PRICE_BAND_LOWER and PRICE_BAND_UPPER are having data type as numeric(9,4) but while processing the data it is rounding up the data.
Please let me know how to handle this scenario.

The ROUNDEC parameter has to go. Rounds up numeric values when the scale of the input value is greater than the scale of the column. By default, COPY truncates values when necessary to fit the scale of the column.
COPY <table_name> (PRICE_BAND_CODE,PRICE_BAND_DESC,PROD_LEVEL1_CODE,PRICE_BAND_LOWER,PRICE_BAND_UPPER,PRICE_BAND_SEQ)
FROM '<s3 path>/PriceBandDIM.gz'
credentials 'aws_access_key_id=xxxxxxxxxxxx;aws_secret_access_key=xxxxxxxxxxxxxxxx'
delimiter '|'
IGNOREBLANKLINES EMPTYASNULL GZIP NULL AS '\000'
BLANKSASNULL TRIMBLANKS REMOVEQUOTES
STATUPDATE ON IGNOREHEADER 0;
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Related

Change the delimiter in AWS Glue Pyspark

abv_data = glueContext.create_dynamic_frame_from_options("s3", \
{'paths': ["s3://{}/{}".format(bucket, prefix)], \
"recurse":True, 'groupFiles': 'inPartition'},"csv",{'withHeader':True},separator='\t')
abv_df_1 = abv_data.toDF()
abv_df_2 = abv_df_1.withColumn("save_date", lit(datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")))
conparms_r = glueContext.extract_jdbc_conf("reporting", catalog_id = None)
abv_df_2.write\
.format("com.databricks.spark.redshift")\
.option("url", "jdbc:redshift://rs_cluster:8192/rptg")\
.option("dbtable", redshift_schema_table_output)\
.option("user", conparms_r['user'])\
.option("password", conparms_r['password'])\
.option("aws_iam_role", "arn:aws:iam::123456789:role/redshift_admin_role")\
.option("tempdir", args["TempDir"])\
.option("extracopyoptions","DELIMITER '\t' IGNOREHEADER 1 DATEFORMAT AS 'YYYY-MM-DD'")\
.mode("append")\
.save()
The csv has a tab delimiter on read, but when I add the column to the dataframe is uses a comma delimiter and is causing the Redshift load to fail.
Is there a way to add the column with a tab delimiter OR change the delimiter on the entire data frame?
This isn't necessarily the way to do this, but here is what I ended up doing:
bring the csv in with a ',' separator.
glueContext.create_dynamic_frame_from_options("s3", \
{'paths': ["s3://{}/{}".format(bucket, prefix)], \
"recurse":True, 'groupFiles': 'inPartition'},"csv",{'withHeader':True}, separator = ',')
Then split the first column on tab and then add all the splits to their own column and add the extra column at the same time.
Drop the first column because it is still the combined column.
This gives you a comma seperated df to load.
Use spark.read.option("delimiter", "\t").csv(file) or sep instead of delimiter.
For, special character, use double \: spark.read.option("delimiter", "\\t").csv(file)

Snowflake - getting 'Error parsing JSON' while using the Copy command from S3 to snowflake

i'm trying to copy gz files from my S3 directory to Snowflake.
i created a table in snowflake (notice that the 'extra' field is defined as 'Variant')
CREATE TABLE accesslog
(
loghash VARCHAR(32) NOT NULL,
logdatetime TIMESTAMP,
ip VARCHAR(15),
country VARCHAR(2),
querystring VARCHAR(2000),
version VARCHAR(15),
partner INTEGER,
name VARCHAR(100),
countervalue DOUBLE PRECISION,
username VARCHAR(50),
gamesessionid VARCHAR(36),
gameid INTEGER,
ingameid INTEGER,
machineuid VARCHAR(36),
extra variant,
ingame_window_name VARCHAR(2000),
extension_id VARCHAR(50)
);
i used this copy command in snowflake:
copy INTO accesslog
FROM s3://XXX
pattern='.*cds_201911.*'
CREDENTIALS = (
aws_key_id='XXX',
aws_secret_key='XXX')
FILE_FORMAT=(
error_on_column_count_mismatch=false
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TYPE = CSV
COMPRESSION = GZIP
FIELD_DELIMITER = '\t'
)
ON_ERROR = CONTINUE
I run it, and got this result (i got many error lines, this is an example to one)
snowflake result
snowflake result -more
a17589e44ae66ffb0a12360beab5ac12 2019-11-01 00:08:39 155.4.208.0 SE 0.136.0 3337 game_process_detected 0 OW_287d4ea0-4892-4814-b2a8-3a5703ae68f3 e9464ba4c9374275991f15e5ed7add13 765 19f030d4-f85f-4b85-9f12-6db9360d7fcc [{"Name":"file","Value":"wowvoiceproxy.exe"},{"Name":"folder","Value":"C:\\Program Files (x86)\\World of Warcraft\\_retail_\\Utils\\WowVoiceProxy.exe"}]
can you please tell me what cause this error?
thanks!
I'm guessing;
The 'Error parsing JSON' is certainly related to the extra variant field.
The JSON looks fine, but there are potential problems with the backslashes \.
If you look at the successfully loaded lines, have the backslashes been removed?
This can (maybe) happen if you have STAGE settings involving escape characters.
The \\Utils substring in the Windows path value can then trigger a Unicode decode error, eg.
Error parsing JSON: hex digit is expected in \U???????? escape sequence, pos 123
UPDATE:
It turns out you have to turn off escape char processing by adding the following to the FILE_FORMAT:
ESCAPE_UNENCLOSED_FIELD = NONE
The alternative is to doublequote fields or to doubly escape backslash, eg. C:\\\\Program Files.

Redshift Copy with Newline Embedded in Quotes

Trying to copy data from S3 to Redshift with a newline with in quotes
Example CSV file:
Line 1 --> ID,Description,flag
Line 2 --> "1111","this is a test", "FALSE"
Line 3 --> "2222","I hope someone
could help", "TRUE"
Line 4 --> "3333", "NA", "FALSE"
Sample Table:
TEST_TABLE:
ID VARCHAR(100)
DESCRIPTION VARCHAR(100)
FLAG VARCHAR(100)
If you notice in line 2 there is a linefeed and I get the error Delimited value missing end quote when using the COPY command.
This is the Copy command I use:
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
DELIMITER ','
removequotes
trimblanks
ESCAPE ACCEPTINVCHARS
EMPTYASNULL
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I've also tried the CSV option, but get "Extra column(s) found ":
copy table_name
from sample.csv
credentials aws_access_key_id= blah; aws_secret_access_key=blah
CSV
IGNOREHEADER 1
COMPUPDATE OFF;
commit;
I would expect the description column in Line 2 to be loaded with the linefeed.
Since the field is delimited by quotes, use the CSV option.
Note: CSV cannot be used with FIXEDWIDTH, REMOVEQUOTES, or ESCAPE.

Redshift Copy errors out when trying to load NUL

I am loading data to Redshift using Copy. The text file has NUL.
I have looked at several options and tried using options such as:
null as '\0' EMPTYASNULL ACCEPTINVCHARS TRIMBLANKS TRUNCATECOLUMNS escape
However, it still errors out.
Below sample records and the error message.
NUL is after Main St|
2278|2047|5|1|1|1|18 N Main St| |Bowman|1|39|16443|15811|58623|Y|544|2018-05-21 17:29:12.000||||
2491|2047|6|1|1|1|18 N Main| |Bowman|1|39|16443|15811|58623-9613|Y|920|2018-11-26 18:28:26.000||||
2491|2047|7|1|1|1|18 N Main| |Bowman|1|39|16443|15811|58623-9613|Y|920|2018-11-26 18:28:26.000||||
2408|2154|7|1|1|1|101 Main St| |Lakota|1|39|16469|15956|58344|Y|447|2018-08-17 08:10:54.000||||
copy table1 from 's3://....txt' iam_role xx delimiter '|' null as '\0' EMPTYASNULL ACCEPTINVCHARS TRIMBLANKS TRUNCATECOLUMNS escape;
Missing newline: Unexpected character 0x7d found at location nn

Redshift - Load data which has newline in field

I am trying to load the data that includes a new line within a field:
001|myname|fav\
movie | myaddress| myphone|
There is a blank line between fav\movie.
I am loading the data with this command:
COPY catdemo
FROM 's3://tickit/catego.csv'
IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'
REGION 'ap-south-1'
DELIMITER '|'
ESCAPE
ACCEPTINVCHARS
IGNOREBLANKLINES
NULL AS '\0';
I want to ignore this blank line, can anyone help me?
its showing delimiter not found between fav\ and movie, but its actually a single line.
fav\
movie