Cannot load lzop-compressed files from S3 into Redshift - amazon-web-services

I am attempting to copy an lzop-compresed file from S3 to Redshift. The file was originally generated by using S3DistCp with the --outputCodec lzo option.
The S3 file seems to be compressed correctly, since I can successfully download and inflate it at the command line:
lzop -d downloaded_file.lzo
But when I attempt to load it into Redshift, I get an error:
COPY atomic.events FROM 's3://path-to/bucket/' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' REGION AS 'eu-west-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS LZOP;
ERROR: failed to inflate with lzop: unexpected end of file.
DETAIL:
-----------------------------------------------
error: failed to inflate with lzop: unexpected end of file.
code: 9001
context: S3 key being read : s3://path-to/bucket/
query: 244
location: table_s3_scanner.cpp:348
process: query0_60 [pid=5615]
-----------------------------------------------
Any ideas on what might be causing the load to fail?

Try specifying the exact file name.
s3://path-to/bucket/THE_FILE_NAME.extension
The code you used will iterate through all the files available there. Looks like there may be other type of files in the same folder (ex: manifest)
COPY atomic.events
FROM 's3://path-to/bucket/THE_FILE_NAME.extension'
CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
REGION AS 'eu-west-1'
DELIMITER '\t'
MAXERROR 1
EMPTYASNULL
FILLRECORD
TRUNCATECOLUMNS
TIMEFORMAT 'auto'
ACCEPTINVCHARS
LZOP;

Related

Athena Create Table FAILED: ParseException missing EOF

I am experiencing this weird scenario, where using the AWS CLI fails with the exception
FAILED: ParseException line 1:9 missing EOF at '-' near 'datas'
But running the same exact query in the Athena UI, after it failed, basically just hitting Run Again in the UI works fine.
I run the AWS cli with:
aws athena start-query-execution --query-string "CREATE EXTERNAL TABLE IF NOT EXISTS \`some_lake_tables\`.\`some_table\` (\`some_name\` STRING, \`symbol\` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://some-lake-poc/feeds_dir/temp_tables/input_table.csv' TBLPROPERTIES ('classification'='csv', 'skip.header.line.count'='1')" --query-execution-context "Database"="datas-data" --result-configuration "OutputLocation"="s3://some-lake-poc/athena-results"

Change location of the glue table

I've created glue table (external) via terraform where I din't put location of the table.
Location of the table should be updated after app run. And when app runs it receives an exception:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: ',', ':', or ';' expected at position 291 from 'bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:smallint:smallint:smallint:decimal(12,2):decimal(12,2):decimal(12,2):bigint:string:bigint:string:timestamp:timestamp:bigint:bigint:bigint:bigint:bigint:string:string:decimal(12,2) :bigint:timestamp:string:bigint:decimal(12,2):string:bigint:bigint:timestamp:int' [0:bigint, 6::, 7:bigint, 13::, 14:bigint, 20::, 21:bigint, 27::, 28:bigint, 34::, 35:bigint, 41::, 42:bigint, 48::, 49:bigint, 55::, 56:bigint, 62::, 63:bigint, 69::, 70:bigint, 76::, 77:bigint, 83::, 84:bigint, 90::, 91:bigint, 97::, 98:string, 104::, 105:string, 111::, 112:smallint, 120::, 121:smallint, 129::, 130:smallint, 138::, 139:decimal, 146:(, 147:12, 149:,, 150:2, 151:), 152::, 153:decimal, 160:(, 161:12, 163:,, 164:2, 165:), 166::, 167:decimal, 174:(, 175:12, 177:,, 178:2, 179:), 180::, 181:bigint, 187::, 188:string, 194::, 195:bigint, 201::, 202:string, 208::, 209:timestamp, 218::, 219:timestamp, 228::, 229:bigint, 235::, 236:bigint, 242::, 243:bigint, 249::, 250:bigint, 256::, 257:bigint, 263::, 264:string, 270::, 271:string, 277::, 278:decimal, 285:(, 286:12, 288:,, 289:2, 290:), 291: , 292::, 293:bigint, 299::, 300:timestamp, 309::, 310:string, 316::, 317:bigint, 323::, 324:decimal, 331:(, 332:12, 334:,, 335:2, 336:), 337::, 338:string, 344::, 345:bigint, 351::, 352:bigint, 358::, 359:timestamp, 368::, 369:int]
This exception kind of represents fields which were defined in terraform.
From aws console I couldn't set location after table was created. When I connected to AWS EMR which uses Glue metastore and tried to execute same query I receive same exception.
So I have several questions:
Does anybody know how to alter empty location of the external glue table?
The default location of the table should looks like that hive/warehouse/dbname.db/tablename. So what is the correct path in that case in EMR ?

Loading entire Json blob as is from S3 in AWS Redshift using COPY gives error

I am trying to load S3 data into redshift using COPY command using following jsonPaths
{
_meta-id : 1,
payload: {..}
}
In my redshift table, I want to store entire JSON doc as my second column
{
"jsonpaths": [
"$['_meta-id']",
"$"
]
}
This gives error
Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $
Query:
copy table_name
from 's3://abc/2018/12/15/1'
json 's3://xyz/jsonPaths';
[Amazon](500310) Invalid operation: Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $..
Details:
-----------------------------------------------
error: Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $
code: 8001
context:
query: 21889
location: s3_utility.cpp:672
process: padbmaster [pid=11925]
-----------------------------------------------;
1 statement failed.
Can someone help?

AWS AMI - fail to create json import file

I created a bucket in Amazon S3, uploaded my FreePBX.ova, and created permissions, etc. When I run this command:
aws ec2 import-image --cli-input-json "{\"Description\":\"freepbx\", \"DiskContainers\":[{\"Description\":\"freepbx\",\"UserBucket\":{\"S3Bucket\":\"itbucket\",\"S3Key\":\"FreePBX.ova\"}}]}"
I get:
Error parsing parameter 'cli-input-json': Invalid JSON: Extra data: line 1 column 135 - line 1 column 136 (char 134 - 135)
JSON received: {"Description":"freepbx", "DiskContainers":[{"Description":"freepbx","UserBucket":{"S3Bucket":"itbucket","S3Key":"FreePBX.ova"}}]}?
And I can't continue the process. I tried to Google it with no results.
What is wrong with this command? How can I solve it?

How to export SnowFlake S3 data file to my AWS S3?

Snowflake S3 data is in .txt.bz2, I need to export the data files present in this SnowFlake S3 to my AWS S3, the exported results must be the same format as in the source location.This is wat I tried.
COPY INTO #mystage/folder from
(select $1||'|'||$2||'|'|| $3||'|'|| $4||'|'|| $5||'|'||$6||'|'|| $7||'|'|| $8||'|'|| $9||'|'|| $10||'|'|| $11||'|'|| $12||'|'|| $13||'|'|| $14||'|'||$15||'|'|| $16||'|'|| $17||'|'||$18||'|'||$19||'|'|| $20||'|'|| $21||'|'|| $22||'|'|| $23||'|'|| $24||'|'|| $25||'|'||26||'|'|| $27||'|'|| $28||'|'|| $29||'|'|| $30||'|'|| $31||'|'|| $32||'|'|| $33||'|'|| $34||'|'|| $35||'|'|| $36||'|'|| $37||'|'|| $38||'|'|| $39||'|'|| $40||'|'|| $41||'|'|| $42||'|'|| $43
from #databasename)
CREDENTIALS = (AWS_KEY_ID = '*****' AWS_SECRET_KEY = '*****' )
file_format=(TYPE='CSV' COMPRESSION='BZ2');
PATTERN='*/*.txt.bz2
Right now Snowflake does not support exporting data to file in bz2.
My suggestion is to set COMPRESSION='gzip', then you can export the Data to your S3 in gzip.
If exporting file in bz2 is high priority for you, please contact Snowflake support.
If you want to unload bz2 file from a Snowflake stage to your own S3, you can do something like this.
COPY INTO #myS3stage/folder from
(select $1||'|'||$2||'|'|| $3||'|'|| $4||'|'|| $5||'|'||$6||'|'|| $7||'|'|| $8||'|'|| $9||'|'|| $10||'|'|| $11||'|'|| $12||'|'|| $13||'|'|| $14||'|'||$15||'|'|| $16||'|'|| $17||'|'||$18||'|'||$19||'|'|| $20||'|'|| $21||'|'|| $22||'|'|| $23||'|'|| $24||'|'|| $25||'|'||26||'|'|| $27||'|'|| $28||'|'|| $29||'|'|| $30||'|'|| $31||'|'|| $32||'|'|| $33||'|'|| $34||'|'|| $35||'|'|| $36||'|'|| $37||'|'|| $38||'|'|| $39||'|'|| $40||'|'|| $41||'|'|| $42||'|'|| $43
from #snowflakeStage(PATTERN => '*/*.txt.bz2'))
CREDENTIALS = (AWS_KEY_ID = '*****' AWS_SECRET_KEY = '*****' )
file_format=(TYPE='CSV');