I've created glue table (external) via terraform where I din't put location of the table.
Location of the table should be updated after app run. And when app runs it receives an exception:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: ',', ':', or ';' expected at position 291 from 'bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:smallint:smallint:smallint:decimal(12,2):decimal(12,2):decimal(12,2):bigint:string:bigint:string:timestamp:timestamp:bigint:bigint:bigint:bigint:bigint:string:string:decimal(12,2) :bigint:timestamp:string:bigint:decimal(12,2):string:bigint:bigint:timestamp:int' [0:bigint, 6::, 7:bigint, 13::, 14:bigint, 20::, 21:bigint, 27::, 28:bigint, 34::, 35:bigint, 41::, 42:bigint, 48::, 49:bigint, 55::, 56:bigint, 62::, 63:bigint, 69::, 70:bigint, 76::, 77:bigint, 83::, 84:bigint, 90::, 91:bigint, 97::, 98:string, 104::, 105:string, 111::, 112:smallint, 120::, 121:smallint, 129::, 130:smallint, 138::, 139:decimal, 146:(, 147:12, 149:,, 150:2, 151:), 152::, 153:decimal, 160:(, 161:12, 163:,, 164:2, 165:), 166::, 167:decimal, 174:(, 175:12, 177:,, 178:2, 179:), 180::, 181:bigint, 187::, 188:string, 194::, 195:bigint, 201::, 202:string, 208::, 209:timestamp, 218::, 219:timestamp, 228::, 229:bigint, 235::, 236:bigint, 242::, 243:bigint, 249::, 250:bigint, 256::, 257:bigint, 263::, 264:string, 270::, 271:string, 277::, 278:decimal, 285:(, 286:12, 288:,, 289:2, 290:), 291: , 292::, 293:bigint, 299::, 300:timestamp, 309::, 310:string, 316::, 317:bigint, 323::, 324:decimal, 331:(, 332:12, 334:,, 335:2, 336:), 337::, 338:string, 344::, 345:bigint, 351::, 352:bigint, 358::, 359:timestamp, 368::, 369:int]
This exception kind of represents fields which were defined in terraform.
From aws console I couldn't set location after table was created. When I connected to AWS EMR which uses Glue metastore and tried to execute same query I receive same exception.
So I have several questions:
Does anybody know how to alter empty location of the external glue table?
The default location of the table should looks like that hive/warehouse/dbname.db/tablename. So what is the correct path in that case in EMR ?
Using jmespath and given the below json, how would I filter so only JobNames starting with "analytics" are returned?
For more context, the json was returned by the aws cli command aws glue list-jobs
{
"JobNames": [
"analytics-job1",
"analytics-job2",
"team2-job"
]
}
Tried this
JobNames[?starts_with(JobNames, `analytics`)]
but it failed with
In function starts_with(), invalid type for value: None, expected one
of: ['string'], received: "null"
Above I extracted the jmespath bit, but here is the entire aws cli command I tried and failed is this
aws glue list-jobs --query '{"as_string": to_string(JobNames[?starts_with(JobNames, `analytics`)])}'
I couldn't test it on list-jobs but the query part works on list-crawlers. Just replaced the JobNames with CrawlerNames.
aws glue list-jobs --query 'JobNames[?starts_with(#, `analytics`) == `true`]'
I am trying to pass the sql file as input to the shell script and executing athena command inside the script. Somehow, UNIX creating single quote (') around the string (i.e. sql - select count(*) from table )
abc.sh <select_query.sql>
read_query=`cat select_query.sql`
dml=${read_query}
Here is the command --
aws athena start-query-execution --query-string "\"$dml\"" --result-configuration OutputLocation=s3://bucket1/ --output text
Error:
An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:1: mismatched input '"select count(*) from ecrmsfdc_raw.user;"' expecting {'(', 'SELECT', 'DESC', 'USING', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'INSERT', 'DELETE', 'DESCRIBE', 'GRANT', 'REVOKE', 'EXPLAIN', 'SHOW', 'USE', 'DROP', 'ALTER', 'SET', 'RESET', 'START', 'COMMIT', 'ROLLBACK', 'CALL', 'PREPARE', 'DEALLOCATE', 'EXECUTE'}
If you notice above error, there are single quotes at the start/end of the select query. I've tried the following options...
tr, eval, etc...
but still I was not able to get rid of those single quotes. Because of this quotes, the athena command is failing. Is there any easy way to remove this quotes prior to include it in the command. Please help...
I am trying to load S3 data into redshift using COPY command using following jsonPaths
{
_meta-id : 1,
payload: {..}
}
In my redshift table, I want to store entire JSON doc as my second column
{
"jsonpaths": [
"$['_meta-id']",
"$"
]
}
This gives error
Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $
Query:
copy table_name
from 's3://abc/2018/12/15/1'
json 's3://xyz/jsonPaths';
[Amazon](500310) Invalid operation: Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $..
Details:
-----------------------------------------------
error: Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation': $
code: 8001
context:
query: 21889
location: s3_utility.cpp:672
process: padbmaster [pid=11925]
-----------------------------------------------;
1 statement failed.
Can someone help?
I am attempting to copy an lzop-compresed file from S3 to Redshift. The file was originally generated by using S3DistCp with the --outputCodec lzo option.
The S3 file seems to be compressed correctly, since I can successfully download and inflate it at the command line:
lzop -d downloaded_file.lzo
But when I attempt to load it into Redshift, I get an error:
COPY atomic.events FROM 's3://path-to/bucket/' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' REGION AS 'eu-west-1' DELIMITER '\t' MAXERROR 1 EMPTYASNULL FILLRECORD TRUNCATECOLUMNS TIMEFORMAT 'auto' ACCEPTINVCHARS LZOP;
ERROR: failed to inflate with lzop: unexpected end of file.
DETAIL:
-----------------------------------------------
error: failed to inflate with lzop: unexpected end of file.
code: 9001
context: S3 key being read : s3://path-to/bucket/
query: 244
location: table_s3_scanner.cpp:348
process: query0_60 [pid=5615]
-----------------------------------------------
Any ideas on what might be causing the load to fail?
Try specifying the exact file name.
s3://path-to/bucket/THE_FILE_NAME.extension
The code you used will iterate through all the files available there. Looks like there may be other type of files in the same folder (ex: manifest)
COPY atomic.events
FROM 's3://path-to/bucket/THE_FILE_NAME.extension'
CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
REGION AS 'eu-west-1'
DELIMITER '\t'
MAXERROR 1
EMPTYASNULL
FILLRECORD
TRUNCATECOLUMNS
TIMEFORMAT 'auto'
ACCEPTINVCHARS
LZOP;