Change location of the glue table - amazon-web-services

I've created glue table (external) via terraform where I din't put location of the table.
Location of the table should be updated after app run. And when app runs it receives an exception:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Error: ',', ':', or ';' expected at position 291 from 'bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:bigint:string:string:smallint:smallint:smallint:decimal(12,2):decimal(12,2):decimal(12,2):bigint:string:bigint:string:timestamp:timestamp:bigint:bigint:bigint:bigint:bigint:string:string:decimal(12,2) :bigint:timestamp:string:bigint:decimal(12,2):string:bigint:bigint:timestamp:int' [0:bigint, 6::, 7:bigint, 13::, 14:bigint, 20::, 21:bigint, 27::, 28:bigint, 34::, 35:bigint, 41::, 42:bigint, 48::, 49:bigint, 55::, 56:bigint, 62::, 63:bigint, 69::, 70:bigint, 76::, 77:bigint, 83::, 84:bigint, 90::, 91:bigint, 97::, 98:string, 104::, 105:string, 111::, 112:smallint, 120::, 121:smallint, 129::, 130:smallint, 138::, 139:decimal, 146:(, 147:12, 149:,, 150:2, 151:), 152::, 153:decimal, 160:(, 161:12, 163:,, 164:2, 165:), 166::, 167:decimal, 174:(, 175:12, 177:,, 178:2, 179:), 180::, 181:bigint, 187::, 188:string, 194::, 195:bigint, 201::, 202:string, 208::, 209:timestamp, 218::, 219:timestamp, 228::, 229:bigint, 235::, 236:bigint, 242::, 243:bigint, 249::, 250:bigint, 256::, 257:bigint, 263::, 264:string, 270::, 271:string, 277::, 278:decimal, 285:(, 286:12, 288:,, 289:2, 290:), 291: , 292::, 293:bigint, 299::, 300:timestamp, 309::, 310:string, 316::, 317:bigint, 323::, 324:decimal, 331:(, 332:12, 334:,, 335:2, 336:), 337::, 338:string, 344::, 345:bigint, 351::, 352:bigint, 358::, 359:timestamp, 368::, 369:int]
This exception kind of represents fields which were defined in terraform.
From aws console I couldn't set location after table was created. When I connected to AWS EMR which uses Glue metastore and tried to execute same query I receive same exception.
So I have several questions:
Does anybody know how to alter empty location of the external glue table?
The default location of the table should looks like that hive/warehouse/dbname.db/tablename. So what is the correct path in that case in EMR ?

Related

BQ - No permission to INFORMATION_SCHEMA.JOBS

Issue description
Trying to insert DML statistics from Bigquery system tables to BQ native tables for monitoring purpose using Airflow task.
For this need, I am using below query:
INSERT INTO
`my-project-id.my_dataset.my_table_metrics`
(table_name,
row_count,
inserted_row_count,
updated_row_count,
creation_time)
SELECT
b.table_name,
a.row_count AS row_count,
b.inserted_row_count,
b.updated_row_count,
b.creation_time
FROM
`my-project-id.my_dataset`.__TABLES__ a
JOIN (
SELECT
tables.table_id AS table_name,
dml_statistics.inserted_row_count AS inserted_row_count,
dml_statistics.updated_row_count AS updated_row_count,
creation_time AS creation_time
FROM
`my-project-id`.`region-europe-west3`.INFORMATION_SCHEMA.JOBS,
UNNEST(referenced_tables) AS tables
WHERE
DATE(creation_time) = current_date ) b
ON
a.table_id = b.table_name
WHERE
a.table_id = 'my_bq_table'
The query is working in Bigquery console but not working via airflow.
Error as per Airflow
python_http_client.exceptions.UnauthorizedError: HTTP Error 401:
Unauthorized
[2022-10-21, 12:18:30 UTC] {standard_task_runner.py:93}
ERROR - Failed to execute job 313922 for task load_metrics (403 Access
Denied: Table
my-project-id:region-europe-west3.INFORMATION_SCHEMA.JOBS: User does
not have permission to query table
my-project-id:region-europe-west3.INFORMATION_SCHEMA.JOBS, or perhaps
it does not exist in location europe-west3.
How to fix this access issue?
Appricate your help.

Athena Create Table FAILED: ParseException missing EOF

I am experiencing this weird scenario, where using the AWS CLI fails with the exception
FAILED: ParseException line 1:9 missing EOF at '-' near 'datas'
But running the same exact query in the Athena UI, after it failed, basically just hitting Run Again in the UI works fine.
I run the AWS cli with:
aws athena start-query-execution --query-string "CREATE EXTERNAL TABLE IF NOT EXISTS \`some_lake_tables\`.\`some_table\` (\`some_name\` STRING, \`symbol\` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://some-lake-poc/feeds_dir/temp_tables/input_table.csv' TBLPROPERTIES ('classification'='csv', 'skip.header.line.count'='1')" --query-execution-context "Database"="datas-data" --result-configuration "OutputLocation"="s3://some-lake-poc/athena-results"

AWS unable to get query result because of ResourceNotFoundException

I'm trying to get cloudwatch query with boto3, but I'm getting ResourceNotFoundException.
import boto3
if __name__ == "__main__":
client = boto3.client('logs')
response = client.start_query(
logGroupName='/aws/lambda/My-Stack-Name-SE349DJ',
startTime=123,
endTime=123,
queryString="fields #message",
limit=1
)
I attempted to the above code. And an error message is as follows.
botocore.errorfactory.ResourceNotFoundException: An error occurred (ResourceNotFoundException) when calling the StartQuery operation: Log group '/aws/lambda/My-Stack-Name-SE349DJ' does not exist for account ID '11111111' (Service: AWSLogs; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: xxxxx-xxxx-xxx; Proxy: null)
What I tested are as below.
The log group exists. I tested it with Logs Insights on the aws console. Also I tested after paste the log group as it is.
I added a backslash to test if '/' is a problem (ex. '/aws/lambda/My-Stack-Name-SE349DJ') and InvalidParameterException appears.
The aws account has administrate access privileges in the log group.
I got the same error message when I tested with aws cli.
An error occurred (ResourceNotFoundException) when calling the StartQuery operation: Log group 'XXXXXXXXXXXXXX' does not exist for account ID '11111111' (Service: AWSLogs; Status Code: 400; Error Code: ResourceNotFoundException; Request ID: xxxxx-xxxx-xxx; Proxy: null)
How can I solve this problem?
Actually the reason why I'm trying this is because I need to get more than 500,000 data from the filtered log group, but 10,000 are the maximum. I think It's better to pull it out by changing the start time and end time.
There is a high possibility that there are too many data in certain time, so I think it would be better to run it with boto3 rather than directly. Is there an easy way to extract more than 500,000 pieces of data from the console or other methods?
As #Marcin commented, It was because of the region configuration.
I added these lines before creating an aws client.
from botocore.config import Config
...
my_config = Config(
region_name = 'us-east-2',
)
...
client = boto3.client('logs', config=my_config)

AWS Batch - Access denied 403

I am using AWS Batch with ECS to perform a job which need to send a request to Athena. I use python boto3 to send the query and the get the request status :
start_query_execution : work fine
get_query_execution : have an error !
When I try to get the query execution I have the following error :
{'QueryExecution': {'QueryExecutionId': 'XXXX', 'Query': "SELECT * FROM my_table LIMIT 10 ", 'StatementType': 'DML', 'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/query_id.csv'}, 'QueryExecutionContext': {'Database': 'my_database'}, 'Status': {'State': 'FAILED', 'StateChangeReason': '**Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 4.**. ; S3 Extended Request ID: ....=)'
I have the all permissions to the container role (only to test) :
s3:*
athena : *
glue : *
I face this problem only in container in AWS batch : with the same policy and code in a lambda it's working !
Any help will be appreciated.
In Athena Output location what I have been using Athena bucket name not file name.
As result set will be generated which will have its own id
'ResultConfiguration': {'OutputLocation': 's3://my_bucket_name/athena-results/'}
If ypu are not sure of the bucket for query you can check in query console -->settings

AWS AMI - fail to create json import file

I created a bucket in Amazon S3, uploaded my FreePBX.ova, and created permissions, etc. When I run this command:
aws ec2 import-image --cli-input-json "{\"Description\":\"freepbx\", \"DiskContainers\":[{\"Description\":\"freepbx\",\"UserBucket\":{\"S3Bucket\":\"itbucket\",\"S3Key\":\"FreePBX.ova\"}}]}"
I get:
Error parsing parameter 'cli-input-json': Invalid JSON: Extra data: line 1 column 135 - line 1 column 136 (char 134 - 135)
JSON received: {"Description":"freepbx", "DiskContainers":[{"Description":"freepbx","UserBucket":{"S3Bucket":"itbucket","S3Key":"FreePBX.ova"}}]}?
And I can't continue the process. I tried to Google it with no results.
What is wrong with this command? How can I solve it?