AWS Hive query for INPUT__FILE__NAME - amazon-web-services

AWS Hive with GLUE metastore
i'm trying to get this query
select partition_name from mytable where
substr(reverse(split(reverse(INPUT__FILE__NAME), '/')[0]),5,8) = '20200705';
file names in S3 looks like abc_20200705
and get an issue from Glue
2020-08-11T23:20:55,496 FAILED: SemanticException InvalidObjectException(message:null (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 7f35e813-b495-4137-8eb7-c43cd09d))
is that possible to do "where" expression from INPUT__FILE__NAME virtual column?

Looks like Glue doesn't support filtering INPUT__FILE__NAME on a partitioned table. You can achieve this by using a subquery as shown below:
select
partition_name
from
(
select
substr(reverse(split(reverse(INPUT__FILE__NAME), '/')[0]), 5, 8) as t,* from mytable
)
tmp
where
t = '20200705';

Related

how to query AWS Athena where data is JsonSerDe format?

I need to query some data in AWS Athena. The source data in s3 is compressed json .gz format. It was created with the parameter
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
If I just do 'select *' there's one column like this:
{userid={s=my_email#gmail.com}, timestamp=2022-07-21 10:00:00, appID={s=greatApp}, etc.}
I am trying to query like this:
with dataset as
(select * FROM "default"."my_table" limit 10)
select json_extract(item, '$.userid') as user
from dataset;
But getting an error:
Expected: json_extract(varchar(x), JsonPath) , json_extract(json, JsonPath)
Is there something wrong with my query?
I got it. You just use "dot" notation to access the keys:
select item.userid.s as user,
item.timestamp,
item.appID.s as appID
from my_table limit 10;

Rename Column Name in Athena AWS

I have tried several ways to rename some column name in athena table.
after reading the following article
https://docs.aws.amazon.com/athena/latest/ug/alter-table-replace-columns.html
But I have get a no luck on it.
I tried
ALTER TABLE "users_data"."values_portions" REPLACE COLUMNS ('username/teradata' 'String', 'username_teradata' 'String')
Got error
no viable alternative at input 'alter table "users_data"."values_portions" replace' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: 23232ssdds.....; proxy: null)
You can refer to this document which talks about renaming columns. The query that you are trying to run will replace all the columns in the existing table with provided column list.
One strategy for renaming columns is to create a new table based on the same underlying data, but using new column names. The example mentioned in the link creates a new orders_parquet table called orders_parquet_column_renamed. The example changes the column o_totalprice name to o_total_price and then runs a query in Athena.
Another way of changing the column name is by simply going to AWS Glue -> Select database -> select table -> edit schema -> double click on column name -> type in new name -> save.

Unable to select data from AWS Athena table

I have created a table in Athena using below SQL
CREATE EXTERNAL TABLE IF NOT EXISTS xyzschema.my_table (
`col1` string,
`col2` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3://temp/my_table_data/'
TBLPROPERTIES ('has_encrypted_data'='false');
Post creation of table when I try to query from the table
select 'col1' from "my_table"
I am getting the following error , not really sure what permission is missing
Your query has the following error(s):
Insufficient permissions to execute the query. Principal does not have any privilege on specified resource
If I run the following
select * from "gleif_data_master_csv"
I get the below error
SYNTAX_ERROR: line 1:8: SELECT * not allowed in queries without FROM clause
Any suggestions/ideas why this is breaking ?
Insufficient permissions to execute the query. Principal does not have any privilege on specified resource
This is a Lake Formation permissions error – the table you are querying is part of a catalog managed by Lake Formation. Look in that service for what permissions your user ("principal" in AWS speak) is allowed to do.

AWS Athena select query to fetch error code from status column

AWS Athena trying to run a select query as below to fetch error code from the status column, but getting the below error
The query which I am trying:
select * from s3_accesslog where status = '404'
Error: SYNTAX_ERROR: line 1:78: '=' cannot be applied to integer, varchar(3)
select * from s3_accesslog where status like '%404%'
Error: SYNTAX_ERROR: line 1:71: Left side of LIKE expression must evaluate to a varchar (actual: integer)
Looks like your status codes are stored in the table as integers, if you remove the quotes the query should work.
So try:
select * from s3_accesslog where status = 404

Trying to Fetch only multiple column from select query where status column with error code

I'm trying to fetch only specific columns(uri,hostheader) from an Athena query where the status column is like 404.
When I execute the query I get the output for uri and hostheader unable to fetch the results for status 404 with the below query.
select
uri,
hostheader
from
accesslogs
where
CAST(status AS VARCHAR) like '%404%'
The solution was to not cast as varchar and instead use the native int type
select
uri,
hostheader
from
accesslogs
where
status = 404