How to get length of a string column in Athena? - amazon-web-services

How to get length of a VARCHAR or STRING column in AWS Athena? The AWS Documentation doesn't give any information on a length function, which works equivalent to the LEN() function in Redshift.

The Presto's length() functions works for getting the size of a STRING/VARCHAR column.
Usage : length(column_name)

Related

Athena shows no value against boolean column, table created using glue crawler

I am using aws glue csv crawler to crawl s3 directory containing csv files. Crawler works fine in the sense that it creates the schema with correct data types for each column, however, when I query data from athena, it doesn't show value under boolean type column.
A csv looks like this:
"val","ts","cond"
"1.2841974","15/05/2017 15:31:59","True"
"0.556974","15/05/2017 15:40:59","True"
"1.654111","15/05/2017 15:41:59","True"
And the table created by crawler is:
Column name Data type
val string
ts string
cond boolean
However, when I run say select * from <table_name> limit 10 it returns:
val ts cond
1 "1.2841974" "15/05/2017 15:31:59"
2 "0.556974" "15/05/2017 15:40:59"
3 "1.654111" "15/05/2017 15:41:59"
Does any one has any idea what might be the reason?
I forgot to add, if I change the data type of cond column to string, it does show data as string e.g. "True" or "False"
I don't know why Glue classifies the cond column as boolean, because Athena will not understand that value as a boolean. I think this is a bug in Glue, or an artefact of it not targeting Athena exclusively. Athena expects boolean values to be either true or false. I don't remember if that includes different capitalizations of the strings or not, but either way yours will fail because they are quoted. The actual bug is that Glue has not configured your table so that it strips the quotes from the strings, and therefore Athena sees a boolean column containing "True" with quotes and all, and that is not a supported boolean value. Instead you get NULL values.
You could try changing your tables to use the OpenCSVSerDe instead, it supports quoted values.
It's surprising that Glue continues to stumble on basic things like this. Glue is unfortunately rarely worth the effort over writing some basic scripts yourself.

Converting a datastore column from string to timestamp

I have a datastore entity which has a column name timestamp. It was supposed to be a timestamp type but it is a string type as of now. Now, this column has values in 2 formats. YYYY-MM-DDTHH:MM:SSZ, YYYY-MM-DDTHH:MM:SS-offset_hours.
In our code, we are doing sorting on timestamp. Which is essentially sorting the "string". Now the question is, how can i convert this "string" column into "Timestamp".
Do i have to do any conversion for existing values which are in different format? How can i do it in terraform?
Google datastore has no notion of schema migrations, you're going to have to write a taskqueue job to do it.
The proper way would be to create a new column called timestamp_2 and backfill it. Here is an article GCP wrote:
https://cloud.google.com/appengine/articles/update_schema

How to do MD5 hashing of as string in athena?

MD5 hashing function in athena is not working for string. However, athena's document shows that it does : https://docs.aws.amazon.com/redshift/latest/dg/r_MD5.html
Not sure what I am missing here. If I transform varchar to varbinary then the hash that gets generated are not correct.
Getting this error :
SYNTAX_ERROR: line 1:8: Unexpected parameters (varchar(15)) for function md5. Expected: md5(varbinary)
This query ran against the "temp" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: dd959e8a-7fa4-4170-8895-ce7cf58be6ea.```
The md5 function in Athena/Presto takes binary input. You can convert a string to a varbinary using the to_utf8 function:
SELECT md5(to_utf8('hello world'))

Retrieving NULL values from S3 while Select query AWS Redshift spectrum

I am able to unload data to S3, and query the results with Spectrum, but NOT when using the delimiter defined below. This is our standard delimiter that works with all of our processing today related to Redshift COPY and UNLOAD commands, so I believe the UNLOAD is working fine. But somewhere between the table definition and the SQL query to retrieve the data, this is not working. We just receive NULLS for all of the fields. Can you look at our example below in order to determine next steps.
unload ('select * from db.test')
to 's3://awsbucketname/ap_cards/'
iam_role 'arn:aws:iam::123456789101:role/redshiftaccess'
delimiter '\325'
manifest;
CREATE EXTERNAL TABLE db_spectrum.test (
cost_center varchar(100) ,
fleet_service_flag varchar(1)
)
row format delimited
fields terminated by '\325'
stored as textfile
location 's3://awsbucketname/test/';
select * from db_spectrum.test
Got a response from AWS Support center as:
Unfortunately you will need to either process the data externally to change the delimiter or UNLOAD the data again with a different delimiter.
The docs say to Specify a single ASCII character for 'delimiter'.
The ASCII range only goes up to 177 in octal.
We will clarify the docs to note that 177 is the max permissible octal for a delimiter. I can confirm that this is the same in Athena as well.
Thank you for bringing this to our attention.
You might try using Spectrify for this. It automates a lot of the nastiness involved currently with moving redshift table to spectrum.

Redshift copy command failure

I am using Amazon Redshift COPY command to insert new rows into a table.
The copy command fails and an error message coming up:
index "pg_toast_16408_index" is not a btree
I have noticed that the problem occurs because of description field that contains long string. When I try to copy without this field it works!
Does someone know why is that? How can I overcome this issue?
Use the TRUNCATECOLUMNS parameter:
Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.