AWS Athena, Erro when Len of String - amazon-athena

I tried to show the len of one string..
SELECT length(fieldA) FROM "data_prod"."myscore" limit 10;
but I receive that error
Your query has the following error(s):
SYNTAX_ERROR: line 1:8: Unexpected parameters (bigint) for function length. Expected: length(varchar(x)) , length(char(x)) , length(varbinary)
This query ran against the "raw_public_data_prod" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: 92f6401b-108a-4671-951e-3a8e882f3b20.
enter image description here

This is what you are looking for.
SELECT length(cast(fieldA as varchar)) FROM "data_prod"."myscore" limit 10;
length function doesn't work for bigint data type. So you have to cast it as varchar to do so.

Related

Column does not exist AWS Timestream Query error

I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/
I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10

AWS Athena select query to fetch error code from status column

AWS Athena trying to run a select query as below to fetch error code from the status column, but getting the below error
The query which I am trying:
select * from s3_accesslog where status = '404'
Error: SYNTAX_ERROR: line 1:78: '=' cannot be applied to integer, varchar(3)
select * from s3_accesslog where status like '%404%'
Error: SYNTAX_ERROR: line 1:71: Left side of LIKE expression must evaluate to a varchar (actual: integer)
Looks like your status codes are stored in the table as integers, if you remove the quotes the query should work.
So try:
select * from s3_accesslog where status = 404

Adding LIMIT fixes "Invalid digit, Value N" error in Amazon Redshift. Why?

I have a standard listings table on Redshift table with all varchars (due to loading into database)
This query (simplified) gives me error:
with AL as (
select
L.price::int as price,
from listings L
where L.price <> 'NULL'
and L.listing_type <> 'NULL'
)
select price from AL
where price < 800
and the error:
-----------------------------------------------
error: Invalid digit, Value 'N', Pos 0, Type: Integer
code: 1207
context: NULL
query: 2422868
location: :0
process: query0_24 [pid=0]
-----------------------------------------------
If I remove the where price < 800 condition, the query returns just fine... but I need the where condition to be there.
I've also checked the number validity of the price field and all look good.
After playing around, this actually makes it work, and I can't quite explain why.
with AL as (
select
L.price::int as price,
from listings L
where L.price <> 'NULL'
and L.listing_type <> 'NULL'
limit 10000000000
)
select price from AL
where price < 800
Note that the table has far less records than the number stated in limit.
Can anyone (possibly from the Redshift engineer team) explain why this is the way it is? Possibly something to do with how the query plan being executed and parallelized?
I had query that could be expressed simply as:
SELECT TOP 10 field1, field2
FROM table1
INNER JOIN table2
ON table1.field3::int = table2.field3
ORDER BY table1.field1 DESC
Removing the explicit cast to ::int solved a similar error for me.
Meanwhile, postgresql locally requires the "::int" to work.
For what it's worth, my local postgresql version is
PostgreSQL 9.6.4 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
Loading CSV data with NaN into AWS Redshift
I found this post while searching google but the above link had what I needed. I was importing a numeric column with value NaN, which is unsupported by redshift numeric.

substring match in redshift database

I have a redshift table "person" in which a particular column has data something like this
[{"attributeName":"name","attributeMetadata":null,"attributeValue":"KitchenAid - 7-Speed Hand Mixer - White","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"title","attributeMetadata":null,"attributeValue":"KitchenAid","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"address","attributeMetadata":null,"attributeValue":"address","attributeImageType":"PRODUCT","attributeStatusCodes":[]},
{"attributeName":"PIN CODE","attributeMetadata":null,"attributeValue":"32110","attributeImageType":"IMG","attributeStatusCodes":[]}]
I would like to extract only the dictionary/json/substring containing PIN CODE (see below)
{"attributeName":"PIN CODE","attributeMetadata":null,"attributeValue":"32110","attributeImageType":"IMG","attributeStatusCodes":[]}
I tried the following query and it is giving the following error
select distinct regexp_substr(attributes,'.*({.*?"attributeName":"PIN CODE".*?}).*') from person ;
ERROR: Invalid content of repeat range
DETAIL:
-----------------------------------------------
error: Invalid content of repeat range
code: 8002
context: T_regexp_init
query: 528401
location: funcs_expr.cpp:130
process: query2_40 [pid=12603]
-----------------------------------------------
I guess the problem is occurring because of multiple attributeName in a single column. Is their a way to achieve the desired result.
I am not sure if I understood you correctly, but you can try to use LIKE:
select * from person where attributes LIKE '%"attributeName":"PIN CODE"%';

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.
The data in protoPayload.resource is the page requested with the querystring paramters included.
The contents of protoPayload.resource looks like the following examples:
/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56
I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.
SELECT
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs
What I would like is just the values from the querystring device_ID such as:
123456
78ec9b4a56
Assuming you have just 1 query string per record then you can do this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable
The part within the parentheses will be captured and returned in the result.
If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable
One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.
select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select
'/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
as protoPayload.resource))