"Where clause" is not working in AWS Athena - amazon-web-services

I used AWS Glue Console to create a table from S3 bucket in Athena. You can see a relevant part on the screenshot above. I obfuscated column name, so assume the column name is "a test column". I would like to select the records with value D in that column. The query I tried to run is:
SELECT
*
FROM
table
WHERE
"a test column" = "D"
Nothing is returned. I also tried to use IS instead of =, as well as to surround D with single quotes instead of double quotes within the WHERE clause:
-- Tried this
WHERE
"a test column" = 'D'
-- Tried this
WHERE
"a test column" IS "D"
-- Tried this
WHERE
"a test column" IS 'D'
Nothing works. Can someone help? Thank you.
The error message I got is
Mismatched input 'where' expecting (service: amazon athena; status code: 400; error code: invalid request exception; request id: 8f2f7c17-8832-4e34-8fb2-a78855e3c17d)

Problem with the query syntax. Use single quotes (') when you refer to a string values, because double quotes refer to a column name in your table.
SELECT
*
FROM
table
WHERE
"column_name" = 'D'

The unexpected answer (also apologize if I did not say it clearly in the original post) is that, I cannot add "limit 200" in front of the where clause. I have to add it in the end. Hope it helps others.

Related

Column does not exist AWS Timestream Query error

I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/
I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10

DAX problem, filter/relation is ignored when I use IF in the RETURN clause

I'm getting an unexpected result when I use an IF in the RETURN clause of a DAX expression. If I don't use the IF, but instead just a variable, then the result is ok.
I've created a test scenario to explain my problem:
I have two test tables:
Table: "Test Object"
Table: "Test Group"
These have a unidirectional relation on "Group code"
I have created a measure "Test measure":
This gives the correct result:
I have set a page filter to only show Group Code "G01".
This all works ok up to this point.
But it goes wrong when I use an IF function:
I then get the following (incorrect) result. Apparently the relation and/or page filter seems to be ignored now:
NB: The result is the same regardless of from which table I use the "Group code" field.
What am I missing here?
I've created a PBIX file that shows the problem:
https://www.dropbox.com/s/76ld1kv503ul6nm/DAX%20problem%20with%20IF.pbix?dl=0
This is called "Auto-Exist" in PBI:
https://www.sqlbi.com/articles/understanding-dax-auto-exist/
If you look closer to the results, you'll notice that your report shows all possible combinations between Group Codes and Object Codes.
This is happening whenever you use a combination of fields from the different tables in a report: PBI first creates a cross-join between these fields, and then eliminates those combinations that result in blanks, so you only see meaningful combinations.
However, you IF statement overrides this logic - you are returning a result always, even if a combination is blank (Blank < 40 test returns "low end" because blank is treated as zero).
To fix it, calculate results only if the variable is not blank, i.e:
Price category =
var lowestPrice = MIN(Object[Price])
var result = IF( NOT ISBLANK(lowestPrice), IF(lowestPrice < 40, "Low end", "High end"))
Return result
You will get:
P.S. Page filter is irrelevant here, it simply filters the table after it's calculated.

Extracting String Portions in SQL using Regular expressions

Hi All,
I have a query related to Regular expressions in SQL.
I have a case where a portion of string has to be extracted from a column. The portion of that column will be prefixed with my column A. Please see the screenshot for the sample data. I have also added the output expected in a separate column (highlighted in green).
Scenarios:
Now if a column value has more than 1 unique number then that has to be shown up with Null
Eg: To verify CAN06010025, CAN06010026 & CAN06010030 after the approval.
In the above string I have more than 1 number(bold portion)
and this case should be ignored (meaning it has to give me Null Value).
If there is only one number and if it is repetitive then I have to consider that case and extract the portion of String..
Eg: Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
In this example, the portion I wanted to extract is repetitive and I am looking to extract the highlighted portion alone.
The same applies to the other cases as well.
I tried with the below sql. The challenge is my Col A can also be present in Col B (Line 2 in screenshot) and this code is considering my Col A portion when I count with REGEXP_COUNT function and is giving me the value as Null. My expectation is to extract that USA12S001 portion from the column.
Could you please help in achieving this where the above two conditions satisfies.
SQL:
SELECT
ColA,
ColB,
case when REGEXP_COUNT(ColB,ColA) >2 THEN NULL
ELSE REPLACE(REPLACE(concat(regexp_substr(ColB,ColA||'([[:alnum:]]+\.?)'),
nvl(regexp_substr(ColB,ColA||'(\-[[:digit:]]+)'),
regexp_substr(ColB,ColA||'([[:space:]]\-[[:space:]][[:digit:]]+)'))),
' ',''),'.','')
END AS Result
FROM
table
Test Data:
Col A
CAN06
USA12
USA27
HUN04
CAN05
USA24
CAN06
Col B
to verify CAN06010025, CAN06010026 & CAN06010030 after the approval
Project USA12: Id USA12S001: Contact required -USA12S001- form to be updated
Project USA27: Id: USA27S001: Prod
To review id HUN04S002-HUN04S004 after the due date.
ID: CAN05S005 with the details as CAN05S005 are completed.
Project USA24: Id: USA24S009: Data Issue
"Project: Subject CAN06S009: V2 & V3- Id CAN06S010: V1"
If the REGEXP_COUNT is the only issue, then the answer is simple: change
case when REGEXP_COUNT(ColB,ColA) >2
to:
case when REGEXP_COUNT(ColB,ColA || '[[:alnum:]]') >2

Adding LIMIT fixes "Invalid digit, Value N" error in Amazon Redshift. Why?

I have a standard listings table on Redshift table with all varchars (due to loading into database)
This query (simplified) gives me error:
with AL as (
select
L.price::int as price,
from listings L
where L.price <> 'NULL'
and L.listing_type <> 'NULL'
)
select price from AL
where price < 800
and the error:
-----------------------------------------------
error: Invalid digit, Value 'N', Pos 0, Type: Integer
code: 1207
context: NULL
query: 2422868
location: :0
process: query0_24 [pid=0]
-----------------------------------------------
If I remove the where price < 800 condition, the query returns just fine... but I need the where condition to be there.
I've also checked the number validity of the price field and all look good.
After playing around, this actually makes it work, and I can't quite explain why.
with AL as (
select
L.price::int as price,
from listings L
where L.price <> 'NULL'
and L.listing_type <> 'NULL'
limit 10000000000
)
select price from AL
where price < 800
Note that the table has far less records than the number stated in limit.
Can anyone (possibly from the Redshift engineer team) explain why this is the way it is? Possibly something to do with how the query plan being executed and parallelized?
I had query that could be expressed simply as:
SELECT TOP 10 field1, field2
FROM table1
INNER JOIN table2
ON table1.field3::int = table2.field3
ORDER BY table1.field1 DESC
Removing the explicit cast to ::int solved a similar error for me.
Meanwhile, postgresql locally requires the "::int" to work.
For what it's worth, my local postgresql version is
PostgreSQL 9.6.4 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
Loading CSV data with NaN into AWS Redshift
I found this post while searching google but the above link had what I needed. I was importing a numeric column with value NaN, which is unsupported by redshift numeric.

How to read columns with spaces from coldfusion queries?

I am reading data from a spreadsheet. One of the column in the spreadsheet contains spaces.
For Example, Columns names are [first name,last name,roll].
I am getting a qryObj after reading the spreadsheet.
Now when i am trying to read first name from the query
<cfquery dbtype="query" name="getName">
SELECT [first name]
FROM qryObj
</cfquery>
It is throwing db error. I have tried with ['first name'] also but still it is throwing error.
The error is:
Query Of Queries syntax error.
Encountered "[. Incorrect Select List, Incorrect select column
I did crazy stuff like googling to see what people had done in other situations, and tried various SQL approaches to escaping non-standard column names (back ticks, square barackets, double quotes, combos thereof) , and drew a blank. So I agree with #da_didi that QoQ/IMQ does not cater for this. You should raise a ticket in the Adobe bug tracker.
You could do SELECT *, which removes the need to reference the column name. Or you could serialize the query, use a string replace to rename the column, deserialise it again then QoQ on the revised name. I'd only do this with a small amount of data though.
Or you could push back on the owbner of the XLS file and say "no can do unless you revise your column names".
You could also perhaps suppress the column names as they stand from the XLS file using excludeHeaderRow,and then specify your own columns names. How did I find out one could do that? By RTFMing the <cfspreadsheet> docs.
Thats easy:
Query
Select [FIRST NAME]
in output loop of query
["FIRST NAME"]
Try this - set a variable works for me
<cfset first_name = #spreadsheetData['first name'][CurrentRow]#>
You cannot. Best practices: I always replace all spaces with an underline.
Simple. Just alias the select. Select [FIRST NAME] as FIRSTNAME from qryObj