AWS Athena: Large Case Statement Internal Error - amazon-athena

I've got a very simple select query running and working in amazon athena:
SELECT
campaign
FROM table;
However, we've got some data consistency issues and campaign is sometimes and ID and sometimes a name, so we've got to map the values so its all ids.
To do this I added a large case statement which looks something like this
SELECT
CASE WHEN account = 'something' AND campaign = 'some name' THEN 'some id'
CASE WHEN account = 'something' AND campaign = 'some other name' THEN 'some other id'
ELSE campaign END as campaign_id
FROM table;
Except instead of only 2 case statements, theres 2400 of them.
When I run the query it spins for a second and I get an "Internal Error" and thats it.

Related

Column does not exist AWS Timestream Query error

I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/
I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10

Netsuite suiteql how to get all available tables to query?

I am using Postman and Netsuite's SuiteQL to query some tables. I would like to write two queries. One is to return all items (fulfillment items) for a given sales order. Two is to return all sales orders that contain a given item. I am not sure what tables to use.
The sales order I can return from something like this.
"q": "SELECT * FROM transaction WHERE Type = 'SalesOrd' and id = '12345'"
The item I can get from this.
"q": "SELECT * FROM item WHERE id = 1122"
I can join transactions and transactionline for the sale order, but no items.
"q": "SELECT * from transactionline tl join transaction t on tl.transaction = t.id where t.id in ('12345')"
The best reference I have found is the Analytics Browser, https://system.netsuite.com/help/helpcenter/en_US/srbrowser/Browser2021_1/analytics/record/transaction.html, but it does not show relationships like an ERD diagram.
What tables do I need to join to say, given this item id 1122, return me all sales orders (transactions) that have this item?
You are looking for TransactionLine.item. That will allow you to query transaction lines whose item is whatever internal id you specify.
{
"q": "SELECT Transaction.ID FROM Transaction INNER JOIN TransactionLine ON TransactionLine.Transaction = Transaction.ID WHERE type = 'SalesOrd' AND TransactionLine.item = 1122"
}
If you are serious about getting all available tables to query take a look at the metadata catalog. It's not technically meant to be used for learning SuiteQL (supposed to make the normal API Calls easier to navigate), but I've found the catalog endpoints are the same as the SuiteQL tables for the most part.
https://{{YOUR_ACCOUNT_ID}}.suitetalk.api.netsuite.com/services/rest/record/v1/metadata-catalog/
Headers:
Accept application/schema+json
You can review all the available records, fields and joins in the Record Catalog page (Customization > Record Catalog).

Joining based on email within strings on AWS Athena

I have two S3 buckets that I am looking to join on Athena. In the first bucket, I have an email address in a CSV file with an email column.
sample#email.com
In the other bucket, I have a JSON file with nested email addresses used by the client. The way this has been set up in Glue means the data looks like this:
[sample#email.com;email#sample.com;com#email.sample]
I am trying the join the data by finding the email from the first bucket inside of the string from the second bucket. I have tried:
REGEXP_LIKE(lower("emailaddress"), lower("emails"))
with no success, I have also tried:
select "source".*, "target".*
FROM "source"
inner join "target"
on "membername" = "first_name"
and "memberlastname" = "last_name"
and '%'||lower("emailaddress")||'%' like lower("emails")
With no success. I am doing something wrong and it is evading me where I am making this error.
It seems you need to reverse your like arguments:
-- sample data
WITH dataset (id, email) AS (
VALUES (1,'sample#email.com'),
(2,'non-present#email.com')
),
dataset2 (emails) as (
VALUES ('[sample#email.com;email#sample.com;com#email.sample]')
)
-- query
SELECT *
FROM dataset
INNER JOIN dataset2 on
lower(emails) like '%' || lower(email) || '%'
Output:
id
email
emails
1
sample#email.com
[sample#email.com;email#sample.com;com#email.sample]

AWS Athena - How to handle GENERIC_INTERNAL_ERROR?

I have the following query used on one of my datasets in Athena.
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://test123data') AS
SELECT npi,
provider_last_name,
provider_first_name,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017
Unfortunately after I run this query I get an error as below:
GENERIC_INTERNAL_ERROR: Path is not absolute: s3://test123data. You may need to manually clean the data at location 's3://aws-athena-query-results-785609744360-us-east-1/Unsaved/2019/12/15/tables/03d3cedf-0101-43cb-91fd-cc8070db0e37' before retrying. Athena will not delete data in your account.
Can someone walk me through how to handle this?
What do I have to do on the bucket since it is empty?
Thanks in advance!
It appears that this message is referring to the Query result location in which Athena automatically stores the output of your queries.
This is useful for running queries on the results of queries, or for simply having a copy of the query output.
See: Working with Query Results, Output Files, and Query History - Amazon Athena
You can specify a new output location by clicking the settings link in the Athena console and then providing a Query result location path, such as: s3://my-bucket/athena-output/
I'm not sure what is causing your specific error, but make sure you append a trailing / to the location. You might also want to create a new bucket for that output.

FQL: Trouble joining tables where id is in format USERID_LINKID

I am running the following FQL via the php sdk:
{
"posts" : "SELECT post_id, actor_id, message, type FROM stream WHERE type = '80' AND message != '' AND filter_key in (SELECT filter_key FROM stream_filter WHERE uid='588853265' AND type='newsfeed') AND is_hidden = 0 ORDER BY created_time DESC LIMIT 50",
"actors" : "SELECT uid,name FROM user WHERE uid IN (SELECT actor_id FROM #posts)",
"links" : "SELECT owner, url FROM link WHERE link_id IN (SELECT post_id FROM #posts)"
}
I get results as expected for posts and for actors but not for links (results are empty). I believe the problem is that the link table uses a normal ID but my 'posts' from 'stream' show the ID in the format: USERID_LINKID.
I have played around with substr() and strlen() to get it to work with no luck.
There doesn't seem to be a way to get a link_id from a stream post. Your query is getting the id of the post, but not the link.
However, I don't think you need it. Get rid of the #links sub-query and add attachment to the SELECT statement in your #posts sub-query. You can get the link url at attachment->media->url.
One other critique: You might want to add additional sub-queries for #actors. The actor_id returned by the stream table can be a user, page, or group. Your FQL only resolves users. You'll end up with unknown IDs for links posted by groups and pages.