I'm a SQL guy getting pulled into setting up an SSAS cube and pulling it into PowerBI. I'm experimenting with MDX, and have run into something I haven't been able to find an answer for.
Scenario
I have a huge table full of data captured from our SQL Server cohort, and I'm using this to inform a decomposition tree showing a measure and then breaking it down by Instance Name, Server Name, Database Name, Client Hostname and Application Name. My starting MDX query is:
SELECT
{[Measures].[CPU]} ON COLUMNS,
{(
[Profile Data].[Instance Name].children
, [Profile Data].[Server Name].children
, [Profile Data].[Database Name].children
, [Profile Data].[Host Name].children
, [Profile Data].[Application Name].children
)} ON ROWS
FROM [MyCube]
Pretty simple, right? And the actual result is what I need it to be - the sum of CPU time in milliseconds over the capture period, aggregated by the five attributes.
The Problem
When running the MDX query in SSMS, I get a nice caption for my CPU column, and nothing for my 5 aggregators. When I run the same query in Power BI I get a slightly more information column name:
[Profile Data].[Instance Name].[Instance Name].[MEMBER_CAPTION]
And so on for the other four. What I'd like is to be able to control that member caption, whether that's in the Dimension/Cube, or just in the MDX query.
What I've Tried
I've tried declaring them as calculated sets like:
SET [Instance Name] AS [Profile Data].[Instance Name].children
And even tried appending:
SET [Instance Name] AS [Profile Data].[Instance Name].children, CAPTION = 'Instance Name'
To no avail. I suspect the actual answer lies somewhere in the SSAS database. Each Attribute has a caption in the default language, but I have no idea how to go about setting a caption for the members/children.
Any direction would be much appreciated.
Related
I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/
I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10
I have a very large (3.5B records) table that I want to update/insert (upsert) using the MERGE statement in BigQuery. The source table is a staging table that contains only the new data, and I need to check if the record with a corresponding ID is in the target table, updating the row if so or inserting if not.
The target table is partitioned by an integer field called IdParent, and the matching is done on IdParent and another integer field called IdChild. My merge statement/script looks like this:
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched and t.IdParent in unnest(parentList) then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched and IdParent in unnest(parentList) then
insert (<all the fields>)
values (<all the fields)
;
So I:
Pull the IdParent list from the staging table to know which partitions to prune
limit the partitions of the target table in the join predicate
also limit the partitions of the target table in the match/not matched conditions
The total size of dataset.Target is ~250GB. If I put this script in my BQ editor and remove all the IdParent in unnest(parentList) then it shows ~250GB to bill in the editor (as expected since there's no partition pruning). If I add the IdParent in unnest(parentList) back in so the script is exactly like you see it above i.e. attempting to partition prune, the editor shows ~97MB to bill. However, when I look at the query results, I see that it actually billed ~180GB:
The target table is also clustered on the two fields being matched, and I'm aware that the benefits of clustering are typically not shown in the editor's estimate. However, my understanding is that that should only make the bytes billed smaller... I can't think of any reason why this would happen.
Is this a BQ bug, or am I just missing something? BigQuery doesn't even say "the script is estimated to process XX MB", it says "This will process XX MB" and then it processes way more.
That's very interesting. What you did seems totally correct.
It seems BQ query planner could interpret your SQL correctly and know the partition pruning is provided, but when it executes. it failed to do so.
try removing t.IdParent in unnest(parentList) from both when matched clauses to see if the issue still happens, that is,
declare parentList array<int64>;
set parentList = array(select distinct IdParent from dataset.Staging);
merge into dataset.Target t
using dataset.Staging s
on
-- target is partitioned by IdParent, do this for partition pruning
t.IdParent in unnest(parentList)
and t.IdParent = s.IdParent
and t.IdChild = s.IdChild
when matched then
update
set t.Column1 = s.Column1,
t.Column2 = s.Column2,
...<more columns>
when not matched then
insert (<all the fields>)
values (<all the fields)
;
It would be a good idea to submit a bug to BigQuery if it couldn't be resolved.
I am running a query that works perfectly on AWS Athena however when I use athena as a data source from quicksight and tries to run query it keeps on giving me QuickSight could not generate any output column after applying transformation error message.
Here is my query:
WITH register as (
select created_at as register_time
, serial_number
, node_name
, node_visible_time_name
from table1
where type = 'register'),
bought as (
select created_at as bought_time
, node_name
, serial_number
from table1
where type= 'bought')
SELECT r.node_name
, r.serial_number
, r.register_time
, b.bought_time
, r.node_visible_time_name
FROM register r
LEFT JOIN bought b
ON r.serial_number = b.serial_number
AND r.node_name = b.node_name
AND b.bought_time between r.deploy_time and date(r.deploy_time + INTERVAL '1' DAY)
LIMIT 11;
I've did some search and found similar question Quicksight custom query postgresql functions In this case adding INTERVAL '1' DAY had the problem. I've tried other alternatives but no luck. Furthermore running query without it still outputs same error message.
No other lines seems to be getting transformed in any other way.
Re-creating dataset and running exact same query works.
I think queries that has been ran on existing dataset transforms the data. Please let me know if anyone knows why this is so.
I want to create a chart from a join sql query between 2 tables in superset.
for example , I go to SQL Lab and execute this query :
select film, count("film") from rental r, payment p where r.rental_id=p.rental_id group by("film") order by count("film") limit 20;
This returns me a result but how to insert in a chart?
How to create chart from SQL query ?
In order to visualize the results from a query executed in SQL Lab, you first need to click on Explore (underneath the Results tab).
Once you are in exploration mode, you can change the "Visualization Type", under "Datasource & Chart Type".
I believe I've done everything right when creating my graphite DB. Grafana can see the data but won't let me select all the fields when I try to "Add Query".
Output from my server shows that the DB is working:
show measurements
name: measurements
name
PORT
select * from "PORT"
name: PORT
time CardNo Counter Nodename PortNo value
---- ------ ------- -------- ------ -----
1511214407000000000 18 bcast_inpackets ALPRGAGQPN2 1 500
However, when I try to "Add Query" in Grafana, I can see PORT in "FROM" (which is what I want), but in the "WHERE" section, when I try to narrow my selection using CardNo, Counter, etc., it appears to behave randomly. If I select CardNo first, it will let me select 18 (see picture below), but then clicking "+" to add another criteria doesn't display the option for say "PortNo" (all I get is an empty dialog box). I can enter the field value manually (eg PortNo) but other users will be plotting graphs and won't necessarily know the underlying schema. Also, if I select Nodename first, then I can select CardNo (weird). I'd like it so the end user can specify ALL the fields (in this case CardNo, Counter, Nodename and PortNo).
My graphite template is this:
"[[graphite]]
# Determines whether the graphite endpoint is enabled.
enabled = true
database = "graphite"
# retention-policy = ""
bind-address = ":2003"
protocol = "tcp"
# consistency-level = "one"
templates = [ "ASR.PORT.* .measurement.Nodename.CardNo.PortNo.Counter"
]
and the data I feed to InfluxDB to test my setup is:
echo "ASR.PORT.ALPRGAGQPN2.18.1.bcast_inpackets 500 `date +%s`" | nc localhost 2003
Firstly, template is better written as:
"ASR.PORT.* .measurement.Nodename.CardNo.PortNo.field"
Which makes bcast_inpackets and any other value after PortNo into a field containing data. This reduces cardinality of series, which improves performance and scalability, by combining all counters into multiple fields on the same series as opposed to separate series with unique tags with their own value fields.
Grafana's influx query builder will filter tag values for the value of the already selected tags. In other words, if you select PortNo=1 and try to select another tag, only tag keys where PortNo=1 will be shown.
If you look at queries Grafana runs in browser, you will see something like show tag keys from PORT where PortNo='1' if PortNo=1 is already selected and different queries for other tags.
This is why you may not see other tags and why which tags you see depends on the tags already selected. This is by design so if you want something different you will need to adjust the schema by, for example, making PortNo and CardNo into fields instead of tags.
You might also be interested in InfluxGraph which can query InfluxDB via Graphite API and also supports same template configuration as InfluxDB.