Datastore GQL: Distinct + Count query

Datastore GQL: Distinct + Count query - google-cloud-platform

I have a Google Datastore table on this format:
"KEY","Country","PersonName"
"1","USA","Joe"
"2","USA","Dan"
"3","Canada","Willo"
I want to count how many registered people I have on each country. A combination between count aggregation and distinct.
Expected output:
"USA", 2
"Canada", 1
The solution I'm looking for based on GQL.
Thanks for help.

I believe this will require multiple queries.
The first one is to get the list of Countries[1], and then another one to get count for a country[2].
[1] SELECT DISTINCT ON (Country) Country FROM Test
[2] AGGREGATE COUNT(*) OVER (SELECT * FROM Test WHERE Country="USA")
This I believe is due to the Datastore limitations, as mentioned here.
Please note "Test" is my Google Datastore table ("Kind") name here.

Related

Column does not exist AWS Timestream Query error

I am trying to apply WHERE clause on DIMENSION of the AWS Timestream records. However, I got the error: Column does not exist
Here is my table schema:
The table schema
The table measure
First, I will show all the sample data I put in the table
SELECT username, time, manual_usage
FROM "meter-reading"."meter-metrics"
ORDER BY time DESC
LIMIT 4
The result:
Result
What I wanted to do is to query and filter the records by the Dimension ("username" specifically).
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE measure_name = "OnceADay"
ORDER BY time DESC LIMIT 10
Then I got the Error: Column 'OnceADay' does not exist
I tried to search for any quotas for Dimensions name and check for error in my schema:
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.naming
https://docs.aws.amazon.com/timestream/latest/developerguide/ts-limits.html#limits.system_identifier
But I didn't find that my "username" for the dimension violate any of the above rules.
I checked for some other queries by AWS Blog, the author used the WHERE clause for the Dimension filter normally:
https://aws.amazon.com/blogs/database/effective-queries-for-common-query-patterns-in-amazon-timestream/

I figured it out after I tried with the sample code. Turn out it was a silly mistake I believe.
Using apostrophe (') instead of single quotation marks ("") solved my problem.
SELECT *
FROM "meter-reading"."meter-metrics"
WHERE username = 'OnceADay'
ORDER BY time DESC LIMIT 10

SSAS Tabular/Analysis services Many-to-Many for Multiple curriencies input - Multiple currencies output

I am trying to create on the fly currencies conversion from many currencies input ( InvoicesHeaders rows have differents currencies , so each row have an amount and the currency code for this amount) and many currencies output ( each affiliate want to see figures with it's own currency ).
Therefore I end up in a many to many, join between the InvoiceTable and the currency table. To join them I create in SQL a concatenated field with the day and the currency code.
Then ( reusing tutorial from internet ) I create a calculation doing a lookup from the Invoice to the rate.
Amount adj:=SUMX(Invoices,Invoices[TotalInvoiceAmount]/LOOKUPVALUE(ExchangeRatesPerDay[Rate],ExchangeRatesPerDay[ToCurrencyConcatenatedday112],Invoices[CurrencyCodeConcatenateInvoiceDate112]))
However, when I am trying to use this measure in excel (filtering on one currency at the time of course ) I am getting an error message saying many rows where pass but only one was expected.
From the error message, it looks like the lookup is getting multiple values which is strange because in the excel I am filtering on one currency. Therefore for each combination of day+currencycode there is only one row. I check the SQL using this query
with cte as (
SELECT [RateTypeName]
,[FromCurrency]
,[ToCurrency]
,[StartDate]
,[Rate]
,[EndDate]
,[ConversionFactor]
,[RateTypeDescription]
,[dday]
,[dday112]
,[ToCurrencyConcatenatedday112]
,[FromCurrencyConcatenatedday112]
, count(*) over (partition by [ToCurrencyConcatenatedday112],FromCurrency ) as co
FROM [stg].[ExchangeRatesPerDay]
)
select * from cte where co>1
And it doesn't return any record.
I will appreciate any idea you may have.
Regards
Vincent

I don't understand why my answer have been deleted. Anynay I am posting it back : I found this website https://www.kasperonbi.com/currency-conversion-in-dax-for-power-bi-and-ssas/ that provide an answer. I have been using this logic in production for over a month and it work great.

How to make a existing column "unique" in PowerBI, so I can form a "one-to-many" relationsship?

I have 40 tables. One table has 20 rows, and one of the columns have 1385 distinct values.
I would like to use this in a relationship with another table.
TableName(1385 rows) Column:Name:(1385 distinct values)
But when I try to do this in Powerbi/Manage-Relations, it will only accept the option "Many-to-Many" relationship. It reports that none of the column are "Unique".
Well, the data in the column is unique. So how can I configure this column to be unique so I can use it in a "One-to-Many" relationship"?
Do I have to edit the DAX expression and put the "DISTINCT" keyword in the expression for that column? And How?
Now I have:
}, {"Columnname", Int64.Type}, {

what you can try is to perform remove duplicates in that table(i know its already contains distinct values but you can give it a try)... and/or just load the data again.

Best way would be when you group your data in the query editor. This way your table has only distinct values and you can create your relationship.
In the query designer under Home > Group By you can group after your column.
Example
Table:
Table (2):
Relationship (One to Many):
Result:
I hope this helps.

Parsing DAX Query with Regular expression to get measures and dimensions

I want to create an application that uses Extended Events to analyze the usage of Azure Analysis Services models.
Specifically, I want to know what measures and dimensions the end users are using.
I look at the QueryEnd event and try to parse the TextData field. Depending on the tool used for querying the model I get either MDX or DAX in the TextData.
I think I have managed to parse the MDX with this RegEx: ([[\w ]+].[[\w ]+](?:.(?:Members|[Q\d]))?)
(from this post: Regular expression for extracting element from MDX Query)
Now parsing the DAX is the problem. If I query the model from fx PowerBI I get a DAX like this:
EVALUATE
TOPN(
502,
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Product'[Color], \"IsGrandTotalRowTotal\"),
\"Order_Count\", 'SalesOrderDetail'[Order Count]
),
[IsGrandTotalRowTotal],
0,
[Order_Count],
0,
'Product'[Color],
1
)
ORDER BY
[IsGrandTotalRowTotal] DESC, [Order_Count] DESC, 'Product'[Color]
What I would like to match with the RegEx is:
'Product'[Color] and 'SalesOrderDetail'[Order Count]
And.... how would i know that Order Count is used as a measyre while Color is an attribute of the Product dimension?..... guess I won't?
Thanx a lot
Nicolaj

think I just found a possible solution for parsing both DAX and MDX queries:
([\[\'][\w ]+[\]\']\.?[\[\'][\w ]+[\]\'])(?!.*\1)
This will give me what I need.... without duplicates. Improvements are welcomed :-)

For disambiguating columns from measures, I would query the DMVs for the deployed model to get a list of columns and a list of measures. Then you can just look up your parsed tokens in the two lists:
DMV docs
Column DMV
Measure DMV.
Note that measure names are globally unique among the population of column and measure names, so that is an easy lookup (just throw away the table reference). Column names are only unique within the table they belong to.
It looks like you've already got a regex that's working for you, so run with that.
I'll also note that almost all PBI visuals which are returning anything other than a simple list (e.g. slicers or one column tables will return just a list) will follow the pattern of the query you shared.
Measures should all be in later args to SUMMARIZECOLUMNS. The pattern is:
SUMMARIZECOLUMNS (
<grouping columns, optionally in a ROLLUPADDISSUBTOTAL>,
<filters, defined above in VARs>,
<measures as ("Alias", [Measure]) pairs>
)

Special character to query from latest timestamp sharded table in BigQuery

From
https://cloud.google.com/bigquery/docs/partitioned-tables:
you can shard tables using a time-based naming approach such as [PREFIX]_YYYYMMDD
This enables me to do:
SELECT count(*) FROM `xxx.xxx.xxx_*`
and query across all the shards. Is there a special notation that queries only the latest shard? For example say I had:
xxx_20180726
xxx_20180801
could I do something along the lines of
SELECT count(*) FROM `xxx.xxx.xxx_{{ latest }}`
to query xxx_20180801?
SINGLE QUERY INSPIRED BY Mikhail Berlyant:
SELECT count(*) as c FROM `XXX.PREFIX_*` WHERE _TABLE_SUFFIX IN ( SELECT
SUBSTR(MAX(table_id), LENGTH('PREFIX_') + 2)
FROM
`XXX.__TABLES_SUMMARY__`
WHERE
table_id LIKE 'PREFIX_%')

If you do care about cost (meaning how many tables will be scaned by your query) - the only way to do so is to do in two steps like below
First query
#standardSQL
SELECT SUBSTR(MAX(table_id), LENGTH('PREFIX') + 1)
FROM `xxx.xxx.__TABLES_SUMMARY__`
WHERE table_id LIKE 'PREFIX%'
Second Query
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '<result of first query>'
so, if result of first query is 20180801 so, second query will obviously look like below
#standardSQL
SELECT COUNT(*)
FROM `xxx.xxx.PREFIX_*`
WHERE _TABLE_SUFFIX = '20180801'
If you don't care about cost but rather need just result - you can easily combine above two queries into one - but - again - remember - even though result will be out of last table - cost will be as you query all table that match xxx.xxx.PREFIX_*
Forgot to mention (even though it should be obvious): of course when you have only COUNT(1) in your SELECT - the cost will be 0(zero) for both options - but in reality - most likely you will have something more valuable than just count(1)

I know this is a kind of an old thread but I was surprised why no one offers an answer using Variables.
"Héctor Neri" already mentioned this in the comments but I thought might be better to have an actual answer with a sample code posted.
#standardSQL
DECLARE SHARD_DATE STRING;
SET SHARD_DATE=(
SELECT MAX(REPLACE(table_name,'{TABLE}_',''))
FROM `{PRJ}.{DATASET}.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE '{TABLE}_20%'
);
SELECT * FROM `{PRJ}.{DATASET}.{TABLE}_*`
WHERE _TABLE_SUFFIX = SHARD_DATE
Make sure to replace {PRJ}, {DATASET}, and {TABLE} values with your table location.
If you run this on BigQuery Web UI, you will see this message:
WARNING: Could not compute bytes processed estimate for script.
But you can see that variable properly reduce the table scan to the latest partition and does not cause any extra cost after running the script.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Datastore GQL: Distinct + Count query - google-cloud-platform

Related

Column does not exist AWS Timestream Query error

SSAS Tabular/Analysis services Many-to-Many for Multiple curriencies input - Multiple currencies output

How to make a existing column "unique" in PowerBI, so I can form a "one-to-many" relationsship?

Parsing DAX Query with Regular expression to get measures and dimensions

Special character to query from latest timestamp sharded table in BigQuery

Categories

Resources