I have a cluster that I use, among other things, for reporting via PowerBi. For this I created views to show only the required fields so the queries run faster. If the source table is sorted by date and the view is 'select fields from table;' will it use the date if I query the view using WHERE on that field?
Any recommendations? For better performance!
Thank you!
For better performance in Redshift, It is absolutely important to set SortKey, DistributionKey and Encoding properly. I guess you want to generate date wise report. In that case, the "date" column should be the distribution key. Do not encode the "date" column which means keep the value ENCODING as RAW / NONE.
Then, you can use the "date" column as a COMPOUND sort key. If you have any other column you want to filter with then use that column as the first key and the "date" column as the second key in the SORT key order. Otherwise, you can define the SORT key only using the "date" column.
Related
I have a BigQuery table, partitioned by date (for everyday there is one partition).
I would like to add various columns sometimes populated and sometimes missing and a column for a unique-id.
The data need to be searchable through a unique id. The other use case is to aggregate per column.
This unique id will have a cardinality of millions per day.
I would like to use the unique-id for clustering.
Is there any limitation on this? Anyone has tried it?
It's a valid use case to enable clustering on an id column, the amount of values shouldn't cause any limitations.
My dimension tables contains more rows than my fact table, I would like my dim table fields to show only the values in the fact table when used as a filter in the filter panel.What cleaning/modeling steps are the best to achieve this.
I also know how to write sql if that is an option for an answer.
Rather than use a table for your dimension, use a view that has an inner join to the fact table
I want to display some raw records (i.e. not aggregated) in a chart table.
I don't see a way to control number formats in this chart.
I have tried the following:
When looking at the column definition in the dataset, the format field only applies in case of datetime.
Also if I create metrics, I can only use them in aggreage table.
Creating a calulated column doesn't help either.
As a result of this, here is the kind of table I get:
How should I proceed to solve this?
Use a calculated column and change it to a datatype with precision.
CAST(myRecord as double(10,2))
I want to create a second table from the first table using filters with dates and other variables as follows. How can I create this?
Following is the expected table and original table,
Go to Edit Queries. Lets say our base table is named RawData. Add a blank query and use this expression to copy your RawData table:
=RawData
The new table will be RawDataGrouped. Now select the new table and go to Home > Group By and use the following settings:
The result will be the following table. Note that I didnt use the exactly values you used to keep this sample at a miminum effort:
You also can now create a relationship between this two tables (by the Index column) to use cross filtering between them.
You could show the grouped data and use the relationship to display the RawDate in a subreport (or custom tooltip) for example.
I assume you are looking for a calculated table. Below is the workaround for the same,
In Query Editor you can create a duplicate table of the existing (Original) table and select the Date Filter -> Is Earliest option by clicking right corner of the Date column in new duplicate table. Now your table should contain only the rows which are having minimum date for the column.
Note: This table is dynamic and will give subsequent results based on data changes in the original table, but you to have refresh both the table.
Original Table:
Desired Table:
When I have added new column into it, post to refreshing dataset I have got below result (This implies, it is doing recalculation based on each data change in the original source)
New data entry:
Output:
Working on a way to compare 2 tables in PowerBI.
I'm joining the 2 tables using the primary key and making custom columns that compare if the old and new are equal.
This doesn't seem like the most efficient way of doing things, and I can't even color code the matrix because some values aren't integers.
Any suggestions?
I did a big project like this last year, comparing two versions of a data warehouse (SQL database).
I tackled most of it in the Query Editor (actually using Power Query for Excel, but that's the same as PBI's Query Editor).
My key technique was to first create a Query for each table, and use Unpivot Other Columns on everything apart from the Primary Key columns. This transforms it into rows of Attribute, Value. You can filter Attribute to just the columns you want to compare.
Then in a new Query you can Merge & Expand the "old" and "new" Queries, joining on on the Primary Key columns + the Attribute column. Then add Filter or Add Column steps to get to your final output.