Is there a way to automatically add a large amount of columns to a QuickSight table without the manual procedure of draging and dropping them?
For instance in the picture below: I would like to add all numbered columns to the table.
The QuickSight support said that this is not possible for now. They have received this question multiple times and it will hopefully be possible in the future.
Related
I have a BigQuery table, partitioned by date (for everyday there is one partition).
I would like to add various columns sometimes populated and sometimes missing and a column for a unique-id.
The data need to be searchable through a unique id. The other use case is to aggregate per column.
This unique id will have a cardinality of millions per day.
I would like to use the unique-id for clustering.
Is there any limitation on this? Anyone has tried it?
It's a valid use case to enable clustering on an id column, the amount of values shouldn't cause any limitations.
My goal is to quickly & dynamically visualize a big data set (> 500 M rows) using QuickSight. To achieve quick query times, it's necessary to load all of the data into SPICE. However, AWS currently has a hard limit for the maximum number of rows that can be imported into SPICE for a single data set, which is 500 M rows. I currently don't see any option that could be used to visualize all of the data. Here are things that I already considered:
Splitting the full data set into individual QS datasets: the problem with this approach is that QuickSight requires that each visual has a single dataset as an input, so values from multiple datasets cannot be shown in the same visual. I'm aware that multiple datasets can be used within one dashboard but that would not suit the use-case of having a single plot visualizing the data.
Pivoting the table: the input table has a lot of rows, so changing the format from long to wide table would circumvent the SPICE row limitations. However, QuickSight doesn't seem to support using an array of columns a y-values to be plotted.
Creating a dataset per visualization: Certain visualizations can theoretically be defined using fewer values than in the original data set. For example, to create a box plot over a set of groups, we mainly need the quartile values for each of the groups to be plotted, rather than the full data set, which would allow us to be below the SPICE limitation. However, QuickSight doesn't allow creating custom plots such as creation of a box plot where quartiles are already pre-processed.
Currently, the only viable approach I see is to create a dashboard per user, since most users would only be interested in a subset of rows from the full data set.
Irrespective of the approach taken, unfortunately, this limitation forces us to do some compromises.
Depending on the number of users, creating a dataset per user might become a headache to manage. So, I would suggest that if possible you use datasets that capture groups of users (example by user group, or user's country).
Pivoting the table might make it harder to build some visuals. As you said, if you pivot multiple values from different rows into an array field, then you would not be able to extract these easily in analyses (you could use string functions and to to extract them that way but there are limitations around this approach too).
Also creating a dataset per visualisation has maintenance overhead in that you would need to update and re-ingest the dataset most times when changing visualisations.
Some other approaches you might consider:
Aggregate multiple rows together Example if your dataset has multiple rows for each user within the same minute, you could aggregate all these into 1 row and summing up values within that minute. The aggregation period should be as large as possible but keep in mind that this will affect the time granularity in your analyses/dashboards
Prune old data If you are more interested in recent data, then you could add a filter to only keep say 1 month of activity. You could then have other non-SPICE (Direct Query) datasets that do not have this restriction but reports would be slower on older data.
Cache in an external database You could load your data into some data warehousing database (such as AWS Redshift) and then not use SPICE in QuickSight. Of course, this will probably get more expensive.
I have a report in Power BI that cannot refresh because the data from the table is too large:
The amount of data on the gateway client has exceeded the limit for a single table. Please consider reducing the use of highly repetitive strings values through normalized keys, removing unused columns, or upgrading to Power BI Premium
I have tried to shrink the columns used in the data set to the best of my ability, but it is still too large to refresh. I did a test where, instead of using just a single query to retrieve the data, I made two queries that split the columns roughly half and half and then link them back together in Power BI using their ID column. It looked to me that the test data refresh started working upon splitting up the table's data into two separate queries.
Please correct me if there is a better method to trim the data down to allow the data set to refresh, but for now this is the best solution I see. What I am wondering is, since now my data is split into two separate queries, what is the best way to adapt the already existing visualizations I have that are linked up to the full, non-refreshable query to the split, refreshable queries? It looks to me like I would have to recreate the visuals from scratch, but if there is a way to simply do a mass replace of the fields that would save so much time. The split queries I created both have the same fields as the non-split query.
so, I got 3 xlsx full of data already treated, so I pretty much just got to display the data using the graphs. The problem seems to be, that Powerbi aggregates all numeric data (using: count, sum, etc.) In their community they suggest to create new measures, the thing is, in that case I HAVE TO CREATE A LOT OF MEASURES...Also, I tried to convert the data to text and even so, Powerbi counts it!!!
any help, pls?
There are several ways to tackle this:
When you pull a field into the field well for a visualisation, you can click the drop down in the field well and select "Don't summarize"
in the data model, select the column and on the ribbon select "don't summarize" as the summarization option in the Properties group.
The screenshot shows the field well option on the left and the data model options on the right, one for a numeric and one for a text field.
And, yes, you never want to use the implicit measures, i.e. the automatic calculations that Power BI creates. If you want to keep on top of what is being calculated, create your own measures, and yes, there will be many.
Edit: If by "aggregating" you are referring to the fact that text values will be grouped in a table (you don't see any duplicates), then you need to add a column with unique values to the table so all the duplicates of the text values show up. This can be done in the data source by adding an Index column, then using that Index column in the table and setting it to a very narrow with to make it invisible.
In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.