Is there a way to pass data source connection string as a parameter to power bi embedded? - powerbi

I have a pbix file that takes an Azure Storage account as a parameter and reads data from there accordingly. The next step is to be able to embed this powerbi dashboard on a webpage and let the end user specify the storage account. I see a lot of questions and answers surrounding passing in filter query parameters--this is different, we're trying to read from a completely different data source and not filtering on a static data source.
Another way to ask this question is: is there a way to embed powerbi template files, if not, is there a feature request somewhere we can upvote?

The short answer is no.
There is a reason to use filters in this case instead of parameters. Parameters are something that is part of the report itself. Each users that looks at your reports will get the same parameter values as the others. If one of them changes some parameter, this will affect all other users. Filters on the other hand, is something local for your session. You can filter the report the way you like, and this will not affect other users experience in any way.
You can't embed templates, because template is simply a state of the report on the disk. When you open it, it's not a template anymore, but becomes a report.
You can either combine the data from all of your data sources in a single report, adding one more column to indicate from where this data comes from, and then filter on this new column. Or create/modify ETL process (for example dataflows can be used for this) to combine these data sources into a single one.

Related

What is AWS S3 dataset?

Looking at documentation of awswrangler.s3.to_csv or awswrangler.s3.to_parquet, there is a dataset parameter.
From testing, it looks like setting dataset=True allows, among other things, to append new data to an already existing set. It also looks like when dataset=True, I can't specify the file name and AWS autogenerates the names for the files which are added to the specified path.
Apart from that, I can't find more information on what dataset means. Is it just referring to the general concept or is there a specific meaning within the context of AWS? What exactly is dataset and when should it be set to True?
The dataset=True option allows you to store the entire dataset, including all metadata, indexes, etc.
The dataset parameter documentation:
dataset (bool) – If True store as a dataset instead of ordinary file(s) If True, enable all follow arguments: partition_cols, mode, database, table, description, parameters, columns_comments, concurrent_partitioning, catalog_versioning, projection_enabled, projection_types, projection_ranges, projection_values, projection_intervals, projection_digits, catalog_id, schema_evolution.
Note all those extra things that get saved when you save a dataset. All that information, like columns_comments, concurrent_partitioning, projection_values, will be lost when you save to CSV or Parquet. But on the other hand, those values are probably only useful if you plan to do further manipulation of the data via awswrangler/pandas at some later date.
Also note that if you set dataset=True you have to give it a file name prefix instead of a single file name, because the output generated will be spread across multiple files.
If you want to use the data in any other tool besides Pandas, such as loading the CSV into Excel, then you most likely want to set dataset=False and output to a single file.

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

Oracle APEX - Reusable Pages?

We have some tables in our database that all have the same attributes but the table is named differently for each. I'm not sure of the Architect's original intent in creating them in this way, but this is what I have to work with.
My question for all the expert Oracle APEX developers: is there away to create a reusable page that I can pass the table name to and that table name would be used in the reporting region and DML processing of that page?
I've read up on templates and plugins and don't see a path forward with those options. Of course, I'm new to webdevelopment, so forgive my ignorance.
We are using version 18.2.
Thanks,
Brian
For reporting purposes, you could use a source which is a function that returns a query (i.e. a SELECT statement). Doing so, you'd dynamically decide which table to select from.
However, DML isn't that simple. Instead of default row processing, you should write your own process(es) so that you'd insert/update/delete rows in the right table. I've never done that, but I'd say that it is possible. Basically, you'd keep all logic in the database (for example, a package) and call those procedures from your Apex application.
You could have multiple regions on one page; one region per table. Then use dynamic actions to show/hide the regions and run the select query based on a table name selected by the user.
Select table name from a dropdown or list
Show the region that matches the table name (dynamic action)
Hide the any other regions that are visible (dynamic action)
Refresh the selected region so the data loads (dynamic action)
If that idea works let me know and I can provide a bit more guidance.
I never tried it with reports, but would it work to put all three reports in a single page, and set them via an Item to have Server-Side Conditions that decide what gets shown in the page? You'd likely need separate items with a determined value for the page to recognize and display.
I know I did that to set buttons such as Delete, Save and Create dynamically, rather than creating two or more separate pages for handling editing of certain information. In this case it regarded which buttons to shown based on a reports' primary key being sent to said "Edit" page. If the value was empty, it meant you wanted to create a new record (also because the create button/link sent no PK). If said PK was sent (via a edit button/link), then you'd have the page recognize it and hide the create button and rather show the edit button.

DataPrep: access to source filename

Is there a way to create a column with filename of the source that created each row ?
Use-Case: I would like to track which file in a GCS bucket resulted in the creation of which row in the resulting dataset. I would like a scheduled transformation of the files contained in a specific GCS bucket.
I've looked at the "metadata article" on GCP but it is pretty useless for my use-case.
UPDATED: I have opened a feature request with Google.
While they haven't closed that issue yet, this was part of the update last week.
There's now a source metadata reference called $filepath—which, as you would expect, stores the local path to the file in Cloud Storage (starting at the top-level bucket). You can use this in formulas or add it to a new formula column and then do anything you want in additional recipe steps.
There are some caveats, such as it not returning a value for BigQuery sources and not persisting through pivot, join, or unnest . . . but it covers the vast majority of use cases handily, and in other cases you just need to materialize it before some of those destructive transforms.
NOTE: If your data source sample was created before this feature, you'll need to generate a new sample in order to see it in the interface (instead of just NULL values).
Full notes for these metadata fields are available here: https://cloud.google.com/dataprep/docs/html/Source-Metadata-References_136155148

How do I manage SAS formats from various sources?

I am wondering how I can efficiently manage formats in SAS for a reporting office that takes in data from various sources, some with proper lookup tables / metadata, and some without.
For data sources that have proper metadata, joining tables for value descriptions works fine, but when metadata doesn't exist and needs to be maintained separately, how should that be done? Some straightforward examples/ideas:
Plain .sas files with a native PROC FORMAT step that is maintained separately.
External files (e.g., Excel, CSV) that are maintained separately and imported into SAS to create a format library.
Database tables maintained separately that can be read from to create a format library.
In addition to just the formatted values, managing value changes (i.e., effective dates for certian values) is also a concern.
Any help in conventions or standards that work well for this type of task is greatly appreciated.
I'm not sure there's a single best solution here - it depends largely on your environment, your users, etc.
If you have fairly naive users, then I'd definitely recommend a single complete repository if possible; whether that is a .sas7bcat file if you are using a single SAS version/OS/bitness, or a ready-made table/dataset to input into PROC FORMAT (and a .sas file included in their autoexec to do the importing). The biggest drawbacks to this are that you have to manage it actively (you cannot allow users to write their own formats to the master format dataset, for example, as they may overwrite other ones), and that there will be additional work to ensure format names do not conflict - YNF. might be 1=YES 2=NO or 1=YES 0=NO or something else. This also doesn't allow you to very easily handle effective dates; but it's possible this is better for your users (and then just handle the documentation separately).
If you have more advanced users, then you might consider a table/dataset that is more relational in nature. A hybrid approach might include a dataset with columns:
Dataset Name (qualified as needed to ensure uniqueness)
Format Name
Start
Label
Other elements (Type, HLO, etc.)
Effective date
That would allow users to make their own modifications (assuming you trust them enough to add dataset name properly, anyway - or set up a stored proc to do the adding from a temp table that checked for conflicts) and allow you to handle format names that conflicted. You'd still have to have a way for the user to handle using multiple datasets, if that's necessary (such as by adding some unique element to the format name by default, like 'dataset ID').
In my mind, however, the best option is using a data dictionary to handle the metadata, which combines self-documentation with metadata management. Similar to above, you have a table with dataset and format elements, but add columns for descriptive text (question description, for example) and other useful information, depending on your use cases. This can be maintained in a database table or dataset, or perhaps more usefully in an excel or similar document that can be shared with non-programmers and easily edited. I use this method for several projects, and it has paid off by allowing my users to help write the documentation for my code, keeping my programs accurate and up to date, while minimizing back-and-forth discussions of updates. I just import the spreadsheet and run a proc format each time I run my data.
You can then have one spreadsheet per dataset, one tab, or one full spreadsheet with all datasets in them - whichever is easiest to use. This easily handles 'effective date' type issues as well - or even versioning, as that can be handled in the spreadsheet.