I'm currently using Azure Data Factory to move over data from an Azure SQL database to an Azure DW instance.
This works fine for one table, but I have a lot of tables I'd like to move over. Using Azure Data Facory, it looks like I need to create a set of Source/Sink datasets and pipelines for every table in the database.
Is there a way to move multiple tables across without have to set up each table in the manner described above?
The copy operation allows you to select multiple tables to move in a single pipeline. From the Azure SQL Data Warehouse portal you can follow this process to setup a multi-table pipeline:
Click on the Load Data button
Select Azure Data Factory
Create a new data factory or use an existing one - ensure that the Load Data select is chosen
Select the Run once now option
Choose your Azure SQL Database source and enter the credentials
On the Select Tables screen, select multiple tables
Continue the Pipeline, save and execute
Related
I am trying to create PowerBI Datamart from Azure Analyis service. There is a datamodel available in the Azure Analysis Service and I can connect using URL and Database Name. The datamodel has ~100 tables present in it and relationship also setup. So my question is, if I want to create a PowerBI datamart from the Azure Analyis service datamode, I need to do the Get Data option of PowerBI datamart and connect to Azure Analyis service, select table, select fields 100 time for getting all the tables of Azure Analyis service datamode into my PowerBI datamart? Is there any import function available where I can import all the tables in a single time?
Why do you want to copy data from AAS into a database?
The reason you find it difficult is that it's an odd thing to do. The query designer for AAS/SSAS generates MDX queries which are indented to run aggregate queries that return a handful of rows, and are wholly unsuitable for extracting whole tables. If you try, the queries will just run forever and fail.
It's possible to extract data from AAS/SSAS tabular models, but you must use DAX not MDX, and so you need to the Power Query or "Transform Data" window, and use the advanced editor.
Each query to load a table should look like this, eg to load the 'Customer' table:
let
Dax = "evaluate Customer",
Source = AnalysisServices.Database("asazure://southcentralus.asazure.windows.net/myserver", "mydatabase", [Query=Dax])
in
Source
I'm trying to ingest data from different tables with in same database using Data fusion Multiple database tables plugin to bigquery tables using multiple big query tables sink. I write 3 different custom SQL and add them inside the plugin section which is under "Data Section Mode" > "Custom SQL Statements".
The problem is When I preview or deploy and run the pipeline I get the error "BigQuery Multi Table has no outputs. Please check that the sink calls addOutput at some point."
What I try to figure out this problem;
Run custom SQL on database and worked properly.
Create pipelines that are specific for custom SQLs but it's like 1 table ingestion from sql server to bigquery table as sink. it worked properly.
Try different Data Section Mode under multiple database tables plugin that is Table Allow List , works but it's just insert all data with no option to transform any column or filtering. Did that one to see if plugin can reach the database and able to read data ,it can read.
Data Pipeline - Multiple Database Tables Plugin Config - 1
Data Pipeline - Multiple Database Tables Plugin Config - 2
As a conclusion I would like to ingest data from one database with multiple tables with in one data pipeline. If possible I would like to do it with writing custom sqls for each tables.
Open for any advice and try.
Thank you.
This is what I am trying to do: I have various SQL server databases with data. I created views in all of them. All views will need to be imported, and I specify their relationships. I want this to be refreshed nightly. I want to build various reports of the same data source.
Do I have to use a PowerBI desktop application to import data into PowerBI Report Service? [I have done this so far, but then can create new reports in the cloud on existing data. It would make sense to connect directly from PowerBI report service to my SQL servers.]
Once I uploaded data using a desktop application (as I have done so far), how can I view the data model in the report service once it is uploaded in the cloud?
In order to get routinely refreshed data I need to setup a gateway. Is the local PowerBI desktop application still involved in this process, or could I [in theory] delete the local desktop application that pushed the data in initially?
For your questions:
You have two options, use PBI Desktop to connect to the data using import/direct query, then load it to the service. You can use dataflows to create an import based on your views, but you will then need to create reports from those. Using dataflows, you'll have to set up a refresh schedule, then for the dataset(s) built on top of those, you'll have to set another refresh schedule.
You will be limited to the dataset sizes of 1GB for the workspace if importing data. You cannot use direct query on dataflows (unless you have enhanced compute with PBI premium). Once the dataset is loaded, you can then create new reports in the service or via desktop on top of that dataset. If possible it is recommended to use direct query.
To see the data model, you can use desktop to connect to PBI Service Dataset. This will connect in 'Live Connection' mode, and will be limited to that one dataset, you can't add others to it, Excel, CSV, SQL etc. You can also use Analyse in Excel, a plugin for Excel, that can connect to the data model. You can create new reports in the service for existing data models as well.
When creating the report in PBI Desktop it does not use the Gateway, you connect to your data sources as normal, then once you load the dataset to Power BI it will match the data sources in the file to the ones set up in the Gateway Admin settings. So you will still need PBI Desktop to create reports, but the gateway is there for the refreshing. The Desktop is not used in the process for refreshing. You could delete the workbook or application, but if you have to make changes, what will you refer to? (You could download a copy of the report from the service).+ It is easier to make changes in the desktop app, then the service, as there is a feature difference between dataset creation in the desktop vs service.
Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.
Is there an option to restore the deleted database in SQL DWH at a later time(more than a year )?
The documentation clearly indicates that when an Azure SQL Data Warehouse is dropped it keeps the final snapshot for seven days:
When you drop a data warehouse, SQL Data Warehouse creates a final
snapshot and saves it for seven days. You can restore the data
warehouse to the final restore point created at deletion.
The same article also mentions the fact you can vote for this feature here:
https://feedback.azure.com/forums/307516-sql-data-warehouse/suggestions/35114410-user-defined-retention-periods-for-restore-points
Even if you could do this, you are basically leaving it up to someone else to be in charge of your warehouse backups. What you could do instead is take control:
Store your Azure SQL Data Warehouse schema in source code control (eg git, Azure DevOps formerly VSTS, etc). If it isn't there already you can reverse engineer the schema using SQL Server Management Studio (SSMS) versions 17.x onwards or even use the SSDT preview feature
Export your data to Data Lake or Azure Blob Storage using CREATE EXTERNAL TABLE AS SELECT (CETAS). This will export your data as flat files to storage where it won't be deleted. Alternately use Azure Data Factory to export the data and zip it up to save space.
When you need to recreate the warehouse, simply redeploy the schema from source code control and redeploy the data, eg via CTAS in to staging tables, or use Azure Data Factory to re-import. If you saved your external tables in the schema you save to source code control then it will just be there when you redeploy. INSERT back in to the main tables from the external tables.
In this way you are in charge of your warehouse schema and your data to be recreated at any point you require, whether it be a day, a month or years.
A simple diagram of the proposed design: