Column Level Encryption in Azure SQL Datawarehouse - azure-sqldw

Is there an option to do Column level encryption in Azure SQL DW similar to the one in SQL DB(Symmetric, asymmetric or always encrypted). I can see there's transparent data encryption(TDE) but I need column level for PII

We have recently introduce column level security which allows you to hide columns from users. Often customers will create policies that grant access based on role eliminating the security risk.

Azure SQL Data Warehouse does not support column-level encryption at this time. Azure SQL Database and SQL Server 2017 (eg on IaaS) do so if encryption is a requirement for you then consider these alternative options. If your data is not too big, consider Azure SQL DB which also has columnstore.
Alternately, consider encrypting your data before inserting it into your data warehouse, eg write a custom encryption component and host it in Data Factory, or write a custom U-SQL outputter, which outputs encrypted columns in a flat file which could then be picked up by Polybase.

Related

Power BI Embedded Approach for 100s of SQL Targets

I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls

Query a dataset with Power Bi REST APIs using a Service Principal

Our goal is to query a dataset that is published to PowerBI via the REST APIs ( https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/execute-queries ). I'm not talking about the metadata of the dataset, I mean the row-level data contained within the tables in the dataset.
We are going to write a service (probably on prem) that will need to query this data, format it, and push it to another system. From what we understood, we could use a service principal as the identity to query the PowerBI API and retrieve the data.
The very important factor in this, is the service principal should not have access to the row level data of any other dataset. If we have to separate the datasets in a different workspace, that is workable, but not preferred.
Service Principal can be used to access that PBI API. It will have access to the data only if it has authorization on that workspace. So you need to separate workspace in order to manage the access of the dataset.
Sample in postman
From my experience, PowerBI execute DAX query can be quite slow. So do keep that in mind if your integration will require a quick response of PBI API.

Restrict access to a table in SQL Lab in Superset

I have database with many tables. Users have full access to this database and tables to create various charts and dashboards. They use SQL Lab extensively to write custom queries.
However I added a sensitive data in a separate table that needs to be accessed only by few set of users. How can I achieve?
I tried ROW-LEVEL-SECURITY feature.
However, this affects only to Virtual Tables created by Superset. I want to restrict during direct SQL Lab access also.
Possible Solution:
Create ACL at database level and create a seperate connection in Superset.
Cons - This requires a duplicate connection to same database twice.
Ideal solution:
To restrict SQL Lab access to specific tables at superset level. e.g Superset should check User roles and ACLs and decide upon a table can be queried or not.
Is this possible?
Maybe consider implement proper access control to your data with Ranger and from superset impersonate login user.

Issues while working with Amazon Aurora Database

My Requirments:
I want to store real-time events data coming from e-commerce websites into a database
In parallel to storing the data, i want to access the events data from a database
I want to perform some sort of ad-hoc analysis(SQL)
Using some sort of built-in methods(either from Boto3 or JAVA SDK), I want to access the events data
I want to create some sort of Custom-API's to access events data stored in database
I recently came across with Amazon Aurora(mysql) database.
I thought Aurora is one of the good example for my requirements. But when I dig into this Amazon Aurora(mysql), I noticed that we can create a database using AWS-CDK
BUT
1. No equivalent methods to create tables using AWS-CDK/BOTO3
2. No equivalent methods in BOTO3 or JAVA SDK to store/access the database data
Can anyone tell me how i can create a table using(IAC) in AURORA db?
Can anyone tell me how i can store realtime data into AURORA?
Can anyone tell me how i can access realtime data stored in AURORA?
No equivalent methods to create tables using AWS-CDK/BOTO3
This is because only Aurora Serveless can be accessed using Data API, not regular database.
You have to use regular mysql tools (e.g., mysql cli, phpmyadmin, mysql workbench etc) to create tables and populate them.
No equivalent methods in BOTO3 or JAVA SDK to store/access the database data
Same reason and solution as for point 1.
Can anyone tell me how i can create a table using(IAC) in AURORA db?
Terraform has mysql, but its not for tables, but users and databases.
Can anyone tell me how i can store realtime data into AURORA?
There is no out-of-the box solution for that, so you need custom solution for that. Maybe stream data to Kinesis Streams or Firehose, then to lambda and lambda will populate your DB? Seems easiest to implement.
Can anyone tell me how i can access realtime data stored in AURORA?
If you stream data to Kinesis Stream first, you can use Kinesis Analytics to analyze it in real time.
Since many of the above requires custom solutions, other architectures are possible.
Create connectoin manager as
DriverManager.getConnection(
"jdbc:mysql://localhost:3306/$dbName", //replace here with you endpoints & database name
"root",
"admin123"
) then
val stmt: Statement = con.createStatement()
stmt.executeQuery("use productcatalogueinfo;")
Whenever your lambda is triggering then it performs this connection and DDL operations too.

Migrate data to SQL DW for multiple tables

I'm currently using Azure Data Factory to move over data from an Azure SQL database to an Azure DW instance.
This works fine for one table, but I have a lot of tables I'd like to move over. Using Azure Data Facory, it looks like I need to create a set of Source/Sink datasets and pipelines for every table in the database.
Is there a way to move multiple tables across without have to set up each table in the manner described above?
The copy operation allows you to select multiple tables to move in a single pipeline. From the Azure SQL Data Warehouse portal you can follow this process to setup a multi-table pipeline:
Click on the Load Data button
Select Azure Data Factory
Create a new data factory or use an existing one - ensure that the Load Data select is chosen
Select the Run once now option
Choose your Azure SQL Database source and enter the credentials
On the Select Tables screen, select multiple tables
Continue the Pipeline, save and execute