How can Azure Data Factory access a Custom Data Connector - powerbi

I've just started to look at Azure Data Factory as a possible way to get data we are currently consuming for Power BI via custom connectors, primarily to access Graph APIs. I can't see if the same data is available to Azure Data Factory. Is there any way to achieve this?

Azure Data Factory has a number of different features which may help:
Web activity - call REST APIs from ADF pipeline; can only access public URLs
Webhook activity - call endpoints and pass callback URL
Azure function - run Azure functions in the pipeline; functions are very flexible so could probably do this
Custom activity via Azure Batch - run .net via Azure Batch; very customisable
Databricks notebook - call a notebook written in Scala, Python, R, Java or SparkSQL - completely customisable
Alternately look at Power BI Data Flows which offers self-service ETL but remember the destination for your "L" is only really Azure Data Lake Gen 2 and Power BI Datasets.

We decided to use Logic Apps, rather than Data Factory, which offer a convenient means to access Graph APIs, as Logic Apps support OAuth well i.e. we're not using Data Connectors any more
In addition, we put some of the more complicated logic into Stored Procedures, as Logic Apps, despite their name, can only handle basic logic

Related

Can SSO be used to create a dataflow that will reside in the PBI service?

Our client would like us to use dataflows for data reuse and other reasons. We will be connecting to a Snowflake database from PBI service. However, they also want to be able to use SSO (Single Sign-On). So, when a user creates a dataset referencing a dataflow, they want the credentials from the currently logged in user to be picked up via SSO and passed along to Snowflake when the dataflow retrieves data from Snowflake. I don't think this can be done but I wanted to verify.
BTW, I know that SSO can be used with PBI Desktop. Just curious if dataflows can use it.
Yes, it seems is possible to use DataFlows with SSO for Snowflake. I am coming to this conclusion from the following reference : https://learn.microsoft.com/en-us/power-query/connectors/snowflake
which includes DataFlows under Summary - Products.

How to serve Google Big Query output to a client web

I have exported Firestore collections to Google Big Query to make data analysis and aggregation.
What is the best practice (using Google Cloud Products) to serve Big Query outputs to a client web application?
Google provides seven client libraries for BigQuery. You can take any library and write a webserver that will serve requests from client web application. The webserver can use a GCP service account to access BigQuery on behalf of its clients.
One such sample is this project. It's written in TypeScript. Uses NodeJS library on the server and React for the client app. I'm the author.
You may try to have an express tour through Google Data Studio, looking for the main features what this Google analytics service can offer for the customers. If your aim stands for visualizing data from Bigquery, Data Studio is a good option, thus it provides a variety of informative dashboards and reports, allowing the user customize charts and graphs sharing them publicly or via user collaboration groups.
Data Studio spreads a lot of connectors to different data sources, hence you can find a separate Bigquery connector for further integration with data resources residing in Bigquery warehouse.
You can track for any future product enhancements here.

Can Superset visualize data returned from a REST API call?

We are trying to use Apache Superset to visualize business data, some of which is stored in SQL based databases, but some of it (think for example of external weather data) we need to access via public APIs (normally REST, but also sometimes push based microservices like websockets and gRPC).
Can Superset surface data in this way, or is it tied to SQL or SQL-like queries/APIs?
Superset supports any database engine with a DB-API driver and SQLAlchemy dialect (https://superset.apache.org/#databases).
So, in theory, you could wrap your API calls into some custom-developed SQLAlchemy accessible endpoint, but unless you need access to data that's refreshed in real-time, your best bet is probably to ETL the data from these public APIs into some type of reporting data lake.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.

WSO2 Stratos - Multi-tenant application development

I am exploring the product WSO2 stratos ,watched some of the webinar recordings. I would like to create an application and expose it as SAAS.One of the webex recordings cover this in detail , but it is not explaining the multi-tenancy on data storage. Is there any tutorial available for the same ? I would like to use shared schema for data storage. What kind of database can i use for this ( For eg: MySql,MongoDB,Cassandra etc ) Is it possible to use some frame works like Athena ? I am just trying to do a kind of POC and then i need to decide whether this platform really fits for the application that i am thinking to build
You can create databases through WSO2 Storage Server in StratosLive which can be accessed via storage.stratoslive.wso2.com. You need to create a database and attach a user to it. Then you can access that database from your webapp (you will get a jdbc url) as you do it in normal cases. Also, you can create Cassandra keyspaces in the Storage Server. But we dont have the MongoDB support at the moment. There is no documentation on this yet.
Yes, you're right. Multi-tenant data architecture is up to the user to decide. This white paper from Microsoft explains multi-tenant data architecture nicely. The whitepaper however is written assuming you're using an RDBMS. I haven't played around with Athena so it's difficult to say how it'll map with what Stratos provides. The data architecture might be different when you're using a NoSQL DB and different DBs have different ways of filtering a set of data by a given tenant (or an ID). So probably going by the whitepaper it'll map to,
Different DBs -> Different keyspaces
Different tabeles -> Different column families
Shared schema -> Shared column family
Better to define your application characteristics before hand and then choose an appropriate DB