Can Superset visualize data returned from a REST API call? - apache-superset

We are trying to use Apache Superset to visualize business data, some of which is stored in SQL based databases, but some of it (think for example of external weather data) we need to access via public APIs (normally REST, but also sometimes push based microservices like websockets and gRPC).
Can Superset surface data in this way, or is it tied to SQL or SQL-like queries/APIs?

Superset supports any database engine with a DB-API driver and SQLAlchemy dialect (https://superset.apache.org/#databases).
So, in theory, you could wrap your API calls into some custom-developed SQLAlchemy accessible endpoint, but unless you need access to data that's refreshed in real-time, your best bet is probably to ETL the data from these public APIs into some type of reporting data lake.

Related

How can Azure Data Factory access a Custom Data Connector

I've just started to look at Azure Data Factory as a possible way to get data we are currently consuming for Power BI via custom connectors, primarily to access Graph APIs. I can't see if the same data is available to Azure Data Factory. Is there any way to achieve this?
Azure Data Factory has a number of different features which may help:
Web activity - call REST APIs from ADF pipeline; can only access public URLs
Webhook activity - call endpoints and pass callback URL
Azure function - run Azure functions in the pipeline; functions are very flexible so could probably do this
Custom activity via Azure Batch - run .net via Azure Batch; very customisable
Databricks notebook - call a notebook written in Scala, Python, R, Java or SparkSQL - completely customisable
Alternately look at Power BI Data Flows which offers self-service ETL but remember the destination for your "L" is only really Azure Data Lake Gen 2 and Power BI Datasets.
We decided to use Logic Apps, rather than Data Factory, which offer a convenient means to access Graph APIs, as Logic Apps support OAuth well i.e. we're not using Data Connectors any more
In addition, we put some of the more complicated logic into Stored Procedures, as Logic Apps, despite their name, can only handle basic logic

Web services over HBase

I am new to the Hadoop environment, sorry if the question is obvious...
I need to develop a web service to record and read large volumes of data. Because of this requirement I thought of using a Hadoop cluster and HBase as my database.
I have designed my hbase schema to satisfy my requirements, so far so good.
The thing is that since it is a service I am developing, I would like the users of the service not to know the internal representation of the data.
I do not want the users to have to invoke a Put to a certain table, for example, to the Clients table, but instead invoke a high-abstraction method, for example, createClient().
How do I add this abstraction layer on top of HBase while maintaining the characteristics of reliable and distributed and the capacity to service lots of users simultaneously offered by HBase itself?
Thanks a lot
Consider Hbase Stargate to enable a REST server. If you want to obscure the table name in the URI, perhaps proxy Stargate with a web server.

What is the "proper" way to use DynamoDB for an iOS app?

I've just started messing around with AWS DynamoDB in my iOS app and I have a few questions.
Currently, I have my app communicating directly to my DynamoDB database. I've been reading around lately and people are saying this isn't the proper way to go about getting data from my database.
By this I mean is I just have a function in my code querying my Dynamo database and returning the result.
How I do it works but is there a better way I should be going about this?
Amazon DynamoDB itself is a highly-scalable service and standing up another server in front of it requires scaling the service also in line with the RCU/WCU configured for your tables, which we can and should avoid.
If your mobile application doesn't need a backend server and you can perform all the business functions from the mobile device, then you should probably think about
Using the AWS DynamoDB SDK for iOS devices to write your client application that runs on the mobile device
Use AWS Token Vending Machine to authenticate your mobile users to grant them credentials to be used to run operations on DynamoDB tables.
Control access (i.e what operations should be allowed on tables etc.,) using IAM policies.
HTH.
From what you say, I can guess that you are talking about a way you can distribute data to many clients (ios apps).
There are few integration patterns (a very good book on this: Enterprise Integration Patterns), one of which is called shared database. It is essentially about using a common database for multiple clients to share the data. Main drawback for that pattern (in your case) is that you are doing assumption about how the database schema looks like. It can potentially bring you some headache supporting the schema in the future, if your business logic changes.
The more advanced approach would be sending events on every change in your data instead of directly writing changes to the database from client apps. This way you can add additional processing to the events before the data they carry is written to the database. For example, you may want to change the event format in the new version of your app, but still want to support legacy users, so you add translation procedure which transforms both types of events to the format which fits the database schema. It's basically a question of whether to work with diffs vs snapshots.
You should be aware of added complexity of working with events, and it can be an overkill if your app is simple and changes in schema are unlikely.
Also consider that you can do data preprocessing using DynamoDB Streams, which gives you some advantages of using events still keeping it simple to implement.

WSO2 Stratos - Multi-tenant application development

I am exploring the product WSO2 stratos ,watched some of the webinar recordings. I would like to create an application and expose it as SAAS.One of the webex recordings cover this in detail , but it is not explaining the multi-tenancy on data storage. Is there any tutorial available for the same ? I would like to use shared schema for data storage. What kind of database can i use for this ( For eg: MySql,MongoDB,Cassandra etc ) Is it possible to use some frame works like Athena ? I am just trying to do a kind of POC and then i need to decide whether this platform really fits for the application that i am thinking to build
You can create databases through WSO2 Storage Server in StratosLive which can be accessed via storage.stratoslive.wso2.com. You need to create a database and attach a user to it. Then you can access that database from your webapp (you will get a jdbc url) as you do it in normal cases. Also, you can create Cassandra keyspaces in the Storage Server. But we dont have the MongoDB support at the moment. There is no documentation on this yet.
Yes, you're right. Multi-tenant data architecture is up to the user to decide. This white paper from Microsoft explains multi-tenant data architecture nicely. The whitepaper however is written assuming you're using an RDBMS. I haven't played around with Athena so it's difficult to say how it'll map with what Stratos provides. The data architecture might be different when you're using a NoSQL DB and different DBs have different ways of filtering a set of data by a given tenant (or an ID). So probably going by the whitepaper it'll map to,
Different DBs -> Different keyspaces
Different tabeles -> Different column families
Shared schema -> Shared column family
Better to define your application characteristics before hand and then choose an appropriate DB

looking for a hosted back-end business data storage for analytics

i want a simple hosted data store for licensed for business applications. i want the following features:
REST-like access for CRUD operations (primarily adding records)
private and authenticated
makes for easy integration with a front end charting client like Google Visualization Apis
easy to use and set up
what about:
* Google Fusion Tables
* Google Cloud Services
* Google BigQuery
* Google Cloud SQL
or other non-google products. but i am imagining a cleaner integration between Google Charts and one of their backend data services.
Pros, Cons, Advice?
First, since this is Stack Overflow, I won't attempt to provide a judgement about how about "easy to use and setup" - that can be done by you reading the documentation for each product.
That being said, overall, the "right" answer really depends on what you are trying to do, and how much data you have. It also depends on what type of application you are building (this is Stack Overflow, so I am assuming you are a developer).
Relational Databases (like Google Cloud SQL) are great for maintaining transactional consistency but once your data grows massive it becomes difficult, expensive, or impossible to run analysis queries in a reasonable timeframe.
Google BigQuery is an analysis tool that allows developers to ask questions about really really big datasets using an SQL like language. It is 100% cloud based and is accessed via RESTful API - but it only allows for appending data, not changing individual records.