How to call on Web Service API and route data into Azure SQL Database? - web-services

Having configured an Azure SQL Database, I would like to feed some tables with data from an HTTP REST GET call.
I have tried Microsoft Flow (whose HTTP Request action is utterly botched) and I am now exploring Azure Data Factory, to no avail.
The only way I can currently think of is provisioning an Azure VM and install Postman with Newman. But then, I would still need to create a Web Service interface to the Azure SQL Database.
Does Microsoft offer no HTTP call service to hook up to an Azure SQL Database?

Had the same situation a couple of weeks ago and I ended up building the API call management using Azure Functions. No problem to use the Azure SDK's to upload the result to e.g BLOB store or Data Lake. And you can add whatever assembly you need to perform the HTTP post operation.
From their you can easily pull it with Data Factory to a Azure SQL db.

I would suggest you write yourself an Azure Data Factory custom activity to achieve this. I've done this for a recent project.
Add a C# class library to your ADF solution and create a class that inherits from IDotNetActivity. Then in the IDictionary method make the HTTP web request to get the data. Land the downloaded file in blob storage first, then have a downstream activity to load the data into SQL DB.
public class GetLogEntries : IDotNetActivity
{
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
{
etc...
HttpWebResponse myHttpWebResponse = (HttpWebResponse)httpWebRequest.GetResponse();
You can use the ADF linked services to authenticate against the storage account and define where container and file name you want as the output etc.
This is an example I used for data lake. But there is an almost identical class for blob storage.
Dataset outputDataset = datasets.Single(dataset => dataset.Name == activity.Outputs.Single().Name);
AzureDataLakeStoreLinkedService outputLinkedService;
outputLinkedService = linkedServices.First(
linkedService =>
linkedService.Name ==
outputDataset.Properties.LinkedServiceName).Properties.TypeProperties
as AzureDataLakeStoreLinkedService;
Don't bother with an input for the activity.
You will need an Azure Batch Service as well to handle the compute for the compiled classes. Check out my blog post on doing this.
https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Hope this helps.

Related

Need recommendation to create an API by aggregating data from multiple source APIs

Before I start doing this I wanted to get advice from the community on the best and most efficient manner to go about doing it.
Here is what I want to do:
Ingest data from multiple API's which returns JSON
Store it in either S3 or DynamoDB
Modify the data to use my JSON structure
Pipe out the aggregate data as an API
The data will be updated twice a day, so I would pull in the data from the source APIs and put it through my pipeline twice a day.
So basically I want to create an API by aggregating data from multiple source APIs.
I've started playing with Lambda and created the following function using Python.
#https://stackoverflow.com/a/41765656
import requests
import json
def lambda_handler(event, context):
#https://www.nylas.com/blog/use-python-requests-module-rest-apis/ USEFUL!!!
#https://stackoverflow.com/a/65896274
response = requests.get("https://remoteok.com/api")
#print(response.json())
return {
'statusCode': 200,
'body': response.json()
}
#https://stackoverflow.com/questions/63733410/using-lambda-to-add-json-to-dynamodb DYNAMODB
This works and returns a JSON response.
Here are my questions:
Should I store the data on S3 or DynamoDB?
Which AWS service should I use to aggregate the data into my JSON structure?
Which service should I use to publish the aggregate data as an API, API Gateway?
However, before I go further I would like to know what is the best way to go about doing this.
If you have experience with this I would love to hear from you.
The answer will vary depending on the quantity of data you're planning to mine. Lambdas are designed for short-duration, high-frequency workloads and thus might not be suitable.
I would recommend looking into AWS Glue, as this seems like a fairly typical ETL (Extract Transform Load) problem. You can set up glue jobs to run on a schedule, and as for data aggregation, that's the T in ETL.
It's simple to output the glue dataframe (result of a transformation) as s3 files, which can then be queried directly by Amazon Athena (as if they were db content).
As for exposing that data via an API, the serverless framework or SST are great tools for taking the sting out of spinning up a serverless API and associated resources.

Which Google Cloud function is preferable to fetch data from external API into GCP?

This should be a very easy question but I can't wrap my head around what to use. I would like to create a data pipeline that fetches data from an outside/external API (for example, Spotify API) and perform some rather simple data cleaning on it, while either continue to create a JSON file in Cloud Storage or enter the data into BigQuery.
As far as I understand I can use Composer to do it, using DAGS etc but what I need here is something more simple/lightweight (mainly UI based) that doesn't cost as much as Composer does as well as being easier to use. What I am looking for is something like Data Factory in Azure.
So, in brief:
Login to a data source using username/password
Extract data from a well known format (CSV/Json)
Transform data, such as remove columns, perform simple filtering like date filtering.
Reformat the data into another format (JSON/CSV/BigQuery)
...without having to code everything from scratch.
Can I handle all of this with one GCP application or do I need to use combinations like Cloud Scheduler, Cloud Functions etc?
As always, you have several options...
Cloud Scheduler seems to be a requirement to trigger regularly the process (up to every minutes).
Then, you have 2 options:
Code the process: API Call, transform/clean the data, sink the data into the destination
Use Cloud Workflow: you can define the API calls that you want to do
Call the API
Store the raw data in BigQuery (API Call also, you have connectors to simplify the process)
Run a query in BigQuery to clean/format your data and store them into a final table (API Call also)
You can also perform a mix between Cloud Functions to get the data and clean/format the data with a query in BigQuery.
Doing something specific like that without starting from scratch... difficult...
EDIT 1
If you have a look to the documentation, you can see that sample
- getCurrentTime:
call: http.get
args:
url: https://us-central1-workflowsample.cloudfunctions.net/datetime
result: currentTime
- readWikipedia:
call: http.get
args:
url: https://en.wikipedia.org/w/api.php
query:
action: opensearch
search: ${currentTime.body.dayOfTheWeek}
result: wikiResult
- returnResult:
return: ${wikiResult.body[1]}
The first step getCurrentTime performs an external call and store the result in result: currentTime.
In the next step, you can reuse the result currentTime and get only the value that you want in another API call.
And you can plug steps like that.
If you need authentication, you can perform a call to secret manager to get the secret values and then to result the secret manager call result in subsequent steps.
For an easier connection to Google APIs, you can use connectors

Executing webjobs via SQL server Stored Procedure

I have a very simple C# console exe. My code deletes a blob from a particular blob storage. It takes a couple of command-line arguments - container name & blob name and deletes the blob whenever triggered.
Now, I want to schedule this exe as a webjob.
I have a couple of questions -
How can I manually trigger this webjob since it takes command line arguments?
Is there any way that I can trigger this webjob via a SQL server stored procedure?
You can use the stored procedure to send http requests to fulfill your needs.
Steps
You can use the Kudu Webjob API to invoke your function.
Create HttpRequest Method in sql server.
Related post
How can I make HTTP request from SQL server?

AWS Lamba, multi tenant apps, sepatate databases

I have to create app where each user has his own database, can login on his own subdomain, but all users use the same API endpoints (Lambda functions).
API is in Node.js, Frontend in Angular 7.
Is it feasible? Can you give me instruction how to configure AWS to it?
AWS has little role to play. Your nodejs api and lambda functions design will handle this
I've done this a couple of years ago. I used a key called ClientID which is passed into every request. You can use anything as long as it's unique.
In your APIs, when you initialize your database, use that identifier to map your database connection. (this means you have to initialize the database each time there's a request)
e.g.
User A -> Database A
User B -> Database B
etc...
However, I encountered an issue when doing this:
when you update database A, you have to update database B as well.
(what happens when you have 100 databases?)
But it's doable.

Is there a way to map an object graph with #Query?

I'm am trying to migrate my SDN3 embedded configuration to using SDN 3.3.0 with a Neo4j instance in server mode (communicating via the REST API then).
When the DB was embedded making a lot of small hits to the DB was not a big deal as Neo4j is capable of handling this kind of queries super fast.
However now that I run my Neo4j separately from my application (ie. in server mode) making a lot of small queries is not advisable because of the network overhead.
User user = userRespository.findOne(123);
user.fetch(user.getFriends());
user.fetch(user.getManager());
user.fetch(user.getAgency());
This will trigger quite a few queries, especially if I want to get, not a single user, but a list of users.
Can I use the #Query annotation and fetch the user and the related entities and map it into an User object?
I was thinking of something like this:
#Query("MATCH (u:User)-[r:FRIEND]->(f) RETURN u,r,f"
Is such a thing possible with Spring Data Neo4j? Will it be possible with Spring Data Neo4j 4?
You can define a class for query result using the #QueryResult directive and let the method for the query return an object of that class, i.e.:
#QueryResult
public interface UserWithFriends {
#ResultColumn("u")
User getUser();
#ResultColumn("f")
List<User> friends();
}
#Query("MATCH (u:User)-[:FRIEND]->(f) WHERE u.name={name} RETURN u,f")
UserWithFriends getUserByName(#Param("name") String name);