How to integrate a WebJob within an Azure Data Factory Pipeline - azure-webjobs

I'am trying to integrate a WebJob inside an ADF pipeline.
The webjob is a very simple console application:
namespace WebJob4
{
class ReturnTest
{
static double CalculateArea(int r)
{
double area = r * r * Math.PI;
return area;
}
static void Main()
{
int radius = 5;
double result = CalculateArea(radius);
Console.WriteLine("The area is {0:0.00}", result);
}
}
}
How do we call this webjob through an ADF pipeline and store the response code (HTTP 200 in case of Success) in azure blob storage?

Dec 2018 Update :
If you are thinking of doing this using azure function, azure data factory NOW provides you with an azure function step! the underlying principle is the same as you will have to expose the azure function with a HTTP trigger. however this provides better security since you can specify your data factory instance access to the azure function using ACL
Reference : https://azure.microsoft.com/en-us/blog/azure-functions-now-supported-as-a-step-in-azure-data-factory-pipelines/
Orginal Answer
From the comments posted I believe you dont want to use custom activities route.
You could try using a copy task for this, even though probably this is not the intended purpose.
there is a httpConnector available for copying data from a web source.
https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-http-connector
the copy task triggers an http endpoint,
you can specify a variety of authentication mechanisms from Basic to
OAuth2.
below I am using the end point to trigger the azure function process, the output is saved in datalake folder for logging (you can use other things obviously, like in your case it would be blob storage.)
Basic linked Service
{
"name": "linkedservice-httpEndpoint",
"properties": {
"type": "Http",
"typeProperties": {
"url": "https://azurefunction.api.com/",
"authenticationType": "Anonymous"
}
}
}
Basic Input Dataset
{
"name": "Http-Request",
"properties": {
"type": "Http",
"linkedServiceName": "linkedservice-httpEndpoint",
"availability": {
"frequency": "Minute",
"interval": 30
},
"typeProperties": {
"relativeUrl": "/api/status",
"requestMethod": "Get",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"structure": [
{
"name": "Status",
"type": "String"
}
],
"published": false,
"external": true,
"policy": {}
}
}
Output
{
"name": "Http-Response",
"properties": {
"structure": [
...
],
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "linkedservice-dataLake",
"typeProperties": {
...
},
"availability": {
...
},
"external": false,
"policy": {}
}
}
Activity
{
"type": "Copy",
"name": "Trigger Azure Function or WebJob with Http Trigger",
"scheduler": {
"frequency": "Day",
"interval": 1
},
"typeProperties": {
"source": {
"type": "HttpSource",
"recursive": false
},
"sink": {
"type": "AzureDataLakeStoreSink",
"copyBehavior": "MergeFiles",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "Http-Request"
}
],
"outputs": [
{
"name": "Http-Response"
}
],
"policy": {
...
}
}

Related

How to skip particular set of test cases in a collection in terminal using newman?

I am having a postman collection which consists of request and test cases for each requests. I have two test case for each request. one for validating status code and other for validating response time. I need to execute status code test case frequently and response time test case results occasionally.How to achieve it without modifying the collection for every run and is it achievable in providing any option in terminal?
collection.json
{
"name": "Metadata",
"item": [
{
"name": "info",
"event": [
{
"listen": "test",
"script": {
"id": "32cf67e7-5d42-4231-86fe-e7fffa32c855",
"exec": [
"pm.test(\"Status code is 200\", function () {",
" pm.response.to.have.status(200);",
"});",
"pm.test(\"Response time is less than 300ms\", function () {",
" pm.expect(pm.response.responseTime).to.be.below(300);",
"});"
],
"type": "text/javascript"
}
}
],
"request": {
"auth": {
"type": "bearer",
"bearer": [
{
"key": "token",
"value": "{{tokenAdmin}}",
"type": "string"
}
]
},
"method": "GET",
"header": [],
"url": {
"raw": "{{url}}/api/m0/metadata/info",
"host": [
"{{url}}"
],
"path": [
"api",
"m0",
"metadata",
"info"
]
}
},
"response": []
}
],
"protocolProfileBehavior": {},
"_postman_isSubFolder": true
}
For a very basic flow, you can use moment to check which day it currently is and if that matches the condition, it will run the responseTime test.
let moment = require('moment'),
date = moment().format('dddd');
// Runs on each request
pm.test("Status code is 200", function () {
pm.response.to.have.status(200);
});
// Only runs on a Friday
if (date === 'Friday') {
pm.test("Response time is less than 1000ms", function () {
pm.expect(pm.response.responseTime).to.be.below(1000);
});
}
Moment has lots of different options available to you and might work if you want to only run that check at the end of the sprint or on a given day.

Alexa SmartHome Skills : Issue with device discovery

I have written an Alexa smart home skills.
When I try to discover the device using the Alexa test or from the mobile app, the lambda is triggered.
The lambda is getting successfully executed, but I get below error in App or test in Alexa console.
I couldn't find any new Smart Home devices. If you’ve ‎n't already,
please enable the smart home skill for your device from the Alexa App.
What could be the possible issue?
Since the lambda is getting successfully executed, I don't think there is any issue with language (English(IN)) or AWS region (EU-WEST-1) , where the lambda is deployed.
I didn't see any logs on Alexa developer console
Any pointers?
Response from Lambda function -
header =
{
namespace: 'Alexa.Discovery',
name: 'Discover.Response',
payloadVersion: '3',
messageId: '785f0173-6ddb-41d8-a785-de7159c7f7ca'
}
payload =
{
"endpoints": [
{
"endpointId": "d4b87cbe6c8e490493733f260b8c2c25",
"friendlyName": "Kitchen",
"description": "Demo",
"manufacturerName": "Man1",
"displayCategories": [
"LIGHT"
],
"cookie": {
"owner": "Owner1"
},
"capabilities": [
{
"type": "AlexaInterface",
"version": "3",
"interface": "Alexa"
},
{
"type": "AlexaInterface",
"version": "3",
"interface": "Alexa.PowerController",
"properties": {
"supported": [
{
"name": "powerState"
}
],
"proactivelyReported": true,
"retrievable": true
}
},
{
"type": "AlexaInterface",
"version": "3",
"interface": "Alexa.BrightnessController",
"properties": {
"supported": [
{
"name": "brightness"
}
],
"proactivelyReported": true,
"retrievable": true
}
}
]
}
]
}
We are wrapping header and payload in the response event.
context.succeed({ event: { header: header, payload: payload } });
So far I haven't found a way to view the logs either.
I had the same problem and I realized that I was putting wrong values in some properties or schema entities like ids.
In the same way, another thing that solved me on some occasion was to place the scheme in the following way:
context.succeed({
"event": {
"header": {
"namespace": "Alexa.Discovery",
"name": "Discover.Response",
"payloadVersion": "3",
"messageId": header.messageId
},
"payload": {
"endpoints": [
{
"endpointId": "demo_id",
...
,
"cookie": {},
"capabilities": [
{
"type": "AlexaInterface",
"interface": "Alexa",
"version": "3"
},
...
]
}
]
}
}
});

Cloud API Vision Results not appearing

I'm making a request with the google vision api that appears to have worked, I get an operation number back. The problem I am having is the I am not sure how to interpret the results and nothing appeared in the output folder after running the script.
This is the script I ran
https://vision.googleapis.com/v1/files:asyncBatchAnnotate
{
"requests":[
{
"inputConfig": {
"gcsSource": {
"uri": "gs://somebucket/1.pdf"
},
"mimeType": "application/pdf"
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
],
"outputConfig": {
"gcsDestination": {
"uri": "gs://somebucket/output/"
},
"batchSize": 1
}
}
]
}
This returns back
{
"name": "operations/8b7534d4b21b825e"
}
and when I do a lookup on the operation I get this
https://vision.googleapis.com/v1/operations/8b7534d4b21b825e
{
"name": "operations/8b7534d4b21b825e",
"metadata": {
"#type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
"state": "CREATED",
"createTime": "2019-01-09T21:08:57.339363096Z",
"updateTime": "2019-01-09T21:08:57.339363096Z"
}
}
However the output folder is completely empty and I am not sure what to make of the state created.
According to this answer by a Google engineer, latency in the order of minutes (~10 minutes) is somewhat expected. I’ve done some tests myself, with small files and at moments delay can be up to 25 minutes, though in some cases it is much less.
When Vision API is done processing your request, you should get a response like the one below, for the get method:
{
"name": "operations/XXXxxxxXXXX",
"metadata": {
"#type": "type.googleapis.com/google.cloud.vision.v1.OperationMetadata",
"state": "DONE",
"createTime": "2019-01-09T23:08:37.312889645Z",
"updateTime": "2019-01-09T23:08:59.169306747Z"
},
"done": true,
"response": {
"#type": "type.googleapis.com/google.cloud.vision.v1.AsyncBatchAnnotateFilesResponse",
"responses": [
{
"outputConfig": {
"gcsDestination": {
"uri": "gs://somebucket/output/"
}
}
}
]
}
}

Calling Webjob from Azure Data Factory pipeline throwing HTTP 409 conflict exception error

I have a OnDemand triggered webjob that I want to trigger through ADF copy activity using HTTP linked service. Here is the linked service:-
{
"name": "LS_WebJob",
"properties": {
"hubName": "yas-cdp-adf_hub",
"type": "Http",
"typeProperties": {
"url": "https://cust-app.scm.azurewebsites.net/api/triggeredwebjobs/ConsoleApplication1/run",
"authenticationType": "Basic",
"username": "$custdata-app",
"password": "**********"
}
}
}
Input Dataset
{
"name": "ZZ_Inp_Webjob",
"properties": {
"published": false,
"type": "Http",
"linkedServiceName": "LS_WebJob",
"typeProperties": {
"requestMethod": "Post",
"requestBody": "Hey Buddy"
},
"availability": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"external": true,
"policy": {}
}
}
Output Dataset
{
"name": "ZZ_Out_WebJob",
"properties": {
"published": false,
"type": "AzureBlob",
"linkedServiceName": "LS_ABLB",
"typeProperties": {
"fileName": "webjob.json",
"folderPath": "yc-cdp-container/Dummy/temp",
"format": {
"type": "TextFormat"
}
},
"availability": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
}
}
}
Pipeline
{
"name": "ZZ-PL-WebJob",
"properties": {
"description": "This pipeline copies data from an HTTP Marina WiFi Source URL to Azure blob",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "HttpSource"
},
"sink": {
"type": "BlobSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "ZZ_Inp_Webjob"
}
],
"outputs": [
{
"name": "ZZ_Out_Webjob"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"style": "StartOfInterval"
},
"name": "WebjobSourceToAzureBlob",
"description": "Copy from an HTTP source to an Azure blob"
}
],
"start": "2017-04-10T01:00:00Z",
"end": "2017-04-10T01:00:00Z",
"isPaused": false,
"hubName": "yas-cdp-adf_hub",
"pipelineMode": "Scheduled"
}
}
My webjob is a simple C# application:-
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("My Team Rocks!");
}
}
}
When I am executing the pipeline, the webjob is being successfully triggered. However the pipeline fails with HTTP 409 conflict error.
Copy activity encountered a user error at Source side:
ErrorCode=UserErrorFailedToReadHttpFile,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed
to read data from http source
file.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Net.WebException,Message=The
remote server returned an error: (409) Conflict.,Source=System,'.
Try adding in the gateway name to the linked service json. Refer to this link How to integrate a WebJob within an Azure Data Factory Pipeline .

Unable to execute Lambda in Async mode via API Gateway POST request

Why currently there no way to execute AWS Lambda in asynchronous mode via Gateway API without involving intermediary Lambda just for calling invoke() method?
Even if i add integration like this:
r = client.put_integration(
restApiId=rest_api_id,
resourceId=resource_id,
httpMethod='POST',
type='AWS',
integrationHttpMethod='POST',
uri=uri,
requestParameters={
'integration.request.header.X-Amz-Invocation-Type': "'Event'",
'integration.request.header.Invocation-Type': "'Event'"
}
)
It still executed synchronously...
Are there some platform limitation or so?
I have an example Swagger document which you can switch invocation type to Lambda function. I guess you already got how to map the header to trigger the different invocation types, but I think you might forget to deploy the API.
Swagger
{
"swagger": "2.0",
"info": {
"version": "2016-02-11T22:00:31Z",
"title": "LambdaAsync"
},
"host": "<placeholder>",
"basePath": "<placeholder>",
"schemes": [
"https"
],
"paths": {
"/": {
"get": {
"produces": [
"application/json"
],
"parameters": [
{
"name": "X-Amz-Invocation-Type",
"in": "header",
"required": false,
"type": "string"
}
],
"responses": {
"200": {
"description": "200 response",
"schema": {
"$ref": "#/definitions/Empty"
}
}
},
"x-amazon-apigateway-integration": {
"passthroughBehavior": "when_no_match",
"httpMethod": "POST",
"uri": "arn:aws:apigateway:us-east-1:lambda:path/2015-03-31/functions/arn:aws:lambda:us-east-1:<account>:function:<function_name>/invocations?Qualifier=$LATEST",
"responses": {
"default": {
"statusCode": "200"
}
},
"requestParameters": {
"integration.request.header.X-Amz-Invocation-Type": "method.request.header.X-Amz-Invocation-Type"
},
"type": "aws"
}
}
}
},
"definitions": {
"Empty": {
"type": "object",
"title": "Empty Schema"
}
}
}