Pentaho di jobs triggered by Lambda

Pentaho di jobs triggered by Lambda - amazon-web-services

Pls help, how this can be achieved?
Requirement:
When new files are available in a AWS S3 Bucket, a lambda process will be triggered and Pentaho job(s) to validate/process the files should be triggered.
The Pentaho Job should be executed in the server and not the Lambda JVM
(to make use of the resources of the Linux Server where Pentaho 7.1 Client community version is available.)
Note: I followed the approach in https://dankeeley.wordpress.com/2017/04/25/serverless-aws-pdi/ and this executes the code in Lambda JVM, but per our reqmt we need the job to run in the linux server.
Infra Details:
Pentaho Code will be in file repo in server; mount location example: /mnt/data
Pentaho version: Pentaho 7.1 Client community version.
Server: Linux
Thanks in advance.

If you want your Pentaho Job to be executed in the server and not the Lambda JVM, you dont need AWS Lambda at all.
Instead you can use
AWS SNS and
Provision an HTTP endpoint on your Linux Server which then subscribes to SNS topic
Basically you will need to install an HTTP Server and provision an HTTP endpoint that can be invoked when new files are available in S3.
So when new files are available in a AWS S3 Bucket, you can set the notification to AWS SNS instead of AWS Lambda and then as a subscriber to this SNS Topic you can hook in the HTTP endpoint that you provisioned in step 2 above.
So whenever a new file is invoked a notification will go to SNS which in turn will push that to HTTP endpoint and then you can read the file and execute your Pentaho Job

Related

AWS Lambda alternative on Oracle Cloud Infrastructure

We are currently using AWS Lambda for some of the services with the following flow.
A rails application (kubernetes) adds a message to SQS queue
Lambda function is invoked via SQS trigger
Lambda function adds the notification to SNS
SNS calls the configured https endpoint to notify the rails application of the status
This has been working well for us. The function takes about 15 seconds to run (for generating some pdf with headless-chrome)
Due to Geographical data security restrictions for a separate installation of our application, we are unable to use AWS and the only feasible option is to use Oracle Cloud Infrastructure (OCI). OCI has cloud functions and also a Queue service, however unlike AWS, OCI doesn't seem to have inbuilt integration between cloud functions and Queue service.
One of the solutions we have discussed in the team is to deploy a service in kubernetes to consume the messages from the OCI Queue and invoke the cloud function and send the results to Notifications service.
I would appreciate any inputs that can simplify this flow but also maintain the async nature and scalability.

Rather than using OCI Queues you can send the events using OCI Streaming with a single subscriber
then you can link Functions easily and Notification service is available

I guess that when you are talking about service in K8s is 24/24 7/7 service and don't want to manage it through HPA/VPA.
If so, you can use https://knative.dev or alternatives https://ramitsurana.github.io/awesome-kubernetes/projects/projects/#serverless-implementations

AWS Sqs Source Connector to Kafka

I am trying to connect to AWS sqs Queue where my data is stored. I have the Aws.key and secret with me. I also have the SQS Source Connector which transfers data from SQS queue to my Kafka topic. I would like to know how to do that. Do I need to work on the AWS console ? How to use the Source Connector to transfer the data ?

You need to deploy your own infrastructure (doesn't require the console, no) running Kafka Connect, then install the SQS connector plugin and then HTTP POST the connector config to its REST API

How to get BitBucket Server v5.15.1 (on-premise) webhook to trigger Lambda via API Gateway to get into S3?

I'm working with an on-premise older version of BitBucket Server v5.15.1 that does not have the Bitbucket Pipelines feature and I need how to get the webhooks to notify AWS Lambda via HTTPS POST via AWS API Gateway after a commit is made to master branch...then Lambda downloads a copy of the repo, zips it up and places it into an S3 bucket...and of course this is where CodePipeline can finally be triggered...But I'm having issues getting this on-premise BitBucket Server located within my AWS account to connect its webhook to Lambda.
I tried following this documentation below and launched the CloudFormation template with all the needed resources but I'm assuming it is for BitBucket Cloud not Bitbucket Server OP.
https://aws.amazon.com/blogs/devops/integrating-git-with-aws-codepipeline/
Anyones help with this would be really appreciated.

I suppose you are following this below blog from AWS :
https://aws.amazon.com/blogs/devops/integrating-codepipeline-with-on-premises-bitbucket-server/
We had also implemented it. If the event is coming to Lambda, then make sure your Lambda is within a VPC and it has correct outbound(read as inbound) rules to connect the Bitbucket server over HTTPS. Also the Bitbucket server accepts the VPC IP range.

SFTP file created trigger in AWS cloudwatch

I have to read and process a file in an AWS Lambda function from an SFTP server that is not on AWS.
Some external source is putting the file in the SFTP server which is not in AWS, and whenever the file is uploaded completely, we have to check it via AWS CloudWatch and then trigger an AWS Lambda for processing this file.
Is this approach right? Is this even possible?
If this is possible, please suggest some steps. I checked in AWS CloudWatch but I was not able to found any trigger which checks the file outside the AWS.

You need to create some sort of job that will monitor your SFTP directory (e.g., using inotify) and then invoke your AWS Lambda function by using AWS access keys of an IAM user created with programmatic access enabled and sufficient permissions to invoke that AWS Lambda function.
You can also create an AWS CloudWatch event that will be triggered on scheduled basis like every 5 mins that will trigger the AWS Lambda function to check for any news file by maintaining a history somewhere for example on AWS DynamoDB but I would rather trigger the AWS Lambda from SFTP server as using AWS based file-upload on SFTP detection sounds better if AWS Transfer for SFTP is used instead of on-premises SFTP server because it uses AWS S3 as an SFTP store and AWS S3 as the feature of creating an event on files/objects upload and trigger an AWS Lambda function.

Can you modify the external source script ?
If yes, you can send a SNS notification to specific topic using the aws cli or a specific language sdk.
Then you can have a lambda to process your file, triggered by the SNS topic.

No SQS event published when a new file upload to S3 bucket

Today I encountered an issue and I'm not if there's a solution or is it a bug.
In our project we use S3 and SQS and API Gateway as an interface for S3. Whenever a new file is uploaded via gateway a new event is being published to SQS and there are no problems.
Earlier today I deployed a new version of our service and consumes SQS messages. To test that everything works as expected I created a new S3 bucket and corresponding SQS queue. Than I started to copy objects from production bucket to the newly created one using boto3 Python library.
After a while I noticed that for some files there was no SQS event published. And after research it turned out that all such files are greater than 8Mb.
I also tried to upload a file using AWS CLI just in case, but result was the same.
However, when I upload a file from AWS web console, then I can see SQS event published.
So everything works when uploading to S3 via API Gateway or AWS Web Console but not AWS CLI or boto3 and presumably other libraries.
Seems like a bug or some limitation but I couldn't find any documentation on it.
Has anyone experienced similar behaviour?
Thanks in advance for any tips.

I believe 8MB is the size at which the CLI (and SDK) will start performing multi-part upload operations. You probably need to enable notifications for the s3:ObjectCreated:CompleteMultipartUpload event on your S3 bucket.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Pentaho di jobs triggered by Lambda - amazon-web-services

Related

AWS Lambda alternative on Oracle Cloud Infrastructure

AWS Sqs Source Connector to Kafka

How to get BitBucket Server v5.15.1 (on-premise) webhook to trigger Lambda via API Gateway to get into S3?

SFTP file created trigger in AWS cloudwatch

No SQS event published when a new file upload to S3 bucket

Categories

Resources