Use serverless here-api in lambda to extract traffic flow - amazon-web-services

I deployed a traffic service on aws that uses a lambda function, but I have a hard time understanding how it works. For instance, I see that the lambda function contains .js files for traffic, but I don't understand what part I can edit to create proper requests for traffic flow based on geo-coordinates.
I found an article about geocoding app, but it felt a bit different from what I am trying to achieve. Which is to create a function just to retrieve the traffic flow information.

After you deploy the traffic application in lambda, you can retrieve the traffic info via the Lambda API gateway.
https://.execute-api..amazonaws.com/Prod/traffic/api/traffic/
https://.execute-api..amazonaws.com/Prod/traffic/api/traffic.maps/
More details can be found in the below two online doc:
https://github.com/heremaps/here-aws-sar/tree/master/serverlessFunctions/traffic
https://developer.here.com/documentation/traffic/dev_guide/topics_v6.1/example-flow-intro.html
If you are looking for the rendered live-traffic map tiles, I would suggest using map tile application: https://github.com/heremaps/here-aws-sar/tree/master/serverlessFunctions/maptile

Related

How to Access documentDB from Lambda#Edge function?

I am trying to set up an event trigger lambda#Edge function from cloudFront.
This function needs to access the database and replace the url's metadata before distributing out to users.
Issues I am facing:
My DocumentDB is placed in a VPC private subnet. Can't be accessed outside the VPC.
My Lambda edge function can't connect to my VPC since they are both in different region.
The method I had in mind is to create an API in my web server(public subnet) for my lambda function to call, but this seems like not a very efficient method
Appreciate If you can give me some advice or an alternative way for implementation.
Thanks in Advance
Lambda#Edge has a few limitations you can read about here.
Among them is:
You can’t configure your Lambda function to access resources inside your VPC.
That means the VPC being in another region is not your problem, you just can't place a Lambda # Edge function in any VPC.
The only solution I can think of is making your DocumentDB available publicly on the internet, which doesn't seem like a great idea. You might be able to create a security group that only allows access from the CloudFront IP-Ranges although I couldn't find out if Lambda#Edge actually uses the same ranges :/
Generally I'd avoid putting too much business logic in Lambda#Edge functions - keep in mind they're run on every request (or at the very least every request to the origin) and increase the latency for these requests. Particularly network requests are expensive in terms of time, more so if you communicate across continents to your primary region with the database.
If the information you need to update the URLs metadata is fairly static, I'd try to serialize it and distribute it in the lambda package - reading from local storage is considerably cheaper and faster.

AWS consuming data from API REST

I'm writing to you because I'm quite a novice with AWS... I only worked before with EC2 instances for simple tasks...
I am currently looking for an AWS service for reciving data using REST API calls (to external AWS services).
So far I have used EC2 where I deployed my library (python) that made calls and stored data in S3.
What more efficient ways does AWS offer for this? some SaaS?
I know that they are still more details to know in order to choose a good services but I would like to know from where I can start looking.
Many thanks in advance :)
I make API requests using AWS Lambda. Specifically, I leave code that makes requests, writes the response to a file and pushes the response object (file) to AWS S3.
You'll need a relative/absolute path to push the files to wherever you want to ingest. By default lambda servers current working directory is: /var/task but you may want to write your files to /tmp/ instead.
You can automate the ingestion process by setting a CloudWatch rule to trigger your function. Sometimes I chain lambda functions when I need to loop requests with changing parameters instead of packing all requests within a single function,
i.e.
I leave the base request (parameterized) in one function and expose the function through an API Gateway endpoint.
I create a second function to call the base function once for each value I need by using the Event object (which is the JSON body of a regular request). This data will replace parameters within the base function.
I automate the second function.
Tip:
Lambda sometimes will run your requests inside the same server. So if you're continuously running these for testing the server may have files from past calls that you don't want, so I usually have a clean-up step at the beginning of my functions that iterates through my filesystem to make sure there are no files before making the requests.
Using python 3.8 as a runtime I use the requests module to send the request, I write the file and use boto3 to push the response object to an aws S3 bucket.
To invoke an external service you need some "compute resources" to run your client. Under compute resources in aws we understand ec2, ecs (docker container) or lambda (serverless - my favorite)
You had your code already running on EC2 so you should already know you need VPC with a public subnet and ip address to make an outbound call regardless the compute resource you choose

How would I create a Minecraft EC2 server that automaticaly starts when someone tries to use it

Currently, I have a working modded Minecraft server working running on a C5 EC2 instance. The problem is that I have to manually start and stop the server which can be annoying for my friends. I was wondering if it would be possible to automate the EC2 state so that it runs as soon as a player attempts to join the sever. This would be similar to how Minecraft realms behaves which I heard Mojang was using AWS for:
https://aws.amazon.com/blogs/aws/hosting-minecraft-realms-on-aws/
I have looked up tutorials for this and this is the best I could come across:
https://github.com/trevor-laher/OnDemandMinecraft
The problem with this solution is that it requires to make a separate website to log users in and start the EC2 instance while I want the startup and shutdown to be completely automatic.
I would appreciate any guidance.
If the server is off, it would not be possible to "connect" to the server. Therefore, another mechanism is required that can be used to start the server.
Combine that with your desire to minimise costs and the only real solution is to somehow trigger an AWS Lambda function, which could start the server.
There are a few ways you could have users trigger the AWS Lambda function:
Make a call to API Gateway
Upload an object to Amazon S3
Somehow put a message in an SNS topic or an SQS queue
Trigger an Amazon CloudWatch Alarm (which calls Lambda via SNS)
...and probably other ways
When considering a method to use, you should consider security implications such as:
Whether only authorized users should be able to trigger the Lambda function, or is it okay that anybody (eg a web crawler) might trigger it.
Whether you willing to give your friends AWS credentials (not a good idea) that they could use to start the server directly, or whether it should be an indirect method.
Frankly, I would recommend the following architecture:
Create an AWS Lambda function that turns on the server
Create an API Gateway that triggers the Lambda function
Give a URL to your friends that calls the API Gateway and passes a 'secret' (effectively a password)
The API Gateway will call the Lambda function, passing the secret
The Lambda function confirms that the secret is correct and starts the Amazon EC2 instance with Minecraft installed
Here is a tutorial that shows many of these concepts: Build an API Gateway API with Lambda Integration
The purpose of the secret is to avoid the server from starting if an unauthorized person (or a bot) happens to hit the API Gateway endpoint. They will not provide the secret, so the server will not be started.
Stopping the server after a period of non-use is a different matter. The library you referenced might be able to assist with finding a way to do this. You could have a script running on the Minecraft server that monitors the game and, after a period of inactivity, simply calls the operating system to perform a Shutdown.
You could use a BungeeCord hub server that then allows user to begin a connection to the main server and spin it up via. AWS.
This would require the bungee server to be always up however, but the cost of hosting a small bungee server should be relatively cheap.
I don't think there's any way you could do this without having a small server that receives the initial request to spin up the AWS machine.

How should microservices developed using AWS API Gateway + Lambda/ECS talk?

I am developing a "micro-services" application using AWS API Gateway with either Lambda or ECS for compute. The issue now is communication between services are via API calls through the API gateway. This feels inefficient and less secure than it can be. Is there a way to make my microservices talk to each other in a more performant and secure manner? Like somehow talk directly within the private network?
One way I thought of is multiple levels of API gateway.
1 public API gateway
1 private API gateway per microservice. And each microservice can call another microservice "directly" inside the private network
But in this way, I need to "duplicate" my routes in 2 levels of API ... this does not seem ideal. I was thinking maybe use {proxy+}. So anything /payment/{proxy+} goes to payment API gateway and so on - theres still 2 levels of API gateway ... but this seem to be the best I can go?
Maybe there is a better way?
There are going to be many ways to build micro-services. I would start by familiarizing yourself with the whitepaper AWS published: Microservices on AWS, Whitepaper - PDF version.
In your question you stated: "The issue now is communication between services are via API calls through the API gateway. This feels inefficient and less secure than it can be. Is there a way to make my microservices talk to each other in a more performant and secure manner?"
Yes - In fact, the AWS Whitepaper, and API Gateway FAQ reference the API Gateway as a "front door" to your application. The intent of API Gateway is to be used for external services communicating to your AWS services.. not AWS services communicating with each other.
There are several ways AWS resources can communicate with each other to call micro-services. A few are outlined in the whitepaper, and this is another resource I have used: Better Together: Amazon ECS and AWS Lambda. The services you use will be based on the requirements you have.
By breaking monolithic applications into small microservices, the communication overhead increases because microservices have to talk to each other. In many implementations, REST over HTTP is used as a communication protocol. It is a light-weight protocol, but high volumes can cause issues. In some cases, it might make sense to think about consolidating services that send a lot of messages back and forth. If you find yourself in a situation where you consolidate more and more of your services just to reduce chattiness, you should review your problem domains and your domain model.
To my understanding, the root of your problem is routing of requests to micro-services. To maintain the "Characteristics of Microservices" you should choose a single solution to manage routing.
API Gateway
You mentioned using API Gateway as a routing solution. API Gateway can be used for routing... however, if you choose to use API Gateway for routing, you should define your routes explicitly in one level. Why?
Using {proxy+} increases attack surface because it requires routing to be properly handled in another micro-service.
One of the advantages of defining routes in API Gateway is that your API is self documenting. If you have multiple API gateways it will become colluded.
The downside of this is that it will take time, and you may have to change existing API's that have already been defined. But, you may already be making changes to existing code base to follow micro-services best practices.
Lambda or other compute resource
Despite the reasons listed above to use API Gateway for routing, if configured properly another resource can properly handle routing. You can have API Gateway proxy to a Lambda function that has all micro-service routes defined or another resource within your VPC with routes defined.
Result
What you do depends on your requirements and time. If you already have an API defined somewhere and simply want API Gateway to be used to throttle, monitor, secure, and log requests, then you will have API Gateway as a proxy. If you want to fully benefit from API Gateway, explicitly define each route within it. Both approaches can follow micro-service best practices, however, it is my opinion that defining each public API in API Gateway is the best way to align with micro-service architecture. The other answers also do a great job explaining the trade-offs with each approach.
I'm going to assume Lambdas for the solution but they could just as well be ECS instances or ELB's.
Current problem
One important concept to understand about lambdas before jumping into the solution is the decoupling of your application code and an event_source.
An event source is a different way to invoke your application code. You mentioned API Gateway, that is only one method of invoking your lambda (an HTTP REQUEST). Other interesting event sources relevant for your solution are:
Api Gateway (As noticed, not effective for inter service communication)
Direct invocation (via AWS Sdk, can be sync or async)
SNS (pub/sub, eventbus)
There are over 20+ different ways of invoking a lambda. documentation
Use case #1 Sync
So, if your HTTP_RESPONSE depends on one lambda calling another and on that 2nd lambdas result. A direct invoke might be a good enough solution to use, this way you can invoke the lambda in a synchronous way. It also means, that lambda should be subscribed to an API Gateway as an event source and have code to normalize the 2 different types of events. (This is why lambda documentation usually has event as one of the parameters)
Use case #2 Async
If your HTTP response doesn't depend on the other micro services (lambdas) execution. I would highly recommend SNS for this use case, as your original lambda publishes a single event and you can have more than 1 lambda subscribed to that event execute in parallel.
More complicated use cases
For more complicated use cases:
Batch processing, fan-out pattern example #1 example #2
Concurrent execution (one lambda calls next, calls next ...etc) AWS Step functions
There are multiple ways and approaches for doing this besides being bound to your current setup and infrastructure without excluding the flexibility to implement/modify the existing code base.
When trying to communicate between services behind the API Gateway is something that needs to be carefully implemented to avoid loops, exposing your data or even worst, blocking your self, see the "generic" image to get a better understanding:
While using HTTP for communicating between the services it is often common to see traffic going out the current infrastructure and then going back through the same API Gateway, something that could be avoided by just going directly the other service in place instead.
In the previous image for example, when service B needs to communicate with service A it is advisable to do it via the internal (ELB) endpoint instead of going out and going back through the API gateway.
Another approach is to use "only" HTTP in the API Gateway and use other protocols to communicate within your services, for example, gRPC. (not the best alternative in some cases since depends on your architecture and flexibility to modify/adapt existing code)
There are cases in where your infrastructure is more complex and you may not communicate on demand within your containers or the endpoints are just unreachable, in this cases, you could try to implement an event-driven architecture (SQS and AWS Lambda)
I like going asynchronous by using events/queues when possible, from my perspective "scales" better and must of the services become just consumers/workers besides no need to listen for incoming request (no HTTP needed), here is an article, explaining how to use rabbitmq for this purpose communicating microservices within docker
These are just some ideas that hope could help you to find your own "best" way since is something that varies too much and every scenario is unique.
I don't think your question is strictly related to AWS but more like a general way of communication between the services.
API Gateway is used as an edge service which is a service at your backend boundary and accessible by external parties. For communication behind the API Gateway, between your microservices, you don't necessary have to go through the API Gateway again.
There are 2 ways of communication which I'd mention for your case:
HTTP
Messaging
HTTP is the most simplistic way of communication as it's naturally easier to understand and there are tons of libraries which makes it easy to use.
Despite the fact of the advantages, there are a couple of things to look out for.
Failure handling
Circuit breaking in case a service is unavailable to respond
Consistency
Retries
Using service discovery (e.g. Eureka) to make the system more flexible when calling another service
On the messaging side, you have to deal with asynchronous processing, infrastructure problems like setting up the message broker and maintaining it, it's not as easy to use as pure HTTP, but you can solve consistency problems with just being eventually consistent.
Overall, there are tons of things which you have to consider and everything is about trade-offs. If you are just starting with microservices, I think it's best to start with using HTTP for communication and then slowly going to the messaging alternative.
For example in the Java + Spring Cloud Netflix world, you can have Eureka with Feign and with that it's really easy to use logical address to the services which is translated by Eureka to actual IP and ports. Also, if you wanna use Swagger for your REST APIs, you can even generate Feign client stubs from it.
I've had the same question on my mind for a while now and still cannot find a good generic solutions... For what it's worth...
If the communication is one way and the "caller" does not need to wait for a result, I find Kinesis streams very powerful - just post a "task" onto the stream and have the stream trigger a lambda to process it. But obviously, this works in very limited cases...
For the response-reply world, I call the API Gateway endpoints just like an end user would (with the added overhead of marshaling and unmarshaling data to "fit" in the HTTP world, and unnecessary multiple authentications).
In rare cases, I may have a single backend lambda function which gets invoked by both the Gateway API lambda and other microservices directly. This adds an extra "hop" for "end users" (instead of [UI -> Gateway API -> GatewayAPI lambda], now I have [UI -> Gateway API -> GatewayAPI lambda -> Backend lambda]), but makes microservice originated calls faster (since the call and all associated data no longer need to be "tunneled" through an HTTP request). Plus, this makes the architecture more complicated (I no longer have a single official API, but now have a "back channel" direct dependencies).

Is amazon lambda suitable for web scraping?

If I create a function to get webpages. Will it execute it on different IP per execution so that my scraping requests dont get blocked?
I would use this AWS pipeline:
Where at source on the left you will have an EC2 instance with JAUNT which then feeds the URLS or HTML pages into a Kinesis Stream. The Lambda will do your HTML parsing and via Firehose stuff everything into S3 or Redshift.
The JAUNT can run via a standard WebProxy service with a rotating IP.
Yes, lambda by default executes with random IPs. You can trigger it using things like event bridge so you can have a schedule to execute the script every hour or similar. Others can possibly recommend using API Gateway, however, it is highly insecure to expose API endpoints available for anyone to trigger. So you have to write additional logic to protect it either by hard coded headers or say oauth.
AWS Lambda doesn't have a fixed IP source as mentioned here
however, I guess this will happen when it gets cooled down, not during the same invocation.
Your internal private IP address isn't what is seen by the web server, it's the public ip address from AWS. As long as you are setting the headers and signaling to the webserver that your application is a bot. See this link for some best practices on creating a web scrapper:
https://www.blog.datahut.co/post/web-scraping-best-practices-tips
Just be respectful of how much data you pull and how often is the main thing in my opinion.
Lambda is triggered when a file is placed in S3, or data is being added to Kinesis or DynamoDB. That is often backwards of what a web scraper needs, though certainly things like S3 could perform as a queue/job runner.
Scraping on different IPs? Certainly lambda is deployed on many machines, though that doesn't actually help you, since you can't control the machines or their IP.