Amazon Mechanical Turk task retrieval API - amazon-web-services

I'm writing an app wherein the premise is that people who can't directly fund a charity out of their pocket could automatically work on Amazon AWS HITs in order to bring clean water to the 3rd world - I'd known there was an Amazon AWS API but is very unclear on how to retrieve a HIT to be worked on, rather than just consume some data about some tasks I'm trying to complete.
Is there any way to retrieve a HIT to be worked on through the AWS API or otherwise?
Thanks ahead of time.

As far as I know, the only way to work on a HIT is through the mTurk website. i.e. - not via API.
There is a site that is trying to do something very similar to what you have described. http://www.sparked.com/

Related

High amount of requests for AWS S3 for a small mobile app

I am doing a mobile app with React Native using AWS Services, which is basically a CRUD app for images, and I am relatively new to it.
My fees seems very high for AWS S3 (look at the picture below). I know the amount is still very small but I am concerned on how did my app requested more than 91k times on barely 2 months. I tried to debug my app, but it's impossible that I requested the AWS server that many times. I looked for any loops in my project and there is nothing, I even looked at the network and didn't found anything suspicious.
I tried to analyze what could it be with cloud watch as well but I don't see any metrics that would represent these S3 requests.
To display images on my React Native app, I am using the react-native-fast-image module.
Is there a way to debug/find what is causing these calls? Has anyone experienced a similar thing with AWS S3? Are DynamoDB query calls based on S3 as well?

How to write Kafka connector to integrate with Facebook API?

I am trying to write a Kafka connector to fetch data from the facebook. The problems are,
How to fetch data from facebook through their API without exceeding the limit of API hit provided by facebook? The connector should call facebook API for data after a specific time interval so that the number of hits won't exceed.
Each user can hit the facebook API with their Access Token so users can't share the same topic partition. So how to handle this scenario. Do we have to create one partition for each user?
I read a few guides and blogs to understand Kafka connect and write a connector.
Confluent- https://docs.confluent.io/current/connect/index.html
Kafka Documentation- https://kafka.apache.org/documentation/#connect
Conceptually It gave me an idea about what is Kafka connect, how it works and what are the important classes to write a Kafka connector. But still, I am confused that practically how to write and run a connector. I tried to find step by step development guide but didn't get.
Any tutorial or pdf If you could suggest which have detailed step by step development guide to write and run Kafka connector.
The only "official guide" is in those links you have
https://docs.confluent.io/current/connect/devguide.html#developing-a-simple-connector
I personally have no experience with the Facebook API, but I assume it uses REST, so you could make start by forking the kafka-connect-rest project, but the simplest answer to not exceed the limit would be to not send more requests than you are allowed within a given time period (add a timer to the code that waits between requests)
Also, one connector would only have one set of access keys. How you create the ConnectRecord objects to ultimately partition the records is up to you, but I don't think having an access key per user will scale very well. It might make more sense to have one key tied to one application, then each user will accept that that application has access to read certain details from their account.

Display real time data on website that scales?

I am starting a project where I want to create a website which will display LIVE flight information and status. We all have seen this at airport. An example is given here - http://www.computronics.biz/productimages/prodairport4.jpg. As you can see this information changes continuously. The website will talk to a backend api and the this backend api will talk to database. Now the important part is that the flight information in the database will be updated by the airline itself. There could be several airlines and they will update their data respectively. I have drawn a diagram and uploaded here - https://imgur.com/a/ssw1S.
Now those airlines will obviously have an interface (website talking to some backend API) through which they will update the database.
Now here is my attempt to solve it. We need to have some sort of trigger such that if any airline updates a flight detail in the database between current time - 1 hour to current + 4 hours (website will only display few hours of flights), we need to call the web api and then send the update to the website in the real time. The user must not refresh the page at all. At the same time the website needs to scale well i.e. if 1 million users are on the website, and there is an update in the database in the correct time range, all 1 million user's website should get updated within a decent amount of time.
I did some research and it looks like we need to have an event based approach. For example - we need to create a function (AWS lambda or Azure function) that should be called whenever there is an update in the database (Dynamo DB for example) within the correct time range. This function then should call an API which should then update the website through web socket technology for example.
I am not looking for any code but just some alternative suggestions on how this can be solved in a scalable way. Also how do we test scalability?
Dont use serverless functions(Lambda/Azure functions)
Although I am a huge fan of serverless functions, and currently running a full web app in Lambda, I don't think its needed for your use case and doesn't make sense economically. As you've answered in the comments, each airline will not write directly to the database, they'll push to an API, meaning you are explicitly told when flights have changed. When an airline has sent you new data you can simply propagate this to all the browser endpoints via websockets. This keeps the design very simple. There is no need to artificially create a database event that then triggers a function that will then tell you a flight has been updated. Thats like removing your doorbell and replacing it with a motion detector that triggers a doorbell :)
Cost
Money always deserves its own section. Lambda is more of an economic break through than a technological one. You have to know when its cost effective. You pay per request so if your dealing with a process that handles 10,000 operations a month, or something that only fires 1,000 times a day, than lambda is dirt cheap and practically free. You also pay for the length of time the function is executing and the memory consumed while executing. Generally, it makes sense to use lambda functions where a dedicated server would be sitting idle for most of the time. So instead of a whole EC2 instance, AWS provides you with a container on demand. There are points at which high requests rates and constantly running processes makes lambda more expensive than EC2. This article discusses how generally its cheaper to use lambda up to a point -> https://www.trek10.com/blog/lambda-cost/ The same applies to Azure functions and googles equivalent. They are all just containers offered on demand.
If you're dealing with flight information I would imagine you will have thousands of flights being updated every minute so your lambda functions will be firing constantly as if you were running an EC2 instance. You will end up paying a lot more than EC2. When you have a service that needs to stay up 24/7 and run 24/7 with high activity that is most certainly a valid use case for a dedicated server or servers.
Proposed Solution
These are the components I would use below:
Message Queue of some sort (RabbitMQ or AWS SQS with SNS perhaps)
Web Socket Backend (The choice will depend on programming language)
Airline input API (REST,GraphQL, or maybe AWS Kinesis Data Firehose)
The airlines publish their data to a back-end api. The updates are stored on a message queue and the web applicaton that actually displays the results to users, via websockets, reads from the queue.
Scalability
For scalability you can run the websocket application on multiple EC2 instances (all reading from the same queuing service) in an autoscaling group, so with extra load more instances will be created automatically hence the name "autoscaling". And those instances can sit behind an elastic load balancer. Lots of AWS documentation on how to do this and its their flagship design pattern. If you use AWS SQS you don't have to manage the scalability details yourself, aws handles that. The only real components to scale are your websocket application and the flight data input endpoint. You can run the flight api in an autoscaling group as well but AWS does offer an additional tool for high traffic data processing. I detail that below.
Testing Scalability
It would be fairly easy to have a mock airline blast your service with thousands and thousands of fake updates and on the other end you can easily run multiple threads of selenium tests simulating browser clicks and validating that the UI is still operational.
Additional tools
If it ends up being large amounts of data, rather than using a conventional REST api for your flight update service you could consider a service AWS offers specifically for dealing with large amounts of real time updates (Kinessis Data Firehose) https://aws.amazon.com/kinesis/data-firehose/ But I've never used it.
First, please don't over think this. This is a trivial problem to solve and doesn't require any special techniques, technologies or trendy patterns & frameworks.
You actually have three functional areas you can address almost separately.
Ingestion - Collection and normalization of the data from the various sources. For this, you'll need a process and transformation engine, LogicApps or such.
Your databases. You'll quickly learn that not all flights are the same ;). While it might seem so, the amount of data isn't that much. Instances of MySQL/SQL Server tuned for a particular function will work just fine. Hint, you don't need to have data for every movement ready to present all the time.
Presentation. The data API and UIs. This, really, is the easy part. I would suggest you use basic polling at first. For reasons you will never have any control over, the SLA for flight data is ~5 minutes so a real-time client notification system is time you should spend elsewhere at first.

Can i open a website through an Amazon Web Service?

Is it possible to open a website,like facebook.com for example, on an amazon web service?
My objective is to automate a certain task in a game and to do so without having to be online on my computer. The point is to spend less time on that game, but to not be left behind on the progress. (I'm building a bot to automate the daily tasks there, just need to know if i can now leave everything running on amazon)
Another project i want to do is to automate access to my email account and perform certain tasks depending on the emails i receive.
You get the point, i tried searching on google but i only find results about creating or hosting your own website in there and not about accessing existing websites and using automation in them.
It sounds like what you want is a virtual private server - basically a computer in the cloud that you control and is always on.
AWS have a service called LightSail for this kind of purpose. Under the hood lightsail just uses EC2, but lightsail takes away a lot of the options and configuration to provide a simpler 'click and go' kind of service.
Once you have a server you can schedule regular tasks. Depending on the complexity of your needs, you could look at using Cron as a scheduler and curl for you http requests.
For the specifics of any project you have I would suggest opening a new question with details of what you are trying to do, the reading you have done, and examples of any code you have tried.

Integration of Google Calendar with amazon EC2

I am using Google calendar and based on calendar booking i want to start AWS EC2 instances.
Is there a way to achieve this interaction?
A very good use case actually; it is really cool.
This is one of the ways you can plan the design and architecture. You would be making use of the Google Calender API endpoint using REST or SDK to continuously (asynchronously) poll the events and activities; depending on how you specify the EC2 instance to be started or stopped. You can obtain the information from Calendar data. Based on this you can start or stop the instance via. the AWS SDK.
There AWS SDKs are available for almost all the popular programming languages - .net, java, php, ruby, python and also Google Go. If not you can always fall back with the REST API calls and the same.
I recommend you to take a look at Skeedly to get some ideas. - http://www.skeddly.com/