Large message Amazon SQS - amazon-web-services

I have an application that uses JMS to send files that are about a few megabytes large. Would it be possible to use Amazon SQS as the JMS provider for this application, as described here?
The problem here is that the max size of an SQS message is 256K. One way around this is to break up each file into multiple messages of 256K. But, if I do that, would having multiple producers send files at the same time break the architecture, as the messages from different producers become mixed up?

In this scenario you cannot use the original message with SQS, you will have to use a new message with a reference to the original message. The reference can be to a S3 object or a custom location on-prem or with-in AWS. S3 option probably involves least amount of work and has best cost efficiency (building and running).
If you consider S3 option, AWS Lambdas can be used to drop the message in SQS.
On a side note, the original message considered here seems to be self contained. May be it's a good idea to revisit the contents of the message, you may find ways to trim it and send only locations around which will result a smaller payload.

If everything is in the same region - the latency and data transfer cost is very minimal. Putting an item in S3 and having the object sent in the SQS should just turn your solution handle any sized data and take off your effort on scalability of the items and size of the each item.
While I said the data transfer costs are minimal, you might still incur data storage costs in S3; which you can use S3 life cycle rules to delete them.
#D.Luffy mentioned an important and excite solution with lambda - with that you can keep adding the items in S3 - enable S3 notifications, get that to the Queue and process the queue item (transfer it to another ec2 instance etc.) - making the solution fire and forget kind of stuff.
Please do not hesitate to leverage S3 alongside with SQS

Related

Using S3 instead of SQS for integration purposes

folks.
There's a question I've recently faced, which brought some concerns and hesitations. I'm creating an "almost serverless" micro-service using AWS. Here is its workflow: Workflows options
The thing is the input message may be large, but AWS SQS limits message size to 256 Kb. So, I decided to use S3 and S3 notifications to handle inputs: client PUTs an object; its creation triggers Lambda functions and so on. In that way, 256 Kb limits are not relevant, but on the other hand, I'm using storage service as an integration one. One of the concerns is a dead letter queue handling f.e.
Maybe someone has faced similar problems. One of the things is to keep "serverless". Are there any good solutions/improvements/advice?
Thanks in advance.
I would recommend combining the two approaches:
Write the data to Amazon S3
Create a message in the Amazon SQS queue that includes a reference to the data in S3
This way, you have the benefits of using a queue, with additional storage.
If all the data you require is already in the file, then you can configure an Amazon S3 Event to create the SQS message directly in the queue. The message will include the name of the bucket and the key of the object. Thus, putting the file in S3 will create the SQS message and trigger the AWS Lambda function. This is more scalable than directly triggering the Lambda function from S3.
Have you considered using Kinesis Stream, then attaching your lambda to the stream with a size of 1? You could also handle your dead-letter with shard time stamps etc.
Or if you are able to manipulate the original message place your timestamp inside the message then you can easily utilize kinesis firehose and bulk load messages. Limitation on Kinesis is 2mb so will give you almost 10x the message size on sqs which you could compress further.

Periodic Read from AWS S3 and publish to SQS

I have an S3 bucket with different files. I need to read those files and publish SQS msg for each row in the file.
I cannot use S3 events as the files need to be processed with a delay - put to SQS after a month.
I can write a scheduler to do this task, read and publish. But can I was AWS for this purpose?
AWS Batch or AWS data pipeline or Lambda.?
I need to pass the date(filename) of the data to be read and published.
Edit : The data volume to be dealt is huge
I can think of two ways to do this entirely using AWS serverless offerings without even having to write a scheduler.
You could use S3 events to start a Step Function that waits for a month before reading the S3 file and sending messages through SQS.
With a little more work, you could use S3 events to trigger a Lambda function which writes the messages to DynamoDB with a TTL of one month in the future. When the TTL expires, you can have another Lambda that listens to the DynamoDB streams, and when there’s a delete event, it publishes the message to SQS. (A good introduction to this general strategy can be found here.)
While the second strategy might require more effort, you might find it less expensive than using Step Functions depending on the overall message throughput and whether or not the S3 uploads occur in bursts or in a smooth distribution.
At the core, you need to do two things:
Enumerate all of the object in a bucket in S3, and perform some action on any object uploaded more than a month ago.
Can you use Lambda or Batch to do this? Sure. A Lambda could be set to trigger once a day, enumerate the files, and post the results to SQS.
Should you? No clue. A lot depends on your scale, and what you plan to do if it takes a long time to perform this work. If your S3 bucket has hundreds of objects, it won't be a problem. If it has billions, your Lambda will need to be able to handle being interrupted, and continuing paging through files from a previous run.
Alternatively, you could use S3 events to trigger a simple Lambda that adds a row to a database. Then, again, some Lambda could run on a cron job that asks the database for old rows, and publishes that set to SQS for others to consume. That's slightly cleaner, maybe, and can handle scaling up to pretty big bucket sizes.
Or, you could do the paging through files, deciding what to do, and processing old files all on a t2.micro if you just need to do some simple work to a few dozen files every day.
It all depends on your workload and needs.

AWS SQS and other services

my company has a messaging system which sends real-time messages in JSON format, and it's not built on AWS
our team is trying to use AWS SQS to receive these messages, which will then have DynamoDB to storage this messages
im thinking to use EC2 to read this messages then save them
any better solution ?? or how to do it i don't have a good experience
First of All EC2 is infrastructure on Cloud, It is similar to physical machine with OS on local setup. If you want to create any application that will fetch the data from Amazon SQS(Messages in Json Format) and Push it in dynamodb(No Sql database), Your design is correct as both SQS and DynamoDb have thorough Json Support. Once your application is ready then you deploy that application on EC2 machine.
For achieving this, your application must have the asyc Buffered SQS consumer that will consume the messages(limit of sqs messages is 256KB), Hence whichever application is publishing messages size of messages needs to be less thab 256Kb.
Please refer below link for sqs consumer
is putting sqs-consumer to detect receiveMessage event in sqs scalable
Once you had consumed the message from sqs queue you need to save it in dynamodb, that you can easily do it using crud repository. With Repository you can directly save the json in Dynamodb table but please sure to configure the provisioning write capacity based on requests, because more will be the provisioning capacity more will be the cost. Please refer below link for configuring the write capacity of table.
Dynamodb reading and writing units
In general, you'll have a setup something like this:
The EC2 instances (one or more) will read your queue every few seconds to see if there is anything there. If so, they will write this data to DynamoDB.
Based on what you're saying you'll have less than 1,000,000 reads from SQS in a month so you can start out on the free tier for that. You can have a single EC2 instance initially and that can be a very small instance - a T2.micro should be more than sufficient. And you don't need more than a few writes per second on DynamoDB.
The advantage of SQS is that if for some reason your EC2 instance is temporarily unavailable the messages continue to queue up and you won't lose any of them.
From a coding perspective, you don't mention your development environment but there are AWS libraries available for a pretty wide variety of environments. I develop in Java and the code to do this would be maybe 100 lines. I would guess that other languages would be similar. Make sure you look at long polling in the language you're using - it can help to speed up the processing and save you money.

AWS SQS Asynchronous Queuing Pattern (Request/Response)

I'm looking for help with an architectural design decision I'm making with a product.
We've got multiple producers (initiated by API Gateway calls into Lambda) that put messages on a SQS queue (the request queue). There can be multiple simultaneous calls, so there would be multiple Lambda instances running in parallel.
Then we have consumers (lets say twenty EC2 instances) who long-poll on the SQS for the message to process them. They take about 30-45 seconds to process a message each.
I would then ideally like to send the response back to the producer that issued the request - and this is the part I'm struggling with with SQS. I would in theory have a separate response queue that the initial Lambda producers would then be consuming, but there doesn't seem to be a way to cherry pick the specific correlated response. That is, each Lambda function might pick up another function's response. I'm looking for something similar to this design pattern: http://soapatterns.org/design_patterns/asynchronous_queuing
The only option that I can see is to create a new SQS Response queue for each Lambda API call, passing in its ARN in the message for the consumers to put the response on, but I can't imagine that's very efficient - especially when there's potentially hundreds of messages a minute? Am I missing something obvious?
I suppose the only other alternative would be setting up a bigger message broker (e.g. RabbitMQ/ApacheMQ) environment, but I'd like to avoid that if possible.
Thanks!
Create a (Temporary) Response Queue For Every Request
To late for the party, but i was thinking that i might find some help in what i want to achieve, #MattHouser #Zaheer Ally , or give an idea to someone working on a related issue.
I am facing a similar challenge. I have an API that upon request by a client, needs to communicate to multiple external APIs and collect (delayed) results.
Since my PHP API is synchronous, it can only perform these requests sequentially. So, i was thinking to use a request queue, where the producer (API) would send messages. Then, multiple workers would consume these messages, each of them performing one of these external API calls.
To get the results back, the producer would have created a temporary response queue, the name-identifier of which would be embedded in the message sent to workers. Hence, each worker would 'publish' his results on this temporary queue.
In the meantime, the producer would keep polling the temporary queue until he received the expected number of messages. Finally, he would delete the queue and send the collected results back to the client.
Yes, you could use RabbitMQ for a more "rpc" queue pattern.
But if you want to stay within AWS, try using something other than SQS for the response.
Instead, you could use S3 for the response. When your producer puts the item into SQS, include in the message an S3 destination for the response. When your consumer completes the tasks, put the response in the desired S3 location.
Then you can check S3 for the response.
Update
You may be able to accomplish an RPC-like message queue using Redis.
https://github.com/ServiceStack/ServiceStack/wiki/Messaging-and-redis
Then, you can use AWS ElastiCache for your Redis cluster. This would completely replace the use of SQS.
Another option would be to use Redis' pub/sub mechanism to asynchronously notify your lambda that the backend work is done. You can use AWS's Elasticache for Redis for an all-AWS-managed solution. Your lambda function would generate a UUID for each request, use that as the channel name to subscribe to, pass it along in the SQS message, and then the backend workers would publish a notification to that channel when the work is done.
I was facing this same problem so I tried it out, and it does work. Whether it's worth the effort over just polling S3 is another question. You have to configure the lambda functions to run inside your VPC, so they can access your Redis. I was going to have to do this anyway since I'd want the workers, in my case also lambda functions, to be able to access my Elasticsearch and RDS. But there are some considerations: most importantly, you need to use a private subnet with a NAT Gateway (or your own NAT Instance), so it can get out to the Internet and AWS managed services (including SQS).
One other thing I just stumbled across is that requests through API Gateway currently cannot take longer than 29 seconds, and this cannot be increased by AWS. You mentioned your jobs take 30 or more seconds, so this could be a showstopper for you using API Gateway and Lambda in this way anyway.
AWS now provides a Java client that supports temporary queues. This is useful for request/response patterns. I can't see a non-Java version.

Amazon sqs vs custom implementation

We need to sync data between different web servers. The idea is very basic: when one entity is created on one server, it should be sent to all the other servers. What's the right way to do it? We are currently evaluating 2 approaches: amazon's sqs and sns services and custom implementation with some key-value database (like memcached and memqueue). What are the common pitfalls of custom implementations? Any feedback will be highly appreciated.
SQS would work OK if you create a new queue for each server and write the data to each queue. The biggest downside is that you will need each server to poll for new messages.
SNS would work more efficiently because it allows you to broadcast a message to multiple locations. However, it's a one-shot try; if a machine can't receive its notification when SNS sends it SNS will not try again.
You don't specify how many messages you are sending or what your performance requirements are, but any SQS/SNS system will likely be much, much slower (mostly due to latencies between sending the message and the servers receiving it) then a local memcache/key-value server solution.
A mixed solution would be to use a persistant store (like SimpleDB) and use SNS to alert the servers that new data is available.