How to pass map object in AWS SQS queue?
I have a map object coming with some data and I wanted to pass the same map
in AWS SQS queue in order to persist that. How can I do the same?
I imagine you want to store the map so you can digest it from SQS later, as such I recommend you serialize it. You should serialize your data, as opposed to just sending the map into SQS and trying to parse it back into a map later, as it is much safer and likely easier.
JSON and MSGpack are popular for serialization and deserialization.
Here's some pseudo code:
some_map = {name: 'tom', age: 31}
serialized_map = make_into_json_function(some_map)
sqs.send_msg(queue, credentials, serialized_map)
To turn the JSON back into a map:
message = sqs.get_msg(some_msg_id)
new_map = make_json_into_map_function(message.text)
print new_map
Here is what I did (and it is working 100%):
Tried to send map values as attributes in SQS : using setMessageAttributes, but since in SQS you can send max 10 attributes at a time thus it was not suiting my requirement (I was having more than 10 values in my map).
SQS provides an option to send message using messageBody as well so I converted my map into plain Json string and sending it to SQS.
Thanks!
I think you need to serialize your map into a XML, JSON, or unformatted text.
When you receive messages, deserialize it to the map.
JSON can be a good choice, and you can use jackson or gson to serialize/deserialize JSON.
Here are two SQS limitations you should keep in mind:
Message content:
A message can include only XML, JSON, and unformatted text. The following Unicode characters are allowed: #x9 | #xA | #xD | #x20 to #xD7FF | #xE000 to #xFFFD | #x10000 to #x10FFFF
Any characters not included in this list are rejected. For more information, see the W3C specification for characters.
Message size:
The minimum message size is 1 byte (1 character). The maximum is 262,144 bytes (256 KB).
Related
I get it that in a pure sense, JSON doesn't account for tuples, but I don't think it's unreasonable to treat tuples as lists in terms of the JSON encoding. \Has anyone else faced and resolved this? I'd like to stay out of the business of pre-processing my error data to replace tuples with lists.
Perhaps I need to specify a different serialization approach?
EDIT: here is a practical example:
Here is some toy code.
the_data = {:ok, %{...}}
Sentry.capture_message(message, the_data)
All it does is attempt to send a message to Sentry with tuples in the data.
If you're unfamiliar with Sentry, the sentry-elixir library provides two functions (among many other, of course) that are used to explicitly send either exceptions or messages to Sentry. The functions are:
Sentry.capture_exception/2
Sentry.capture_message/2
In addition, errors are sent to Sentry when they bubble up to the "top". These can't be intercepted so I have to specify (and implement) a before_send_event "handler" in the configuration for Sentry.
This is what my configuration looks like for the environment I'm working in:
config :sentry,
dsn: "https://my_neato_sentry_key#sentry.io/33333343",
environment_name: :staging,
enable_source_code_context: true,
root_source_code_path: File.cwd!(),
tags: %{
env: "staging"
},
before_send_event: {SomeApplication.Utils.SentryLogger, :before_send},
included_environments: [:staging],
filter: SomeApplication.SentryEventFilter
My before_send function basically attempts to sanity check the data and replace all tuples with lists. I haven't implemented this entirely yet though and instead of replacing all tuples I am temporarily using Kernel.inspect/2 to convert it to a string. This isn't ideal of course, because they I can't manipulate the data in the Sentry views:
def before_send(sentry_event) do
IO.puts "------- BEFORE SEND TWO ---------------------------"
sentry_event
|> inspect(limit: :infinity)
end
This results in the following output:
{:invalid, {:ok, the_data}}
And the capture_message fails.
By default, sentry uses jason to encode its JSONs and, again by default, jason doesn't encode tuples. You can change that by implementing Jason.Encoder for Tuple:
defimpl Jason.Encoder, for: Tuple do
def encode(tuple, opts) do
Jason.Encode.list(Tuple.to_list(tuple), opts)
end
end
Be warned - this will have a global effect on how tuples are converted to JSON in your application.
When using the AWS SQS (Simple Queue Service) you pay for each request you make to the service (push, pull, ...). There is a maximum of 256kb for each message you can send to a queue.
To save money I'd like to buffer messages sent to my Go application before I send them out to SQS until I have enough data to efficiently use the 256kb limit.
Since my Go application is a webserver, my current idea is to use a string mutex and append messages as long as I would exceed the 256kb limit and then issue the SQS push event. To save even more space I could gzip every single message before appending it to the string mutex.
I wonder if there is some kind of gzip stream that I could use for this. My assumption is that gzipping all concatenated messages together will result in smaller size then gzipping every message before appending it to the string mutex. One way would be to gzip the string mutex after every append to validate its size. But that might be very slow.
Is there a better way? Or is there a total better approach involving channels? I'm still new to Go I have to admit.
I'd take the following approach
Use a channel to accept incoming "internal" messages to a go routine
In that go routine keep the messages in a "raw" format, so 10 messages is 10 raw uncompressed items
Each time a new raw item arrives, compress all the raw messages into one. If the size with the new message > 256k then compress messages EXCEPT the last one and push to SQS
This is computationally expensive. Each individual message causes a full compression of all pending messages. However it is efficient for SQS use
You could guesstimate the size of the gzipped messages and calculate whether you've reached the max size threshold. Keep track of a message size counter and for every new message increment the counter by it's expected compressed size. Do the actual compression and send to SQS only if your counter will exceed 256kb. So you could avoid compressing every time a new message comes in.
For a use-case like this, running a few tests on a sample set of messages should give the rough percentage of compression expected.
Before you get focused on compression, eliminate redundant data that is known on both sides. This is what encodings like msgpack, protobuf, AVRO, and so on do.
Let's say all of your messages are a struct like this:
type Foo struct {
bar string
qux int
}
and you were thinking of encoding it as JSON. Then the most efficient you could do is:
{"bar":"whatever","qux",123}
If you wanted to just append all of those together in memory, you might get something like this:
{"bar":"whatever","qux",123}{"bar":"hello world","qux",0}{"bar":"answer to life, the universe, and everything","qux",42}{"bar":"nice","qux",69}
A really good compression algorithm might look at hundreds of those messages and identify the repetitiveness of {"bar":" and ","qux",.
But compression has to do work to figure that out from your data each time.
If the receiving code already knows what "schema" (the {"bar": some_string, "qux": some_int} "shape" of your data) each message has, then you can just serialize the messages like this:
"whatever"123"hello world"0"answer to life, the universe, and everything"42"nice"69
Note that in this example encoding, you can't just start in the middle of the data and unambiguously find your place. If you have a bunch of messages such as {"bar":"1","qux":2}, {"bar":"2","qux":3}, {"bar":"3","qux":4}, then the encoding will produce: "1"2"2"3"3"4, and you can't just start in the middle and know for sure if you're looking at a number or a string - you have to count from the ends. Whether or not this matters will depend on your use case.
You can come up with other simple schemes that are more unambiguous or make the code for writing or reading messages easier or simpler, like using a field separator or message separator character which is escaped in your encoding of the other data (just like how \ and " would be escaped in quoted JSON strings).
If you can't have the receiver just know/hardcode the expected message schema - if you need the full flexibility of something like JSON and you always unmarshal into something like a map[string]interface{} or whatever - then you should consider using something like BSON.
Of course, you can't use msgpack, protobuf, AVRO, or BSON directly - they need a medium that allows arbitrary bytes like 0x0. And according to the AWS SQS FAQ:
Q: What kind of data can I include in a message?
Amazon SQS messages can contain up to 256 KB of text data, including XML, JSON and unformatted text. The following Unicode characters are accepted:
#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]
So if you want to aim for maximum space efficiency for your exact usecase, you'd have to write your own code which use the techniques from those encoding schemes, but only used bytes which bytes are allowed in SQS messages.
Relatedly, if you have a lot of integers, and you know most of them are small (or clump around a certain spot of the number line, so that by adding a constant offset to all of them you can make most of them small), you can use one of the variable length quantity techniques to encode all of those integers. In fact several of those common encoding schemes mentioned above use variable length quantities in their encoding of integers. If you use a "piece size" of six (6) bits (instead of the standard implicitly assumed piece size of eight (8) bits to match a byte) then you can use base64. Not full base64 encoding, because the padding will completely defeat the purpose - just map from the 64 possible values that fit in six bits to the 64 distinct ASCII characters that base64 uses.
Anyway, unless you know your data has a lot repetition (but not the kind that you can just not send, like the same field names in every message) I would start with all of that, and only then would I look at compression.
Even so, if you want minimal size, I would aim for LZMA, and if you want minimal computing overhead, I would use LZ4. Gzip is not bad per se - if it's much easier to use gzip then just use it - but if you're optimizing for either size or for speed, there are better options. I don't know if gzip is even a good "middle ground" of speed and output size and working memory size - it's pretty old and maybe there's compression algorithms which are just strictly superior in speed and output and memory size by now. I think gzip, depending on implementation, also includes headers and framing information (like version metadata, size, checksums, and so on), which if you really need to minimize for size you probably don't want, and in the context of SQS messages you probably don't need.
So we have 100 different types of messages coming into our Kinesis stream. We only want to save 4 types. I know Kinesis can transform messages, but can it filter as well? How is this done?
Filtering is just a transform in which you decide not to output anything. You indicate this by sending the result with a value "Dropped" as per the documentation.
You can find at this post an example of transform, and the logic includes several things: letting records just pass through without any transform (status "OK"), transforming and outputting a record (again, status "OK"), dropping -or filtering- a record (status "Dropped"), and communicating an error using the status "ProcessingFailed"
I have two Kinesis streams and I would like to create a third stream that is the intersection of these two streams. My goal is to have a stream processor respond to an event on the resulting third stream without having to write a consumer that performs this intersection.
A record on stream a would be:
{
"customer_id": 3,
"first_name":"Marcy",
"last_name":"Shurtleff"
}
and a record on stream b would be:
{
"payment_id": 10001,
"customer_id": 1,
"amount":234.56,
"date":"2018-09-07T10:25:43.511Z"
}
I would like to perform a join (like I can in KSQL with Kafka) that will join stream a.customer_id to stream b.customer_id resulting in:
{
"customer_id": 3,
"first_name":"Marcy",
"last_name":"Shurtleff",
"payment_id": 10001,
"amount":234.56,
"date":"2018-09-07T10:25:43.511Z"
}
(or whatever sql-like projection I choose).
I know this is possible with Kafka and KSQL, but is this possible with Kinesis?
Kinesis Data Analytics will not help as you cannot use more than one stream as a datasource in that product and you can only perform joins on 'in-application' streams.
I recently implemented a solution that does exactly what you are asking using Kinesis Data Anlytics. Indeed, a KDA In-application takes only one stream as input data source; so this limitation makes the schema standardization of the data flowing into KDA necessary when you are dealing with multiple sets of streams. To work around these issues, a python snippet code can be used inside of lambda to flatten and standardize any event by converting its entire payload to a JSON-encoded string. The image below shows how my whole solution is deployed:
The process of standardize and flatten the streams is illustrated in detail below:
Note that after this stage both JSON events have the same schema and no nested fields. Yet, all information is preserved. In addition, the ssn field is placed on the header to be used as join key inside of the KDA application.
For more information about this solution, check this article I wrote: https://medium.com/#guilhermeepassos/joining-and-enriching-multiple-sets-of-streaming-data-with-kinesis-data-analytics-24b4088b5846
d = random,randint(1,30)
data = [d, strftime("%Y%m%d %H%M%S", gmtime())] #random num , system time
client.publish("gas", str(data)]
This is a part of my python code which is ver2.
I'm trying to send a list using MQTT.
However, If I write bytearray instead of str which is third line
It says "ValueError: string must be of size 1".
So I wrote str then make it sting type
Can I send a just list which is NOT string type.
MQTT message payloads are just byte arrays, there is no inherent format to them. Strings tend to works as long as both ends of the transaction are using the same character encoding.
If you want to send structured data (such as the ost) then you need to decide on a way to encode that structure so the code receiving the message will know how to reconstruct it.
The current usual solution to this problem is to encode structures are JSON, but XML or something like protobuffers are also good candidates.
The following question has some examples of converting Python lists to JSON objects
Serializing list to JSON