Can we inject compressed data to Kusto in plain text via Event Hub - azure-eventhub

I am trying to ingest payload to Kusto/ADX via Event Hub
Limitation = 1 standard tier EH can only support throughput up to 40 Mbps.
Goal: Increasing max throughput by sending compressed payload without handling translation manually.
Example: payload = {
a: 1,
b: 2
}
we send this payload to EH by compression manually and Kusto store it as 1 row with 2 column a and b. Not handling compression handle from our end.
I am expecting Event hub to handle the compressed data and translation from their end.

It's 40 MB (Megabyte) per second, not 40Mb (Megabit).
You can compress your payload with gzip.
Kusto will open it automatically as part of the ingestion process.
Ingest data from event hub into Azure Data Explorer
Setting
Suggested value
Field description
Compression
None
The compression type of the event hub messages payload. Supported compression types: None, Gzip.
Having said that, the right thing to do would probably be to switch to an Event Hubs in higher tier.

Related

How to slow down reads in Kinesis Consumer Library?

We have an aggregation system where the Aggregator is an KDA application running Flink which Aggregates the data over 6hrs time window and puts all the data into AWS Kinesis data Stream.
We also have an Consumer Application that uses KCL 2.x library and reads the data from KDS and puts the data into DynamoDB. We are using default KCL configuration and have set the poll time to 30 seconds. The issue that we are facing now is, Consumer application is reading all the data in KDS with in few minutes, causing huge writes in DynamoDB with in short period of time causing scaling issues in DynamoDB.
We would like to consume the KDS Data slowly and even out the data consumption across time allowing us to keep lower provisioned capacity for WCU's.
One way we can do that is increase the polling time for the KCL consumer application, I am trying to see if there is any configuration that can limit the number of records that we can poll, helping us to reduce the write throughput in dynamoDB or any other way to fix this problem?
Appreciate any responses

Record real time Audio from browser and stream to Amazon S3 for storage

I want to record audio from my browser and live stream it for storage in Amazon S3. I cannot wait till the recording is finished as client can close the browser, so I would like to store what has been spoken (or nearest 5-10 second).
The issue is multipart upload does not support less than 5Mib chunks, and audio files will be for most part less than 5Mib.
Ideally I would like to send the chunks in 5 seconds, so what has been said in last 5 seconds to be uploaded.
Can it be support by S3? or should I use any other AWS service to first hold the recording parts - heard about kinesis stream but not sure if it can serve the purpose.

How many lines is best to batch before sending the ILP data to QuestDB?

I read somewhere that Influx supports only 1000 metrics (lines of data) one can send via ILP. What is the maximum for QuestDB?
I am batching 1000 lines currently before calling socket.send(), will the speed go up if I send more in one go?
When you call send() on the socket it does not create any application level batching, just starts sending the byte buffer over the network. QuestDB batches all incoming data using parameters
commitLag
maxUncommittedRows
described at
https://questdb.io/docs/guides/out-of-order-commit-lag/

How to use Kinesis to broadcast records?

I know that Kinesis typical use case is event streaming, however we'd like to use it to broadcast some information to have it in near real time in some apps besides making it available for further stream processing. KCL seems to be the only viable option to use Kinesis as stream API is too low level.
As far I understand to use KCL we'd have to generate random applicationId so all apps could receive all the data, but this means creating a new DynamoDB table each time an application starts. Of course we can perform clean up when application stops but when application doesn't stop gracefully there would be DynamoDB table hanging around.
Is there a way/pattern to use Kinesis streams in a broadcast fashion?

Kafka message codec - compress and decompress

When using kafka, I can set a codec by setting the kafka.compression.codec property of my kafka producer.
Suppose I use snappy compression in my producer, when consuming the messages from kafka using some kafka-consumer, should I do something to decode the data from snappy or is it some built-in feature of kafka consumer?
In the relevant documentation I could not find any property that relates to encoding in kafka consumer (it only relates to the producer).
Can someone clear this?
As per my understanding goes the de-compression is taken care by the Consumer it self. As mentioned in their official wiki page
The consumer iterator transparently decompresses compressed data and only returns an uncompressed message
As found in this article the way consumer works is as follows
The consumer has background “fetcher” threads that continuously fetch data in batches of 1MB from the brokers and add it to an internal blocking queue. The consumer thread dequeues data from this blocking queue, decompresses and iterates through the messages
And also in the doc page under End-to-end Batch Compression its written that
A batch of messages can be clumped together compressed and sent to the server in this form. This batch of messages will be written in compressed form and will remain compressed in the log and will only be decompressed by the consumer.
So it appears that the decompression part is handled in the consumer it self all you need to do is to provide the valid / supported compression type using the compression.codec ProducerConfig attribute while creating the producer. I couldn't find any example or explanation where it says any approach for decompression in the consumer end. Please correct me if I am wrong.
I have the same issue with v0.8.1 and this compression decomression in Kafka is poorly documented other than saying the Consumer should "transparently" decompresses compressed data which it NEVER did.
The example high level consumer client using ConsumerIterator in Kafka web site only works with uncompressed data. Once I enable compression in Producer client, the message never gets into the following "while" loop. Hopefully they should fix this issue asap or they shouldn't claim this feature as some users may use Kafka to transport large size message that needs batching and compression capabilities.
ConsumerIterator <byte[], byte[]> it = stream.iterator();
while(it.hasNext())
{
String message = new String(it.next().message());
}