Is it possible to configure Azure Event Hubs Capture file names by PartitionKey rather than PartitionId?

Is it possible to configure Azure Event Hubs Capture file names by PartitionKey rather than PartitionId? - azure-eventhub

When configuring an Azure Event Hub instance for Event Hubs Capture, the following example file name formats are provided, all of which use the PartitionId variable in some form.
{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}
{Year}/{Month}/{Day}/{Namespace}/{EventHub}/{PartitionId}/{Hour}/{Minute}/{Second}
{Year}/{Month}/{Day}/{Hour}/{Namespace}/{EventHub}/{PartitionId}/{Minute}/{Second}
Is it possible to include the PartitionKey in the file name path instead of (or as well as) the PartitionId?

No, partition-key is not available for building an AVRO path.

Related

How can I get server and stream key from AWS MediaLive

I need to configure live streaming using OBS studio software, live streaming has already been setup on AWS using ElementalMediaLive(auto wizard). but I am unable to figure out the way to find the server address and stream key which are required the configure the OBS studio.
can some one please guid me where can I find the above required information into AWS panel?
Thanks

That is the "stream name" in the rtmp ingest point URL. The format would be something like
rtmp://domain:1935/live/streamname
stream key in OBS is streamname
For detail, please refer to page 7 and 8 of this pdf: https://d2908q01vomqb2.cloudfront.net/fb644351560d8296fe6da332236b1f8d61b2828a/2020/04/14/Connecting-OBS-Studio-to-AWS-Media-Services-in-the-Cloud-v2.pdf

Have a look here:
https://aws.amazon.com/de/blogs/media/connecting-obs-studio-to-aws-media-services-in-the-cloud/
Especially this document:
https://d2908q01vomqb2.cloudfront.net/fb644351560d8296fe6da332236b1f8d61b2828a/2020/04/14/Connecting-OBS-Studio-to-AWS-Media-Services-in-the-Cloud-v2.pdf
Where it says:
STEP C: CONFIGURE THE APPLIANCE
Launch OBS Studio on the source system. Choose Settings to open the settings window. Choose Stream to access the streaming settings.
Complete the fields:
For Stream Type, choose Custom Streaming Server.
For URL, copy one of the endpoint URLs from the input you created in Step B. Remove the /<stream_name> at the end of the URL.
For Stream key, type the stream name.
Leave the Use authentication box unchecked.
Choose Apply to save your changes.

What OBS refers to as "stream key" is the App Instance.
Regards,

The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account

The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account
I have an event hub which captures data in a container in a storage account.
I am sending messages from a java application.
When I open the message in the Event hub capture container(in storage account) and go to the .avro file blade, under the 'Edit' tab I see the file received along with the below message:-
The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account
The actual contents of the message are showing in an encrypted format and I am not able to see the contents of the message.
Please help as to how I should be able to see the contents of the message.

I don't think Storage Data Explorer can parse avro files w/o a proper schema provided for body. Try opening the file with a tool such as AvroEditor. You can find the editor here - http://avroeditor.sourceforge.net/

Data Explorer ( preview) in ADLS gen-2 won't be able to show content for Parquet or Avro format files. If you wish to read file content create external table in data explorer. something like below :
.create external table ExTableavro (AppId:string,UserId:string,Email:string,TargetTitle:string,Params:string,EventEnqueuedUtcTime:datetime)
kind=blob
partition by
AppId,
bin(EventProcessedUtcTime, 1d)
dataformat=avro
(
h#'https://streamoutalds2.blob.core.windows.net/stream-api-raw-avro/logs/;secret Key'
)
with
(
folder = "ExternalTables"
)
Note the Dataformat set as 'Avro'
Hope it Helps!

AWT IoT rule to S3

I have created a rule to send the incoming IoT messages to a S3 bucket.
The problem is that any time IoT recieves a messages is sended and stored in a new file (with the same name) in S3.
I want this S3 file to keep all the data from before and not truncate each time a new message is stored.
How can I do that?

When you set up an IoT S3 rule action, you need to specify a bucket and a key. The key is what we might think of as a "path and file name". As the docs say, we can specify the key string by using a substitution template, which is just a fancy way of saying "build a path out of these pieces of information". When you are building your substitution template, you can reference fields inside the message as well as use use a bunch of other functions
Especially look at the functions topic, timestamp, as well as some of the string manipulator functions.
Let's say your topic names are something like things/thing-id-xyz/location and you just want to store each incoming JSON message in a "folder" for the thing-id it came in from. You might specify a key like:
${topic(2)}/${timestamp()).json
it would evaluate to something like:
thing-id-xyz/1481825251155.json
where the timestamp part is the time the message came in. That will be different for each message, and then the messages would not overwrite each other.
You can also specify parts of the message itself. Let's imagine our incoming messages look something like this:
{
"time": "2022-01-13T10:04:03Z",
"latitude": 40.803274,
"longitude": -74.237926,
"note": "Great view!"
}
Let's say you want to use the nice ISO date value you have in your data instead of the timestamp of the file. You could reference the time field no problem, like:
${topic(2)}/${time}.json
Now the file would be written as the key:
thing-id-xyz/2022-01-13T10:04:03Z.json
You should be able to find some combination of values that works for your needs, and that most importantly, is UNIQUE for each message so they don't overwrite each other in S3.

You can do it using AWS IoT SQL variable expressions. For example use following as a key ${newuuid()}. This will create new s3 object for each message received.
See more about SQL Functions https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html

You can't do this with the S3 IoT Rule Action. You can get similar results using AWS Firehose, which will batch up several messages and write to one file. You will still end up with multiple files though.

AWS S3 storage and schema

I have an IOT sensor which sends the following message to IoT MQTT Core topic:
{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}
I have added ACT/RULE which stores the incoming message in an S3 Bucket with the timestamp as a key(each message is stored as a seperate file/row in the bucket).
I have only worked with SQL databases before, so having them stored like this is new to me.
1) Is this the proper way to work with S3 storage?
2) How can I visualize the values in a schema instead of separate files?
3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:
"Amazon ML can't retrieve the schema. If you've just created this
datasource, wait a moment and try again."
Appreciate all advice there is!

1) Is this the proper way to work with S3 storage?
With only one sensor, using the [timestamp](https://docs.aws.amazon.com/iot/latest/developerguide/iot-sql-functions.html#iot-function-timestamp function in your IoT rule would be a way to name unique objects in S3, but there are issues that might come up.
With more than one sensor, you might have multiple messages arrive at the same timestamp and this would not generate unique object names in S3.
Timestamps from nearly the same time are going to have similar prefixes and designing your S3 keys this way may not give you the best performance at higher message rates.
Since you're using MQTT, you could use the traceId function instead of the timestamp to avoid these two issues if they come up.
2) How can I visualize the values in a schema instead of separate files?
3) I am trying to create ML Datasource from the S3 Bucket, but get the error below when Amazon ML tries to create schema:
For the third question, I think you could be running into a data format problem in ML because your S3 objects contain the JSON data from your messages and not a CSV.
For the second question, I think you're trying to combine message data from successive messages into a CSV, or at least output the message data as a single line of a CSV file. I don't think this is possible with just the Iot SQL language since it's intended to produce JSON.
One alternative is to configure your IoT SQL rule with a Lambda action and use a lambda function to make your JSON to CSV conversion and then write the CSV to your S3 bucket. If you go this direction, you may have to enrich your IoT message data with the timestamp (or traceId) as you call the lambda.
A rule like select timestamp() as timestamp, traceid() as traceid, concat(ID1, ID2, ID3, ValueMax) as values, * as message would produce a JSON like
{"timestamp":1538606018066,"traceid":"abab6381-c369-4a08-931d-c08267d12947","values":[10001,1001,101,123],"message":{"ID1":10001,"ID2":1001,"ID3":101,"ValueMax":123}}
That would be straightforward to use as the source for a CSV row with the data from its values property.

How to read file in Apache Samza from local file system and hdfs system

Looking for approach in Apache Samza to read file from local system or HDFS
then apply filters, aggregate, where condition, order by, group by into batch of data.
Please provide some help.

You should create a system for each source of data you want to use. For example, to read from a file, you should create a system with the FileReaderSystemFactory -- for HDFS, create a system with the HdfsSystemFactory. Then, you can use the regular process callback or windowing to process your data.

You can feed your Samza Job using standard Kafka producer. To make it easy for you. You can use Logstash, you need to create Logstash script where you specify:
input as local file or hdfs
filters (optional) here you can do basic filtering, aggregation etc.
kafka output with specific topic you want to feed
input
I was using this approach to feed my samza job from local file
Another approach could be using Kafka Connect
http://docs.confluent.io/2.0.0/connect/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js