The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account - azure-eventhub

The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account
I have an event hub which captures data in a container in a storage account.
I am sending messages from a java application.
When I open the message in the Event hub capture container(in storage account) and go to the .avro file blade, under the 'Edit' tab I see the file received along with the below message:-
The file '************.avro' may not render correctly as it contains an unrecognized extension. Event Hub - Capture Container in Storage Account
The actual contents of the message are showing in an encrypted format and I am not able to see the contents of the message.
Please help as to how I should be able to see the contents of the message.

I don't think Storage Data Explorer can parse avro files w/o a proper schema provided for body. Try opening the file with a tool such as AvroEditor. You can find the editor here - http://avroeditor.sourceforge.net/

Data Explorer ( preview) in ADLS gen-2 won't be able to show content for Parquet or Avro format files. If you wish to read file content create external table in data explorer. something like below :
.create external table ExTableavro (AppId:string,UserId:string,Email:string,TargetTitle:string,Params:string,EventEnqueuedUtcTime:datetime)
kind=blob
partition by
AppId,
bin(EventProcessedUtcTime, 1d)
dataformat=avro
(
h#'https://streamoutalds2.blob.core.windows.net/stream-api-raw-avro/logs/;secret Key'
)
with
(
folder = "ExternalTables"
)
Note the Dataformat set as 'Avro'
Hope it Helps!

Related

Azure Data Factory HDFS dataset preview error

I'm trying to connect to the HDFS from the ADF. I created a folder and sample file (orc format) and put it in the newly created folder.
Then in ADF I created successfully linked service for HDFS using my Windows credentials (the same user which was used for creating sample file):
But when trying to browse the data through dataset:
I'm getting an error: The response content from the data store is not expected, and cannot be parsed.:
Is there something I'm doing wrongly or it is kind of permissions issue?
Please advise
This appears to be a generic issue, you need to point to a file with appropriate extension rather than a folder itself. Also make sure you are using a supported data store activity.
You can follow this official MS doc to use HDFS server with Azure Data Factory

Test data to requests for Postman monitor

I run my collection using Test data from a csv file, However there is no option to upload the test data file when adding monitor for the collection. On searching through internet could see that the test data file have to be provided in URL (saved in cloud ..google drive,.). But i couldn't get source for how to provide this URL to the collection . Can anyone please help
https://www.postman.com/praveendvd-public/workspace/postman-tricks-and-tips/request/8296678-d06b3fc0-6b8b-4370-9847-aee0f526e7db
you cannot use csv file in monitor , but could store the content of csv as variable and use that to drive the monitor . An example can be seen in the above public repository

AWS Appflow <-> Salesforce integration

I'm trying to setup a workflow to backup Accounts & Contact objects from Salesforce to S3 via AWS Appflow. Perhaps, I'm able to setup the connection and able to backup the files on-demand.
However, for restoration I would like to import the mapping using .csv file and below are sample first 3 lines (using comma-separator source & destination fields).
Name, Name
Type, Account Type
AccountNumber, Account Number
But Appflow is unable to import as " Couldn't parse rows from the file" - Am I missing something ?
This was bug on AWS side and it taken up ! Workaround is to do manual mapping instead of external CSV; make sure the source field attributes match with the corresponding objects in Salesforce.

Is it possible to configure Azure Event Hubs Capture file names by PartitionKey rather than PartitionId?

When configuring an Azure Event Hub instance for Event Hubs Capture, the following example file name formats are provided, all of which use the PartitionId variable in some form.
{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}
{Year}/{Month}/{Day}/{Namespace}/{EventHub}/{PartitionId}/{Hour}/{Minute}/{Second}
{Year}/{Month}/{Day}/{Hour}/{Namespace}/{EventHub}/{PartitionId}/{Minute}/{Second}
Is it possible to include the PartitionKey in the file name path instead of (or as well as) the PartitionId?
No, partition-key is not available for building an AVRO path.

Why Transfer in GCP failed on csv file and where is the error log?

I am testing out the transfer function in GCP:
This is the open data in csv, https://www.stats.govt.nz/assets/Uploads/Annual-enterprise-survey/Annual-enterprise-survey-2018-financial-year-provisional/Download-data/annual-enterprise-survey-2018-financial-year-provisional-csv.csv
My configuration in GCP:
The transfer failed as below:
Question 1: why the transfer failed?
Question 2: where is the error log?
Thank you very much.
[UPDATE]:
I checked log history, nothing was captured:
[Update 2]:
Error details:
Details: First line in URL list must be TsvHttpData-1.0 but it is: Year,Industry_aggregation_NZSIOC,Industry_code_NZSIOC,Industry_name_NZSIOC,Units,Variable_code,Variable_name,Variable_category,Value,Industry_code_ANZSIC06
I noticed in the transfer service if you choose the third option for source: it reads URL of TSV file. Essentially TSV, PSV are just variants of CSV, and I have no problem retrieving the source csv file. The error details seem to implicating something not expected there.
The problem is that in your example, you are pointing to a data file as the source of the transfer. If we read the documentation on GCS transfer, we find that the we must specify a file which contains the identity of the target URL that we want to copy.
The format of this file is called a Tab-Separated-Values (TSV) and contains a number of parameters including:
The URL of the source of the file.
The size in bytes of the source file.
An MD5 hash of the content of the source file.
What you specified (just the URL of the source file) ... is not what is required.
One possible solution would be to use gsutil. It has an option of taking a stream as input and writing that stream to a given object. For example:
curl http://[URL]/[PATH] | gsutil cp - gs://[BUCKET]/[OBJECT]
References:
Creating a URL list
Can I upload files to google cloud storage from url?