In Azure Eventhub receiver giving "Encountered error while fetching the list of EventHub PartitionIds" error - azure-eventhub

I am trying to implement the receiver part as per the tutorial
https://azure.microsoft.com/en-us/documentation/articles/event-hubs-java-ephjava-getstarted/
Failure while registering: com.microsoft.azure.eventprocessorhost.EPHConfigurationException:
Encountered error while fetching the list of EventHub PartitionIds
I have 16 partitions in my eventhub. But when I send the data I don't specify any partition. How do I know in which partition my data is sent to? Am I getting the above error because of all the partitions?

Make sure that your consumer sas policies does includes "manage" and not only "listen".
I guess it has to have manage rights to be able to list the partitions.

Ideally - EPH should work with "Listen"-only Claims. Right now we have a bug in EventProcessorHost client-code as a result of which - it needs "Manage" claims. We are working on it.
The error "Encountered error while fetching the list of EventHub PartitionIds" is generic and Thrown at PartitionManager, while querying for Partitions. You ran into one of the exceptions in the below catch block. Please indicate inner-exception for completeness (SEO) & faster resolution.
catch(XPathExpressionException|ParserConfigurationException|IOException|InvalidKeyException|NoSuchAlgorithmException|URISyntaxException|SAXException exception)
{
throw new EPHConfigurationException("Encountered error while fetching the list of EventHub PartitionIds", exception);
}
EDIT:
this issue is fixed in version 0.7.7.

I was working with integrating eventhub with the ELK stack and came across the same error.
To solve this I found that in the settings for the Event Hub Namespace that my event hub was in I had to allow access to the VNET my ELK stack was in.
This can be done by going to the following page: Your EventHubNamespace > Settings - Firewalls and virtual networks.
Either allow access from all networks or add your specific VNET, Subnet, or IP range (for specific machines).
This in addition to setting consumer sas policies to Manage should resolve the "Encountered error while fetching the list of EventHub PartitionIds" error.

Related

ReadEventsAsync got EventHubsExeception(ConsumerDisconnected) intermitently

I am using EventHubConsumerClient.ReadEventsAsync method to read events in eventHub. It works perfectly when I use default eventHub. However, when I route it to a new eventHub I am getting EventHubsExeception(ConsumerDisconnected) from time to time. From the documentation. It says this happen due to A client was forcefully disconnected from an Event Hub instance. This typically occurs when another consumer with higher OwnerLevel asserts ownership over the partition and consumer group. I almost got this exception every time. Only a few time it works. Anyone know how to resolve this? Or is there a better way to read message from eventHub? I don't want to use eventProcessorClient since it requires blobContainerClient
for the code, I followed the sample
await using var consumerClient = new EventHubConsumerClient(
EventHubConsumerClient.DefaultConsumerGroupName,
eventHubConnectionString,
eventHubName
);
await foreach (PartitionEvent partitionEvent in consumerClient.ReadEventsAsync(cancelToken)){
...
}
The error that you're seeing is very specific to a single scenario: another client has opened an AMQP link to one of the partitions you're reading from and has requested that the Event Hubs service give it exclusive access. This results in the Event Hubs service terminating your link with an AMQP error code of Stolen which the Event Hubs SDK translates into the form that you're seeing. (source)
These requests for exclusive access are enforced on a consumer group level. In your snippet, you're using the default consumer group, which is apparently also used by other consumers. As a best practice, I'd recommend that you create a unique consumer group for each application that is reading from the Event Hub - unless you specifically want them to interact.
In your case, your client is not requesting exclusive access, so anyone that is will take precedence. If you were to create a new consumer group and use that to configure your client, I would expect your disconnect errors to stop.

How can I filter out errors on sentry to avoid consuming my quota?

I'm using Sentry to log my errors, but there are errors I'm not able to fix (or could not be fixed by me) like
OSError (write error)
Or error that come from RQ (each time I deploy my app)
Or client errors (which are client.errors)
I can't just ignore them because I consume all my quota. How I can filter out this errors?
Here some references for interested people.
uwsgi: OSError: write error during GET request
Fixing broken pipe error in uWSGI with Python
https://github.com/unbit/uwsgi/issues/1623
I created a Gist for rate limiting the amount of events that are being send to Sentry:
https://gist.github.com/jurrian/e22f8e724b8499a29c5537e956f0dc7f
It uses ratelimitingfilter which can be configured to set a rate per minute, and additionally add a burst to start rate limiting after a number of events.
I get the same errors, but i never had any problems with my quota. But if you really want to filter it, you can just do it in your sdk:
https://docs.sentry.io/error-reporting/configuration/filtering/?platform=python
But beware, this could hide other errors as mentioned here:
https://github.com/pypa/warehouse/issues/679
To safe yourself some quota, you have two options:
Avoid forwarding events client side, thus preventing events being send to sentry at all. Have a look at the docs for available client-side filters. The drawback with this approach is of course that you need a new code deployment for any adjustment of client-side filters and some clients may not instantly reflect your code changes.
Avoid forwarding events on sentry's side, via inbound filters ([Project] > Project Settings > Inbound Filters). According to the sentry documentation on quota usage, events filtered via inbound filters are not affecting your quota.
Inbound filters include:
Common browser extension errors
Events coming from localhost
Known legacy browsers errors
Known web crawlers
By their error message
From specific release versions of your code
From certain IP addresses
Business plans and above also allow to filter events by error messages.

AWS Glue job runs correct but returns a connection refused error

I am running a test job on AWS. I am reading CSV data from S3 bucket, running a GLUE ETL job on it and storing the same data on Amazon Redshift. GLUE job is just reading the data from S3 and storing in Redshift without any modification. The job runs fine and I get the desired result in Redshift but it returns an error which I am unable to understand.
Here is the error log:
18/11/14 09:17:31 WARN YarnClient: The GET request failed for the URL http://169.254.76.1:8088/ws/v1/cluster/apps/application_1542186720539_0001
com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.conn.HttpHostConnectException: Connect to 169.254.76.1:8088 [/169.254.76.1] failed: Connection refused (Connection refused)
It is a WARN rather than error but I want to understand what is causing the WARN. I tried to search for the IP that is indicated in the WARN but I am not able to find the machine with the mentioned IP.
I noticed these error comming up to me in my AWS Glue Job so I found something that could be helpful from AWS:
This WARN message is not so special, and does not mean job failure or any errors directly. I guess there should be other cause.
I would recommend you to enable continuous logging, and check both driver/executor logs to see if there are any suspicious behavior.
If you enable job bookmark, please try disabling it and see how it goes without bookmark.
https://forums.aws.amazon.com/thread.jspa?messageID=927547
I had dissabled bookmarks from the begining. What I check is that my Glue job writing data to S3 and got an exeption per Memory, so what I did is to repartition the data.
MyDynamicFrame.coalesce(100).write.partitionBy("month").mode("overwrite").parquet("s3://"+bucket+"/"+path+"/out_data")
so if you have some write opperations, I'll recommend to check how you are writing to S3

How to find out details about metrics "Internal Server Errors" and "Other Errors" for Azure EventHub

In the new Azure portal I see under "Metrics" section for an EventHub that there were many "Internal Server Error" events on a specific day. Is it possible to find out more about what could have caused it and description about those errors?
As the metrics for Event Hubs state about InternalServerErrors and OtherErrors:
InternalServerErrors: Total number of internal server error exceptions sent back to the sender or receiver while performing run-time operations. This type of error is due to either service-side or network problems.
OtherErrors: These types of errors are due to faults at the sender or receiver side, such as providing bad parameters, not enough credentials, or trying to perform an operation on a nonexistent entity.
I would recommend you log into azure portal, choose your Event hub, click "MONITORING > Diagnostics logs", then turn on diagnostics for collecting logs. For more details, you could refer to Event Hubs diagnostic logs.

Kafka mirroring Failed to find leader for Set (ArrayIndexOutOfBoundsException: 11)

I'm trying to mirror Kafka data between two AWS Kafka/Zookeeper clusters, both running version 0.8.2.1.
I can access source cluster Zookeepers from the target cluster Kafka instances, list topics, etc. However when trying to run this command:
/opt/kafka/bin/kafka-run-class.sh kafka.tools.MirrorMaker
--consumer.config /opt/kafka/config/mirror-consumer.properties
--num.streams 1
--producer.config /opt/kafka/config/mirror-producer.properties
--whitelist=".*"
I get the following error:
WARN Fetching topic metadata with correlation id 0 for topics [...] from broker [...] failed (kafka.client.ClientUtils$)
java.lang.ArrayIndexOutOfBoundsException: 11
at kafka.api.TopicMetadata$$anonfun$readFrom$1.apply$mcVI$sp(TopicMetadata.scala:38)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
at kafka.api.TopicMetadata$.readFrom(TopicMetadata.scala:36)
What would be the best way to debug this error? I've read several posts online but they indicate a whole array of causes, from network connectivity to disk space issues.
I would appreciate your help in this matter.
Apparently there is a critical bug in Kafka 0.8.2.1 version which has not been fixed since 2015:
https://issues.apache.org/jira/browse/KAFKA-2082
"failed due to Leader not local for partition"
It looks like there is no way to deal with this short of upgrading Kafka to the latest version. Based on other posts I found online it looks like Kafka mirroring does not work between different versions of Kafka, so this is another thing to consider.