How do i catch a tensorboard exception inside model.fit() that breaks training

How do i catch a tensorboard exception inside model.fit() that breaks training - tensorboard

We are running a custom training on vertexAI that runs over multiple days.
Now we added tensorboard support using callbacks:
myCallbacks.append(tf.keras.callbacks.TensorBoard(log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'], histogram_freq=1))
After some epochs we get an exception inside model.fit() due to some temporary network problems:
2022-12-18 18:24:32.612516: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at summary_kernels.cc:150 : FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: www.googleapis.com
when resuming upload gs://dx-ai-train/WorkingDir_3D_dinbv217/aprl18/config_tb/logs/train/events.out.tfevents.1671295725.66e23cc7db45.1.0.v2
Failed to flush 11 events to gs://dx-ai-train/WorkingDir_3D_dinbv217/aprl18/config_tb/logs/train/events.out.tfevents.1671295725.66e23cc7db45.1.0.v2
Could not flush events file.
I can't catch the exception, since the catch would be outside of model.fit() already.
Maybe add my own callback, and then relay to the Keras TensorBoard callback inside a try/catch construction?

Related

Django's infinite streaming response logs 500 in apache logs

I have a Django+Apache server, and there is a view with infinite streaming response
def my_view(request):
try:
return StreamingHttpResponse(map(
lambda x: f"{dumps(x)}\n",
data_stream(...) # yields dicts forever every couple of seconds
))
except Exception as e:
print_exc()
return HttpResponse(dumps({
"success": False,
"reason": ERROR_WITH_CLASSNAME.format(e.__class__.__name__)
}), status=500, content_type="application/json")
When client closes the connection to the server, there is no cleanup to be done. data_stream will yield one more message which won't get delivered. No harm done if that message is yielded and not received as there are no side-effects. Overhead from processing that extra message is negligible on our end.
However, after that last message fails to deliver, apache logs 500 response code (100% of requests). It's not getting caught by except block, because print_exc doesn't get called (no entries in error log), so I'm guessing this is apache failing to deliver the response from django and switching to 500 itself.
These 500 errors are triggering false positive alerts in our monitoring system and it's difficult to differentiate an error due to connection exit vs an error in the data_stream logic.
Can I override this behavior to log a different status code in the case of a client disconnect?

From what I understand about the StreamingHttpResponse function is that any exceptions raised inside it are not propagated further. This has to do with how WSGI server works. If you start handling an exception and steal the control, the server will not be able to to finish the HTTP response. So the error is handled by the server and printed in the terminal. If you attach the debugger to this and see how the exception is handled you will be able to find a line in wsgiref/handlers.py where your exception is absorbed and taken care of.
I think in this file- https://github.com/python/cpython/blob/main/Lib/wsgiref/handlers.py

Multipart file upload failing some times on docker container on AWS

I have an hapi server running on AWS docker container and it exposes a file upload API. This API runs smoothly on my local machine, but when deployed to AWS it fails some times with an error "Incomplete multipart payload". The error does not occur always, but only at some times.
The images which I am uploading are small in size(less than 100 kb) only and this failure is not because of slow network as I have tested it on multiple networks.
After debugging hapi modules for payload parsing, I have found that Pez module who is parsing the payload is throwing this error. I also noticed that when this error happens Pez modules onClose event is called and none of the parse events occurs and hence it returns the "Incomplete multipart payload" error. The Pez state is at "preamble" when this happens, for successful parse case, the state is "epilogue".
My hapi route config is
config: {
payload: {
maxBytes: 20971520,
output: 'data',
parse: true,
allow: 'multipart/form-data'
}
}
Can somebody suggest why is the parsing fails at times or why the onClose event is called before parsing happens?

How to process the message on failure inside IEventProcessor.ProcessEvents method

The application has an implementation of IEventProcessor. When an unhandled exception is thrown from the ProcessEventsAsync method the EventProcessorHost never re-sends those messages to the running instance of IEventProcessor. (It will re-send if the hosting application is stopped and restarted or if the lease is lost and re-obtained.)
when an exception occurs in processEventAsync the checkpoint will not be set only if it's successful the checkpoint is set using this context.CheckpointAsync()

Checkout the ProcessorErrorAsync method. According the doc, it will be called in the event of an error. You'll have access to the context where can log the id and error.

Worker role using event hubs gives 'No connection handler was found for virtual host'

I have a worker role that uses an EventProcessorHost to ingest data from an EventHub. I frequently receive error messages of the following kind:
Microsoft.ServiceBus.Messaging.MessagingCommunicationException:
No connection handler was found for virtual host 'myservicebusnamespace.servicebus.windows.net:42777'. Remote container id is 'f37c72ee313c4d658588ad9855773e51'. TrackingId:1d200122575745cc89bb714ffd533b6d_B5_B5, SystemTracker:SharedConnectionListener, Timestamp:8/29/2016 6:13:45 AM
at Microsoft.ServiceBus.Common.ExceptionDispatcher.Throw(Exception exception)
at Microsoft.ServiceBus.Common.Parallel.TaskHelpers.EndAsyncResult(IAsyncResult asyncResult)
at Microsoft.ServiceBus.Messaging.IteratorAsyncResult`1.StepCallback(IAsyncResult result)
I can't seem to find a way to catch this exception. It seems I can just ignore the error because everything works as expected (I had previously mentioned here that it was dropping messages because of this error, but I have since found out that a bug in the software that sends the messages caused this problem), however I would like to know what causes these errors, since they are clogging up my logging now and then.
Can anyone shed some light on the cause?

The Event Hub partitions are distributed across multiple servers. They sometimes move due to load balancing, upgrade and other reasons. When this happens, the client connection is lost with this error. The connection will be reestablished very quickly so you should not see any issues with message processing. It is safe to ignore this communication error.

SSIS 2012 - Web Service Task - Output Exceptions

I've got a Web Service Task in my Control Flow.
The actual Web Service type is void. For testing purposes, the Web Service just throws a new Exception, with a message.
I've got the task outputting to a variable User::ServiceResponse
The Failure path goes to a Send Mail Task, which uses the variable User::ServiceResponse... however, the email received does not contain any text.
When the package executes, the Immediate Window does show a long error, which in part contains my exception message. "Adams Error Message" towards the very end.
SSIS package "....\WebProfile.dtsx" starting.
Error: 0xC002F304 at SVC, Web Service Task: An error occurred with the following error message:
"Microsoft.SqlServer.Dts.Tasks.WebServiceTask.WebserviceTaskException:
The Web Service threw an error during method execution.
The error is: System.Web.Services.Protocols.SoapException:
Server was unable to process request. --->
System.Exception: Adams Error Message
--- End of inner exception stack trace ---.
Without writing a lot of custom script... how can I get the exceptions from my Web Service Task into the Send Mail Task?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do i catch a tensorboard exception inside model.fit() that breaks training - tensorboard

Related

Django's infinite streaming response logs 500 in apache logs

Multipart file upload failing some times on docker container on AWS

How to process the message on failure inside IEventProcessor.ProcessEvents method

Worker role using event hubs gives 'No connection handler was found for virtual host'

SSIS 2012 - Web Service Task - Output Exceptions

Categories

Resources