ExportDicomData request of Google Cloud Healthcare API on GitHub tutorials never finish - google-cloud-platform

I'm trying AutoML Vision of ML Codelabs on Cloud Healthcare API GitHub tutorials.
https://github.com/GoogleCloudPlatform/healthcare/blob/master/imaging/ml_codelab/breast_density_auto_ml.ipynb
I run the Export DICOM data cell code of Convert DICOM to JPEG section and the request as well as all the premise cell code succeeded.
But waiting for operation completion is timed out and never finish.
(ExportDicomData request status on Dataset page stays "Running" over the day. I did many times but all the requests were stacked staying "Running". A few times I tried to do from scratch and the results were same.)
I did so far:
1) Remove "output_config" since INVALID ARGUMENT error occurs.
https://github.com/GoogleCloudPlatform/healthcare/issues/133
2) Enable Cloud Resource Manager API since it is needed.
This is the cell code.
# Path to export DICOM data.
dicom_store_url = os.path.join(HEALTHCARE_API_URL, 'projects', project_id, 'locations', location, 'datasets', dataset_id, 'dicomStores', dicom_store_id)
path = dicom_store_url + ":export"
# Headers (send request in JSON format).
headers = {'Content-Type': 'application/json'}
# Body (encoded in JSON format).
# output_config = {'output_config': {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}}
output_config = {'gcs_destination': {'uri_prefix': jpeg_folder, 'mime_type': 'image/jpeg; transfer-syntax=1.2.840.10008.1.2.4.50'}}
body = json.dumps(output_config)
resp, content = http.request(path, method='POST', headers=headers, body=body)
assert resp.status == 200, 'error exporting to JPEG, code: {0}, response: {1}'.format(resp.status, content)
print('Full response:\n{0}'.format(content))
# Record operation_name so we can poll for it later.
response = json.loads(content)
operation_name = response['name']
This is the result of waiting.
Waiting for operation completion...
Full response:
{
"name": "projects/my-datalab-tutorials/locations/us-central1/datasets/sample-dataset/operations/18300485449992372225",
"metadata": {
"#type": "type.googleapis.com/google.cloud.healthcare.v1beta1.OperationMetadata",
"apiMethodName": "google.cloud.healthcare.v1beta1.dicom.DicomService.ExportDicomData",
"createTime": "2019-08-18T10:37:49.809136Z"
}
}
AssertionErrorTraceback (most recent call last)
<ipython-input-18-1a57fd38ea96> in <module>()
21 timeout = time.time() + 10*60 # Wait up to 10 minutes.
22 path = os.path.join(HEALTHCARE_API_URL, operation_name)
---> 23 _ = wait_for_operation_completion(path, timeout)
<ipython-input-18-1a57fd38ea96> in wait_for_operation_completion(path, timeout)
15
16 print('Full response:\n{0}'.format(content))
---> 17 assert success, "operation did not complete successfully in time limit"
18 print('Success!')
19 return response
AssertionError: operation did not complete successfully in time limit
API Version is v1beta1.
I was wondering if somebody has any suggestion.
Thank you.

After several times kept trying and stayed running one night, it finally succeeded. I don't know why.

There was a recent update to the codelabs. The error message is due to the timeout in the codelab and not the actual operation. This has been addressed in the update. Please let me know if you are still running into any issues!

Related

ValueError('Cannot invoke RPC: Channel closed!') when multiple process are launched together

I get this error when I launch, from zero, more than 4 process in sync:
{
"insertId": "61a4a4920009771002b74809",
"jsonPayload": {
"asctime": "2021-11-29 09:59:46,620",
"message": "Exception in callback <bound method ResumableBidiRpc._on_call_done of <google.api_core.bidi.ResumableBidiRpc object at 0x3eb1636b2cd0>>: ValueError('Cannot invoke RPC: Channel closed!')",
"funcName": "handle_event",
"lineno": 183,
"filename": "_channel.py"
}
This is the pub-sub schema:
pub-sub-schema
The error seems to happen at step 9 or 10.
The actual code is:
future = publisher.publish(
topic_path,
encoded_message,
msg_attribute=message_key
)
future.add_done_callback(
callback=lambda f:
logging.info(...)
)
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
PROJECT_ID,
"..."
)
streaming_pull_future = subscriber.subscribe(
subscription_path,
callback=aggregator_callback_handler.handle_message
)
aggregator_callback_handler.callback = streaming_pull_future
wait_result(
timeout=300,
pratica=...,
f_check_res_condition=lambda: aggregator_callback_handler.response is not None
)
streaming_pull_future.cancel()
subscriber.close()
The module aggregator_callback_handler handles .nack and .ack.
The error is returned for some seconds, then the VMs on which the services are hosted scales and the error stops. Same if, instead of launching the processes all together, I scale them manually launching them one by one and leaving some sleep in-between.
I've already checked the timeouts and put the subscriber outside of context manager, but those solutions doesn't work.
Any idea on how to handle this?

How to extract Google PubSub publish time in Apache Beam

My goal is to be able to access PubSub message Publish Time as recorded and set by Google PubSub in Apache Beam (Dataflow).
PCollection<PubsubMessage> pubsubMsg
= pipeline.apply("Read Messages From PubSub",
PubsubIO.readMessagesWithAttributes()
.fromSubscription(pocOptions.getInputSubscription()));
Does not seem to contain one as an attribute.
I have tried
.withTimestampAttribute("publish_time")
No luck either. What am I missing? Is it possible to extract Google PubSub publish time in dataflow?
Java version:
PubsubIO will read the message from Pub/Sub and assign the message publish time to the element as the record timestamp. Therefore, you can access it using ProcessContext.timestamp(). As an example:
p
.apply("Read Messages", PubsubIO.readStrings().fromSubscription(subscription))
.apply("Log Publish Time", ParDo.of(new DoFn<String, Void>() {
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
LOG.info("Message: " + c.element());
LOG.info("Publish time: " + c.timestamp().toString());
Date date= new Date();
Long time = date.getTime();
LOG.info("Processing time: " + new Instant(time).toString());
}
}));
I published a message a little bit ahead (to have a significant difference between event and processing time) and output with DirectRunner was:
Mar 27, 2019 11:03:08 AM com.dataflow.samples.LogPublishTime$1 processElement
INFO: Message: I published this message a little bit before
Mar 27, 2019 11:03:08 AM com.dataflow.samples.LogPublishTime$1 processElement
INFO: Publish time: 2019-03-27T09:57:07.005Z
Mar 27, 2019 11:03:08 AM com.dataflow.samples.LogPublishTime$1 processElement
INFO: Processing time: 2019-03-27T10:03:08.229Z
Minimal code here
Python version:
Now the timestamp can be accessed through DoFn.TimestampParam of the process method (docs):
class GetTimestampFn(beam.DoFn):
"""Prints element timestamp"""
def process(self, element, timestamp=beam.DoFn.TimestampParam):
timestamp_utc = datetime.datetime.utcfromtimestamp(float(timestamp))
logging.info(">>> Element timestamp: %s", timestamp_utc.strftime("%Y-%m-%d %H:%M:%S"))
yield element
Note: date parsing thanks to this answer.
Output:
INFO:root:>>> Element timestamp: 2019-08-12 20:16:53
Full code

Is possible to avoid the 60 seconds limit in urllib2.urlopen with GAE?

I'm requesting a file with a size around 14MB from a slow server with urllib2.urlopen, and it spend more than 60 seconds to get the data, and I'm getting the error:
Deadline exceeded while waiting for HTTP response from URL:
http://bigfile.zip?type=CSV
Here my code:
class CronChargeBT(webapp2.RequestHandler):
def get(self):
taskqueue.add(queue_name = 'optimized-queue', url='/cronChargeBTB')
class CronChargeBTB(webapp2.RequestHandler):
def post(self):
url = "http://bigfile.zip?type=CSV"
url_request = urllib2.Request(url)
url_request.add_header('Accept-encoding', 'gzip')
urlfetch.set_default_fetch_deadline(300)
response = urllib2.urlopen(url_request, timeout=300)
buf = StringIO(response.read())
f = gzip.GzipFile(fileobj=buf)
...work with the data insiste the file...
I create a cron task who calls CronChargeBT. Here the cron.yaml:
- description: cargar BlueTomato
url: /cronChargeBT
target: charge
schedule: every wed,sun 01:00
and it create a new task and insert into a queue, here the queue configuration:
- name: optimized-queue
rate: 40/s
bucket_size: 60
max_concurrent_requests: 10
retry_parameters:
task_retry_limit: 1
min_backoff_seconds: 10
max_backoff_seconds: 200
Of coursethat the timeout=300 isn't working because the 60seconds limit in GAE but I think yhat I can avoid it using a task... anyone knows how I can get the data in the file avoiding this timeout.
Thanks a lot!!!
Cron jobs are limited to 10 minutes deadline, not 60 seconds. If your download fails, perhaps just retry? Does the download work if you download it from your computer? There's nothing you can do on GAE if the server you are downloading from is too slow or unstable.
Edit: According to https://cloud.google.com/appengine/docs/java/outbound-requests#request_timeouts, there is a maximum deadline of 60 seconds for cron job requests. Therefore, you can't get around it.

How to get Boto3 to append collected cloudtrail client responses to log file when run from a crontask?

I am currently working with a log collection product and want to be able to pull in my CloudTrail logs from AWS. I started using the boto3 client in order to lookup the events in CloudTrail. I got the script to work right when I am running it directly from the commandline, but as soon as I tried to put it in cron to pull the logs automatically over time, it stopped collecting the logs!
Here's a sample of the basics of what's in the script to pull the logs:
#!/usr/bin/python
import boto3
import datetime
import json
import time
import sys
import os
def initialize_log():
try:
log = open('/var/log/aws-cloudtrail.log', 'ab')
except IOError as e:
print " [!] ERROR: Cannot open /var/log/aws-cloudtrail.log (%s)" % (e.strerror)
sys.exit(1)
return log
def date_handler(obj):
return obj.isoformat() if hasattr(obj, 'isoformat') else obj
def read_logs(log):
print "[+] START: Connecting to CloudTrail Logs"
cloudTrail = boto3.client('cloudtrail')
starttime = ""
endtime = ""
if os.path.isfile('/var/log/aws-cloudtrail.bookmark'):
try:
with open('/var/log/aws-cloudtrail.bookmark', 'r') as myfile:
strdate=myfile.read().replace('\n', '')
starttime = datetime.datetime.strptime( strdate, "%Y-%m-%dT%H:%M:%S.%f" )
print " [-] INFO: Found bookmark! Querying with a start time of " + str(starttime)
except IOError as e:
print " [!] ERROR: Cannot open /var/log/aws-cloudtrail.log (%s)" % (e.strerror)
else:
starttime = datetime.datetime.now() - datetime.timedelta(minutes=15)
print " [-] INFO: Cannot find bookmark...Querying with start time of" + str(starttime)
endtime = datetime.datetime.now()
print " [-] INFO: Querying for CloudTrail Logs"
response = cloudTrail.lookup_events(StartTime=starttime, EndTime=endtime, MaxResults=50)
for event in response['Events']:
log.write(json.dumps(event, default=date_handler))
log.write("\n")
print json.dumps(event, default=date_handler)
print "------------------------------------------------------------"
if 'NextToken' in response.keys():
while 'NextToken' in response.keys():
time.sleep(1)
response = cloudTrail.lookup_events(StartTime=starttime, EndTime=endtime, MaxResults=50, NextToken=str(response['NextToken']))
for event in response['Events']:
log.write(json.dumps(event, default=date_handler))
log.write("\n")
print json.dumps(event, default=date_handler)
print "------------------------------------------------------------"
# log.write("\n TESTING 1,2,3 \n")
log.close()
try:
bookmark_file = open('/var/log/aws-cloudtrail.bookmark','w')
bookmark_file.write(str(endtime.isoformat()))
bookmark_file.close()
except IOError as e:
print " [!] ERROR: Cannot set bookmark for last pull time in /var/log/aws-cloudtrail.bookmark (%s)" % (e.strerror)
sys.exit(1)
return True
log = initialize_log()
success = read_logs(log)
if success:
print "[+] DONE: All results printed"
else:
print "[+] ERROR: CloudTrail results were not able to be pulled"
I looked into it more and did some testing to confirm that permissions were right on the destination files and that the script could write to them when run from root's crontab, but I still wasn't getting the logs returned from the boto cloudtrail client unless I ran it manually.
I also checked to make sure that the default region was getting read correctly from /root/.aws/config and it looks like it is, because if I move it I see the cron email show a stack trace instead of the success messages I have built in.
I am hoping someone has already run into this and it's a quick simple answer!
EDIT: The permissions to the cloudtrail logs is allowed via the instance's IAM Role, and yes, the task is scheduled under root's crontab.
Here's the email output:
From root#system Mon Mar 28 23:00:02 2016
X-Original-To: root
From: root#system (Cron Daemon)
To: root#system
Subject: Cron <root#system> /usr/bin/python /root/scripts/get-cloudtrail.py
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
Date: Mon, 28 Mar 2016 19:00:02 -0400 (EDT)
[+] START: Connecting to CloudTrail Logs
[-] INFO: Found bookmark! Querying with a start time of 2016-03-28 22:55:01.395001
[-] INFO: Querying for CloudTrail Logs
[+] DONE: All results printed

'Cannot parse input stream' error when updating defects in Rally via pyral

I am using the Python Toolkit for Rally REST API to update defects on our Rally server. I have confirmed that I am able to make contact with the server and authenticate fine by getting a list of current defects. I am running into issues with updating them. I am using Python 2.7.3 with pyral 0.9.1 and requests 0.13.3.
Also, I am passing 'verify=False' to the Rally() call and have made appropriate chages to the
restapi module to compensate for this.
Here is my test code:
import sys
from pyral import Rally, rallySettings
server = "rallydev.server1.com"
user = "user#mycompany.com"
password = "trial"
workspace = "trialWorkspace"
project = "Testing Project"
defectID = "DE192"
rally = Rally(server, user, password, workspace=workspace,
project=project, verify=False)
defect_data = { "FormattedID" : defectID,
"State" : "Closed"
}
try:
defect = rally.update('Defect', defect_data)
except Exception, details:
sys.stderr.write('ERROR: %s \n' % details)
sys.exit(1)
print "Defect %s updated" % defect.FormattedID
When I run the script:
[temp]$ ./updefect.py
ERROR: Unable to update the Defect
If I change the code in the RallyRESTResponse function to print out the value of self.errors when found (line 164 of rallyresp.py), I get this output:
[temp]$ ./updefect.py
[u"Cannot parse input stream due to I/O error as JSON document: Parse error: expected '{' but saw '\uffff' [ chars read = >>>\uffff<<< ]"]
ERROR: Unable to update the Defect
I did find another question that sounds like it might possibly be related to mine here:
App SDK: Erorr parsing input stream when running query
Can you provide any assistance?
Pairing Michael's observation regarding the GZIP encoding with that of another astute Rally customer working a Support case on the issue - it appears that some versions of the requests module will default to GZIP compression if the content-type is not specifically defined.
The fix is to set content-type to application/json in the REST Headers section of pyral's config.py:
RALLY_REST_HEADERS = \
{
'X-RallyIntegrationName' : 'Python toolkit for Rally REST API',
'X-RallyIntegrationVendor' : 'Rally Software Development',
'X-RallyIntegrationVersion' : '%s.%s.%s' % __version__,
'X-RallyIntegrationLibrary' : 'pyral-%s.%s.%s' % __version__,
'X-RallyIntegrationPlatform' : 'Python %s' % platform.python_version(),
'X-RallyIntegrationOS' : platform.platform(),
'User-Agent' : 'Pyral Rally WebServices Agent',
'Content-Type' : 'application/json',
}
What you are seeing is probably not related to the Python 2.7.3 / requests 0.13.3 versions being used. The error message you saw has also been reported using the Javascript based App SDK and .NET Toolkit for Rally (2 separate reports here on SO) and at least one other person using Python 2.6.6 and requests 0.9.2. It appears that the error verbiage is being generated on the Rally WSAPI back-end. Current assessment by fellow Rally'ers is that it is an encoding related issue. The question is where the encoding issue originates.
I have yet to be able to repro this issue, having tried with several versions of Python (2.6.x and 2.7.x), several versions of requests and on Linux, MacOS and Win7.
As you seem to be pretty comfortable with diving in to the code and running in debug mode, one avenue to try is to capture the defective POST URL and POST data and attempting the update via a browser based REST client like 'Simple REST Client' or Poster and observing if you get the same error message in the WSAPI response.
I'm seeing similar behavior with pyral while trying to add an attachment to a defect.
With debugging and logging on I see this request on stdout:
2012-07-20T15:11:24.855212 PUT https://rally1.rallydev.com/slm/webservice/1.30/attachmentcontent/create.js?workspace=workspace/123456789
Then the json in the logfile:
2012-07-20 15:11:24.854 PUT attachmentcontent/create.js?workspace=workspace/123456789
{"AttachmentContent": {"Content": "iVBORw0KGgoAAAANSUhEUgAABBQAAAJrCAIAAADf2VflAAAXOWlDQ...
Then this in the logfile (after a bit of fighting with restapi.py to get around the unicode error):
2012-07-20 15:11:25.260 404 Cannot parse input stream due to I/O error as JSON document: Parse error: expected '{' but saw '?' [ chars read = >>>?<<< ]
The notable thing there is the 404 error code. Also, the "Cannot parse input stream..." error message is not coming from pyral, it's coming from Rally's server. So pyral is sending Rally something Rally can't understand.
I also logged the response headers, which may be a clue:
{'rallyrequestid': 'qs-app-03ml3akfhdpjk7c430otjv50ak.qs-app-0387404259', 'content-encoding': 'gzip', 'transfer-encoding': 'chunked', 'expires': 'Fri, 20 Jul 2012 19:18:35 GMT', 'vary': 'Accept-Encoding', 'cache-control': 'no-cache,no-store,max-age=0,must-revalidate', 'date': 'Fri, 20 Jul 2012 19:18:36 GMT', 'p3p': 'CP="NON DSP COR CURa PSAa PSDa OUR NOR BUS PUR COM NAV STA"', 'content-type': 'text/javascript; charset=utf-8'}
Note there the 'content-encoding': 'gzip'. I suspect the requests module (I'm using 0.13.3 in Macos Python 2.6) is gzip encoding its PUT request but the Rally API server is not properly decoding that.