Dataflow worker failed to start - google-cloud-platform

I have a dataflow job that failed to start the workers with the following error:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/start.py", line 28, in <module>
from dataflow_worker import batchworker
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 57, in <module>
from apache_beam.runners.dataflow.internal.dependency import _dependency_file_copy
ImportError: No module named dependency
However I could run the same job successfully with the exact same code (same setup.py file as well) on another machine, I'm suspecting issues in my configuration of the dataflow project.

If the same job with the exact same code (same setup.py file as well) can run successfully in one cloud project but failed in another project, then there might be an issue with the failed project. The best way to figure out would be to report this as an issue at the public issue tracker per instructions at Report Bugs and Request Features with Issue Trackers to have the support from Google to look into the project with you.

Related

ImportError: No module named idlelib" when running Google Dataflow worker

I have a python 2.7 script I run locally to launch a Apache Beam / Google Dataflow job (SDK 2.12.0). The job takes a csv file from a Google storage bucket, processes it and then creates an entity in Google Datastore for each row. The script ran fine for years ...but now it is failing:
INFO:root:2019-05-15T22:07:11.481Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
INFO:root:2019-05-15T21:47:13.370Z: JOB_MESSAGE_ERROR: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 773, in run
self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 489, in _load_main_session
pickler.load_session(session_file)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 280, in load_session
return dill.load_session(file_path)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 410, in load_session
module = unpickler.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 827, in _import_module
return __import__(import_name)
ImportError: No module named idlelib
I believe this error is happening at the worker level (not locally). I don't make reference to it in my script. To make sure it wasn't me I have installed updates for all google-cloud packages, apache-beam[gcp] etc locally -just in case. I tried importing idlelib into my script I get the same error. Any suggestions?
It has been fine for years and started failing from SDK 2.12.0 release.
What was the last release that this script succeeding on? 2.11?

The directory is not empty: '.elasticbeanstalk\\app_versions' Windows 10

I am switching to a new computer with a fresh install of Windows 10 Pro and am having a very strange issue with the EB CLI. I am not able to run 'eb deploy' using Windows Power Shell, I get the following error:
ERROR: OSError - [WinError 145] The directory is not empty: '.elasticbeanstalk\\app_versions'
I and uninstalled/reinstalled Python along with the EB CLI but with the same result.
Note: I am able to run all other EB commands like eb ssh or eb logs with no issues.
An observation I was able to make while watching the '.elasticbeanstalk' folder, I see the 'app_versions' folder being created along with the application zip in that folder. Once the command fails the ZIP file remains in the 'app_versions' folder for about 10 to 15 seconds before it is removed. I checked S3 and the zip file is uploaded...
I have reviewed this other Stack Overflow issue: AWS Elastic Beanstalk deploy not working
I do not have Google/Dropbox or OneDrive running on the directory I am working in. Just to be safe I paused OneDrive but still nothing.
Please, any help would be amazing!
UPDATE:
Ran eb deploy --debug
There is no error until AFTER the upload is completed, confirmed this by checking the S3 bucket and seeing the latest upload.
2019-02-04 14:50:06,522 (INFO) eb : Traceback (most recent call last):
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\core\ebrun.py", line 62, in run_app
app.run()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\cement\core\foundation.py", line 797, in run
return_val = self.controller._dispatch()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\cement\core\controller.py", line 472, in _dispatch
return func()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\cement\core\controller.py", line 478, in _dispatch
return func()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\core\abstractcontroller.py", line 94, in default
self.do_command()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\controllers\deploy.py", line 78, in do_command
staged=self.staged, timeout=self.timeout, source=self.source)
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\operations\deployops.py", line 59, in deploy
build_config=build_config
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\operations\commonops.py", line 538, in create_app_version
fileoperations.delete_app_versions()
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\core\fileoperations.py", line 432, in delete_app_versions
delete_directory(app_version_folder)
File "C:\Users\winng\AppData\Roaming\Python\Python37\site-packages\ebcli\core\fileoperations.py", line 425, in delete_directory
shutil.rmtree(location)
File "c:\users\winng\appdata\local\programs\python\python37\lib\shutil.py", line 513, in rmtree
return _rmtree_unsafe(path, onerror)
File "c:\users\winng\appdata\local\programs\python\python37\lib\shutil.py", line 401, in _rmtree_unsafe
onerror(os.rmdir, path, sys.exc_info())
File "c:\users\winng\appdata\local\programs\python\python37\lib\shutil.py", line 399, in _rmtree_unsafe
os.rmdir(path)
OSError: [WinError 145] The directory is not empty: '.elasticbeanstalk\\app_versions'
2019-02-04 14:50:06,526 (INFO) eb : OSError - [WinError 145] The directory is not empty: '.elasticbeanstalk\\app_versions'

AWS DataPipeline Maching Learning AMI tensorflow issues

I'm running the AWS Machine Learning AMI on an EC2 instance. I've confirmed that from the terminal, both in python and jupyter can run
import tensorflow as tf
along with
python pytest.py
from the terminal (which contains the above tensorflow import), with no issues.
I'm now trying to automate my script using DataPipeline along with TaskRunner. The bash command in DataPipeline is again, just:
python pytest.py
However, Immediately get the following error:
Traceback (most recent call last): File "pytest.py", line 1, in
import tensorflow as tf File "/usr/lib/python2.7/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import * File "/usr/lib/python2.7/dist-packages/tensorflow/python/init.py", line
72, in
raise ImportError(msg) ImportError: Traceback (most recent call last): File
"/usr/lib/python2.7/dist-packages/tensorflow/python/init.py", line
61, in
from tensorflow.python import pywrap_tensorflow File "/usr/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py",
line 28, in
_pywrap_tensorflow = swig_import_helper() File "/usr/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py",
line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description) ImportError: libcudart.so.7.5: cannot open shared object
file: No such file or directory
Failed to load the native TensorFlow runtime.
See
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#import_error
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
It seems like AWS DataPipeline (or TaskRunner?) uses a different enviornment setup, because again, I have no issues running the script through an ssh terminal to the instance. I found a few posts which suggested adding cuda to the LD_LIBRARY_PATH, but the AMI instance already has it:
echo $LD_LIBRARY_PATH
/home/ec2-user/src/torch/install/lib:/home/ec2-user/src/cntk/bindings/python/cntk/libs:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/home/ec2-user/src/mxnet/mklml_lnx_2017.0.1.20161005/lib:
which clearly contains the cuda librarypath that tensorflow needs.

AWS command line tools broken : (

I tried to install awscli after ebcli, and they both broke. Currently, if I type aws s3 ls, it just hangs with no response, and if I try to use eb, I get this error:
Traceback (most recent call last):
File "/usr/local/bin/eb", line 11, in <module>
load_entry_point('awsebcli==3.8.4', 'console_scripts', 'eb')()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 565, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2631, in load_entry_point
return ep.load()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2291, in load
return self.resolve()
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2297, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/local/lib/python2.7/dist-packages/ebcli/core/ebcore.py", line 43, in <module>
from . import ebglobals, base, io, hooks
File "/usr/local/lib/python2.7/dist-packages/ebcli/core/base.py", line 19, in <module>
from ebcli import __version__
ImportError: cannot import name __version__
I basically need to have command line tools for s3 and elastic beanstalk, but I apparently have no luck, and will be spending my entire day googling the universe, and combing through error codes to try and fix this : (
I'm on Ubuntu 14.04 on a Thinkpad.
It is quite common for different Python libraries to install over each other, causing problems like this.
A popular fix is to use a the virtualenv tool to create isolated Python environments.
The AWS documentation for awsebcli has a page showing how: Install the EB CLI in a Virtual Environment
Alternatively, keep using the AWS Command-Line Interface (CLI) since it works across all AWS services, rather than using service-specific command sets like awsebcli (which pre-date the CLI).

Python pikascript.py fails from command prompt

I have a script in python which is used to connect to RabbitMQ server and consumes messages. When i tried to run the script from command prompt as "./pikascript.py" i am getting the proper output but the same script when i try to execute as "python pikascript.py" i get the following error:
WARNING:pika.adapters.base_connection:Connection to 16.125.72.210:5671 failed: [Errno 1] _ssl.c:503: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
Traceback (most recent call last):
File "pikascript.py", line 39, in <module>
ssl=True, ssl_options=ssl_options))
File "build\bdist.win-amd64\egg\pika\adapters\blocking_connection.py", line 130, in __init__
File "build\bdist.win-amd64\egg\pika\adapters\base_connection.py", line 72, in __init__
File "build\bdist.win-amd64\egg\pika\connection.py", line 600, in __init__
File "build\bdist.win-amd64\egg\pika\adapters\blocking_connection.py", line 230, in connect
File "build\bdist.win-amd64\egg\pika\adapters\blocking_connection.py", line 301, in _adapter_connect
pika.exceptions.AMQPConnectionError: Connection to 16.125.72.210:5671 failed: [Errno 1] _ssl.c:503: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version
I gave the proper path in the environmenal variables. Are there any dependencies to run the pika libraries.. Could someone please help me out.
When I tried to run the script from command line as "./pikascript.py" it is referring to the python path in "C:\Python\python.exe", but when I run the same script as "python pikascript.py" it refers to another python path in the same machine, where setup tools and pika library are not installed properly.
So I started executing the script as "C:\Python\python.exe pikascript.py" and script gets executed without any error.