install be_helper on datalab - google-cloud-platform

I knew that BigQuery module is already installed on datalab. I just wanna to use bq_helper module because I learned it on Kaggle.
I did !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and it worked.
but I can't import the bq_helper. The pic is shown below.
Please help. Thanks!
I used python2 on Datalab.

I am not familiar with the BigQuery Helper library you shared, but in general, in Datalab, it may happen that you need to restart the kernel in order for the libraries to be properly loaded.
I reproduced the scenario you proposed: installing the library with the command !pip install -e git+https://github.com/SohierDane/BigQuery_Helper#egg=bq_helper and then trying to import it in the notebook using:
from bq_helper import BigQueryHelper
bq_assistant = BigQueryHelper("bigquery-public-data", "github_repos")
bq_assistant.project_name
At first, it did not work and I obtained the same error as you; then I clicked on the Reset Session button and the library was loaded properly.
Some other details that may be relevant if this does not work for you are:
I am also running on Python2 (although the GitHub page of the library suggests that it was only tested in Python3.6+).
The Custom metadata parameters in the Datalab GCE instance are: created-with-datalab-version: 20180503 and created-with-sdk-version: 208.0.2.

Related

No module named 'nltk.lm' in Google colaboratory

I'm trying to import the NLTK language modeling module (nltk.lm) in a Google colaboratory notebook without success. I've tried by installing everything from nltk, still without success.
What mistake or omission could I be making?
Thanks in advance.
.
Google Colab has nltk v3.2.5 installed, but nltk.lm (Language Modeling package) was added in v3.4.
In your Google Colab run:
!pip install -U nltk
In the output you will see it downloads a new version, and uninstalls the old one:
...
Downloading nltk-3.6.5-py3-none-any.whl (1.5 MB)
...
Successfully uninstalled nltk-3.2.5
...
You must restart the runtime in order to use newly installed versions.
Click the Restart runtime button shown in the end of the output.
Now it should work!
You can double check the nltk version using this code:
import nltk
print('The nltk version is {}.'.format(nltk.__version__))
You need v3.4 or later to use nltk.lm.

Unable to import module 'lambda_function': No module named '_awscrt'

I'm working with this article Asynchronous Amazon Transcribe Streaming SDK for Python.
I'm trying to create a lambda layer for the required libraries.
I used the following command:
pip3 install amazon-transcribe aiofile -t .
But I get the following error when I use the layer in my lambda function:
Unable to import module 'lambda_function': No module named '_awscrt'
The same works fine with virtual environment locally. I'm not sure what's the exact issue.
I even tried installing awscrt separately but it didn't work.
Any kind of help will be greatly appreciated. Thanks!
Lambda layers .zip files need to follow a specific directory file structure. Look at this section of the documentation to see how it should be structured for Python. This might be your problem.
I built the layer on Amazon Linux and it worked fine!
The troubleshooting guide in the repo helped:
The caio linux implementation works normal for modern linux kernel versions and file systems. So you may have problems specific for your environment. It's not a bug and might be resolved some ways:
1. Upgrade the kernel
2. Use compatible file system
3. Use threads based or pure python implementation.

No module name `keras` under tmux on AWS instance

I am trying to use Amazon AWS instance to train my network. To run it under keras, I need to run
source activate tensorflow_p36
first and it works. Unfortunately, if I do the same from under tmux, it says it can't find keras module.
Why and how to overcome?
You can refer to the solution suggested in TMUX Session Won't Import Python Module. If you start the tmux session first and then import tensorflow it should work. At least my issue was resolved when I used this sequence other wise I got an error saying tensorflow module was not found.

How to ensure software package version consistency in AWS SageMaker serverless compute?

I am learning AWS SageMaker which is supposed to be a serverless compute environment for Machine Learning. In this type of serverless compute environment, who is supposed to ensure the software package consistency and update the versions?
For example, I ran the demo program that came with SageMaker, deepar_synthetic. In this second cell, it executes the following: !conda install -y s3fs
However, I got the following warning message:
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.4.10
latest version: 4.5.4
Please update conda by running
$ conda update -n base conda
Since it is serverless compute, am I still supposed to update the software packages myself?
Another example is as follows. I wrote a few simple lines to find out the package versions in Jupyter notebook:
import platform
import tensorflow as tf
print(platform.python_version())
print (tf.version)
However, I got the following warning messages:
/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
The prints still worked and I got the results shown beolow:
3.6.4
1.4.0
I am wondering what I have to do to get the package consistent so that I don't get the warning messages. Thanks.
Today, SageMaker Notebook Instances are managed EC2 instances but users still have full control over the the Notebook Instance as root. You have full capabilities to install missing libraries through the Jupyter terminal.
To access a terminal, open your Notebook Instance to the home page and click the drop-down on the top right: “New” -> “Terminal”.
Note: By default, conda installs to the root environment.
The following are instructions you can follow https://conda.io/docs/user-guide/tasks/manage-environments.html on how to install libraries in the particular conda environment.
In general you will need following commands,
conda env list
which list all of your conda environments
source activate <conda environment name>
e.g. source activate python3
conda list | grep <package>
e.g. conda list | grep numpy
list what are the current package versions
pip install numpy
Or
conda install numpy
Note: Periodically the SageMaker team releases new versions of libraries onto the Notebook Instances. To get the new libraries, you can stop and start your Notebook Instance.
If you have recommendations on libraries you would like to see by default, you can create a forum post under https://forums.aws.amazon.com/forum.jspa?forumID=285 . Alternatively, you can bootstrap your Notebook Instances with Lifecycle Configurations to install custom libraries. More details here: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateNotebookInstanceLifecycleConfig.html

Amazon Lambda unable to import [python windows .pyd pip]

I am trying to write to my PostgreSQL database with AWS Lambda using the python2.7 runtime. I care very little about how I do this, so if anyone has a different way that I can understand that works, I'd love to hear it.
The method I'm currently trying is to use psycopg2, as this is the only way I know. In order to do this, I need to upload the psycopg2 module to my environment on AWS Lambda. As per instructions, I've created a directory with my source and psycopg2 using pip install psycopg2 -t ..\my-project, zipped my-project, and uploaded it.
My error message is this from within the AWS Lambda console: Unable to import module 'lambda_function': No module named _psycopg
The code runs on my windows machine. I think the issue is that when I import psycopg2 from my local windows machine, the _psycopg module is being imported from _psycopg.pyd, and .pyd files are windows specific. I may be wrong about this.
I'm really just looking for any way to achieve the desired result described in my first paragraph, but here's a more specific question: How do I tell windows to pip install and compile psycopg2 without using .pyd files? Is this possible? Do I have something completely wrong?
I know the formatting of this question is a little unorthodox, I think I've given all the necessary information, let me know if there's anything else I can provide.
I solved the problem by opening an ubuntu instance on VirtualBox, pip installing the package there, pulling the relevant folders out, and placing them in my-project before zipping and uploading to AWS Lambda.
See these instructions.