Altair charts in juypter notebook does not render - python-2.7

I can't figure out what's up in my jupyter notebook. Vincent and Bokeh work fine, but in trying out Altair, I must be missing something, but I'm not erroring and the online docs don't mention my problem.
This is what I enter (from documentation page https://altair-viz.github.io/gallery/bar_aggregate.html )
from altair import *
Chart('http://vega.github.io/vega-lite/data/population.json',
description='A bar chart showing the US population distribution of age groups in 2000.',
).mark_bar().encode(
x=X('sum(people):Q',
axis=Axis(
title='population',
),
),
y=Y('age:O',
scale=Scale(
bandSize=17.0,
),
),
).transform_data(
filter='datum.year == 2000',
)
The code executes in my Jupyter notebook with no errors, but also no graph. I do have vega installed, so that's not the issue. It's not specific to this graph, other examples have the same behavior. I'm not sure how to even troubleshoot this!

This solved my problem : Display plots in jupyter
Which points to : How to install properly altair in anaconda and enable ipyvega
Install altair :
conda install altair --channel conda-forge
Run this line in command line before launching jupyter :
jupyter nbextension enable vega --py --sys-prefix
Launch notebook :
jupyter notebook

Related

How to install Kaggle on Jupyter Notebook services in Google Cloud

I've using Google Colab for computing my Kaggle competition, nowadays I decided to take a look if it'll work faster using services on Google Cloud. I have a *.ipybn file from Google Cloud, downloaded it and try to upload it to Google Cloud instance.
I created all connection on Google Colab using this link: https://towardsdatascience.com/setting-up-kaggle-in-google-colab-ebb281b61463 and it worked fine.
Using this tutorial: https://towardsdatascience.com/how-to-use-jupyter-on-a-google-cloud-vm-5ba1b473f4c2 I started a new instance for Jupyter notebook. Uploaded a *ipybn file, I tried to install Kaggle and run my notebook, but I have usually following errors:
kaggle: command not found error
ensure that your python binaries are on your path
How can I set everything to work on Google Cloud service?
Using this first tutorial mind about changing root directory path from content to /home/jupyter/, for example:
import zipfile
zip_ref = zipfile.ZipFile("/home/jupyter/Airbus_competition/input/test_v2.zip", 'r')
zip_ref.extractall("/home/jupyter/Airbus_competition/input/test_v2")
zip_ref.close()
For problems with installing kaggle, you don't have access to root folder from Jupyter notebooks, but you can install and use Kaggle API, when you change the command from !kaggle to !~/.local/bin/kaggle, for example (commands from tutorial changed to be working on GCS):
!mkdir ~/.kaggle
import json
token = {"your_TOKEN"}
with open('/home/jupyter/.kaggle/kaggle.json', 'w') as file:
json.dump(token, file)!cp /home/jupyter/.kaggle/kaggle.json
~/.kaggle/kaggle.json
!~/.local/bin/kaggle config set -n path -v{home/jupyter/Airbus_competition}
!chmod 600 /home/jupyter/.kaggle/kaggle.json
!~/.local/bin/kaggle competitions download -c airbus-ship-detection -p /home/jupyter/Airbus_competition/input --force

Zeppelin pyspark interpreter not able to submit application in YARN

Environment : AWS EMR emr-5.11.1 , Zeppelin 0.7.3 , Spark 2.2.1
Problem : Zeppelin pyspark interpreter is not submitting jobs as applications in YARN
As per this , i have done following changes , with no effect
set SPARK_HOME
added spark.executer.memory=5g , spark.cores.max ,
master=yarn-client , spark.home in pyspark interpreter tab in zeppelin
added spark.dynamicAllocation.enabled = true in yarn-site.xml
Restarted interpreter and zeppelin process
Please Help
Solution 1
I have the same question, please upgrade to 0.8.0, the newest version solve that question.
Solution 2
edit $ZEPPELIN_HOME/conf/zeppelin-env.sh, add export SPARK_SUBMIT_OPTIONS="--num-executors 10 --driver-memory 8g --executor-memory 10g --executor-cores 4 ".
if you don't have zeppelin-env.sh, please copy and rename zeppelin-env.sh.template to zeppelin-env.sh.
Solution 3
edit $SPARK_CONF_DIR/spark-defaults.conf and modify what you want to add.
After that, restart your server.

AWS EMR jupyter password

im using EMR and wanted to use jupyter(ipython) so i added to the cluster the bootstrap action:
s3://elasticmapreduce.bootstrapactions/ipython-notebook/install-ipython-notebook
I performed the port tunelling to access jupyter from my local host and works fine, but it is asking for a login password, tried empty, tried hadoop, but no luck, does any body knows what is the jypyter password?
I ran into this problem as well when I used the same bootstrap action. I tried adding in Args=[--password, jupyter] which I also could not get working. That was from this aws forum:
Name='Install Jupyter notebook',Path="s3://aws-bigdata-blog/artifacts/aws-blog-emr-jupyter/install-jupyter-emr5.sh",Args=[--r,--julia,--toree,--torch,--ruby,--ds-packages,--ml-packages,--python-packages,'ggplot nilearn',--port,8880,--password,jupyter,--jupyterhub,--jupyterhub-port,8001,--cached-install,--notebook-dir,s3://<your-s3-bucket>/notebooks/,--copy-samples]
What I did instead was to follow these instructions for installing anaconda directly in the EMR instance using the CLI. If you follow the first part you should be able to get it up and running. To summarize here:
ssh into your master emr instance using the .pem file you saved
once there's you'll want to install anaconda using super user priveledges: sudo wget http://repo.continuum.io/archive/Anaconda3-4.1.1-Linux-x86_64.sh. Then bash Anaconda3–4.1.1-Linux-x86_64.sh
Make sure you're using the anaconda version of python: which python
If you're not, specify your source: source .bashrc
Now make a jupyter config file: jupyter notebook --generate-config
cd into the jupyter folder: cd ~/.jupyter/
update the config file: vi jupyter_notebook_config.py
In the config file add the following lines:
c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 6789 <---pick whichever port you want
exit out of the config editor and run jupyter via: jupyter notebook
this should run a notebook with no active kernels (for now). But it will give you the token you're looking for: http://localhost:6789/?token=xxxxxx
Leave this running, and open a new terminal window. Now you'll want to tunnel to the EMR instance per this aws blog post (make the port the same as the one you specified in the config file). ssh -o ServerAliveInterval=10 -i <<credentials.pem>> -N -L 8192:<<master-public-dns-name>>:8192 hadoop#<<master-public-dns-name>>
Opening localhost:6789 in the browser should prompt you with the jupyter page to enter your password or token. Enter the token that was generated in the above step and you should be good to go.
Hope this helps! There might be a less convoluted way, but this is what ended up working for me.

Run Jupyter cells in slideshow mode

You can display a Jupyter notebook in an active html setting by running :
$ jupyter nbconvert untitled.ipynb --to slides --post serve
Is there any ways to run a notebook in the same slideshow format in order to allow for a live presentation/execution of your cells ?
Check out the Jupyter plugin RISE using Reveal.JS:
Github Repo: https://github.com/damianavila/RISE
Documentation: https://rise.readthedocs.io/en/maint-5.5/
It's awesome.

How to run pyspark on EC2 with IPython starting from the spark-ec2 launch process?

Three steps and I have a spark context in my IPython notebook:
1.) Launch spark on EC2 using the these instructions.
2.) Install anaconda and py4j on every node (set PATH accordingly).
3.) Login to master, cd to the spark folder, then run:
MASTER=spark://<public DNS>:7077 PYSPARK_PYTHON=~/anaconda2/bin/python PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS='notebook --ip="*"' ./bin/pyspark
This process makes the IPython notebook available on < master public DNS >:8888, which is great, but ... I am currently using a csshx style solution to accomplish step 2.
Question:
How can I set install requirements (on AWS or elsewhere) so that the spark-ec2 script spins up machines with the desired setup?
If that's not possible or simply clunky, what would you suggest? (command line only solutions are preferred)