How to plot parallel coordinae plot ftrom Hyperparameter Tuning with the HParams Dashboard? - tensorboard

I am trying to replicate the parallel coordinate plot form Hyperparameter Tuning tutorial in this Tensorflow tutorial and I have writen my own csv file where I store my results.
My output reading the csv file is like this:
conv_layers filters dropout accuracy
0 4 16 0.5 0.447917
1 4 16 0.6 0.458333
2 4 32 0.5 0.635417
3 4 32 0.6 0.447917
4 4 64 0.5 0.604167
5 4 64 0.6 0.645833
6 8 16 0.5 0.437500
7 8 16 0.6 0.437500
8 8 32 0.5 0.437500
9 8 32 0.6 0.562500
10 8 64 0.5 0.562500
11 8 64 0.6 0.437500
How can I create the same plot like in the tutorial in python?

so I found the answer using plotly
import os
import sys
import pandas as pd
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objects as go
init_notebook_mode(connected=True)
df = pd.read_csv('path/to/csv')
fig = go.Figure(data=
go.Parcoords(
line = dict(color = df['accuracy'],
colorbar = [],
colorscale = [[0, '#6C9E12'], ##
[0.25,'#0D5F67'], ##
[0.5,'#AA1B13'], ##
[0.75, '#69178C'], ##
[1, '#DE9733']]),
dimensions = list([
dict(range = [0,12],
label = 'Conv_layers', values = df['conv_layers']),
dict(range = [8,64],
label = 'filter_number', values = df['filters']),
dict(range = [0.2,0.8],
label = 'dropout_rate', values = df['dropout']),
dict(range = [0.2,0.8],
label = 'dense_num', values = df['dense']),
dict(range = [0.1,1.0],
label = 'accuracy', values = df['accuracy'])
])
)
)
fig.update_layout(
plot_bgcolor = '#E5E5E5',
paper_bgcolor = '#E5E5E5',
title="Parallel Coordinates Plot"
)
# print the plot
fig.show()

Related

List format error using matlotlib linecollection

I have a list (coordpairs) that I am trying to use as the basis for plotting using LineCollection. The list is derived from a Pandas data frame. I am having trouble getting the list in the right format, despite what is admittedly a clear error code. Trimmed data frame contents, code, and error are below. Thank you for any help.
Part of the Data Frame
RUP_ID Vert_ID Longitude Latitude
1 1 -116.316961 34.750178
1 2 -116.316819 34.750006
2 1 -116.316752 34.749938
2 2 -116.31662 34.749787
10 1 -116.317165 34.754078
10 2 -116.317277 34.751492
10 3 -116.317206 34.751273
10 4 -116.317009 34.75074
10 5 -116.316799 34.750489
11 1 -116.316044 34.760377
11 2 -116.317105 34.755674
11 3 -116.317165 34.754078
Code
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
fig = plt.figure()
ax1 = plt.subplot2grid((2, 2), (0, 0), rowspan=2, colspan=1)
for ii in range(1,len(mydf)):
temp = mydf.loc[mydf.RUP_ID == ii]
df_line = temp.sort_values(by='Vert_ID', ascending=True)
del temp
lat = df_line.Latitude
lon = df_line.Longitude
lat = lat.tolist()
long = long.tolist()
coordpairs = zip(lat, long)
lc = LineCollection(coordpairs, colors='r') # this is line 112 in the error
ax1.add_collection(lc)
# note I also tried:
# import numpy as np
# coordpairs2 = np.vstack([np.array(u) for u in set([tuple(p) for p in coordpairs])])
# lc = LineCollection(coordpairs2, colors='r')
# and received the same plotting error
Error/Outputs
C:\apath\python.exe C:/mypath/myscript.py
Traceback (most recent call last):
File "C:/mypath/myscript.py", line 112, in <module>
lc = LineCollection(coordpairs, colors='r') # this is line 112 in the error
File "C:\apath\lib\site-packages\matplotlib\collections.py", line 1149, in __init__
self.set_segments(segments)
File "C:\apath\lib\site-packages\matplotlib\collections.py", line 1164, in set_segments
self._paths = [mpath.Path(_seg) for _seg in _segments]
File "C:\apath\lib\site-packages\matplotlib\path.py", line 141, in __init__
raise ValueError(msg)
ValueError: 'vertices' must be a 2D list or array with shape Nx2
Process finished with exit code 1
You would want to create one single LineCollection, with several lines, one per RUP_ID value from the first dataframe column. That means you best loop over the unique values of that column (not over every row!) and append the coordinates to a list. Use that list as the input to LineCollection.
u = """RUP_ID Vert_ID Longitude Latitude
1 1 -116.316961 34.750178
1 2 -116.316819 34.750006
2 1 -116.316752 34.749938
2 2 -116.31662 34.749787
10 1 -116.317165 34.754078
10 2 -116.317277 34.751492
10 3 -116.317206 34.751273
10 4 -116.317009 34.75074
10 5 -116.316799 34.750489
11 1 -116.316044 34.760377
11 2 -116.317105 34.755674
11 3 -116.317165 34.754078"""
import io
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
df = pd.read_csv(io.StringIO(u), sep="\s+")
verts = []
for (RUP_ID, grp) in df.groupby("RUP_ID"):
df_line = grp.sort_values(by='Vert_ID', ascending=True)
lat = df_line.Latitude
lon = df_line.Longitude
verts.append(list(zip(lon, lat)))
lc = LineCollection(verts, color='r')
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale()
plt.show()

Null independent column wise mean calculation in Python

I am trying to calculate the mean of 3 three columns in Python. Here is the catch-
If all 3 row values of my 3 columns are not null then my mean will be (x+y+z)/3.
If one of my row value is null (suppose z), then my mean should be (x+y)/2.
I'm storing there mean values in a seperate column which is part of the pandas dataframe.
I'm looking for the best approach as my dataset has over 2 million rows.
My data is below.
Thanks in advance.
A B C
0 1 2 3 # = (1+2+3)/3 = 2
1 4 NaN 6 # = (4+6)/2 = 5
2 NaN 8 9 # = (8+9)/2 = 8.5
Just apply the numpy.nanmean function along axis 0 (columns). This is the default axis so you will get the same result with omitting axis = 0. If you want the means row-wise use axis = 1:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'a': [2.3, 4.5, 2.1, np.nan, 6.7],
'b': [2.4, 5.6, np.nan, np.nan, 7.1],
'c': [np.nan, np.nan, np.nan, np.nan, 0.9]
})
colmeans = df.apply(np.nanmean, axis = 0)
# colmeans
# a 3.900000
# b 5.033333
# c 0.900000
# dtype: float64
rowmeans = df.apply(np.nanmean, axis = 1)
# 0 2.35
# 1 5.05
# 2 2.10
# 3 NaN
# 4 4.90
# dtype: float64

How to visualize binary data in multiple axis in Python?

I have a sample Pandas data frame as follows:
Action Comedy Crime Thriller SciFi
1 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 0 1 0 1
1 1 0 0 0
I would like to plot the data-set using Python(Preferably by using matplotlib) in such a way that each of the columns will be a separate axis. Hence in this case, there will be 5 axis (Action, Comedy, Crime...) and 5 data points (since it has 5 rows).
Is it possible to plot this kind of multi-axis data using python matplotlib? If its not possible, what would be the best solution to visualize this data?
RadarChart
Having several axes could be accomplished using a RadarChart. You may adapt the Radar Chart example to your needs.
u = u"""Action Comedy Crime Thriller SciFi
1 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 0 1 0 1
1 1 0 0 0"""
import io
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.path import Path
from matplotlib.spines import Spine
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
def radar_factory(num_vars, frame='circle'):
theta = np.linspace(0, 2*np.pi, num_vars, endpoint=False)
theta += np.pi/2
def draw_poly_patch(self):
verts = unit_poly_verts(theta)
return plt.Polygon(verts, closed=True, edgecolor='k')
def draw_circle_patch(self):
return plt.Circle((0.5, 0.5), 0.5)
patch_dict = {'polygon': draw_poly_patch, 'circle': draw_circle_patch}
def unit_poly_verts(theta):
x0, y0, r = [0.5] * 3
verts = [(r*np.cos(t) + x0, r*np.sin(t) + y0) for t in theta]
return verts
class RadarAxes(PolarAxes):
name = 'radar'
RESOLUTION = 1
draw_patch = patch_dict[frame]
def fill(self, *args, **kwargs):
"""Override fill so that line is closed by default"""
closed = kwargs.pop('closed', True)
return super(RadarAxes, self).fill(closed=closed, *args, **kwargs)
def plot(self, *args, **kwargs):
"""Override plot so that line is closed by default"""
lines = super(RadarAxes, self).plot(*args, **kwargs)
for line in lines:
self._close_line(line)
def _close_line(self, line):
x, y = line.get_data()
if x[0] != x[-1]:
x = np.concatenate((x, [x[0]]))
y = np.concatenate((y, [y[0]]))
line.set_data(x, y)
def set_varlabels(self, labels):
self.set_thetagrids(np.degrees(theta), labels)
def _gen_axes_patch(self):
return self.draw_patch()
def _gen_axes_spines(self):
if frame == 'circle':
return PolarAxes._gen_axes_spines(self)
spine_type = 'circle'
verts = unit_poly_verts(theta)
# close off polygon by repeating first vertex
verts.append(verts[0])
path = Path(verts)
spine = Spine(self, spine_type, path)
spine.set_transform(self.transAxes)
return {'polar': spine}
register_projection(RadarAxes)
return theta
df = pd.read_csv(io.StringIO(u), delim_whitespace=True)
N = 5
theta = radar_factory(N, frame='polygon')
fig, ax = plt.subplots(subplot_kw=dict(projection='radar'))
colors = ['b', 'r', 'g', 'm', 'y']
markers = ["s", "o","P", "*", "^"]
ax.set_rgrids([1])
for i,(col, row) in enumerate(df.iterrows()):
ax.scatter(theta, row, c=colors[i], marker=markers[i], label=col)
ax.fill(theta, row, facecolor=colors[i], alpha=0.25)
ax.set_varlabels(df.columns)
labels = ["Book {}".format(i+1) for i in range(len(df))]
ax.legend(labels*2, loc=(0.97, .1), labelspacing=0.1, fontsize='small')
plt.show()
heatmap
An easy and probably more readable way to visualize the data would be a heatmap.
u = u"""Action Comedy Crime Thriller SciFi
1 0 1 1 0
0 1 0 0 1
0 1 0 1 0
0 0 1 0 1
1 1 0 0 0"""
import io
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(io.StringIO(u), delim_whitespace=True)
print df
plt.matshow(df, cmap="gray")
plt.xticks(range(len(df.columns)), df.columns)
plt.yticks(range(len(df)), range(1,len(df)+1))
plt.ylabel("Book number")
plt.show()
Here is a nice simple visualization that you can get with a bit of data manipulation and Seaborn.
import seaborn as sns
# df is a Pandas DataFrame with the following content:
# Action Comedy Crime Thriller SciFi
# 1 0 1 1 0
# 0 1 0 0 1
# 0 1 0 1 0
# 0 0 1 0 1
# 1 1 0 0 0
df = ...
# Give name to the indices for convenience
df.index.name = "Index"
df.columns.name = "Genre"
# Get a data frame containing the relevant genres and indices
df2 = df.unstack()
df2 = df2[df2 > 0].reset_index()
# Plot it
ax = sns.stripplot(x="Genre", y="Index", data=df2)
ax.set_yticks(df.index)
And you get:
For fine tuning you can check the documentation of sns.stripplot.

Comparing metrics of Keras with metrics of sklearn.classification_report

I am struggling with different metrics while evaluating neural networks.
My investigations showed that Keras (version 1.2.2) calculates different values for specific metrics (using function evaluate) compared to sklearn.classification report.
Specifically, the values for the metric 'precision' (i.e. 'precision' of Keras != 'precision' of sklearn) or 'recall' (i.e. 'recall' of Keras != 'recall' of sklearn) differ.
For the following working example the differences seem to be random, but evaluating bigger networks shows that 'precision' of Keras equals (almost) 'recall' of sklearn whereas both 'recall' metrics differ clearly.
I appreciate your help!
from __future__ import print_function
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils # numpy utils for to_categorical()
from keras import backend as K # abstract backend API (in order to generate compatible code for Theano and Tf)
from sklearn.metrics import classification_report
batch_size = 128
nb_classes = 10
nb_epoch = 30
# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
if K.image_dim_ordering() == 'th':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 # range [0,1]
X_test /= 255 # range [0,1]
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes) # necessary for use of categorical_crossentropy
Y_test = np_utils.to_categorical(y_test, nb_classes) # necessary for use of categorical_crossentropy
# create model
model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
border_mode='valid',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
# configure model
model.compile(loss='categorical_crossentropy',
optimizer='adadelta',
metrics=['accuracy', 'precision', 'recall'])
# train model
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, validation_data=(X_test, Y_test))
# evaluate model with keras
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
print('Test precision:', score[2])
print('Test recall:', score[3])
# evaluate model with sklearn
predictions_last_epoch = model.predict(X_test, batch_size=batch_size, verbose=1)
target_names = ['class 0', 'class 1', 'class 2', 'class 3', 'class 4',
'class 5', 'class 6', 'class 7', 'class 8', 'class 9']
predicted_classes = np.argmax(predictions_last_epoch, axis=1)
print('\n')
print(classification_report(y_test, predicted_classes,
target_names=target_names, digits = 6))
E D I T
The output of the script given above:
Test score: 0.0271549037314
Test accuracy: 0.9916
Test precision: 0.992290322304
Test recall: 0.9908
9728/10000 [============================>.] - ETA: 0s
precision recall f1-score support
class 0 0.987867 0.996939 0.992382 980
class 1 0.993860 0.998238 0.996044 1135
class 2 0.990329 0.992248 0.991288 1032
class 3 0.991115 0.994059 0.992585 1010
class 4 0.994882 0.989817 0.992343 982
class 5 0.991041 0.992152 0.991597 892
class 6 0.993678 0.984342 0.988988 958
class 7 0.992180 0.987354 0.989761 1028
class 8 0.989754 0.991786 0.990769 974
class 9 0.991054 0.988107 0.989578 1009
avg / total 0.991607 0.991600 0.991597 10000
For another model:
val/test loss: 0.231304548573
val/test categorical_accuracy: **0.978500002956**
val/test precision: *0.995103668976*
val/test recall: 0.941900001907
val/test fbeta_score: 0.967675107574
val/test mean_squared_error: 0.0064611148566
10000/10000 [==============================] - 0s
precision recall f1-score support
class 0 0.989605 0.971429 0.980433 980
class 1 0.985153 0.993833 0.989474 1135
class 2 0.988154 0.969961 0.978973 1032
class 3 0.981373 0.991089 0.986207 1010
class 4 0.968907 0.983707 0.976251 982
class 5 0.997633 0.945067 0.970639 892
class 6 0.995690 0.964509 0.979852 958
class 7 0.987230 0.977626 0.982405 1028
class 8 0.945205 0.991786 0.967936 974
class 9 0.951429 0.990089 0.970374 1009
avg / total *0.978964* **0.978500** 0.978522 10000
Definition of desired metrics (for model.compile):
metrics=['categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']
model.compile(loss='categorical_crossentropy',
optimizer='sgd',
metrics=metrics)
Output of model.metrics_names:
['loss', 'categorical_accuracy', 'precision', 'recall', 'fbeta_score', 'mean_squared_error']
Yes, it is different due to the fact that the sklearn classification report gives you the weighted average based on the support.
Experiment with:
from sklearn.metrics import classification_report
y_true = [0, 1,2,1]
y_pred = [0, 0,2,0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
Gives you:
precision recall f1-score support
class 0 0.33 1.00 0.50 1
class 1 0.00 0.00 0.00 2
class 2 1.00 1.00 1.00 1
avg / total 0.33 0.50 0.38 **4**
However, (1+0+0.33)/3 = 0.44(3), but as it seems from the support column sklearn returns (1*1+0*2+0.33*1)/4=0.3325

Python Side-by-side box plots on same figure

I am trying to generate a box plot in Python 2.7 for each categorical value in column E from the Pandas dataframe below
A B C D E
0 0.647366 0.317832 0.875353 0.993592 1
1 0.504790 0.041806 0.113889 0.445370 2
2 0.769335 0.120647 0.749565 0.935732 3
3 0.215003 0.497402 0.795033 0.246890 1
4 0.841577 0.211128 0.248779 0.250432 1
5 0.045797 0.710889 0.257784 0.207661 4
6 0.229536 0.094308 0.464018 0.402725 3
7 0.067887 0.591637 0.949509 0.858394 2
8 0.827660 0.348025 0.507488 0.343006 3
9 0.559795 0.820231 0.461300 0.921024 1
I would be willing to do this with Matplotlib or any other plotting library. So far the above code can plot all the categories combined on one plot. Here is the code to generate the above data and produce the plot:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Data
df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
df['E'] = [1,2,3,1,1,4,3,2,3,1]
# Boxplot
bp = ax.boxplot(df.iloc[:,:-1].values, widths=0.2)
plt.show()
In this example, the categories are 1,2,3,4. I would like to plot separate boxplots side-by-side on the same figure, for only categories 1 and 2 and show the category names in the legend.
Is there a way to do this?
Additional Information:
The output should look similar to the 3rd figure from here - replace "Yes","No" by "1","2".
Starting with this:
import numpy
import pandas
from matplotlib import pyplot
import seaborn
seaborn.set(style="ticks")
# Data
df = pandas.DataFrame(numpy.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 3, 1, 1, 4, 3, 2, 3, 1]
You've got a couple of options. If separate axes are ok,
fig, axes = pyplot.subplots(ncols=4, figsize=(12, 5), sharey=True)
df.query("E in [1, 2]").boxplot(by='E', return_type='axes', ax=axes)
If you want 1 axes, I think seaborn will be easier. You just need to clean up your data.
ax = (
df.set_index('E', append=True) # set E as part of the index
.stack() # pull A - D into rows
.to_frame() # convert to a dataframe
.reset_index() # make the index into reg. columns
.rename(columns={'level_2': 'quantity', 0: 'value'}) # rename columns
.drop('level_0', axis='columns') # drop junk columns
.pipe((seaborn.boxplot, 'data'), x='E', y='value', hue='quantity', order=[1, 2])
)
seaborn.despine(trim=True)
The cool thing about seaborn is that tweaking the parameters slightly can achieve a lot in terms of the plot's layout. If we switch our hue and x variables, we get:
ax = (
df.set_index('E', append=True) # set E as part of the index
.stack() # pull A - D into rows
.to_frame() # convert to a dataframe
.reset_index() # make the index into reg. columns
.rename(columns={'level_2': 'quantity', 0: 'value'}) # rename columns
.drop('level_0', axis='columns') # drop junk columns
.pipe((seaborn.boxplot, 'data'), x='quantity', y='value', hue='E', hue_order=[1, 2])
)
seaborn.despine(trim=True)
If you're curious, the resulting dataframe looks something like this:
E quantity value
0 1 A 0.935433
1 1 B 0.862290
2 1 C 0.197243
3 1 D 0.977969
4 2 A 0.675037
5 2 B 0.494440
6 2 C 0.492762
7 2 D 0.531296
8 3 A 0.119273
9 3 B 0.303639
10 3 C 0.911700
11 3 D 0.807861
An addition to #Paul_H answer.
Side-by-side boxplots on the single matplotlib.axes.Axes, no seaborn:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 1, 1, 1, 2, 1, 2, 2, 1]
mask_e = df['E'] == 1
# prepare data
data_to_plot = [df[mask_e]['A'], df[~mask_e]['A'],
df[mask_e]['B'], df[~mask_e]['B'],
df[mask_e]['C'], df[~mask_e]['C'],
df[mask_e]['D'], df[~mask_e]['D']]
# Positions defaults to range(1, N+1) where N is the number of boxplot to be drawn.
# we will move them a little, to visually group them
plt.figure(figsize=(10, 6))
box = plt.boxplot(data_to_plot,
positions=[1, 1.6, 2.5, 3.1, 4, 4.6, 5.5, 6.1],
labels=['A1','A0','B1','B0','C1','C0','D1','D0'])