Trying to use Caffe2 to add two blobs together that contain matrices - caffe2

I'm trying to add the values of two blobs together. These blobs contain a matrix that is 2,2.
workspace.FeedBlob("X", np.random.randn(2, 2).astype(np.float32))
workspace.FeedBlob("Y", np.random.randn(2, 2).astype(np.float32))
net = core.Net('net')
sum_stuff = net.Add([X, Y])

What exactly did not work? The following example will create two 2x2 matrices and add them together:
from caffe2.python import workspace, model_helper, core
import numpy as np
# create 2x2 matrices with random integer from 0 to 9
# and feed them to the workspace
workspace.FeedBlob("X", np.random.randint(0,9,size=(2,2)).astype(np.int))
workspace.FeedBlob("Y", np.random.randint(0,9,size=(2,2)).astype(np.int))
# define a network which adds the two blobs together and
# stores the result as Sum
net = core.Net('net')
sum_stuff = net.Add(["X", "Y"], "Sum")
# run the network
workspace.CreateNet(net)
workspace.RunNet(net.Proto().name)
# get the values from the workspace
X = workspace.FetchBlob("X")
Y = workspace.FetchBlob("Y")
Sum = workspace.FetchBlob("Sum")
# print the result to check if correct
print("First matrix:\n{0}".format(X))
print("Second matrix:\n{0}".format(Y))
print("Sum of the two matrices:\n{0}".format(Sum))

Related

Keras ImageDataGenerator: random transform

I'm interested in augmenting my dataset with random image transformations. I'm using Keras ImageDataGenerator, and I'm getting the following error when trying to apply random_transform to a single image:
--> x = apply_transform(x, transform matrix, img_channel_axis, fill_mode, cval)
>>> RuntimeError: affine matrix has wrong number of rows.
I found the source code for the ImageDataGenerator here. However, I'm not sure how to debug the runtime error. Below is the code I have:
from keras.preprocessing.image import img_to_array, load_img
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.inception_v3 import preprocess_input
image_path = './figures/zebra.jpg'
#data augmentation
train_datagen = ImageDataGenerator(
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest')
print "\nloading image..."
image = load_img(image_path, target_size=(299, 299))
image = img_to_array(image)
image = np.expand_dims(image, axis=0) # 1 x input_shape
image = preprocess_input(image)
train_datagen.fit(image)
image = train_datagen.random_transform(image)
The error occurs at the last line when calling random_transform.
The problem is that random_transform expects a 3D-array.
See the docstring:
def random_transform(self, x, seed=None):
"""Randomly augment a single image tensor.
# Arguments
x: 3D tensor, single image.
seed: random seed.
# Returns
A randomly transformed version of the input (same shape).
"""
So you'll need to call it before np.expand_dims.

how to give the test size in stratified kfold sampling in python?

Using sklearn , I want to have 3 splits (i.e. n_splits = 3)in the sample dataset and have a Train/Test ratio as 70:30. I'm able split the set into 3 folds but not able to define the test size (similar to train_test_split method).Is there a way to do define test sample size in StratifiedKFold ?
from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
StratifiedKFold does by definition a K-fold split. This is, the iterator returned will yield (K-1) sets for training while 1 set for testing. K is controlled by n_splits, and thus, it does create groups of n_samples/K, and use all combinations of K-1 for training/testing. Refer to wikipedia or google K-fold cross-validation for more info about it.
In short, the size of the test set will be 1/K (i.e. 1/n_splits), so you can tune that parameter to control the test size (e.g. n_splits=3 will have test split of size 1/3 = 33% of your data). However, StratifiedKFold will iterate over K groups of K-1, and might not be what you want.
Having said that, you might be interested in StratifiedShuffleSplit, which returns just configurable number of splits and train/test ratio. If you just want a single split, you can tune n_splits=1 and yet keep test_size=0.3 (or whatever ratio you want).

How to Load custom data sets in neon nervana

If any one is familiar with nervana's neon please can you give me example of how to load custom dataset in this example of neon.
Here is an example dataset. You can also check their docs. You'll see a refernce later to a "DataSet" in __iter__ but that is just contains some function generate an item. The key is to make sure you create contiguous X, y pairs, set them on a backend tensor and yield. Hope that helps.
import numpy as np
from data import DataSet
from operator import mul
from neon.data import NervanaDataIterator
class CustomLoader(NervanaDataIterator):
def __init__(self, in_data, img_shape, n_classes):
# Load the numpy data into some variables. We divide the image by 255 to normalize the values
# between 0 and 1.
self.shape = img_shape # shape of the input data (e.g. for images, (C, H, W))
# 1. assign some required and useful attributes
self.start = 0 # start at zero
self.ndata = in_data.shape[0] # number of images in X (hint: use X.shape)
self.nfeatures = reduce(mul, img_shape, 1) # number of features in X (hint: use X.shape)
# number of minibatches per epoch
# to calculate this, use the batchsize, which is stored in self.be.bsz
self.nbatches = self.ndata / self.be.bsz
# 2. allocate memory on the GPU for a minibatch's worth of data.
# (e.g. use `self.be` to access the backend.). See the backend documentation.
# to get the minibatch size, use self.be.bsz
# hint: X should have shape (# features, mini-batch size)
# hint: use some of the attributes previously defined above
self.dev_X = self.be.zeros((self.nfeatures, self.be.bsz))
self.dev_Y = self.be.zeros((n_classes, self.be.bsz))
self.data_loader = DataSet(in_data, self.be.bsz)
self.data_loader.start()
def reset(self):
self.data_loader.stop()
self.start = 0
self.data_loader.start()
def __iter__(self):
# 3. loop through minibatches in the dataset
for index in xrange(self.nbatches):
# 3a. grab the right slice from the numpy arrays
inputs, targets, _ = self.data_loader.batch()
inputs = inputs.ravel()
# The arrays X and Y data are in shape (batch_size, num_features),
# but the iterator needs to return data with shape (num_features, batch_size).
# here we transpose the data, and then store it as a contiguous array.
# numpy arrays need to be contiguous before being loaded onto the GPU.
inputs = np.ascontiguousarray(inputs.T / 255.0)
targets = np.ascontiguousarray(targets.T)
# here we test your implementation
# your slice has to have the same shape as the GPU tensors you allocated
assert inputs.shape == self.dev_X.shape, \
"inputs has shape {}, but dev_X is {}".format(inputs.shape, self.dev_X.shape)
assert targets.shape == self.dev_Y.shape, \
"targets has shape {}, but dev_Y is {}".format(targets.shape, self.dev_Y.shape)
# 3b. transfer from numpy arrays to device
# - use the GPU memory buffers allocated previously,
# and call the myTensorBuffer.set() function.
self.dev_X.set(inputs)
self.dev_Y.set(targets)
# 3c. yield a tuple of the device tensors.
# X should be of shape (num_features, batch_size)
# Y should be of shape (4, batch_size)
yield (self.dev_X, self.dev_Y)

Matching Numpy and NetCDF4 indexing when creating a netCDF file

I'm trying to move values from a numpy array to a NetCDF file, which I am creating. Currently I'm trying to find the best way to emulate 'fancy indexing' of numpy arrays when creating a netCDF file, but the two indexing systems don't match when the dataset only has two points.
import netCDF4
import numpy as np
rootgrp = netCDF4.Dataset('Test.nc','w',format='NETCDF4')
time = rootgrp.createDimension('time',None)
dim1 = rootgrp.createDimension('dim1',100)
dim2 = rootgrp.createDimension('dim2',100)
dim3 = rootgrp.createDimension('dim3',100)
ncVar = rootgrp.createVariable('ncVar','f4',('time','dim1','dim2','dim3'))
npArr = np.arange(0,10000)
npArr = np.reshape(npArr,(100,100))
So this works just fine:
x,y=np.array(([1,75,10,99],[40,88,19,2]))
ncVar[0,x,y,0] = npArr[x,y]
While this does not:
x,y=np.array(([1,75],[40,88]))
ncVar[0,x,y,0] = npArr[x,y]
These assignments are part of a dynamic loop that determines x,y to create values for ncVar at ~1000 time-steps
EDIT: the issue seems to be that the first case recognizes x,y as defining a series of pts, and so returns a [4,] size array (despite the documentation on netCDF4 'fancy indexing'), while the second interprets them combinatorially and so returns a [2,2] size array (as stated in the documentation). Has anyone run into this or found a workaround?

add_edges_from three tuples networkx

I am trying to use networkx to create a DiGraph. I want to use add_edges_from(), and I want the edges and their data to be generated from three tuples.
I am importing the data from a CSV file. I have three columns: one for ids (first set of nodes), one for a set of names (second set of nodes), and another for capacities (no headers in the file). So, I created a dictionary for the ids and capacities.
dictionary = dict(zip(id, capacity))
then I zipped the tuples containing the edges data:
List = zip(id, name, capacity)
but when I execute the next line, it gives me an assertion error.
G.add_edges_from(List, 'weight': 1)
Can someone help me with this problem? I have been trying for a week with no luck.
P.S. I'm a newbie in programming.
EDIT:
so, i found the following solution. I am honestly not sure how it works, but it did the job!
Here is the code:
import networkx as nx
import csv
G = nx.DiGraph()
capacity_dict = dict(zip(zip(id, name),capacity))
List = zip(id, name, capacity)
G.add_edges_from(capacity_dict, weight=1)
for u,v,d in List:
G[u][v]['capacity']=d
Now when I run:
G.edges(data=True)
The result will be:
[(2.0, 'First', {'capacity': 1.0, 'weight': 1}), (3.0, 'Second', {'capacity': 2.0, 'weight': 1})]
I am using the network simplex. Now, I am trying to find a way to make the output of the flowDict more understandable, because it is only showing the ids of the flow. (Maybe i'll try to input them in a database and return the whole row of data instead of using the ids only).
A few improvements on your version. (1) NetworkX algorithms assume that weight is 1 unless you specifically set it differently. Hence there is no need to set it explicitly in your case. (2) Using the generator allows the capacity attribute to be set explicitly and other attributes to also be set once per record. (3) The use of a generator to process each record as it comes through saves you having to iterate through the whole list twice. The performance improvement is probably negligible on small datasets but still it feels more elegant. Having said that -- your method clearly works!
import networkx as nx
import csv
# simulate a csv file.
# This makes a multi-line string behave as a file.
from StringIO import StringIO
filehandle = StringIO('''a,b,30
b,c,40
d,a,20
''')
# process each row in the file
# and generate an edge from each
def edge_generator(fh):
reader = csv.reader(fh)
for row in reader:
row[-1] = float(row[-1]) # convert capacity to float
# add other attributes to the dict() below as needed...
# e.g. you might add weights here as well.
yield (row[0],
row[1],
dict(capacity=row[2]))
# create the graph
G = nx.DiGraph()
G.add_edges_from(edge_generator(filehandle))
print G.edges(data=True)
Returns this:
[('a', 'b', {'capacity': 30.0}),
('b', 'c', {'capacity': 40.0}),
('d', 'a', {'capacity': 20.0})]