How to pass several previous states using scan in Tensorflow. - python-2.7

I'm going to modify DRAW(Deep Recurrent Attentive Writer) code that other person shared here for variable length sequence using tf.scan function. So I need to change the for loop in the original code into a structure that is suitable for scan function. Below is original part of the code,
...
for t in range(T):
c_prev = tf.zeros((batch_size,img_size)) if t==0 else cs[t-1]
x_hat=x-tf.sigmoid(c_prev) # error image
r=read(x,x_hat,h_dec_prev)
h_enc,enc_state=encode(enc_state,tf.concat(1,[r,h_dec_prev]))
z,mus[t],logsigmas[t],sigmas[t]=sampleQ(h_enc)
h_dec,dec_state=decode(dec_state,z)
cs[t]=c_prev+write(h_dec) # store results
h_dec_prev=h_dec
DO_SHARE=True # from now on, share variables
...
In order to use tf.scan, I need to pass several previous states(c_prev, h_dec_prev...). However, as I know tf.scan only gets one tensor (is it right?) for the loop as an example in here
elems = np.array([1, 2, 3, 4, 5, 6])
sum = scan(lambda a, x: a + x, elems)
It seems there should be only one a and it should be a tensor. In this case, only possible way I can imagine is to flatten several different state tensors and concatenate it. But I'm worrying that it will mess up the code and make slow down the speed a lot especially when the state sizes are all different. Is there any efficient (and fast) way to handle this kind of problem?

Related

Is there a way to pass which "level" of structure is desired to a formula in (Arduino) C++?

I am not hugely experienced in C++ coding, but I learn pretty well as I go. But, I have not been able to properly query how to do this, may be using wrong terms or insufficiently expressing my desire. Here's the situation.
I have a lot of variables (3x12) that I have set up under a structure:
struct Tracking
{
String Title;
BoolArray n24hr;
bool State;
unsigned char Days, Weeks;
uint16_t Minutes, TotalMinutes, Daily, Weekly, Monthly, n7d[7], n4w[4];
} Components[3];
I also have code that performs basically the same thing 3 times but on different "levels", e.g. daily, weekly, monthly. It keeps tracks of status over those time periods, filling arrays, finding totals, and duty cycles, etc. It fills the minutes into days, and when that reaches a week, it puts the totals into a week format, and repeats until it reaches monthly levels. So basically, I have it doing something like:
in my main loop:
//calls status formula
StatusFormula();
in a separate file:
//status formula defined
void StatusFormula()
{
// for each element of Components:
//determine current status
//for daily
//add it to the correct spot in the array
//perform calculations on it
//when it reaches a week:
//add it to the correct spot in the next array
//perform calculations on it
//when it reaches a month:
//add it to the correct spot in the next array
//perform calculations on it
}
These calculations are all basically the same, the only differences are the structure member names & the constants for the calculations (i.e., MinsADay, DaysAWk, etc.).
I can get it to work this way, it just means a lot more lines and if I want to change something, I have to repeat it 3 times. What I want is something like this:
in my main loop:
//calls status formula
StatusFormula("Daily"); //sends the status formula information to decide which level (daily, weekly, monthly), it supposed to work on
if (Components[i].Minutes == MinsADay)
{
StatusFormula("Weekly"); //sends the status formula information to decide which level (daily, weekly, monthly), it supposed to work on
if (Components[i].Daily == DaysAWk)
{
StatusFormula("Monthly");
}
}
in a separate file:
//status formula defined
void StatusFormula()
{
//determine which level & variables to use (I would probably use case for this), then
//add it to the correct spot in the correct array
//perform calculations on it
}
I tried passing the level using a string, but it didn't work:
in my main loop:
StatusFormula(i, "Daily"); //sending data to formula, where i is value 0 to 2 for the Components array & defined earlier in the for loop.
in a separate file:
//formula defined as:
void StatusFormula(uint8_t counter, string level)
{Components[counter].level -= //etc... performing calculations as desired.
//so I thought this should evaluate to "Components[i].Daily -=" (& i would be a value 0 to 2) & treat it like the structure, but it doesn't work that way apparently.
I tried passing the structure & variable itself, but that didn't work either:
in my main loop:
StatusFormula(i, Components[i].Daily); //sending data to formula
in a separate file:
//formula defined as:
void StatusFormula(uint8_t counter, Tracking& level)
{level -= //etc... //(level should be Components[i].Daily -=" (& i would be a value 0 to 2)) this didn't work either.
I couldn't find any google searches to help me, and I trialed-and-errored a bunch of ways, but I couldn't figure out how to do that in C++, let alone on the Arduino platform. In Excel VBA, I would just have the variable passed as a string to the formula, which would substitute the word and then treat it like the variable that it is, but I couldn't make that happen in C++. Also to note, I am going to try and define this a separate file/tab so that my massive code file is easier to read/edit, in case that makes a difference. I would paste my code directly, but it is long and super confusing.
I guess what I am asking is how would I pass the structure and/or structure member to the formula in a way that would say the equivalent of:
case 1: //"Daily"
//use Components[i].Daily & Components[i].Minutes & MinsaDay
break;
case 2: //"Weekly"
//use Components[i].Weekly & Components[i].Days & DaysaWk
break;
//etc.
I feel like there should be a way & that I am just missing a small, vital piece. Several people in the comments suggested enums, and after researching, it might possibly be what I want, but I am having trouble visualizing it at the moment and need to do more research and examples. Any suggestions or examples on how to send the appropriate structure & members to the formula to be modified in it?

Declaring variables in Python 2.7x to avoid issues later

I am new to Python, coming from MATLAB, and long ago from C. I have written a script in MATLAB which simulates sediment transport in rivers as a Markov Process. The code randomly places circles of a random diameter within a rectangular area of a specified dimension. The circles are non-uniform is size, drawn randomly from a specified range of sizes. I do not know how many times I will step through the circle placement operation so I use a while loop to complete the process. In an attempt to be more community oriented, I am translating the MATLAB script to Python. I used the online tool OMPC to get started, and have been working through it manually from the auto-translated version (was not that helpful, which is not surprising). To debug the code as I go, I use the
MATLAB generated results to generally compare and contrast against results in Python. It seems clear to me that I have declared variables in a way that introduces problems as calculations proceed in the script. Here are two examples of consistent problems between different instances of code execution. First, the code generated what I think are arrays within arrays because the script is returning results which look like:
array([[ True]
[False]], dtype=bool)
This result was generated for the following code snippet at the overlap_logix operation:
CenterCoord_Array = np.asarray(CenterCoordinates)
Diameter_Array = np.asarray(Diameter)
dist_check = ((CenterCoord_Array[:,0] - x_Center) ** 2 + (CenterCoord_Array[:,1] - y_Center) ** 2) ** 0.5
radius_check = (Diameter_Array / 2) + radius
radius_check_update = np.reshape(radius_check,(len(radius_check),1))
radius_overlap = (radius_check_update >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap), dtype=int)
overlap_update = np.reshape(overlap,(len(overlap),1))
overlap_logix = (radius_overlap == 1)
idx_true = overlap_update[overlap_logix]
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
A similar result for the same run was produced for variables:
radius_check_update
radius_overlap
overlap_update
Here is the same code snippet for the working MATLAB version (as requested):
distcheck = ((Circles.CenterCoordinates(1,:)-x_Center).^2 + (Circles.CenterCoordinates(2,:)-y_Center).^2).^0.5;
radius_check = (Circles.Diameter ./ 2) + radius;
radius_overlap = (radius_check >= distcheck);
% Now actually check the overalp condition.
if sum(radius_overlap) == 0
% The new circle does not overlap so proceed.
newCircle_Found = 1;
debug_value = 2;
elseif sum(radius_overlap) == 1
% The new circle overlaps with one other circle
temp = 1:size(radius_overlap,2);
idx_true = temp(radius_overlap == 1);
radius = distcheck(1,idx_true) - (Circles.Diameter(1,idx_true)/2);
In the Python version I have created arrays from lists to more easily operate on the contents (the first two lines of the code snippet). The array within array result and creating arrays to access data suggests to me that I have incorrectly declared variable types, but I am not sure. Furthermore, some variables have a size, for example, (2L,) (the numerical dimension will change as circles are placed) where there is no second dimension. This produces obvious problems when I try to use the array in an operation with another array with a size (2L,1L). Because of these problems I started reshaping arrays, and then I stopped because I decided these were hacks because I had declared one, or more than one variable incorrectly. Second, for the same run I encountered the following error:
TypeError: 'numpy.ndarray' object is not callable
for the operation:
radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
which occurs at the bottom of the above code snippet. I have posted the entire script at the following link because it is probably more useful to execute the script for oneself:
https://github.com/smchartrand/MarkovProcess_Bedload
I have set-up the code to run with some initial parameter values so decisions do not need to be made; these parameter values produce the expected results in the MATLAB-based script, which look something like this when plotted:
So, I seem to specifically be having issues with operations on lines 151-165, depending on the test value np.sum([radius_overlap]) and I think it is because I incorrectly declared variable types, but I am really not sure. I can say with confidence that the Python version and the MATLAB version are consistent in output through the first step of the while loop, and code line 127 which is entering the second step of the while loop. Below this point in the code the above documented issues eventually cause the script to crash. Sometimes the script executes to 15% complete, and sometimes it does not make it to 5% - this is due to the random nature of circle placement. I am preparing the code in the Spyder (Python 2.7) IDE and will share the working code publicly as a part of my research. I would greatly appreciate any help that can be offered to identify my mistakes and misapplications of python coding practice.
I believe I have answered my own question, and maybe it will be of use for someone down the road. The main sources of instruction for me can be found at the following three web pages:
Stackoverflow Question 176011
SciPy FAQ
SciPy NumPy for Matlab users
The third web page was very helpful for me coming from MATLAB. Here is the modified and working python code snippet which relates to the original snippet provided above:
dist_check = ((CenterCoordinates[0,:] - x_Center) ** 2 + (CenterCoordinates[1,:] - y_Center) ** 2) ** 0.5
radius_check = (Diameter / 2) + radius
radius_overlap = (radius_check >= dist_check)
# Now actually check the overalp condition.
if np.sum([radius_overlap]) == 0:
# The new circle does not overlap so proceed.
newCircle_Found = 1
debug_value = 2
elif np.sum([radius_overlap]) == 1:
# The new circle overlaps with one other circle
overlap = np.arange(0,len(radius_overlap[0]), dtype=int).reshape(1, len(radius_overlap[0]))
overlap_logix = (radius_overlap == 1)
idx_true = overlap[overlap_logix]
radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
In the end it was clear to me that it was more straightforward for this example to use numpy arrays vs. lists to store results for each iteration of filling the rectangular area. For the corrected code snippet this means I initialized the variables:
CenterCoordinates, and
Diameter
as numpy arrays whereas I initialized them as lists in the posted question. This made a few mathematical operations more straightforward. I was also incorrectly indexing into variables with parentheses () as opposed to the correct method using brackets []. Here is an example of a correction I made which helped the code execute as envisioned:
Incorrect: radius = dist_check(idx_true,1) - (Diameter(idx_true,1) / 2)
Correct: radius = dist_check[idx_true] - (Diameter[0,idx_true] / 2)
This example also shows that I had issues with array dimensions which I corrected variable by variable. I am still not sure if my working code is the most pythonic or most efficient way to fill a rectangular area in a random fashion, but I have tested it about 100 times with success. The revised and working code can be downloaded here:
Working Python Script to Randomly Fill Rectangular Area with Circles
Here is an image of a final results for a successful run of the working code:
The main lessons for me were (1) numpy arrays are more efficient for repetitive numerical calculations, and (2) dimensionality of arrays which I created were not always what I expected them to be and care must be practiced when establishing arrays. Thanks to those who looked at my question and asked for clarification.

K-Means Algorithm not working properly

I was trying to write my own K-Means clustering algorithm however it is not working.Can someone take a look and help me finding what mistake I am committing.I am fairly new.
I expect the data to be clustered in 2 groups since K=2.However I am not getting the expected result.I think mean assignment is not working properly.Can someone give a look?
https://github.com/DivJ/Robo_Lab/blob/master/K_Means.py
dist=[]
lab=[]
x_sum,y_sum=0,0
x_sum1,y_sum1=0,0
k=2
mean=pt[:k]
def assignment():
global dist
global lab
for i in range(0,100):
for j in range(0,k):
dist.append(math.hypot(pt[i,0]-mean[j,0],pt[i,1]-mean[j,1]))
lab.append(dist.index(min(dist)))
dist=[]
def mean_shift():
global x_sum,x_sum1,y_sum,y_sum1,lab
for i in range(0,100):
if(lab[i]==0):
plt.scatter(pt[i,0],pt[i,1],c='r')
x_sum=pt[i,0]+x_sum
y_sum=pt[i,1]+y_sum
elif(lab[i]==1):
plt.scatter(pt[i,0],pt[i,1],c='b')
x_sum1=pt[i,0]+x_sum1
y_sum1=pt[i,1]+y_sum1
mean[0,0]=x_sum/lab.count(0)
mean[0,1]=y_sum/lab.count(0)
mean[1,0]=x_sum1/lab.count(1)
mean[1,1]=y_sum1/lab.count(1)
lab=[]
def k_means(itr):
for z in range(0,itr):
assignment()
mean_shift()
k_means(100)
Here's what's wrong with your code:
1) You initialize means as pt[:k], however later you reassign means which leads to the first two points being reassigned unintentionally since means merely is a pointer to these points. You need to create a copy of the first to points to avoid changing them:
import copy
means=copy.copy(pt[:k])
2) You initialize x_sum, y_sum, x_sum1 and y_sum1 outside of mean_shift() which causes the sums to grow bigger and bigger with each iteration. Set them to 0 every time you call mean_shift().

How to feed in and retrieve state of LSTM in tensorflow C/ C++

I'd like to build and train a multi-layer LSTM model (stateIsTuple=True) in python, and then load and use it in C++. But I'm having a hard time figuring out how to feed and fetch states in C++, mainly because I don't have string names which I can reference.
E.g. I put the initial state in a named scope such as
with tf.name_scope('rnn_input_state'):
self.initial_state = cell.zero_state(args.batch_size, tf.float32)
and this appears in the graph as below, but how can I feed to these in C++?
Also, how can I fetch the current state in C++? I tried the graph construction code below in python but I'm not sure if it's the right thing to do, because last_state should be a tuple of tensors, not a single tensor (though I can see that the last_state node in tensorboard is 2x2x50x128, which sounds like it just concatenated the states as I have 2 layers, 128 rnn size, 50 mini batch size, and lstm cell - with 2 state vectors).
with tf.name_scope('outputs'):
outputs, last_state = legacy_seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None)
output = tf.reshape(tf.concat(outputs, 1), [-1, args.rnn_size], name='output')
and this is what it looks like in tensorboard
Should I concat and split the state tensors so there is only ever one state tensor going in and out? Or is there a better way?
P.S. Ideally the solution won't involve hard-coding the number of layers (or rnn size). So I can just have four strings input_node_name, output_node_name, input_state_name, output_state_name, and the rest is derived from there.
I managed to do this by manually concatenating the state into a single tensor. I'm not sure if this is wise, since this is how tensorflow used to handle states, but is now deprecating that and switching to tuple states. Instead of setting state_is_tuple=False and risking my code being obsolete soon, I've added extra ops to manually stack and unstack the states to and from a single tensor. Saying that, it works fine both in python and C++.
The key code is:
# setting up
zero_state = cell.zero_state(batch_size, tf.float32)
state_in = tf.identity(zero_state, name='state_in')
# based on https://medium.com/#erikhallstrm/using-the-tensorflow-multilayered-lstm-api-f6e7da7bbe40#.zhg4zwteg
state_per_layer_list = tf.unstack(state_in, axis=0)
state_in_tuple = tuple(
# TODO make this not hard-coded to LSTM
[tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0], state_per_layer_list[idx][1])
for idx in range(num_layers)]
)
outputs, state_out_tuple = legacy_seq2seq.rnn_decoder(inputs, state_in_tuple, cell, loop_function=loop if infer else None)
state_out = tf.identity(state_out_tuple, name='state_out')
# running (training or inference)
state = sess.run('state_in:0') # zero state
loop:
feed = {'data_in:0': x, 'state_in:0': state}
[y, state] = sess.run(['data_out:0', 'state_out:0'], feed)
Here is the full code if anyone needs it
https://github.com/memo/char-rnn-tensorflow

TensorFlow apply_gradients remotely

I'm trying to split up the minimize function over two machines. On one machine, I'm calling "compute_gradients", on another I call "apply_gradients" with gradients that were sent over the network. The issue is that calling apply_gradients(...).run(feed_dict) doesn't seem to work no matter what I do. I've tried inserting placeholders in place of the tensor gradients for apply_gradients,
variables = [W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2]
loss = -tf.reduce_sum(y_ * tf.log(y_conv))
optimizer = tf.train.AdamOptimizer(1e-4)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
compute_gradients = optimizer.compute_gradients(loss, variables)
placeholder_gradients = []
for grad_var in compute_gradients:
placeholder_gradients.append((tf.placeholder('float', shape=grad_var[1].get_shape()) ,grad_var[1]))
apply_gradients = optimizer.apply_gradients(placeholder_gradients)
then later when I receive the gradients I call
feed_dict = {}
for i, grad_var in enumerate(compute_gradients):
feed_dict[placeholder_gradients[i][0]] = tf.convert_to_tensor(gradients[i])
apply_gradients.run(feed_dict=feed_dict)
However, when I do this, I get
ValueError: setting an array element with a sequence.
This is only the latest thing I've tried, I've also tried the same solution without placeholders, as well as waiting to create the apply_gradients operation until I receive the gradients, which results in non-matching graph errors.
Any help on which direction I should go with this?
Assuming that each gradients[i] is a NumPy array that you've fetched using some out-of-band mechanism, the fix is simply to remove the tf.convert_to_tensor() invocation when building feed_dict:
feed_dict = {}
for i, grad_var in enumerate(compute_gradients):
feed_dict[placeholder_gradients[i][0]] = gradients[i]
apply_gradients.run(feed_dict=feed_dict)
Each value in a feed_dict should be a NumPy array (or some other object that is trivially convertible to a NumPy array). In particular, a tf.Tensor is not a valid value for a feed_dict.