Appending lists using multiprocessing pool - python-2.7

I am new to python, and the multiprocessing module. I created a far simplified version of what I am trying to accomplish to distill my problem. The issue is that the variables don't seem to update when called outside of the function where they are appended/the worker processes .
After researching I thought it might have something to do with queues? However, I believe queues to be more about sharing memory between the processes which I don't believe is required in my situation, since each list could be appended independently.
from multiprocessing import Pool
def build(array):
array.append("hello")
return array
if __name__== '__main__':
x=["yo","sup"]
y=["blah", "blah"]
z=["apple","banana"]
w=["cats", "dogs"]
p=Pool(4)
p.map(build,[x,y,z,w])
p.close()
p.join()
print x, y, z, w
When I run the code above, it simply returns x,y,z,w as imputed without appending "hello" to each list and I cannot figure out why. I know that if I put the print statement at the end of the function build that it will output the appended lists. I also realize that I could do the following:
results = p.map(build,[x,y,z,w])
print results
However, in my actual project I need to utilize x, y, z, w later and would prefer not to index results to get the list I am looking for. Is there anyway to have the changes made to the lists stick, so to speak, outside of the worker processes?

Each process has its own memory heap so your lists are copied to the Process Pool workers memory and are changed only there

Related

Get job from spooler - C++

With a printer that doesn't exist, I send to the spooler different files. In my software, I try to get all files existing in the queue of the spooler. For that, I tried the following instruction:
bool t = EnumJobs(hPrinter, 0,1,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
I get jobId in the field 'JobId' of the structure.
In the structure type 'JOB_INFO_3', the field 'JobId' is well filled but the field 'nextJobId' is not filled. Why?
It's the same problem when I execute the following instruction:
bool t = EnumJobs(hPrinter, 0,3,3, (LPBYTE) &h, sizeof(JOB_INFO_3), &pcbNeeded, &pcReturned)
Moreover, the field 'JobId' is not filled. Why ?
Then, I don't know how to get info(filename, state, number of pages, etc) of a particular job. I tried the following instruction but it didn't work:
GetJobA(hPrinter, h.JobId, 1, (LPBYTE) &job_info_1, sizeof(JOB_INFO_1), & nbBytes)
And my last question is: Is it possible to get all the jobs from the spooler of the printer?
Do you have any solutions?
So, I'm not sure what the rest of your code looks like, but it looks possible that you're not using the API quite correctly. The MSDN documentation suggests that you should call the EnumJobs API twice.
To determine the required buffer size, call EnumJobs with cbBuf set to zero. EnumJobs fails, GetLastError returns ERROR_INSUFFICIENT_BUFFER, and the pcbNeeded parameter returns the size, in bytes, of the buffer required to hold the array of structures and their data.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
The flow goes like this:
Call EnumJobs for the first time to see how much memory needs to be allocated for your JOB_INFO_n array.
Allocate the memory required for your JOB_INFO_n array.
Call EnumJobs with your JOB_INFO_n array.
Looking at the call to EnumJobs where you attempt to get the first three jobs, the size of your pJob appears to be sizeof(JOB_INFO_3), where it should be three times this size in order to hold all three jobs. What is the return from EnumJobs for that call?
The reason why nextJobId is not filled in is likely a misunderstanding of the field. This field is for print jobs that have been linked together, not to find out which print job is next in the queue.
NextJobId - The print job identifier for the next print job in the linked set of print jobs.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd145021(v=vs.85).aspx
As for the information about the print job, this is going to be difficult. Unfortunately, there is no way I know of to get the name/path of the file printed. There's no concept of this in the spooler APIs. Consider a print job which isn't backed by a file for example. The best you get is the print job name, which is set by the printing application.
For pages, it looks like there is a TotalPages field in the JOB_INFO_1 structure. That may be of some use to you. It looks like you're already trying to get the JOB_INFO_1 structure but having some troubles. If the API is failing, you can use GetLastError() to identify what the issue is. Does the job ID you're passed in exist?
https://msdn.microsoft.com/en-us/library/windows/desktop/ms679360(v=vs.85).aspx
For the last question about getting all print jobs from the queue. It seems that the MSDN documentation suggests the following:
To determine the number of print jobs in the printer queue, call the GetPrinter function with the Level parameter set to 2.
https://msdn.microsoft.com/en-us/library/windows/desktop/dd162625(v=vs.85).aspx
Hope this helps.

Program Seems to Terminate Early

I get the feeling this is one of those really simple problems where there's something I just don't understand about the language. But I'm trying to learn Elixir, and my program isn't running all the way through. I've got a minimal example here.
defmodule Foo do
def run(0) do
IO.puts("0")
end
def run(n) do
IO.puts(to_string n)
run(n - 1)
end
def go do
run(100)
end
end
# Foo.go
# spawn &Foo.go/0
Now, if I uncomment the Foo.go line at the bottom and run it with elixir minimal.exs, then I get the intended output, which is all of the numbers from 100 down to 0. If I uncomment only the spawn &Foo.go/0 line, I consistently get no output at all.
However, if I uncomment both lines and run the program, I get the numbers from 100 to 0 (from the first line), then the first few numbers (usually about 100 to 96 or so) before the program terminates for some reason. So I really don't know what's causing the process to terminate at a random point.
It's worth pointing out that the reason this confusion arose for me was that I was trying to use mix to compile a larger project, when the program seemed to get started, do a small part of its work, and then terminate because mix apparently stops running after a bit. So I'm not sure what the idiomatic way to run an Elixir program is either, given that mix seems to terminate it after a short while anyway.
spawn/1 will create a new process to run the function. While they are not the same, you can sort of think of an Erlang / Elixir process as a thread in most other languages.
So, when you start your program, the "main" process gets to doing some work. In your case, it creates a new process (lets call it "Process A") to output the numbers from 100 down to 0. However, the problem is that spawn/1 does not block. Meaning that the "main" process will keep executing and not wait for "Process A" to return.
So what is happening is that your "main" process is completing execution which ends the entire program. This is normal for every language I have ever used.
If you wanted to spawn some work in a different process and make sure it finishes execution BEFORE ending your program, you have a couple different options.
You could use the Task module. Something along the lines of this should work.
task = Task.async(&Foo.go/0)
Task.await(task)
You could explicitly send and receive messages
defmodule Foo do
def run(0, pid) do
IO.puts("0")
# This will send the message back to the "main" thread upon completion.
send pid, {:done, self()}
end
def run(n, pid) do
IO.puts(to_string n)
run(n - 1, pid)
end
# We now pass along the pid of the "main" thread into the go function.
def go(pid) do
run(100, pid)
end
end
# Use spawn/3 instead so we can pass in the "main" process pid.
pid = spawn(Foo, :go, [self()])
# This will block until it receives a message matching this pattern.
receive do
# The ^ is the pin operator. It ensures that we match against the same pid as before.
{:done, ^pid} -> :done
end
There are other ways of achieving this. Unfortunately, without knowing more about the problem you are trying to solve, I can only make basic suggestions.
With all of that said, mix will not arbitrarily stop your running program. For whatever reason, the "main" process must have finished execution. Also mix is a build tool, not really the way you should be running your application (though, you can). Again, without knowing what you are attempting to do or seeing your code, I cannot give you anything more than this.

How to feed in and retrieve state of LSTM in tensorflow C/ C++

I'd like to build and train a multi-layer LSTM model (stateIsTuple=True) in python, and then load and use it in C++. But I'm having a hard time figuring out how to feed and fetch states in C++, mainly because I don't have string names which I can reference.
E.g. I put the initial state in a named scope such as
with tf.name_scope('rnn_input_state'):
self.initial_state = cell.zero_state(args.batch_size, tf.float32)
and this appears in the graph as below, but how can I feed to these in C++?
Also, how can I fetch the current state in C++? I tried the graph construction code below in python but I'm not sure if it's the right thing to do, because last_state should be a tuple of tensors, not a single tensor (though I can see that the last_state node in tensorboard is 2x2x50x128, which sounds like it just concatenated the states as I have 2 layers, 128 rnn size, 50 mini batch size, and lstm cell - with 2 state vectors).
with tf.name_scope('outputs'):
outputs, last_state = legacy_seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None)
output = tf.reshape(tf.concat(outputs, 1), [-1, args.rnn_size], name='output')
and this is what it looks like in tensorboard
Should I concat and split the state tensors so there is only ever one state tensor going in and out? Or is there a better way?
P.S. Ideally the solution won't involve hard-coding the number of layers (or rnn size). So I can just have four strings input_node_name, output_node_name, input_state_name, output_state_name, and the rest is derived from there.
I managed to do this by manually concatenating the state into a single tensor. I'm not sure if this is wise, since this is how tensorflow used to handle states, but is now deprecating that and switching to tuple states. Instead of setting state_is_tuple=False and risking my code being obsolete soon, I've added extra ops to manually stack and unstack the states to and from a single tensor. Saying that, it works fine both in python and C++.
The key code is:
# setting up
zero_state = cell.zero_state(batch_size, tf.float32)
state_in = tf.identity(zero_state, name='state_in')
# based on https://medium.com/#erikhallstrm/using-the-tensorflow-multilayered-lstm-api-f6e7da7bbe40#.zhg4zwteg
state_per_layer_list = tf.unstack(state_in, axis=0)
state_in_tuple = tuple(
# TODO make this not hard-coded to LSTM
[tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0], state_per_layer_list[idx][1])
for idx in range(num_layers)]
)
outputs, state_out_tuple = legacy_seq2seq.rnn_decoder(inputs, state_in_tuple, cell, loop_function=loop if infer else None)
state_out = tf.identity(state_out_tuple, name='state_out')
# running (training or inference)
state = sess.run('state_in:0') # zero state
loop:
feed = {'data_in:0': x, 'state_in:0': state}
[y, state] = sess.run(['data_out:0', 'state_out:0'], feed)
Here is the full code if anyone needs it
https://github.com/memo/char-rnn-tensorflow

TensorFlow apply_gradients remotely

I'm trying to split up the minimize function over two machines. On one machine, I'm calling "compute_gradients", on another I call "apply_gradients" with gradients that were sent over the network. The issue is that calling apply_gradients(...).run(feed_dict) doesn't seem to work no matter what I do. I've tried inserting placeholders in place of the tensor gradients for apply_gradients,
variables = [W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2]
loss = -tf.reduce_sum(y_ * tf.log(y_conv))
optimizer = tf.train.AdamOptimizer(1e-4)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
compute_gradients = optimizer.compute_gradients(loss, variables)
placeholder_gradients = []
for grad_var in compute_gradients:
placeholder_gradients.append((tf.placeholder('float', shape=grad_var[1].get_shape()) ,grad_var[1]))
apply_gradients = optimizer.apply_gradients(placeholder_gradients)
then later when I receive the gradients I call
feed_dict = {}
for i, grad_var in enumerate(compute_gradients):
feed_dict[placeholder_gradients[i][0]] = tf.convert_to_tensor(gradients[i])
apply_gradients.run(feed_dict=feed_dict)
However, when I do this, I get
ValueError: setting an array element with a sequence.
This is only the latest thing I've tried, I've also tried the same solution without placeholders, as well as waiting to create the apply_gradients operation until I receive the gradients, which results in non-matching graph errors.
Any help on which direction I should go with this?
Assuming that each gradients[i] is a NumPy array that you've fetched using some out-of-band mechanism, the fix is simply to remove the tf.convert_to_tensor() invocation when building feed_dict:
feed_dict = {}
for i, grad_var in enumerate(compute_gradients):
feed_dict[placeholder_gradients[i][0]] = gradients[i]
apply_gradients.run(feed_dict=feed_dict)
Each value in a feed_dict should be a NumPy array (or some other object that is trivially convertible to a NumPy array). In particular, a tf.Tensor is not a valid value for a feed_dict.

gnuradio: how to change the noutput_items dynamically when writing OOT block?

When I make a OOT block in gnuradio
class mod(gr.sync_block):
"""
docstring for block mod
"""
def __init__(self):
gr.sync_block.__init__(self,
name="mod",
in_sig=[np.byte],
out_sig=[np.complex64])
def work(self, input_items, output_items):
in0 = input_items[0]
out = output_items[0]
result=do(....)
out[:]=result
return len(output_items[0])
I get:
ValueError: could not broadcast input array from shape (122879) into shape (4096)
How can I solve it?
GRC is as below:
selector :input index and output index are controlled by WX GUI Chooser block
FSK4 MOD: modulate fsk4 signal and write data to raw.bin
FSK4 DEMOD : read data from raw.bin and demodulate
file source -> /////// -> FSK4 MOD -> FSK4 DEMOD -> NULL SINK
selector
file source -> ////// -> GMKS MOD -> GMSK DEMOD ->NULL SINK
when the input index or output index is changed,the whole flow graph will be not responding.
There's two things:
You have a bug somewhere, and the solution is not to change something, but fix that bug. The full Python error message will tell you exactly in which line the error is.
noutput_items is a variable that GNU Radio sets at runtime to let you know how much output you might produce in this call to work. Hence, it's not something you can set, but it's something your work method must respect.
I think it's fair to assume that you're not very aware of how GNU Radio works:
GNU Radio is based on calling your block's work function when there is enough output space available and enough input items to process. The amount of output space that your block can use is passed to your work as a parameter, and will change between calls to work.
I very strongly recommend going through chapters 1-3 of the official Guided Tutorials if you haven't already. We always try to keep these tutorials up-to-date.
EDIT: Your command shows that you have not really understood what I meant, sorry. So: GNU Radio calls your work method over and over again while it's executing.
For example, it might call work with 4000 input items and 4000 output items space (you have a sync block, therefore number of input==number of output). Your work processes the first 1000 of that, and therefore return 1000. So there's 3000 items left.
Now, the upstream block does something, so there's 100 new items. Because the 3000 from before are still there, your block's work will get called with 3100 items.
Your work processes any number of items, and returns that number. GNU Radio makes sure that the "remaining" items stay available and will call your work again if there is enough in- our output.