Why LLVM IR generated by numba for vector addition is too complex - llvm

I wanted to check the LLVM IR for a vector addition from numba and noticed it generates a lot of IR just for a simple add. I was hoping a simple "add" IR but it generates 2000 lines of LLVM IR. Is there a way to get a minimal code?
from numba import jit
import numpy as np
#jit(nopython=True,nogil=True)
def mysum(a,b):
return a+b
a, b = 1.3 * np.ones(5), 2.2 * np.ones(5)
mysum(a, b)
# Get the llvm IR
llvm_ir =list(mysum.inspect_llvm().values())[0]
print(llvm_ir)
with open("llvm_ir.ll", "w") as file:
file.write(llvm_ir)
# Get the assembly code
asm = list(mysum.inspect_asm().values())[0]
print(asm)
with open("llvm_ir.asm", "w") as file:
file.write(asm)

Related

TensorFlow scatter_nd function is not working with placeholder and complex input

I am using tf.scatter_nd to update complex values at some index.
It seems the real and imaginary parts are somehow getting added together by this function. My question is how to make it work with placeholders. Here is the minimal working example where the variables b and e should have same values.
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
update=np.asarray([1.+2j])
idx=tf.constant( [[0]])
shp=tf.constant([1])
# works with constants
a=tf.constant(update)
b=tf.scatter_nd(idx,a,shp)
with tf.Session() as sess:
print sess.run(b) # correct output: 1.+2j
#Does not work with placeholders
d=tf.placeholder(tf.complex128)
e=tf.scatter_nd(idx,d,shp)
with tf.Session() as sess:
print sess.run(e,feed_dict={d:update}) # WRONG output: 3.+0j
I am using Anaconda python 2.7 + TensorFlow 1.7 GPU version installed using the conda command.
Edit:
The issue occurs when running the code on GPU. CPU version works correctly.
Here is the updated code to reproduce the issue in TensorFlow-GPU 1.8 installed using Anaconda Python 2.7.
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
update=np.asarray([1.+2j])
idx=tf.constant( [[0]])
shp=tf.constant([1])
a=tf.placeholder(tf.complex128)
with tf.device("/cpu:0"):
b=tf.scatter_nd(idx,a,shp)
with tf.device("/gpu:0"):
c=tf.scatter_nd(idx,a,shp)
with tf.Session() as sess:
print 'Correct output on CPU', sess.run(b,feed_dict={a:update})
print 'Wrong output on GPU',sess.run(c,feed_dict={a:update})
I saw this thread and this thread but could not find how to resolve it. Is there any alternative to tf.scatter_nd that will run on GPU?

loading a graph from .meta file from Tensorflow in c++ for inference

I have trained some models using tensorflow 1.5.1 and I have the checkpoints for those models (including .ckpt and .meta files). Now I want to do inference in c++ using those files.
In python, I would do the following to save and load the graph and the checkpoints.
for saving:
images = tf.placeholder(...) // the input layer
//the graph def
output = tf.nn.softmax(net) // the output layer
tf.add_to_collection('images', images)
tf.add_to_collection('output', output)
for inference i restore the graph and the checkpoint then restore the input and output layers from collections like so:
meta_file = './models/last-100.meta'
ckpt_file = './models/last-100'
with tf.Session() as sess:
saver = tf.train.import_meta_graph(meta_file)
saver.restore(sess, ckpt_file)
images = tf.get_collection('images')
output = tf.get_collection('output')
outputTensors = sess.run(output, feed_dict={images: np.array(an_image)})
now assuming that I did the saving in python as usual, how can I do inference and restore in c++ with simple code like in python?
I have found examples and tutorials but for tensorflow versions 0.7 0.12 and the same code doesn't work for version 1.5. I found no tutorials for restoring models using c++ API on tensorflow website.
For the sake of this thread. I will rephrase my comment into an answer.
Posting a full example would require either a CMake setup or putting the file into a specific directory to run bazel. As I do favor the first way and it would burst all limits on this post to cover all parts I would like to redirect to a complete implementation in C99, C++, GO without Bazel which I tested for TF > v1.5.
Loading a graph in C++ is not much more difficult than in Python, given you compiled TensorFlow already from source.
Start by creating a MWE, which creates a very dump network graph is always a good idea to figure out how things work:
import tensorflow as tf
x = tf.placeholder(tf.float32, shape=[1, 2], name='input')
output = tf.identity(tf.layers.dense(x, 1), name='output')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver(tf.global_variables())
saver.save(sess, './exported/my_model')
There are probably tons of answers here on SO about this part. So I just let it stay here without further explanation.
Loading in Python
Before doing stuff in other languages, we can try to do it in python properly -- in the sense: we just need to rewrite it in C++.
Even restoring is very easy in python like:
import tensorflow as tf
with tf.Session() as sess:
# load the computation graph
loader = tf.train.import_meta_graph('./exported/my_model.meta')
sess.run(tf.global_variables_initializer())
loader = loader.restore(sess, './exported/my_model')
x = tf.get_default_graph().get_tensor_by_name('input:0')
output = tf.get_default_graph().get_tensor_by_name('output:0')
it is not helpful as most of these API endpoints do not exists in the C++ API (yet?). An alternative version would be
import tensorflow as tf
with tf.Session() as sess:
metaGraph = tf.train.import_meta_graph('./exported/my_model.meta')
restore_op_name = metaGraph.as_saver_def().restore_op_name
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name
sess.run(restore_op, {filename_tensor_name: './exported/my_model'})
x = tf.get_default_graph().get_tensor_by_name('input:0')
output = tf.get_default_graph().get_tensor_by_name('output:0')
Hang on. You can always use print(dir(object)) to get the properties like restore_op_name, ... .
Restoring a model is an operation in TensorFlow like every other operation. We just call this operation and providing the path (a string-tensor) as an input. We can even write our own restore operation
def restore(sess, metaGraph, fn):
restore_op_name = metaGraph.as_saver_def().restore_op_name # u'save/restore_all'
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name # u'save/Const'
sess.run(restore_op, {filename_tensor_name: fn})
Even this looks strange, it now greatly helps to do the same stuff in C++.
Loading in C++
Starting with the usual stuff
#include <tensorflow/core/public/session.h>
#include <tensorflow/core/public/session_options.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
#include <string>
#include <iostream>
typedef std::vector<std::pair<std::string, tensorflow::Tensor>> tensor_dict;
int main(int argc, char const *argv[]) {
const std::string graph_fn = "./exported/my_model.meta";
const std::string checkpoint_fn = "./exported/my_model";
// prepare session
tensorflow::Session *sess;
tensorflow::SessionOptions options;
TF_CHECK_OK(tensorflow::NewSession(options, &sess));
// here we will put our loading of the graph and weights
return 0;
}
You should be able to compile this by either put it in the TensorFlow repo and use bazel or simply follow the instructions here to use CMake.
We need to create such a meta_graph created by tf.train.import_meta_graph. This can be done by
tensorflow::MetaGraphDef graph_def;
TF_CHECK_OK(ReadBinaryProto(tensorflow::Env::Default(), graph_fn, &graph_def));
In C++ reading a graph from file is not the same as importing a graph in Python. We need to create this graph in a session by
TF_CHECK_OK(sess->Create(graph_def.graph_def()));
By looking at the strange python restore function above:
restore_op_name = metaGraph.as_saver_def().restore_op_name
restore_op = tf.get_default_graph().get_operation_by_name(restore_op_name)
filename_tensor_name = metaGraph.as_saver_def().filename_tensor_name
we can code the equivalent piece in C++
const std::string restore_op_name = graph_def.saver_def().restore_op_name()
const std::string filename_tensor_name = graph_def.saver_def().filename_tensor_name()
Having this in place, we just run the operation by
sess->Run(feed_dict, // inputs
{}, // output_tensor_names (we do not need them)
{restore_op}, // target_node_names
nullptr) // outputs (there are no outputs this time)
Creating the feed_dict is probably a post on its own and this answer is already long enough. It does only cover the most important stuff. I would like to redirect to a complete implementation in C99, C++, GO without Bazel which I tested for TF > v1.5. This is not that hard -- it just can get very long in the case of the plain C version.

BitwiseXOR for Tensorflow

My project requests a new layer, which needs the new operator of Tensor to compute bitwiseXOR between input x and constant Key k.
E.g. x = 4 (bit form: 100), k = 7 (111), the bitwiseXOR(x, k) expects as 3 (011).
As far as I know, Tensor only has LogicXOR operator for bool type. Luckily, Tensorflow has the extended ability to have a new Op. However, I read the document in https://www.tensorflow.org/extend/adding_an_op, I can get the basic idea, but that is far from the implementation, maybe because of the lack of c++ knowledge. Any suggestions for implementation the new operator that will be helpful. Then I can use that new Op of Tensor to build the new layers.
If you don't want to implement your own C++ op, you can try with tf.py_func which allows you to define a python function that operates on numpy arrays and is then used as a Tensorflow operation in the graph.
For your problem you can use numpy's bitwise_xor():
import tensorflow as tf
import numpy as np
t1 = tf.constant([2,4,6], dtype=tf.int64)
t2 = tf.constant([1,3,5], dtype=tf.int64)
t_xor = tf.py_func(np.bitwise_xor, [t1, t2], tf.int64, stateful=False)
with tf.Session() as sess:
val = sess.run(t_xor)
print(val)
which prints [3,7,3] as expected.
Please take care of the known limitations of this function (taken from the link above):
N.B. The tf.py_func() operation has the following known limitations:
The body of the function (i.e. func) will not be serialized in a
GraphDef. Therefore, you should not use this function if you need to
serialize your model and restore it in a different environment.
The operation must run in the same address space as the Python program
that calls tf.py_func(). If you are using distributed TensorFlow, you
must run a tf.train.Server in the same process as the program that
calls tf.py_func() and you must pin the created operation to a device
in that server (e.g. using with tf.device():).

Tensorflow- bidirectional_dynamic_rnn: Attempt to reuse RNNCell

The following code (taken from - https://github.com/dennybritz/tf-rnn/blob/master/bidirectional_rnn.ipynb)
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.contrib.rnn.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.bidirectional_dynamic_rnn(
cell_fw=cell,
cell_bw=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
output_fw, output_bw = outputs
states_fw, states_bw = states
is giving the following error for
tensorflow - 1.1 for both 2.7 and 3.5
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.LSTMCell object at 0x10ce0c2b0>
with a different variable scope than its first use. First use of cell was with scope
'bidirectional_rnn/fw/lstm_cell', this attempt is with scope 'bidirectional_rnn/bw/lstm_cell'.
Please create a new instance of the cell if you would like it to use a different set of weights.
If before you were using: MultiRNNCell([LSTMCell(...)] * num_layers), change to:
MultiRNNCell([LSTMCell(...) for _ in range(num_layers)]). If before you were using the same cell
instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances
(one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use
existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation,
so this error will remain until then.)
But it is working in
tensorflow - 1.0.1 for python 3.5 (did not test on python - 2.7)
I tried with multiple code examples I found online but
tf.nn.bidirectional_dynamic_rnn
is giving the same error with tensorflow - 1.1
Is there a bug in tensorflow 1.1 or am i just missing something?
Sorry you ran into this. I can confirm that the error appears in 1.1 (docker run -it gcr.io/tensorflow/tensorflow:1.1.0 python) but not in 1.2 RC0 (docker run -it gcr.io/tensorflow/tensorflow:1.2.0-rc0 python).
So it looks like either 1.2-rc0 or 1.0.1 are your options for the moment.

Receiving .rc suffix from cplex when using Pyomo with NL / ASL solver interface

I would like to obtain .rc or .urc suffixes for my variables from the cplex solver, using Pyomo with the NL / ASL interface. This interface is generally faster than the default cplex interface for my models. However I can't seem to get the NL interface to return these suffixes. If I use the cplex solver with default options, I get values for the rc suffix. However, if I use solver_io='nl' or set the solver to 'cplexamp' (which I think does the same thing), then I get no rc values. (I am able to get duals, but not rc's.)
Here's some example code:
from pyomo.environ import *
from pyomo.opt import SolverFactory
def show_rc(m, *args, **kwargs):
opt = SolverFactory(*args, **kwargs)
results = opt.solve(m, suffixes=['rc'])
m.solutions.load_from(results)
m.rc.pprint()
m = ConcreteModel()
m.X = Var(bounds=(0, 1))
m.obj = Objective(rule=lambda m: 3.14 * m.X, sense=maximize)
m.rc = Suffix(direction=Suffix.IMPORT, datatype=Suffix.FLOAT)
show_rc(m, "cplex") # has value 3.14
show_rc(m, "cplex", solver_io="nl") # no value returned
show_rc(m, "cplexamp") # no value returned
The documentation specifically mentions getting reduced costs via a suffix, and the .rc suffix seems to be the standard place for this in AMPL, but I'm having no luck reading this via Pyomo's NL interface. Can anyone point me in the right direction?
Unfortunately, the cplexamp executable does not return reduced costs in the solution file (I just checked). I suppose AMPL must be computing these using the dual solution that is returned. I would open up a ticket on GitHub. Perhaps we can add that functionality to our ASL interface.
In terms of speed, you should try out Pyomo's Python-based interface to Cplex (solver_io="python"). This is usually much faster as it does not require any file I/O. You will need to install the Cplex-Python bindings before you can use that interface through Pyomo. If you can "import cplex", then it should be good to go.
Edit: I forgot to mention that solver_io="python" does return reduced costs for Cplex.