How to transfer and parse snap graph from python to c++ - c++

Stanford SNAP is a well-known package for graph mining, and has both Python implementation and C++ implementation.
I have some code in python to do graph mining using SNAP. I also have a C++ function process the snap graph. Now I need to write a wrapper so that this C++ function can be invoked from Python.
The problem is that I don't know how to parse/dereference the snap graph object from Python to C++.
The python code looks like: (More explanations come after the code examples)
import my_module;
import snap;
G = snap.GenRndGnm(snap.PUNGraph, 100, 1000);
print(type(G));
A = my_module.CppFunction(G); # customized function
print(A);
The CPP wrapper my_module_in_cpp.cpp looks like:
// module name: my_module, defined in the setup file
// function to be called from python: CppFunction
#include <Python.h>
//#include "my_file.h" // can be ignored in this minimal working example.
#include "Snap.h"
#include <iostream>
static PyObject *My_moduleError;
// module_name_function, THE CORE FUNCTION OF THIS QUESTION
static PyObject *my_module_CppFunction(PyObject *self, PyObject *args) {
PUNGraph G_py;
int parseOK = PyArg_ParseTuple(args, "O", &G_py);
if (!parseOK) return NULL;
std::cout << "N: " << G_py->GetNodes() << ", E: " << G_py->GetEdges() << std::endl;
fflush(stdout);
if ((G_py->GetNodes()!=100)||(G_py->GetEdges()!=1000)) {
PyErr_SetString(My_moduleError, "Graph reference incorrect.");
return NULL;
}
PyObject *PList = PyList_New(0);
PyList_Append(PList,Py_BuildValue("i",G_py->GetNodes()));
PyList_Append(PList,Py_BuildValue("i",G_py->GetEdges()));
return PList;
}
// To register the core function to python
static PyMethodDef CppFunctionMethod[] = {
{"CppFunction", my_module_CppFunction, METH_VARARGS, "To call CppFunction in C++"},
{NULL,NULL,0,NULL}
};
extern "C" PyMODINIT_FUNC initmy_module(void) {
PyObject *m = Py_InitModule("my_module",CppFunctionMethod);
if (m==NULL) return;
My_moduleError = PyErr_NewException("my_module.error", NULL, NULL);
Py_INCREF(My_moduleError);
PyModule_AddObject(m, "error", My_moduleError);
}
I'm using Ubuntu, python-2.7. In case someone may want to re-produce the problem, the setup.py file is also provided.
from distutils.core import setup, Extension
module1 = Extension('my_module',\
include_dirs = ['/usr/include/python2.7/','/users/<my_local>/Snap-3.0/','/users/<my_local>/Snap-3.0/snap-core','/users/<my_local>/Snap-3.0/glib-core'],
library_dirs = ['/users/<my_local>/Snap-3.0/snap-core/'],
extra_objects = ['/users/<my_local>/Snap-3.0/snap-core/Snap.o'],
extra_compile_args=['-fopenmp','-std=c++11'],
extra_link_args=['-lgomp'],
sources = ['my_module_in_cpp.cpp'])
setup (name = 'NoPackageName', version = '0.1',\
description = 'No description.', ext_modules = [module1])
Every time I run the python code above, the error message "Graph reference incorrect." is displayed.
Apparently G_py->GetNodes() and G_py->GetEdges() cause the problem. This must result from G_py not pointing to the right address/in right format. I tried using TUNGraph in the cpp code as well, it still does not point to the correct address. Is there any way that the pointer in C++ can point to the correct address of the original C++ object?
Although in general it is hard to dereference a PythonObject from C++, but in this case I think it is doable since Snap-Python is also implemented in C++. We just need to unwrap its python wrapper. And the snap authors also provided the SWIG files.
Of course we can write the graph file in the disk, and read from that, but this will result in I/O and incur extra time consumption. And the snap-user-group does not have as much user traffic as stackoverflow.
BTW, there are networkx and stanford-nlp tags, but no stanford-snap or similar tag referring to that tool. Can someone create such a tag?

Related

C++ Protobuf, error when trying to build a FileDescriptor that imports another one

Currently I have the following two proto definitions, both .proto files are in same folder:
topmessage.proto:
syntax = "proto3";
message TopMessage {
// some fields
}
crestmessage.proto:
syntax = "proto3";
import "topmessage.proto";
message CrestMessage {
// some fields
TopMessage tm = 4;
}
Then, as part of my reader.cc, I am trying to build a file descriptor for the proto definition that user passes in. More specifically, the user will pass the path of the .proto file in as an argument, and the program will then read the file and build a file descriptor from it. Here is how this function is implemented, it mainly follows the blog by Floris https://vdna.be/site/index.php/2016/05/google-protobuf-at-run-time-deserialization-example-in-c/:
DescriptorPool pool;
const FileDescriptor* buildFileDescriptor(string file_path, string file_name) {
int def_messageFile = open(file_path.c_str(), O_RDONLY);
FileInputStream file_input(def_messageFile);
Tokenizer input(&file_input, NULL);
FileDescriptorProto file_desc_proto;
Parser parser;
if (!parser.Parse(&input, &file_desc_proto)) {
cerr << "Failed to parse .proto definition:" << endl;
return NULL;
}
file_desc_proto.set_name(file_name);
const FileDescriptor* file_desc = pool.BuildFile(file_desc_proto);
return file_desc;
}
The question now is when I am trying to build CrestMessage, and its proto definition file is also the one passed in as argument. For simplicity, I first build TopMessage by manually providing its file address since it is the dependency, and it works fine, and I can also find TopMessage in the pool, which is global. However, when I try to build CrestMessage, it gives the error.
const FileDescriptor* file_desc2 = buildFileDescriptor("topmessage.proto", "TopMessage");
cout << pool.FindFileByName("TopMessage") << endl;
const FileDescriptor* file_desc = buildFileDescriptor(definition_path, "CrestMessage");
cout << file_desc->name() << endl;
enter image description here
I have not find anything in the api description by Google that says how to import, does anyone have ideas on what should be used?
it's a name problem, instead of using message name when building FileDescriptor, use the name of the .proto file ("topmessage.proto" for example)

TensorFlow 0.12 Model Files

I train a model and save it using:
saver = tf.train.Saver()
saver.save(session, './my_model_name')
Besides the checkpoint file, which simply contains pointers to the most recent checkpoints of the model, this creates the following 3 files in the current path:
my_model_name.meta
my_model_name.index
my_model_name.data-00000-of-00001
I wonder what each of these files contains.
I'd like to load this model in C++ and run the inference. The label_image example loads the model from a single .bp file using ReadBinaryProto(). I wonder how I can load it from these 3 files. What is the C++ equivalent of the following?
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')
What your saver creates is called "Checkpoint V2" and was introduced in TF 0.12.
I got it working quite nicely (though the docs on the C++ part are horrible, so it took me a day to solve). Some people suggest converting all variables to constants or freezing the graph, but none of these is actually needed.
Python part (saving)
with tf.Session() as sess:
tf.train.Saver(tf.trainable_variables()).save(sess, 'models/my-model')
If you create the Saver with tf.trainable_variables(), you can save yourself some headache and storage space. But maybe some more complicated models need all data to be saved, then remove this argument to Saver, just make sure you're creating the Saver after your graph is created. It is also very wise to give all variables/layers unique names, otherwise you can run in different problems.
C++ part (inference)
Note that checkpointPath isn't a path to any of the existing files, just their common prefix. If you mistakenly put there path to the .index file, TF won't tell you that was wrong, but it will die during inference due to uninitialized variables.
#include <tensorflow/core/public/session.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
using namespace std;
using namespace tensorflow;
...
// set up your input paths
const string pathToGraph = "models/my-model.meta"
const string checkpointPath = "models/my-model";
...
auto session = NewSession(SessionOptions());
if (session == nullptr) {
throw runtime_error("Could not create Tensorflow session.");
}
Status status;
// Read in the protobuf graph we exported
MetaGraphDef graph_def;
status = ReadBinaryProto(Env::Default(), pathToGraph, &graph_def);
if (!status.ok()) {
throw runtime_error("Error reading graph definition from " + pathToGraph + ": " + status.ToString());
}
// Add the graph to the session
status = session->Create(graph_def.graph_def());
if (!status.ok()) {
throw runtime_error("Error creating graph: " + status.ToString());
}
// Read weights from the saved checkpoint
Tensor checkpointPathTensor(DT_STRING, TensorShape());
checkpointPathTensor.scalar<std::string>()() = checkpointPath;
status = session->Run(
{{ graph_def.saver_def().filename_tensor_name(), checkpointPathTensor },},
{},
{graph_def.saver_def().restore_op_name()},
nullptr);
if (!status.ok()) {
throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
}
// and run the inference to your liking
auto feedDict = ...
auto outputOps = ...
std::vector<tensorflow::Tensor> outputTensors;
status = session->Run(feedDict, outputOps, {}, &outputTensors);
For completeness, here's the Python equivalent:
Inference in Python
with tf.Session() as sess:
saver = tf.train.import_meta_graph('models/my-model.meta')
saver.restore(sess, tf.train.latest_checkpoint('models/'))
outputTensors = sess.run(outputOps, feed_dict=feedDict)
I'm currently struggling with this myself, I've found it's not very straightforward to do currently. The two most commonly cited tutorials on the subject are:
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.goxwm1e5j
and
https://medium.com/#hamedmp/exporting-trained-tensorflow-models-to-c-the-right-way-cf24b609d183#.g1gak956i
The equivalent of
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')
Is just
Status load_graph_status = LoadGraph(graph_path, &session);
Assuming you've "frozen the graph" (Used a script with combines the graph file with the checkpoint values).
Also, see the discussion here: Tensorflow Different ways to Export and Run graph in C++

LLVM API: correct way to create/dispose

I'm attempting to implement a simple JIT compiler using the LLVM C API. So far, I have no problems generating IR code and executing it, that is: until I start disposing objects and recreating them.
What I basically would like to do is to clean up the JIT'ted resources the moment they're no longer used by the engine. What I'm basically attempting to do is something like this:
while (true)
{
// Initialize module & builder
InitializeCore(GetGlobalPassRegistry());
module = ModuleCreateWithName(some_unique_name);
builder = CreateBuilder();
// Initialize target & execution engine
InitializeNativeTarget();
engine = CreateExecutionEngineForModule(...);
passmgr = CreateFunctionPassManagerForModule(module);
AddTargetData(GetExecutionEngineTargetData(engine), passmgr);
InitializeFunctionPassManager(passmgr);
// [... my fancy JIT code ...] --** Will give a serious error the second iteration
// Destroy
DisposePassManager(passmgr);
DisposeExecutionEngine(engine);
DisposeBuilder(builder);
// DisposeModule(module); //--> Commented out: Deleted by execution engine
Shutdown();
}
However, this doesn't seem to be working correctly: the second iteration of the loop I get a pretty bad error...
So to summarize: what's the correct way to destroy and re-create the LLVM API?
Posting this as Answer because the code's too long. If possible and no other constraints, try to use LLVM like this. I am pretty sure the Shutdown() inside the loop is the culprit here. And I dont think it would hurt to keep the Builder outside, too. This reflects well the way I use LLVM in my JIT.
InitializeCore(GetGlobalPassRegistry());
InitializeNativeTarget();
builder = CreateBuilder();
while (true)
{
// Initialize module & builder
module = ModuleCreateWithName(some_unique_name);
// Initialize target & execution engine
engine = CreateExecutionEngineForModule(...);
passmgr = CreateFunctionPassManagerForModule(module);
AddTargetData(GetExecutionEngineTargetData(engine), passmgr);
InitializeFunctionPassManager(passmgr);
// [... my fancy JIT code ...] --** Will give a serious error the second iteration
// Destroy
DisposePassManager(passmgr);
DisposeExecutionEngine(engine);
}
DisposeBuilder(builder);
Shutdown();
/* program init */
LLVMInitializeNativeTarget();
LLVMInitializeNativeAsmPrinter();
LLVMInitializeNativeAsmParser();
LLVMLinkInMCJIT();
ctx->context = LLVMContextCreate();
ctx->builder = LLVMCreateBuilderInContext(ctx->context);
LLVMParseBitcodeInContext2(ctx->context, module_template_buf, &module) // create module
do IR code creation
{
function = LLVMAddFunction(ctx->module, "my_func")
LLVMAppendBasicBlockInContext(ctx->context, ...
LLVMBuild...
...
}
optional optimization
{
LLVMPassManagerBuilderRef pass_builder = LLVMPassManagerBuilderCreate();
LLVMPassManagerBuilderSetOptLevel(pass_builder, 3);
LLVMPassManagerBuilderSetSizeLevel(pass_builder, 0);
LLVMPassManagerBuilderUseInlinerWithThreshold(pass_builder, 1000);
LLVMPassManagerRef function_passes = LLVMCreateFunctionPassManagerForModule(ctx->module);
LLVMPassManagerRef module_passes = LLVMCreatePassManager();
LLVMPassManagerBuilderPopulateFunctionPassManager(pass_builder, function_passes);
LLVMPassManagerBuilderPopulateModulePassManager(pass_builder, module_passes);
LLVMPassManagerBuilderDispose(pass_builder);
LLVMInitializeFunctionPassManager(function_passes);
for (LLVMValueRef value = LLVMGetFirstFunction(ctx->module); value;
value = LLVMGetNextFunction(value))
{
LLVMRunFunctionPassManager(function_passes, value);
}
LLVMFinalizeFunctionPassManager(function_passes);
LLVMRunPassManager(module_passes, ctx->module);
LLVMDisposePassManager(function_passes);
LLVMDisposePassManager(module_passes);
}
optional for debug
{
LLVMVerifyModule(ctx->module, LLVMAbortProcessAction, &error);
LLVMPrintModule
}
if (LLVMCreateJITCompilerForModule(&ctx->engine, ctx->module, 0, &error) != 0)
my_func = (exec_func_t)(uintptr_t)LLVMGetFunctionAddress(ctx->engine, "my_func");
LLVMRemoveModule(ctx->engine, ctx->module, &ctx->module, &error);
LLVMDisposeModule(ctx->module);
LLVMDisposeBuilder(ctx->builder);
do
{
my_func(...);
}
LLVMDisposeExecutionEngine(ctx->engine);
LLVMContextDispose(ctx->context);
/* program finit */
LLVMShutdown();

Gtksourceviewmm syntax highlighting not working

I'm trying to use the C++ wrapper gtksourceview, I made this a long time ago and I remember that it was working, but now everything works except the higlight syntax. And I'm not pretty sure what it is. I hope you can help me, I read a lot about this library on internet but I can find a solution. Here is a simple code. Thanks in advance.
#include "twindow.h"
#include <iostream>
TWindow::TWindow() {
add(m_SourceView);
m_SourceView.set_size_request(640, 480);
m_SourceView.set_show_line_numbers();
m_SourceView.set_tab_width(4);
m_SourceView.set_auto_indent();
m_SourceView.set_show_right_margin();
m_SourceView.set_right_margin_position(80);
m_SourceView.set_highlight_current_line();
m_SourceView.set_smart_home_end(gtksourceview::SOURCE_SMART_HOME_END_ALWAYS);
gtksourceview::init ();
Glib::RefPtr<gtksourceview::SourceBuffer> buffer = m_SourceView.get_source_buffer () ;
if (!buffer) {
std::cerr << "gtksourceview::SourceView::get_source_buffer () failed" << std::endl ;
}
buffer->begin_not_undoable_action();
buffer->set_text(Glib::file_get_contents("main.c"));
buffer->end_not_undoable_action();
buffer->set_highlight_syntax(true);
Glib::RefPtr<gtksourceview::SourceLanguageManager> language_manager = gtksourceview::SourceLanguageManager::create();
Glib::RefPtr<gtksourceview::SourceLanguage> language = gtksourceview::SourceLanguage::create();
language = language_manager->get_language("c");
buffer->set_language(language);
show_all_children();
}
So you want to use the c++ wrapper of gtksourceview, so I guess you want to use gtksourceviewmm.
Why you create the LanguageManager, you can use the default one.
If you using 3.2 of gtksourceviewmm, then look at the docs.
You should also check out this function.
So an example would look like;
Glib::ustring file_path = "/home/user/whatever/main.c";
Glib::RefPtr<Gsv::LanguageManager> language_manager = Gsv::LanguageManager::get_default();
Glib::RefPtr<Gsv::Language> language = language_manager->guess_language(file_path, Glib::ustring());
Another thing I want to mention is that you should create a buffer to show the content of the file, as in my projects I got a seg fault when I wanted to use get_source_buffer(), so it seems to be null by default.
Glib::RefPtr<Gsv::Buffer> buffer = Gsv::Buffer::create(language);
buffer->set_text(Glib::get_file_contents(file_path));
this->m_SourceView.set_source_buffer(buffer);

Asynchronously redirect stdout/stdin from embedded python to c++?

I am essentially trying to write a console interface with input and output for an embedded python script. Following the instructions here, I was able to capture stdout:
Py_Initialize();
PyRun_SimpleString("\
class StdoutCatcher:\n\
def __init__(self):\n\
self.data = ''\n\
def write(self, stuff):\n\
self.data = self.data + stuff\n\
import sys\n\
sys.stdout = StdoutCatcher()");
PyRun_SimpleString("some script");
PyObject *sysmodule;
PyObject *pystdout;
PyObject *pystdoutdata;
char *string;
sysmodule = PyImport_ImportModule("sys");
pystdout = PyObject_GetAttrString(sysmodule, "stdout");
pystdoutdata = PyObject_GetAttrString(pystdout, "data");
stdoutstring = PyString_AsString(pystdoutdata);
Py_Finalize();
The problem with this is that I only recieve the stdout after the script has finished running, whereas ideally for a console the stdoutstring would update as the python script updates it. Is there a way to do this?
Also, how would I go about capturing stdin?
If it helps, I am working with a compiler that accepts Objective-C. I also have the boost libraries available.
I've figured out the stdout part of the question. For posterity, this works:
static PyObject*
redirection_stdoutredirect(PyObject *self, PyObject *args)
{
const char *string;
if(!PyArg_ParseTuple(args, "s", &string))
return NULL;
//pass string onto somewhere
Py_INCREF(Py_None);
return Py_None;
}
static PyMethodDef RedirectionMethods[] = {
{"stdoutredirect", redirection_stdoutredirect, METH_VARARGS,
"stdout redirection helper"},
{NULL, NULL, 0, NULL}
};
//in main...
Py_Initialize();
Py_InitModule("redirection", RedirectionMethods);
PyRun_SimpleString("\
import redirection\n\
import sys\n\
class StdoutCatcher:\n\
def write(self, stuff):\n\
redirection.stdoutredirect(stuff)\n\
sys.stdout = StdoutCatcher()");
PyRun_SimpleString("some script");
Py_Finalize();
Still having trouble with stdin...
To process all available input inside Python I'd recommend the fileinput module.
If you want to handle input as line-by-line commands, (such as in an interactive interpreter), you might find the python function raw_input useful.
To redirect stdin using a similar helper class such as the ones you've used above, the function to override is readline, not read. See this link for more info on that (and also raw_input).
Hope this helps,
Supertwang
Easiest way I found so far to do this is as follows:
PyObject *sys = PyImport_ImportModule("sys");
PyObject* io_stdout = PyFile_FromFile(stdout, "stdout", "a", nullptr);
PyObject_SetAttrString(sys, "stdout", io_stdout);
PyObject* io_stderr = PyFile_FromFile(stderr, "stderr", "a", nullptr);
PyObject_SetAttrString(sys, "stderr", io_stderr);
PyObject* io_stdin = PyFile_FromFile(stdin, "stdin", "r", nullptr);
PyObject_SetAttrString(sys, "stdin", io_stdin);
you can test it with:
# for test
PyRun_SimpleString("print sys.stdin.readline()");
If you stick with the approach you outlined, inheriting your class from io.IOBase is probably a good idea.