test dataset existence in HDF5/c++ and handle the error

test dataset existence in HDF5/c++ and handle the error - c++

I am reading *.hdf5 serie of files with HDF5 library in C++. The files have the same datasets (same keys, but different informations), but sometimes a single dataset can miss in a file (e.g. in 100 files I have the dataset apple, in 3 files I don't have any apple dataset), and in these cases there is the following exception:
HDF5-DIAG: Error detected in HDF5 (1.10.7) thread 0:
#000: H5D.c line 298 in H5Dopen2(): unable to open dataset
major: Dataset
minor: Can't open object
[...]
#005: H5Gloc.c line 376 in H5G__loc_find_cb(): object 'apple' doesn't exist
major: Symbol table
minor: Object not found
terminate called after throwing an instance of 'H5::GroupIException'
I would like to handle this exception, for example creating an empty apple dataset for that file, when this error occurs.
Here I post the chunck of code where I read the file->the group->the dataset. Handling the error, I would like to create anyway an empty GoldenApples vector, even when the dataset apple doesn't exist.
std::string FileName = "fruit." + std::to_string(cutID) + ".hdf5";
fruitFile = H5::H5File(FileName, H5F_ACC_RDONLY );
H5::Group group = fruitFile.openGroup("fruit");
H5::DataSet dataset = group.openDataSet("apple");
H5::DataSpace dataspace = dataset.getSpace();
hsize_t naxes[2];
dataspace.getSimpleExtentDims(naxes, NULL);
AppleType = Eigen::MatrixXd::Zero(naxes[1], naxes[0]);
dataset.read(AppleType.data(), H5::PredType::NATIVE_DOUBLE);
GoldenApples = std::vector<int>(naxes[0], 0.);
//need golden apples, which are in pos (4,i) in matrix AppleType
for (int i = 0; i < naxes[0]; i++){
GoldenApples[i] = AppleType(4,i);
}
fruitFile.close();

If you are not bound to a particular library, take a look at HDFql as it's easy to check the existence of an HDF5 dataset with it. Using HDFql in C++, your use-case could be solved as follows:
// check if dataset 'apple' exists in HDF5 file 'fruit.h5'
if (HDFql::execute("SHOW DATASET fruit.h5 apple") == HDFql::Success)
{
std::cout << "Dataset apple exists!" << std::endl;
}
else
{
std::cout << "Dataset apple does not exist!" << std::endl;
}
For additional information about HDFql, please check its reference manual as well as some examples.

Related

C++ Protobuf, error when trying to build a FileDescriptor that imports another one

Currently I have the following two proto definitions, both .proto files are in same folder:
topmessage.proto:
syntax = "proto3";
message TopMessage {
// some fields
}
crestmessage.proto:
syntax = "proto3";
import "topmessage.proto";
message CrestMessage {
// some fields
TopMessage tm = 4;
}
Then, as part of my reader.cc, I am trying to build a file descriptor for the proto definition that user passes in. More specifically, the user will pass the path of the .proto file in as an argument, and the program will then read the file and build a file descriptor from it. Here is how this function is implemented, it mainly follows the blog by Floris https://vdna.be/site/index.php/2016/05/google-protobuf-at-run-time-deserialization-example-in-c/:
DescriptorPool pool;
const FileDescriptor* buildFileDescriptor(string file_path, string file_name) {
int def_messageFile = open(file_path.c_str(), O_RDONLY);
FileInputStream file_input(def_messageFile);
Tokenizer input(&file_input, NULL);
FileDescriptorProto file_desc_proto;
Parser parser;
if (!parser.Parse(&input, &file_desc_proto)) {
cerr << "Failed to parse .proto definition:" << endl;
return NULL;
}
file_desc_proto.set_name(file_name);
const FileDescriptor* file_desc = pool.BuildFile(file_desc_proto);
return file_desc;
}
The question now is when I am trying to build CrestMessage, and its proto definition file is also the one passed in as argument. For simplicity, I first build TopMessage by manually providing its file address since it is the dependency, and it works fine, and I can also find TopMessage in the pool, which is global. However, when I try to build CrestMessage, it gives the error.
const FileDescriptor* file_desc2 = buildFileDescriptor("topmessage.proto", "TopMessage");
cout << pool.FindFileByName("TopMessage") << endl;
const FileDescriptor* file_desc = buildFileDescriptor(definition_path, "CrestMessage");
cout << file_desc->name() << endl;
enter image description here
I have not find anything in the api description by Google that says how to import, does anyone have ideas on what should be used?

it's a name problem, instead of using message name when building FileDescriptor, use the name of the .proto file ("topmessage.proto" for example)

TensorFlow 0.12 Model Files

I train a model and save it using:
saver = tf.train.Saver()
saver.save(session, './my_model_name')
Besides the checkpoint file, which simply contains pointers to the most recent checkpoints of the model, this creates the following 3 files in the current path:
my_model_name.meta
my_model_name.index
my_model_name.data-00000-of-00001
I wonder what each of these files contains.
I'd like to load this model in C++ and run the inference. The label_image example loads the model from a single .bp file using ReadBinaryProto(). I wonder how I can load it from these 3 files. What is the C++ equivalent of the following?
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')

What your saver creates is called "Checkpoint V2" and was introduced in TF 0.12.
I got it working quite nicely (though the docs on the C++ part are horrible, so it took me a day to solve). Some people suggest converting all variables to constants or freezing the graph, but none of these is actually needed.
Python part (saving)
with tf.Session() as sess:
tf.train.Saver(tf.trainable_variables()).save(sess, 'models/my-model')
If you create the Saver with tf.trainable_variables(), you can save yourself some headache and storage space. But maybe some more complicated models need all data to be saved, then remove this argument to Saver, just make sure you're creating the Saver after your graph is created. It is also very wise to give all variables/layers unique names, otherwise you can run in different problems.
C++ part (inference)
Note that checkpointPath isn't a path to any of the existing files, just their common prefix. If you mistakenly put there path to the .index file, TF won't tell you that was wrong, but it will die during inference due to uninitialized variables.
#include <tensorflow/core/public/session.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
using namespace std;
using namespace tensorflow;
...
// set up your input paths
const string pathToGraph = "models/my-model.meta"
const string checkpointPath = "models/my-model";
...
auto session = NewSession(SessionOptions());
if (session == nullptr) {
throw runtime_error("Could not create Tensorflow session.");
}
Status status;
// Read in the protobuf graph we exported
MetaGraphDef graph_def;
status = ReadBinaryProto(Env::Default(), pathToGraph, &graph_def);
if (!status.ok()) {
throw runtime_error("Error reading graph definition from " + pathToGraph + ": " + status.ToString());
}
// Add the graph to the session
status = session->Create(graph_def.graph_def());
if (!status.ok()) {
throw runtime_error("Error creating graph: " + status.ToString());
}
// Read weights from the saved checkpoint
Tensor checkpointPathTensor(DT_STRING, TensorShape());
checkpointPathTensor.scalar<std::string>()() = checkpointPath;
status = session->Run(
{{ graph_def.saver_def().filename_tensor_name(), checkpointPathTensor },},
{},
{graph_def.saver_def().restore_op_name()},
nullptr);
if (!status.ok()) {
throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
}
// and run the inference to your liking
auto feedDict = ...
auto outputOps = ...
std::vector<tensorflow::Tensor> outputTensors;
status = session->Run(feedDict, outputOps, {}, &outputTensors);
For completeness, here's the Python equivalent:
Inference in Python
with tf.Session() as sess:
saver = tf.train.import_meta_graph('models/my-model.meta')
saver.restore(sess, tf.train.latest_checkpoint('models/'))
outputTensors = sess.run(outputOps, feed_dict=feedDict)

I'm currently struggling with this myself, I've found it's not very straightforward to do currently. The two most commonly cited tutorials on the subject are:
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.goxwm1e5j
and
https://medium.com/#hamedmp/exporting-trained-tensorflow-models-to-c-the-right-way-cf24b609d183#.g1gak956i
The equivalent of
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')
Is just
Status load_graph_status = LoadGraph(graph_path, &session);
Assuming you've "frozen the graph" (Used a script with combines the graph file with the checkpoint values).
Also, see the discussion here: Tensorflow Different ways to Export and Run graph in C++

Overwrite an image/pixel data in the dicom file using dcmtk

I use dcmtk to read a dicom file and extract the image into a .tiff format. After doing some image processing I have an image which I would like to save in the source dicom file.That is overwriting the old image/pixel data with my new ones, while keeping rest of the data(uid,patient name,,etc) same.
I use the following code to read dicom
OFCondition status = src_fileformat.loadFile(src_path);
if (status.good())
{
Sint32 instanceNumber = 0;
if (src_fileformat.getDataset()->findAndGetSint32(DCM_InstanceNumber, instanceNumber).good())
{
cout << "instance Number N: " << instanceNumber << endl;
sprintf(instanceNum, "%d", instanceNumber);
printf("%s\n", instanceNum);
}
else
cerr << "Error: cannot access Instance Number!" << endl;
}
else
cerr << "Error: cannot read DICOM file (" << status.text() << ")" << endl;
src_dcm = new DicomImage(src_path);
if (src_dcm != NULL)
{
if (src_dcm->getStatus() == EIS_Normal)
{
if (src_dcm->isMonochrome())
{
src_dcm->setMinMaxWindow();
Uint8 *pixelData = (Uint8 *)(src_dcm->getOutputData(16 /* bits */));
if (pixelData != NULL)
{
src_dcm->writeBMP("source.tiff", 24); /* do something useful with the pixel data */
}
}
}
else
cerr << "Error: cannot load DICOM image (" << DicomImage::getString(src_dcm->getStatus()) << ")" << endl;
}
After image processing I have an image that I want to overwrite onto this source dicom file. I looked into image2dcm,but I couldn't get the correct syntax/method to do. any one help me out.. :)
Edit-1
Image2Dcm i2d;
I2DOutputPlug *outPlug = new I2DOutputPlugSC();
I2DImgSource *inputPlug = new I2DJpegSource();
E_TransferSyntax writeXfer;
inputPlug->setImageFile(jpgFile);
DcmDataset *dataset = NULL;
OFCondition result = i2d.convert(inputPlug, outPlug, dataset, writeXfer);
// Saving output DICOM image
if (result.good())
{
dataset->putAndInsertString(DCM_PhotometricInterpretation,"RGB");
DcmFileFormat dcmff(dataset);
result = dcmff.saveFile(dcmFile, writeXfer);
}
I tried the above shown syntax,but couldn't exactly understand it
This is the processed image(above)
This is the original dicom image that I want to overwrite. Guys ,any idea or help??

The basic approach should be:
load the DICOM dataset from file
replace the pixel data in the dataset
modify various other element values (e.g. SOP Instance UID)
save the modified DICOM dataset to a new file
In case of uncompressed images, the second step could be performed in the same manner as the third step, i.e. by an appropriate call of a putAndInsertXXX() method on the dataset. Of course, the element value of the Pixel Data attribute should be in correct DICOM format. See DICOM standard part 3 and 5 for details.

Not found: FeedInputs: unable to find feed output TensorFlow

I was trying this example of using Tensorflow saved model in c++ in this website:
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.ji310n4zo
It works well. But it does not save the values of the variables a and b as it only saves the graph not the variables. I tried to replace the following line:
tf.train.write_graph(sess.graph_def, 'models/', 'graph.pb', as_text=False)
with
saver.save(sess, 'models/graph', global_step=0)
of course after creating the saver object. It does not work and it outputs:
Not found: FeedInputs: unable to find feed output a
I checked the nodes the Nodes that are loaded and they are only:
_SOURCE
_SINK
while in the write_graph function and then load the model in C++, I got the following nodes loaded:
_SOURCE
_SINK
save/restore_slice_1/shape_and_slice
save/restore_slice_1/tensor_name
save/restore_slice/shape_and_slice
save/restore_slice/tensor_name
save/save/shapes_and_slices
save/save/tensor_names
save/Const
save/restore_slice_1
save/restore_slice
b
save/Assign_1
b/read
b/initial_value
b/Assign
a
save/Assign
save/restore_all
save/save
save/control_dependency
a/read
c
a/initial_value
a/Assign
init
Tensor
and even the graph file that is created by saver.save() is much smaller, 165B, compared to the one created by write_graph, 1.9KB.

I'm not sure if that is the best way of solving the problem but at least it solves it.
As write_graph can also store the values of the constants, I added the following code to the python just before writing the graph with write_graph function:
for v in tf.trainable_variables():
vc = tf.constant(v.eval())
tf.assign(v, vc, name="assign_variables")
This creates constants that store variables' values after being trained and then create tensors "assign_variables" to assign them to the variables. Now, when you call write_graph, it will store the variables' values in the file.
The only remaining part is to call these tensors "assign_variables" in the c code to make sure that your variables are assigned with the constants values that are stored in the file. Here is a one way to do it:
Status status = NewSession(SessionOptions(), &session);
std::vector<tensorflow::Tensor> outputs;
for(int i = 0;status.ok(); i++) {
char name[100];
if (i==0)
sprintf(name, "assign_variables");
else
sprintf(name, "assign_variables_%d", i);
status = session->Run({}, {name}, {}, &outputs);
}

There is another way of restoring the variables, by calling the save/restore_all operation, that should be present in the graph:
std::vector<tensorflow::Tensor> outputs;
Tensor checkpoint_filepath(DT_STRING, TensorShape());
checkpoint_filepath.scalar<std::string>()() = "path to the checkpoint file";
status = session->Run( {{ "save/Const", checkpoint_filepath },},
{}, {"save/restore_all"}, &outputs);

Passing Protocol buffer serialized datas from C++ to Python via LevelDB

Though I've followed the excellent Protocol Buffer documentation and tutorials for C++ and Python, I can't achieve my goal which is :
- to serialize datas from a C++ process.
- insert it into LevelDB from that same process.
- extract the serialized datas from a Python process
- Deseralize it from this same Python process
- Use those deseralized datas in Python
I can serialize my datas using protocol buffer in C++ (using a std::string container). I can insert it into LevelDB. But, when I levelDB->Get my serialized datas, though Python seems to recognize it as a String, and showing me their raw content, whenever I deserialize it into a Python String, it is empty!
Here is how I serialize and insert my datas in C++ :
int main(int arg, char** argv)
{
GOOGLE_PROTOBUF_VERIFY_VERSION;
leveldb::DB* db;
leveldb::Options options;
leveldb::Status status;
tutorial::AddressBook address_book;
tutorial::Person* person1;
tutorial::Person* person2;
options.create_if_missing = true;
status = leveldb::DB::Open(options, "test_db", &db);
assert(status.ok());
person1 = address_book.add_person();
person1->set_id(1);
person1->set_name("ME");
person1->set_email("me#me.com");
person2 = address_book.add_person();
person2->set_id(2);
person2->set_name("SHE");
person2->set_email("she#she.com");
std::string test;
if (!address_book.SerializeToString(&test))
{
std::cerr << "Failed to write address book" << std::endl;
return -1;
}
if (status.ok()) status = db->Put(leveldb::WriteOptions(), "Test", test);
And here is how I try to deserialize it in Python:
address_book = addressbook_pb2.AddressBook()
db = leveldb.LevelDB('test_db')
ab = address_book.ParseFromString(db.Get("Test"))
ad var type is NoneType
Edit :
before the db.Get(), ab.ByteSize() returns 0, 76 after the ParseFromString(), I assume it's a Type problem then...
+
ab.ListFields() returns a unexploitable list of the contained field: succesfully couting two person instances, but unable to let me acces to it.
Any clues, any ideas of what I didn't understand, what I'm doing wrong here?
Many thanks!

Ok, so this was my bad.
I went back into the Protocol Buffers Python documentation, and the fact is that even if the AdressBook object I was retrieving did not showed any description, it was still able to be iterated over and even had a .str() method.
so, if anyone comes to that problem again, just try to explore your ProtocolBuffers object using iPython like I did, and you'll find that every of your proto elements are fields of your object.
Using my example:
ab = adress_book.ParseFromString(db.Get('Test'))
ab.__str__() # Shows a readable version of my object
for person in adress_book.person: # I'm even able to iterate over any of my ab fields values
print person.id
print person.name

Try using ' instead of ":
ab = address_book.ParseFromString(db->Get('Test'))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

test dataset existence in HDF5/c++ and handle the error - c++

Related

C++ Protobuf, error when trying to build a FileDescriptor that imports another one

TensorFlow 0.12 Model Files

Overwrite an image/pixel data in the dicom file using dcmtk

Not found: FeedInputs: unable to find feed output TensorFlow

Passing Protocol buffer serialized datas from C++ to Python via LevelDB

Categories

Resources