Which compression algorithm is better suited to compress protocol buffers output?

Which compression algorithm is better suited to compress protocol buffers output? - compression

Given:
a DB table with numeric statistics only
binary dump of the table - record after record
protocol buffers dump of the table - using proto schema created from the table schema
As I have expected the binary dump produced by the protocol buffers is smaller than the naive binary dump of the same table:
PS Z:\dev\internal\vedtool> dir net_endshapes_stat.data | % { "$($_.Name): $($_.Length)B" }
net_endshapes_stat.data: 2941331B
net_endshapes_stat.data: 4311042B
However, when I compressed both files with 7z using Ultra level and LZMA method, I have discovered that the larger file compresses better:
PS Z:\dev\internal\vedtool> dir net_endshapes_stat.7z | % { "$($_.Name): $($_.Length)B" }
net_endshapes_stat.7z: 1206186B
net_endshapes_stat.7z: 1055901B
Now, I understand that it is completely OK, still I am wondering whether there is a compression algorithm better tuned to perform on the protocol buffers output.
EDIT
Here is the proto schema:
message net_endshapes_stat {
optional fixed32 timestamp = 1;
optional sint32 shape_id = 2;
optional sint64 bps_in = 3;
optional sint64 bps_out = 4;
optional sint64 total_in = 5;
optional sint64 total_out = 6;
}

Related

Using C++ protobuf formatted structure in leveldb. set/get operations

I'd like to make a POC of using leveldb in order to store key-value table of different data types in protobuf format.
So far I was able to open the database file, and I also saw the get function with the following signature :
virtual Status Get(const ReadOptions& options, const Slice& key, std::string* value)=0
I understand that the value is actually refers to a binary string like vector and not regular alphanumeric string, so I guess it can fit for multi type primitives like string, uint, enum) but how can it support struct/class that represent protobuf layout in c++ ?
So this is my proto file that I'd like to store in the leveldb:
message agentStatus {
string ip = 1;
uint32 port = 2;
string url = 3;
google.protobuf.Timestamp last_seen = 4;
google.protobuf.Timestamp last_keepalive = 5;
bool status = 6;
}
and this is my current POC code. How can I use the get method to access any of the variables from the table above ?
#include <leveldb/db.h>
void main () {
std::string db_file_path = "/tmp/data.db";
leveldb::DB* db;
leveldb::Status status;
leveldb::Options options;
options.create_if_missing = false;
status_ = leveldb::DB::Open(options, db_file_path, &db);
if (!status_.ok()) {
throw std::logic_error("unable to open db");
}
Thanks !

You need to serialize the protobuf message into a binary string, i.e. SerilaizeToString, and use the Put method to write the binary string to LevelDB with a key.
Then you can use the Get method to retrieve the binary value with the given key, and parse the binary string to a protobuf message, i.e. ParseFromString.
Finally, you can get fields of the message.

OpenVINO - Image classification

I tried to use OpenVINO Inference Engine to accelerate my DL inference. It works with one image. But I want to create a batch of two images and then do a inference.
This is my code:
InferenceEngine::Core core;
InferenceEngine::CNNNetwork network = core.ReadNetwork("path/to/model.xml");
InferenceEngine::InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
std::string input_name = network.getInputsInfo().begin()->first;
InferenceEngine::DataPtr output_info = network.getOutputsInfo().begin()->second;
std::string output_name = network.getOutputsInfo().begin()->first;
InferenceEngine::ExecutableNetwork executableNetwork = core.LoadNetwork(network, "CPU");
InferenceEngine::InferRequest inferRequest = executableNetwork.CreateInferRequest();
std::string input_image_01 = "path/to/image_01.png";
cv::Mat image_01 = cv::imread(input_image_01 );
InferenceEngine::Blob::Ptr imgBlob_01 = wrapMat2Blob(image_01);
std::string input_image_02 = "path/to/image_02.png";
cv::Mat image_02 = cv::imread(input_image_02 );
InferenceEngine::Blob::Ptr imgBlob_02 = wrapMat2Blob(image_02);
InferenceEngine::BlobMap imgBlobMap;
std::pair<std::string, InferenceEngine::Blob::Ptr> pair01(input_image_01, imgBlob_01);
imgBlobMap.insert(pair01);
std::pair<std::string, InferenceEngine::Blob::Ptr> pair02(input_image_02, imgBlob_02);
imgBlobMap.insert(pair02);
inferRequest.SetInput(imgBlobMap);
inferRequest.StartAsync();
inferRequest.Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY);
InferenceEngine::Blob::Ptr output = inferRequest.GetBlob(output_name);
std::vector<unsigned> class_results;
ClassificationResult cls(output, {"x", "y"}, 2, 3);
class_results = cls.getResults();
Unfortunately, I received the following error message from the command
inferRequest.SetInput(imgBlobMap);
[NOT_FOUND] Failed to find input or output with name: 'path/to/image_02.png'
C:\j\workspace\private-ci\ie\build-windows-vs2019#2\b\repos\openvino\inference-engine\src\plugin_api\cpp_interfaces/impl/ie_infer_request_internal.hpp:303
C:\Program Files (x86)\Intel\openvino_2021.3.394\inference_engine\include\details/ie_exception_conversion.hpp:66
How can I create a batch of more than image, do a inference and get the information for classification class and confidence? Is the confidence and class located in the received variable of GetBlob()? Should I need the call of ClassificationResult cls(output, {"x", "y"}, 2, 3);?

I'd recommend you to review Using Shape Inference article from OpenVINO online documentation to be aware of the limitations of using batches. It also refers to Open Model Zoo smart_classroom_demo, where dynamic batching is used in processing multiple previously detected faces. Basically, when you have batch enabled in the model, the memory buffer of your input blob will be allocated to have a room for all batch of images, and your responsibility is to fill data in input blob for each image in batch from your data. You may take a look at function CnnDLSDKBase::InferBatch, of smart_classroom_demo, which is located at file smart_classroom_demo/cpp/src/cnn.cpp, line 51. As you can see, in the loop over num_imgs an auxiliary function matU8ToBlob fills the input blob with data for current_batch_size of images, then set batch size for infer request and run inference.
for (size_t batch_i = 0; batch_i < num_imgs; batch_i += batch_size) {
const size_t current_batch_size = std::min(batch_size, num_imgs - batch_i);
for (size_t b = 0; b < current_batch_size; b++) {
matU8ToBlob<uint8_t>(frames[batch_i + b], input, b);
}
if (config_.max_batch_size != 1)
infer_request_.SetBatch(current_batch_size);
infer_request_.Infer();

there is a similar sample using the batch inputs as input into model within the OpenVINO. You can refer to below link.
https://github.com/openvinotoolkit/openvino/blob/ae2913d3b5970ce0d3112cc880d03be1708f13eb/inference-engine/samples/hello_nv12_input_classification/main.cpp#L236

Is it possible to restore the .proto file when a message uses package, imports, and field options?

My goal is to restore the lost .proto files written by someone else from existing c++ protobuf messages. By using the Descriptor and EnumDescriptor I was able to do the following:
const google::protobuf::EnumDescriptor* logOptionDesc =
bgs::protocol::LogOption_descriptor();
std::string logOptionStr = logOptionDesc->DebugString();
bgs::protocol::EntityId entityId;
const google::protobuf::Descriptor* entityIdDesc = entityId.GetDescriptor();
std::string entityIdStr = entityIdDesc->DebugString();
The logOptionStr string I got looked something like this:
enum LogOption {
HIDDEN = 1;
HEX = 2;
}
and entityIdStr:
message EntityId {
required fixed64 high = 1 [(.bgs.protocol.log) = HEX];
required fixed64 low = 2 [(.bgs.protocol.log) = HEX];
}
Notice the EntityId message contains some field options. Without resolving this dependency I cannot generate a FileDescriptor that can help me restore the .proto files. I suspect the EntityId string should look something like the following:
import "LogOption.proto";
package bgs.protocol;
extend google.protobuf.FieldOptions {
optional LogOptions log = HEX;
}
message EntityId {
required fixed64 high = 1 [(.bgs.protocol.log) = HEX];
required fixed64 low = 2 [(.bgs.protocol.log) = HEX];
}
Is it possible to restore the .proto files that require additional information such as package, field options and imports? What else do I need to do to restore the .proto files?

How to write or convert float-type data to leveldb in caffe

Now I am making the leveldb to train caffe framework.So I use "convert_imageset.cpp". This cpp file writes the char-type data only to leveldb.
But I have the float data to write it to leveldb. This data is pre-proceed image data so it is float type data.
how can I write or convert this float data to leveldb.
This float data is a set of vector with 4096 dimensions.
Please help me.
Or not how to convert it to HDF5Data?

HDF5 stands for hierarchical data format. You can manipulate such data format for example with R (RHDF5 documentation)
Other software that can process HDF5 are Matlab and Mathematica.
EDIT
A new set of tools called HDFql has been recently released to simplify "managing HDF files through a high-level language like C/C++". You can check it out here

def del_and_create(dname):
if os.path.exists(dname):
shutil.rmtree(dname)
os.makedirs(dname)
def get_img_datum(image_fn):
img = cv.imread(image_fn, cv.IMREAD_COLOR)
img = img.swapaxes(0, 2).swapaxes(1, 2)
datum = caffe.io.array_to_datum(img, 0)
return datum
def get_jnt_datum(joint_fn):
joint = np.load(joint_fn)
datum = caffe.io.caffe_pb2.Datum()
datum.channels = len(joint)
datum.height = 1
datum.width = 1
datum.float_data.extend(joint.tolist())
return datum
def create_dataset():
img_db_fn = 'img.lmdb'
del_and_create(img_db_fn)
img_env = lmdb.Environment(img_db_fn, map_size=1099511627776)
img_txn = img_env.begin(write=True, buffers=True)
jnt_db_fn = 'joint.lmdb'
del_and_create(jnt_db_fn)
jnt_env = lmdb.Environment(jnt_db_fn, map_size=1099511627776)
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_fns = glob.glob('imageData/*.jpg')
fileCount = len(img_fns)
print 'A total of ', fileCount, ' images.'
jnt_fns = glob.glob('jointData/*.npy')
jointCount = len(jnt_fns)
if(fileCount != jointCount):
print 'The file counts doesnot match'
exit()
keys = np.arange(fileCount)
np.random.shuffle(keys)
for i, (img_fn, jnt_fn) in enumerate( zip(sorted(img_fns), sorted(jnt_fns)) ):
img_datum = get_img_datum(img_fn)
jnt_datum = get_jnt_datum(jnt_fn)
key = '%010d' % keys[i]
img_txn.put(key, img_datum.SerializeToString())
jnt_txn.put(key, jnt_datum.SerializeToString())
if i % 10000 == 0:
img_txn.commit()
jnt_txn.commit()
jnt_txn = jnt_env.begin(write=True, buffers=True)
img_txn = img_env.begin(write=True, buffers=True)
print '%d'%(i), os.path.basename(img_fn), os.path.basename(jnt_fn)
img_txn.commit()
jnt_txn.commit()
img_env.close()
jnt_env.close()
The above code expects images from a given path, and the labels of each image as .npy file.
Credits: https://github.com/mitmul/deeppose/blob/caffe/scripts/dataset.py
Note: I had seen Shai's answer to a question, which claims that lmdb doesnot support float-type data. But, it does work for me with the latest version of Caffe and LMDB and using this code snippet. As his answer is way too old, its highly likely that older versions may not have supported float-type data.

ColdFusion - Reading in lots of images

What is the most efficient way of reading in lots of images in CF / Railo and checking their width and height?
In my app, I need to typically read in about 20 images + and at the moment this takes up to 14 seconds to complete. A bit too long really.
theImageRead = ImageNew(theImageSrc);
if ( imageGetWidth(theImageRead) > 100 ) {
writeOutput('<img src="' & theImageSrc & '" />');
}
Images are read from a list of absolute URL's. I need to get images specified over a certain dimension.
If there's a quicker solution to this then I'd love to get your insight. Perhaps underlying java methods?
I am also using jSoup if there's anything in that which could help.
Thanks,
Michael.

I don't believe there's any way to determine the pixel dimensions of an image without reading the bytes and creating an image object. The main bottleneck here will be the http request overhead.
that said there are a few ways to speed up what you're trying to do.
use threads to concurrently request images, then when all threads have finished processing output the images.
If you display the same image or set of images more than once cache it. If you don't want to cache the actually image you can cache the metadata to avoid having to perform a http request for every image.
decide if you need to output all the images to the page immediately, or could some or all of these be deferred and loaded via and ajax request

I have written this utility function quite a while ago (it runs on older ColdFusion versions, too). Maybe it helps.
Note that this requires the Java Advanced Imaging Image I/O Tools (Jai-imageio). Download the .jar and put it in your class path (restarting CF is necessary).
/**
* Reads basic properties of many types of images. Values are
* returned as a struct consisting of the following elements:
*
* Property names, their types and default values:
* ImgInfo.width = 0 (pixels)
* ImgInfo.height = 0 (pixels)
* ImgInfo.size = 0 (bytes)
* ImgInfo.isGrayscale = false (boolean)
* ImgInfo.isFile = false (boolean)
* ImgInfo.success = false (boolean)
* ImgInfo.error = "" (string)
*
* #param FilePath Physical path to image file.
* #return A struct, as described.
*/
function GetImageProperties(FilePath) {
var ImgInfo = StructNew();
var jImageIO = CreateObject("java", "javax.imageio.ImageIO");
var jFile = CreateObject("java", "java.io.File").init(FilePath);
var jBufferedImage = 0;
var jColorSpace = 0;
ImgInfo.width = "";
ImgInfo.height = "";
ImgInfo.fileSize = 0;
ImgInfo.isGrayscale = false;
ImgInfo.isFile = jFile.isFile();
ImgInfo.success = false;
ImgInfo.error = "";
try {
jBufferedImage = jImageIO.read(jFile);
ImgInfo.fileSize = jFile.length();
ImgInfo.width = jBufferedImage.getWidth();
ImgInfo.height = jBufferedImage.getHeight();
jColorSpace = jBufferedImage.getColorModel().getColorSpace();
ImgInfo.isGrayscale = (jColorSpace.getType() eq jColorSpace.TYPE_GRAY);
ImgInfo.success = true;
}
catch (any ex) {
ImgInfo.error = ToString(ex);
}
jImageIO = JavaCast("null", "");
jFile = JavaCast("null", "");
jBufferedImage = JavaCast("null", "");
jColorSpace = JavaCast("null", "");
return ImgInfo;
}
Use like:
imageInfo = GetImageProperties(theImageSrc);
if (imageInfo.success and imageInfo.width > 100)
writeOutput('<img src="#HTMLEditFormat(theImageSrc)#" />');
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Which compression algorithm is better suited to compress protocol buffers output? - compression

Related

Using C++ protobuf formatted structure in leveldb. set/get operations

OpenVINO - Image classification

Is it possible to restore the .proto file when a message uses package, imports, and field options?

How to write or convert float-type data to leveldb in caffe

ColdFusion - Reading in lots of images

Categories

Resources