How do I read binary C++ protobuf data using Python protobuf?

How do I read binary C++ protobuf data using Python protobuf? - c++

The Python version of Google protobuf gives us only:
SerializeAsString()
Where as the C++ version gives us both:
SerializeToArray(...)
SerializeAsString()
We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?
Is this the correct way of doing it?
binary = get_binary_data()
binary_size = get_binary_size()
string = None
for i in range(len(binary_size)):
string += i
message = new MyMessage()
message.ParseFromString(string)
Update:
Here's a new example, and a problem:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(data)
When we get to the foo_bar.ParseFromString(data) line, I get this error:
Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.
Update 2:
It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).
This padding comes from using the C++ protobuf function, SerializeToArray on a fixed-length buffer. To eliminate this, I have used this temproary code:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
string = ''
for i in range(0, len(data)):
byte = data[i]
if byte != '\xcc': # yuck!
string += data[i]
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(string)
There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.

I'm not an expert with Python, but you can pass the result of a file.read() operation into message.ParseFromString(...) without having to build a new string type or anything.

Python strings can contain any character, i.e. they are capable of holding "binary" data directly. There should be no need to convert from string to "binary".

Related

NEAR FunctionCall `args` field

In the near_primitives::views, the args field on the FunctionCall is represented as a String type. From the chain data model, which is transaction::Action::FunctionCall, its args field there is a `Vec.
The question is, does this args field will always content a valid JSON payload as the content? We assume the answer is probably a No since the underlying field contains pure bytes.
In which circumstances this would a valid JSON string and in which circumstances it would be a binary format?
Finally, if binary format is possible (likely), how is it possible to decode it? Is this in developers hand and could be any binary format?
See
https://github.com/near/nearcore/blob/14711926391d3ec1d23116658a295a62e77bc701/core/primitives/src/views.rs#L768
https://github.com/near/nearcore/blob/14711926391d3ec1d23116658a295a62e77bc701/core/primitives/src/transaction.rs#L113

In most cases args will be base64 encoded JSON string.
Here's an example of how we decode them on NEAR Indexer for Explorer side.
ActionView::FunctionCall {
method_name,
args,
gas,
deposit,
} => {
if let Ok(decoded_args) = base64::decode(args) {
if let Ok(mut args_json) = serde_json::from_slice(&decoded_args) {
escape_json(&mut args_json);
arguments["args_json"] = args_json;
}
}
Is this in developers hand and could be any binary format?
Yes.
Rainbow Bridge-related transactions have borsh-serialized args which are not possible to decode into JSON.
ref: https://github.com/near/near-indexer-for-explorer/blob/master/src/models/serializers.rs#L94-L103

args are not limited to any format at all, they are just binary blob. What you see in the views.rs is partially serialized data where args are expected to be in base64 encoding thus it is a String (thus, it is always base64 data there; be it JSON, Borsh-serialized data, or just raw binary blob, e.g. PNG image)

Coldfusion 2018 No SOF segment Multi-Page TIFF Error

I have a new CF18 server and I'm getting some errors reading and converting some old images that were readable on my previous CF11 server. FYI GetReadableImageFormats results in "BMP,GIF,JPEG,JPEG 2000,JPEG2000,JPG,PNG,PNM,RAW,TIF,TIFF,WBMP"
Normally I read the files as a Binary and put it into memory for manipulation
<cffile action="readBinary" file="#file_location#" variable="binImage" />
<cfimage action="read" source="#binImage#" name="objImage" isbase64="no">
This now results in an error:
"An exception occurred while trying to read the image. No SOF segment in stream"
Reading the file with action="read" and dumping the left(binImage, 999) results:
"...2015:10:07 17:46:58 Kofax standard Multi-Page TIFF Storage Filter v3.03.000,..."
Then I tried reading it into java using:
<cfset tifFileName="#file_location#">
<cfscript>
ss = createObject("Java","com.sun.media.jai.codec.FileSeekableStream").init(tifFileName);
//create JAI ImageDecoder
decoder = createObject("Java","com.sun.media.jai.codec.ImageCodec").createImageDecoder("tiff", ss, JavaCast("null",""));
</cfscript>
Which yields an error:
"Decoding of old style JPEG-in-TIFF data is not supported."
I found this...
Decoding of old style JPEG-in-TIFF data is not supported
Do you think using TwelveMonkeys ImageIO the best path to follow for my issue?
UPDATE: Based on the suggestion that there is an invalid marker 0xFF9E I tried the following:
<cffile action="readBinary" file="#file_location#" variable="binImage" />
<cfset hexEncoding = binaryEncode(binImage, "hex")>
<cfset new_hexEncoding = replaceNoCase(hexEncoding, 'FF9E', 'FFE9', 'ALL')>
<cfset binImage = binaryDecode(new_hexEncoding, "hex")>
isImage(binImage) returns "NO" and the "No SOF segment in stream" error persists. I looped over the hexEncoding and found the FF9E string 23x. I've never edited raw image code so I'm not sure my replace is correct.
Edit: At this point I'm fairly certain my Search and Replace hexEncoding, 'FF9E', 'FFE9' logic is flawed. there is no occurance of 0xff9e in the binaryEncoded binImage.

This was driving me nuts. I tried everything I could find short of installing extra JAVA libraries or routing it through other executables to make the conversion. In my case there is only one JPEG in a TIFF, so I wrote something that literally grabs the binary data for the JPEG out of the TIFF (doesn't account for pages) and serves it up. Once you have the binary of the JPEG you can write it to a file, do conversions on it, even stream it direct to the browser. Here ya go future people who need this. I didn't write it to do pages or detect what kind of tiff it is since for my uses I already know all that. These things are .bin files, but they are all the same single page jpeg in a tiff and I needed a way to serve them up quickly in a format that browsers don't hate. This runs fast enough to be served up on the fly. Is there a better way? Probably, but this works, self contained, copy and paste, and makes complete sense to anyone that needs to edit it.
<cfscript>
strFileName = "test.tiff";
blnOutputImageToBrowser = true;
blnSaveToFile = true;
strSaveFile = GetDirectoryFromPath(GetCurrentTemplatePath())&"test.jpg"
imgByteArray = FileReadBinary(strFileName);
//Convert to HEX String
hexString = binaryEncode(imgByteArray,"hex"); //Convert binary to HEX String, so we can pattern search it
//Set HEX Length
hexLength = arraylen(imgByteArray);
//Find Start of JPG Data in HEX String
jpegStartHEX = find("FFD8FF",hexString);
jpegStartBIN = (jpegStartHEX-1)/2; //-1 because CF arrays start on 1 and everyone else starts on 1. /2 because the HEX string positions are double the byte array positions
objByteBuffer = CreateObject("java","java.nio.ByteBuffer"); //Init JAVA byte buffer class for us to use later (this makes it go faster than trying to convert the hex string back to binary)
//Find Stop of JPG Data
jpegStopHEX = 0;
jpegStopBIN = 0;
intSearchIDX = jpegStartHEX+6; //Might as well start after the JPEG start block
blnStop = false;
while (intSearchIDX < len(hexString) && jpegStopHEX == 0 && !blnStop) {
newIDX = find("FFD9",hexString,intSearchIDX);
if (newIDX == 0) {
blnStop=true;
}
else {
if (newIDX%2 == 0) { //bad search try again (due to indexing in CF starting on 1 instead of 0, the even numbers are in between hex code [they are pairs like 00 and FF])
intSearchIDX = newIDX+1;
}
else { //Found ya
jpegStopHEX = newIDX;
blnStop=true;
}
}
}
jpegStopBIN = (jpegStopHEX-1)/2; //-1 because CF arrays start on 1 and everyone else starts on 1. /2 because the HEX string positions are double the byte array positions
//Dump JPG Binary into ByteArray from the start and stop positions we discovered
jpegLengthBIN = jpegStopBIN+2-jpegStartBIN;
objBufferImage = objByteBuffer.Allocate(JavaCast( "int", jpegLengthBIN ));
objBufferImage.Put(imgByteArray,JavaCast( "int", jpegStartBIN ),JavaCast( "int", jpegLengthBIN ));
if (blnSaveToFile) { //Dump byte array into test file
fileWrite(strSaveFile,objBufferImage.Array());
}
if (blnOutputImageToBrowser) {
img = ImageNew( objBufferImage.Array() );
ImageResize(img,"1200","","highestPerformance"); //Because we might as well show an example of resizing
outputImage(toBinary(toBase64(img))); //You could skip loading the byte array as an image object and just plop the binary in directly if you don't need to manipulate it any
}
</cfscript>
<cffunction name="outputImage" returntype="void">
<cfargument name="binInput" type="binary">
<cfcontent variable="#binInput#" type="image/png" reset="true" />
<cfreturn>
</cffunction>

SerialPort Encoding

I try to configure serial port in C++, every thing goes right except its Encoding.
My code is:
serialPort->PortName = "COM3";
serialPort->BaudRate = 115200;
serialPort->NewLine = "***";
serialPort->Encoding = Encoding->GetEncoding(28591);
I get the following error:
type name is not allowed.
How to use Text Encodings in C++?

I guess Encoding is a type name, and the last line in the code snippet must be
serialPort->Encoding = Encoding::GetEncoding(28591);

Use Scala Iterator to break up large stream (from string) into chunks using a RegEx match, and then operate on those chunks?

I'm currently using a not-very-Scala-like approach to parse large Unix mailbox files. I'm still learning the language and would like to challenge myself to find a better way, however, I do not believe I have a solid grasp on just what can be done with an Iterator and how to effectively use it.
I'm currently using org.apache.james.mime4j, and I use the org.apache.james.mime4j.mboxiterator.MboxIterator to get a java.util.Iterator from a file, as so:
// registers an implementation of a ContentHandler that
// allows me to construct an object representing an email
// using callbacks
val handler: ContentHandler = new MyHandler();
// creates a parser that parses a SINGLE email from a given InputStream
val parser: MimeStreamParser = new MimeStreamParser(configBuilder.build());
// register my handler
parser.setContentHandler(handler);
// Get a java.util.Iterator
val iterator = MboxIterator.fromFile(fileName).build();
// For each email, process it using above Handler
iterator.forEach(p => parser.parse(p.asInputStream(Charsets.UTF_8)))
From my understanding, the Scala Iterator is much more robust, and probably a lot more capable of handling something like this, especially because I won't always be able to fit the full file in memory.
I need to construct my own version of the MboxIterator. I dug through the source for MboxIterator and was able to find a good RegEx pattern to use to determine the beginning of individual email messages with, however, I'm drawing a blank from now on.
I created the RegEx like so:
val MESSAGE_START = Pattern.compile(FromLinePatterns.DEFAULT, Pattern.MULTILINE);
What I want to do (based on what I know so far):
Build a FileInputStream from an MBOX file.
Use Iterator.continually(stream.read()) to read through the stream
Use .takeWhile() to continue to read until the end of the stream
Chunk the Stream using something like MESSAGE_START.matcher(someString).find(), or use it to find the indexes the separate the message
Read the chunks created, or read the bits in between the indexes created
I feel like I should be able to use map(), find(), filter() and collect() to accomplish this, but I'm getting thrown off by the fact that they only give me Ints to work with.
How would I accomplish this?
EDIT:
After doing some more thinking on the subject, I thought of another way to describe what I think I need to do:
I need to keep reading from the stream until I get a string that matches my RegEx
Maybe group the previously read bytes?
Send it off to be processed somewhere
Remove it from the scope somehow so it doesn't get grouped the next time I run into a match
Continue to read the stream until I find the next match.
Profit???
EDIT 2:
I think I'm getting closer. Using a method like this gets me an iterator of iterators. However, there are two issues: 1. Is this a waste of memory? Does this mean everything gets read into memory? 2. I still need to figure out a way to split by the match, but still include it in the iterator returned.
def split[T](iter: Iterator[T])(breakOn: T => Boolean):
Iterator[Iterator[T]] =
new Iterator[Iterator[T]] {
def hasNext = iter.hasNext
def next = {
val cur = iter.takeWhile(!breakOn(_))
iter.dropWhile(breakOn)
cur
}
}.withFilter(l => l.nonEmpty)

If I understand correctly, you want to lazily chunk a large file delimited by a regex recognizable pattern.
You could try to return an Iterator for each request but the correct iterator management would not be trivial.
I'd be inclined to hide all file and iterator management from the client.
class MBox(filePath :String) {
private val file = io.Source.fromFile(filePath)
private val itr = file.getLines().buffered
private val header = "From .+ \\d{4}".r //adjust to taste
def next() :Option[String] =
if (itr.hasNext) {
val sb = new StringBuilder()
sb.append(itr.next() + "\n")
while (itr.hasNext && !header.matches(itr.head))
sb.append(itr.next() + "\n")
Some(sb.mkString)
} else {
file.close()
None
}
}
testing:
val mbox = new MBox("so.txt")
mbox.next()
//res0: Option[String] =
//Some(From MAILER-DAEMON Fri Jul 8 12:08:34 2011
//some text AAA
//some text BBB
//)
mbox.next()
//res1: Option[String] =
//Some(From MAILER-DAEMON Mon Jun 8 12:18:34 2012
//small text
//)
mbox.next()
//res2: Option[String] =
//Some(From MAILER-DAEMON Tue Jan 8 11:18:14 2013
//some text CCC
//some text DDD
//)
mbox.next() //res3: Option[String] = None
There is only one Iterator per open file and only the safe methods are invoked on it. The file text is realized (loaded) only on request and the client gets just what's requested, if available. Instead of all lines in one long String you could return each line as part of a collection, Seq[String], if that's more applicable.
UPDATE: This can be modified for easy iteration.
class MBox(filePath :String) extends Iterator[String] {
private val file = io.Source.fromFile(filePath)
private val itr = file.getLines().buffered
private val header = "From .+ \\d{4}".r //adjust to taste
def next() :String = {
val sb = new StringBuilder()
sb.append(itr.next() + "\n")
while (itr.hasNext && !header.matches(itr.head))
sb.append(itr.next() + "\n")
sb.mkString
}
def hasNext: Boolean =
if (itr.hasNext) true else {file.close(); false}
}
Now you can .foreach(), .map(), .flatMap(), etc. But you can also do dangerous things like .toList which will load the entire file.

VBS script attempting to write to .rc file returns error

I am trying to find a process by which to edit and write to a resource .rc file; I attempted to use the sample code listed at
How to increment values in resourse file by using vbscript but the last line in both samples returned the same error ( fso.OpenTextFile(rcfile, 2).Write rctext ) :
Error: Invalid procedure call or argument
Code: 800A0005
Source: Microsoft VBScript runtime error
I modified the script to write out to a .txt file and that worked fine, but I'm baffled as to what may be causing the problem writing out to a .rc file.

From the linked sample (simplified)
rctext = fso.OpenTextFile(rcfile).ReadAll
rctext = ....
fso.OpenTextFile(rcfile, 2).Write rctext
The idea is read all the file, and as far as there is no variable holding a reference to the opened file, it is closed, then change what needs to be changed and open again the file, now for writing, and write the changed content to file
And, usually, it works. But sometimes the file opened for reading is not closed fast enough to later open it for writing.
To ensure the file is closed and then can be opened for writing, change the reading code to
set f = fso.OpenTextFile(rcfile)
rctext = f.ReadAll
f.Close

As your line
fso.OpenTextFile(rcfile, 2).Write rctext
does three things (access fso, open file, write to it), there are many things that could go wrong. Please see this answer for ideas wrt to problems concerning the first two actions. Another answer concerns the write.
In your case, the evidence - works with a.txt, but not with b.rc - makes it highly improbable that the file's opening is to blame (so .Close won't save you). I suspect that the .rc contains Unicode (UTF-8/UTF-16) data that the textstream can't encode.
So either use the unicode parameter to read/write open the file with UTF-16 encoding or an ADODB.Stream for UTF-8.

It seems that the answer to my question required both of your answers(#MC ND and #Ekkehard.Horner); also, once I changed the vbs script to open and write to the .rc file in Unicode, which I'm not sure why I have to, the script was able to execute without error.
Here is the vbs script in it's final form:
Const ForReading = 1, ForWriting = 2
Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
Const DoNotCreate = false
rcFile = "C:\Path\To\RC\File.rc"
major = 4
minor = 3
maint = 2
build = 1
version = major & "," & minor & "," & maint & "," & build
Set fso = CreateObject("Scripting.FileSystemObject")
Set fileObj = fso.OpenTextFile(rcFile, ForReading, DoNotCreate, TristateTrue)
rcText = fileObj.ReadAll
fileObj.Close
Set regex = New RegExp
regex.Global = True
regex.Pattern = "(PRODUCTVERSION|FILEVERSION) \d+,\d+,\d+,\d+"
rcText = regex.Replace(rcText, "$1 " & version)
regex.Pattern = "(""(ProductVersion|FileVersion)"",) ""\d+, \d+, \d+, \d+"""
rcText = regex.Replace(rcText, "$1 """ & Replace(version, ",", ", ") & """")
Set fileObj = fso.GetFile(rcFile)
Set textStream = fileObj.OpenAsTextStream(ForWriting, TristateTrue)
textStream.Write rcText
textStream.Close
The only thing that does not seem to work is the regex for replacing the ProduceVersion|FileVersion values, but hopefully I can hammer that out within a reasonable time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How do I read binary C++ protobuf data using Python protobuf? - c++

I'm not an expert with Python, but you can pass the result of a file.read() operation into message.ParseFromString(...) without having to build a new string type or anything.

Python strings can contain any character, i.e. they are capable of holding "binary" data directly. There should be no need to convert from string to "binary".

Related

NEAR FunctionCall `args` field

Coldfusion 2018 No SOF segment Multi-Page TIFF Error

SerialPort Encoding

Use Scala Iterator to break up large stream (from string) into chunks using a RegEx match, and then operate on those chunks?

VBS script attempting to write to .rc file returns error

Categories

Resources