How to retrieve multiple NDJSON objects from the same file using ArduinoJson?

How to retrieve multiple NDJSON objects from the same file using ArduinoJson? - c++

I am trying to use ArduinoJson to parse Google's quickdraw dataset, which contains .ndjson files with multiple objects inside. I figured how to retrieve the first of the objects in the file using the following simple code:
DeserializationError deserialization_error = ArduinoJson::deserializeJson(doc, as_cstr);
if (deserialization_error) {
printf("deserializeJson() failed: %s\n", deserialization_error.c_str());
}
However, this only parses the first object in the ndjson file.
According to the website, I get the sense that something else should happen automatically:
NDJSON, JSON Lines
When parsing a JSON document from an input stream, ArduinoJson stops reading as soon as the document ends (e.g., at the closing brace).
This feature allows to read JSON documents one after the other; for example, it allows to read line-delimited formats like NDJSON or JSON Lines.
{"event":"add_to_cart"}
{"event":"purchase"}
Is there some way to get the byte length of the parsed object to I can continue using the cstring to parse consecutive objects? I did print out the cstring and it does contain the entirety of the ndjson file.

I found it.
just call multiple times:
DeserializationError error = deserializeJson(doc, sceneFile);
or:
deserializeJson(docline1, sceneFile);
deserializeJson(docline2, sceneFile);
deserializeJson(docline3, sceneFile);

Related

Parsing the date from the header stream of an .msg file

I'm trying to obtain the send date of an .msg email message file. After endless searching, I've concluded that the send date is not kept in its own stream within the file (but please correct me if I'm wrong). Instead, it appears that the date must be obtained from the stream containing the standard email headers (a stream named __substg1.0_007D001F).
So I've managed to obtain the email header stream and store it in a buffer. At this point, I need to find and parse the Date field from the headers. I'm finding this difficult, because I don't believe I can use a standard email-parsing C++ library. After all, I only have a header stream--not an entire, standard email file.
I'm currently trying a regex, perhaps something like this:
std::wregex regexDate(L"^Date:(.*)\r\n");
std::wsmatch match;
if (std::regex_search(strHeader, match, regexDate)) {
//...
}
But I'm reluctant to use regex (I'm concerned that it'll be error-prone), and I'm wondering if there's a more robust, accepted approach to parsing headers. Perhaps splitting the header string on new lines and finding the one that begins with Date:? Any guidance would be greatly appreciated.
One other consideration: I'm not sure it's possible to read in the header stream line by line, because IStream doesn't have a get line method.
(Side note: I've also tried obtaining message data using C++ Outlook automation, but that seems to involve some security and compatibility issues, so it won't work out.)

The Send Date is stored in an msg file, but as you note, it is not in its own stream. As a short, fixed-width value, it can be found in the __properties_version1.0 stream object under the root entry (or under an attachment object for embedded messages), with the property ID 0x00390040, the PidTagClientSubmitTime Property, which is described in the MS-OXOMSG documentation as
Contains the current time, in UTC, when the email message is submitted.
MS-OXCMAIL Section 2.2.3.2.2: Sent time elaborates on this:
To set the value of the PidTagClientSubmitTime property ([MS-OXOMSG] section 2.2.3.11), clients MUST set the Date header value, as specified in [RFC2822].
This has the property type 0x0040, pTypTime, which, per the list of Property Data Types:
8 bytes; a 64-bit integer representing the number of 100-nanosecond intervals since January 1, 1601

How to read Text file and returns additional input field using TextIO?

I have a PCollection of KV where key is filename and value is some additional info of the files (e.g., the "Source" systems that generated the files). E.g.,
KV("gs://bucket1/dir1/X1.dat", "SourceX"),
KV("gs://bucket1/dir2/Y1.dat", "SourceY")
I need to read all lines from the files and with the "Source" field, returning as a KV PCollection.
KV(line1 from X1.dat, "SourceX")
KV(line2 from X1.dat, "SourceX")
...
KV(line1 from Y1.dat, "SourceY")
I was able to achieve this by calling FileIO.match() and followed by a DoFn in which I sequentially read the file and append the SourceX (retrieved from a map passed in SideInput).
To get the benefit of parallel reading, could I use TextIO.readAll() to achieve this? TextIO.read() returns a PCollection, without filename info. How can I join it back the map of Filename to Source mapping? Tried WithKeys transfer, but not working ...

Currently using FileIO.match() as you are doing is the best way to accomplish this, but once https://github.com/apache/beam/pull/12645 is merged you'll be able to use the new ContextualTextIO transforms.
Note that computing line numbers in a distributed manner is inherently expensive; you might want to see if you can use offsets (much esasier to compute, and ordered the same as line numbers) instead.

If I understand correctly, you want to read the file in parallel? Unfortunately, TextIO.readAll does not have this feature. You will have to use FileIO.match, and then write your DoFn to read the file in the custom way that you want.
This is because you will not be able to do a random seek into a file and preserve the count of line numbers.
Is reading files serially a bottleneck for your pipeline?

How can I send a list using MQTT

d = random,randint(1,30)
data = [d, strftime("%Y%m%d %H%M%S", gmtime())] #random num , system time
client.publish("gas", str(data)]
This is a part of my python code which is ver2.
I'm trying to send a list using MQTT.
However, If I write bytearray instead of str which is third line
It says "ValueError: string must be of size 1".
So I wrote str then make it sting type
Can I send a just list which is NOT string type.

MQTT message payloads are just byte arrays, there is no inherent format to them. Strings tend to works as long as both ends of the transaction are using the same character encoding.
If you want to send structured data (such as the ost) then you need to decide on a way to encode that structure so the code receiving the message will know how to reconstruct it.
The current usual solution to this problem is to encode structures are JSON, but XML or something like protobuffers are also good candidates.
The following question has some examples of converting Python lists to JSON objects
Serializing list to JSON

How can I store the content of a *.css file (text file) with addidional information in a new file?

I have a textfile (*.css Cascading Style Sheets) file, which is a plain text.
Then I have additional program information, just some double and int values, which has noithing to do with the text file directly.
I would like to store that state in a file, so that when I open that file I have access to the content of the *.css and the double and int values.
So I would be able the applications last state with the text file content and those double and int values.
What would be the most effective way?

I guess, you'll want the result to still be usable as a CSS File. In that case, add a comment block with a marker at the beginning, where you can store your data in some ASCII-Format, e.g. JSON. Like e.g.:
/* --#--
{x=42, y=47.11, whatever="blabla"}
*/
/* here comes the original css */
Then you can easily find out, if the data is already there by looking for /* --#--. You can use existing JSON parsers to retrieve your data and existing JSON writers to generate the file. You don't have to parse the whole CSS, but only the comments with the marker.

C++ write to a specific part of text file without overwriting

I'm writing a simple calendar application which saves the data in a text file. I use the iCalendar-format, so my text file ends "END:VCALENDAR".
When the user adds a new event, the application should write the associated data at the end of the text file without overwriting "END:VCALENDAR", how can I do this? What about deleting an event which is saved in the middle of the text file? Is there a need to write the whole file again using the updated data? Many thanks.

You can't dynamically "expand" the file by writing in the middle of it.
You'll need to, either:
Deserialize the whole calendar to memory, then write it back (best option)
Read into memory everything which lies past the point you want to insert the data, write you data, then write the stored file "tail"

There isn't any way of inserting into the middle of a file; the underlying OS doesn't support it. The usual technique is to copy the file into a temporary file, making whatever modifications you need to along the line, then (and only if there are no errors on the output of the copy—do verify that the output stream has not failed after the close) delete the input file and rename/move the temporary file to the original name.

There is no method supported by the C++ libraries that, unlike append, gives an option to insert at any specific position into a file; be it a text or a binary file.
There are two options for you then:
First is the one you are presuming, that is, read the whole file, update the data and write it back again.
Second is to seek in the file to the last line's first character E as in END:VCALENDAR, write your event and then append "END:VCALENDAR" to it.
And yes, you can find that first character of last line, E right after the last newline character, programmatically.
Sorry, there isn't really any other way around, as far as I know.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to retrieve multiple NDJSON objects from the same file using ArduinoJson? - c++

I found it. just call multiple times: DeserializationError error = deserializeJson(doc, sceneFile); or: deserializeJson(docline1, sceneFile); deserializeJson(docline2, sceneFile); deserializeJson(docline3, sceneFile);

Related

Parsing the date from the header stream of an .msg file

How to read Text file and returns additional input field using TextIO?

How can I send a list using MQTT

How can I store the content of a *.css file (text file) with addidional information in a new file?

C++ write to a specific part of text file without overwriting

Categories

Resources