Data recording and extraction software for C++ - c++

I am interested to learn about what libraries, tools, or frameworks there are for having a C++ program record data for later analysis and extraction. I provide a description of what I envision to give an idea of what I'm looking to do, but your suggestions need not fit it exactly.
I'd like to specify different record types for my program to record. For example, there might be a distinct record type for each type of message I get from a device, a record type for the results of major algorithms, a record type for each kind of operator input. Ideally the code changes for adding a new record type would be fairly minimal: Define a struct for the data to record, correlate it to a record type ID, and add the code to record instances to file.
After the main program runs, I'd like to run a data extraction tool that could give a summary of the data recorded and allow me to extract specific record types over a specified time period of the run. I could provide the exec to the tool and it would use some of the same hooks a debugger tool uses to figure out the names of the fields in the struct for use in the extraction report. It would be nice if the extraction report could be specified as .txt, .xml, .csv (for opening in Excel), or .hdf (for opening in Matlab).
This would be for Linux and GCC compiler. Ideally suggestions would be FOSS, but proprietary solutions are welcome too. Let me know!

What you described isn't anything special. Just generic serialization and de-serialization. If you want some specific library you should describe what exactly you want to do with the recorded data.
For serialization support, look into Boost::Serialization and s11n.

Related

How to read/restore big data file (SEGY format) with C/C++?

I am working on a project which needs to deal with large seismic data of SEGY format (from several GB to TB). This data represents the 3D underground structure.
Data structure is like:
1st tract, 2,3,5,3,5,....,6
2nd tract, 5,6,5,3,2,....,3
3rd tract, 7,4,5,3,1,....,8
...
What I want to ask is, in order to read and deal with the data fast, do I have to convert the data into another form? Or it's better to read from the original SEGY file? And is there any existing C package to do that?
If you need to access it multiple times and
if you need to access it randomly and
if you need to access it fast
then load it to a database once.
Do not reinvent the wheel.
When dealing of data of that size, you may not want to convert it into another form unless you have to - though some software does do just that. I found a list of free geophysics software on Wikipedia that look promising; many are open source and read/write SEGY files.
Since you are a newbie to programming, you may want to consider if the Python library segpy suits your needs rather than a C/C++ option.
Several GB is rathe medium, if we are toking about poststack.
You may use segy and convert on the fly, you may invent your own format. It depends whot you needed to do. Without changing segy format it's enough to createing indexes to traces. If segy is saved as inlines - it's faster access throug inlines, although crossline access is not very bad.
If it is 3d seismic, the best way to have the same quick access to all inlines/crosslines is to have own format - based od beans, e.g 8x8 traces - loading all beans and selecting tarces access time may be very quick - 2-3 secends. Or you may use SSD disk, or 2,5x RAM as your SEGY.
To quickly access timeslices you have 2 ways - 3D beans or second file stored as timeslices (the quickes way). I did same kind of that 10 years ago - access time to 12 GB SEGY was acceptable - 2-3 seconds in all 3 directions.
SEGY in database? Wow ... ;)
The answer depends upon the type of data you need to extract from the SEG-Y file.
If you need to extract only the headers (Text header, Binary header, Extended Textual File headers and Trace headers) then they can be easily extracted from the SEG-Y file by opening the file as binary and extracting relevant information from the respective locations as mentioned in the data exchange formats (rev2). The extraction might depend upon the type of data (Post-stack or Pre-stack). Also some headers might require conversions from one format to another (e.g Text Headers are mostly encoded in EBCDIC format). The complete details about the byte locations and encoding formats can be read from the above documentation
The extraction of trace data is a bit tricky and depends upon various factors like the encoding, whether the no. of trace samples is mentioned in the trace headers, etc. A careful reading of the documentation and getting to know about the type of SEG data you are working on will surely make this task a lot easier.
Since you are working with the extracted data, I would recommend to use already existing libraries (segpy: one of the best python library I came across). There are also numerous free available SEG-Y readers, a very nice list has already been mentioned by Daniel Waechter; you can choose any one of them that suits your requirements and the type file format supported.
I recently tried to do something same using C++ (Although it has only been tested on post-stack data). The project can be found here.

"Best" Input File Formats for C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input.
Here are some rough requirements:
The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be computer savvy and happy to use vi.)
Easily integrates with C++. I don't want to have to pull along a 100mb library and three different compilers to get it up and running.
Supports tabular input (2d, n-dimensional)
Supports POD types
Can expand as more inputs are required, binds well to variables, etc.
Parsing speed is not terribly important
Ideally, as easy to write (reflect) as it is to read
Works well on Windows and Linux
Supports compositing (one file referencing another file to read, and so on.)
Human Readable
In a perfect world, I would use a header-only library or some clean STL implementation, but I'm fine with leveraging Boost or some small external library if it works well.
So, what are your thoughts on various formats? Drawbacks? Advantages?
Edit
Options to consider? Anything else to add?
XML
YAML
SQLite
Google Protocol Buffers
Boost Serialization
INI
JSON
There is one excellent format that meets all your criteria:
SQLite!
Please read article about using SQLite as an application file format. Also, please watch Google Tech Talk by D. Richard Hipp (SQLite author) about this very topic.
Now, lets see how SQLite meets your requirements:
The format is a "standard"
SQLite has become format of choice for most mobile environments, and for many desktop apps (Firefox, Thunderbird, Google Chrome, Adobe Reader, you name it).
Easily integrates with C++
SQLite has standard C interface, which is only one source file and one header file. There are C++ wrappers too.
Supports tabular input (2d, n-dimensional)
SQLite table is as tabular as you could possibly imagine. To represent say 3-dimensional data, create table with columns x,y,z,value and store your data as a set of rows like this:
x1,y1,z1,value1
x2,y2,z2,value2
...
Supports POD types
I assume by POD you meant Plain Old Data, or BLOB. SQLite lets you store BLOB fields as is.
Can expand as more inputs are required, binds well to variables
This is where it really shines.
Parsing speed is not terribly important
But SQLite speed is superb. In fact, parsing is basically transparent.
Ideally, as easy to write (reflect) as it is to read
Just use INSERT to write and SELECT to read - what could be easier?
Works well on Windows and Linux
You bet, and all other platforms as well.
Supports compositing (one file referencing another file to read)
You can ATTACH one database to another.
Human Readable
Not in binary, but there are many excellent SQLite browsers/editors out there. I like SQLite Expert Personal on Windows and sqliteman on Linux. There is also SQLite editor plugin for Firefox.
There are other advantages that SQLite gives you for free:
Data is indexable which makes it very fast to search. You just cannot do this using XML, JSON or any other text-only formats.
Data can be edited partially, even when amount of data is very large. You do not have to rewrite few gigabytes just to edit one value.
SQLite is fully transactional: it guarantees that your data is consistent at all times. Even if your application (or whole computer) crashes, your data will be automatically restored to last known consistent state on next first attempt to connect to the database.
SQLite stores your data verbatim: you do not need to worry about escaping junk characters in your data (including zero bytes embedded in your strings) - simply always use prepared statements, that's all it takes to make it transparent. This can be big and annoying problem when dealing with text data formats, XML in particular.
SQLite stores all strings in Unicode: UTF-8 (default) or UTF-16. In other words, you do not need to worry about text encodings or international support for your data format.
SQLite allows you to process data in small chunks (row by row in fact), thus it works well in low memory conditions. This can be a problem for any text based formats, because often they need to load all text into memory to parse it. Granted, there are few efficient stream-based XML parsers out there, but in general any XML parser will be quite memory greedy compared to SQLite.
Having worked quite a bit with both XML and json, here's my rather subjective opinion of both as extendable serialization formats:
The format is a "standard": Yes for both
Easily integrates with C++: Yes for both. In each case you'll probably wind up with some kind of library to handle it. On Linux, libxml2 is a standard, and libxml++ is a C++ wrapper for it; you should be able to get both of those from your distro's package manager. It will take some small effort to get those working on Windows. There appears to be some support in Boost for json, but I haven't used it; I've always dealt with json using libraries. Really, the library route is not very onerous for either.
Supports tabular input (2d, n-dimensional): Yes for both
Supports POD types: Yes for both
Can expand as more inputs are required: Yes for both - that's one big advantage to both of them.
Binds well to variables: If what you mean is some way inside the file itself to say "This piece of data must be automatically deserialized into this variable in my program", then no for both.
As easy to write (reflect) as it is to read: Depends on the library you use, but in my experience yes for both. (You can actually do a tolerable job of writing json using printf().)
Works well on Windows and Linux: Yes for both, and ditto Mac OS X for that matter.
Supports one file referencing another file to read: If you mean something akin to a C #include, then XML has some ability to do this (e.g. document entities), while json doesn't.
Human readable: Both are typically written in UTF-8, and permit line breaks and indentation, and thus can be human-readable. However, I've just been working with a 479 KB XML file that's all on one line, so I had to run it through a prettyprinter to make sense of it. json can also be pretty unreadable, but in my experience is often formatted better than XML.
When starting new projects, I generally prefer json; it's more compact and more human-readable. The main reason I might select XML over json would be if I were worried about receiving badly-formed documents, since XML supports automated document format validation, while you have to write your own validation code with json.
Check out google buffers. This handles most of your requirements.
From their documentation, the high level steps are:
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the C++ protocol buffer API to write and read messages.
For my purposes, I think the way to go is XML.
The format is a standard, but allows for modification and flexibility for the schema to change as the program requirements evolve.
There are several library options. Some are larger (Xerces-C) some are smaller (ezxml), but there are many options, so we won't be locked in to a single provider or very specific solution.
It can supports tabular input (2d, n-dimensional). This requires more parsing work on "our" end, and is likely the weakest point for XML.
Supports POD types: Absolutely.
Can expand as more inputs are required, binds well to variables, etc. through schema modifications and parser modifications.
Parsing speed is not terribly important, so processing a text file or files is not an issue.
XML can be programmatically written just as easily as read.
Works well on Windows and Linux or any other OS that supports C and text files.
Supports compositing (one file referencing another file to read, and so on.)
Human Readable with many text editors (Sublime, vi, etc.) supporting syntax highlighting out of the box. Many web browsers display the data well.
Thanks for all the great feedback! I think if we wanted a purely binary solution, Protocol Buffers or boost::serialization is likely the way that we would go.

How to create extended (custom) file property in Windows?

We have a proprietary file format which has embedded in it a product-code.
I am just starting down the path of "enabling the end-user to sort / filter by product-code when opening a file".
The simplest approach for us might be to simply have another drop-down in our customized Open File Dialog in which to choose a product-code to filter by.
However, I think it might be more useful to the end-user if we could present this information as a column in the details view for this file type - just as name, date-modified, type, size, etc., are also detail properties of a file-type (or perhaps generic to all files).
My vague understanding is that XP and prior Windows OSes embedded some sort of meta data like this in an alternate data stream in NTFS. However, Starting in Vista Microsoft stopped using alternate data streams due to their dependence upon NTFS, and hence fragility (i.e. can't send via file attachment, can't move to a FAT formatted thumb drive, etc.)
Things I need to know but haven't figured out yet:
Is it possible / Is it practicable / how to create a custom extended file property for our file type that expresses the product-code to the Windows shell so that it can be seen in Windows Explorer (and hence File dialogs)?
If that is doable, then how to configure things so that the product-code column is displayed by default for folders containing our file type.
Can anyone point me to a good starting point on the above? We certainly don't have to accomplish this by publishing a custom extended file property - but that seems like a sensible approach, in absence of any way to measure the costs of going this route.
If you have sensible alternative approaches to the problem, I'd be interested in those as well!
Just found: http://www.codeproject.com/Articles/830/The-Complete-Idiot-s-Guide-to-Writing-Shell-Extens
CRAP! It seems I'm very late to the banquet, and MS has already removed this functionality from their shell: http://xpwasmyidea.blogspot.com/2009/10/evil-conspiracy-behind-customizable.html
By far the easiest approach to developing a shell extension is to use a library made for the purpose.
I can recommend EZShellExtension because I have used it in the past to add columns and thumbnails/preview for a custom file format for our company.

Creating metadata for binary file

I have a binary file I'm creating in C++, I'm tasked to create a metadata format to describe the data that it can be read in Java using the metadata.
One record in the data file has Time, then 64 bytes of data, then a CRC, then a new line delimiter. How should the metadata look to describe what is in the 64 bytes? I've never created a metadata file before.
Probably you want to generate a file which describes how many entries there are in the data file, and maybe the time range. Depending on what kind of data you have, the metadata might contain either a per-record entry (RawData, ImageData, etc.) or one global entry (data stored as float.)
It totally depends on what the Java-code is supposed to do, and what use-cases you have. If you want to know whether to open the file at all depending on date, that should be part of the metadata, etc.
I think that maybe you have the design backwards.
First, think about the end.
What result do you want to see? A Java program will create some kind of .csv file?
What kind(s) of file(s)?
What information will be needed to do this?
Then design the metadata to provide the information that is needed to perform the necessary tasks (and any extra tasks you anticipate).
Try to make the metadata extensible so that adding extra metadata in the future will not break the programs that you are writing now. e.g. if the Java program finds metadata it doesn't understand, it just skips it.

alternative for jasperreport in c++

from a c++ program i need to print a simple label. the label contains a text, an image and a barcode.(in my project the label is more complex, this is just for example)
my customer need a way to customize che label layout.
in the past in java I solve this problem using a report created with jasperreport. my customer customize the report with ireport and then i fill the data with an hashtable datasource (i never connect to an sql database)
anybody know a way to obtain something like this in java
really sorry for my scholastic english
Offhand, it's a bit hard to say -- most report generators assume some sort of database (SQL or at least accessible via ODBC) as the data source. I'd probably look into some that are free and include source code so you can change the data source (though I've no idea how difficult a modification that will be).
The other problem is that printing anything but plain text is somewhat non-portable; you'll need different code for Linux, Mac/OS or Windows. For Windows, one possibility would be Report Generator from CodeProject.com. If you want something more portable, you could use something like Xport to create XHTML output to be viewed in/printed from a browser (or any number of other programs that understand [X]HTML (there's also a commercial version). Of course, you could generate output in any number of other formats that support graphics, such as Postscript/PDF, LaTex, etc. This lets you use portable code to generate the report, but usually requires some non-portable code to invoke a viewer.