DRYing c++ structure - c++

I have a simple c++ struct that is extensively used in a program. Now I wish to persist the structure in a sqlite database as individual fields (iow not as a blob).
What good ways are there to map the attributes of the struct to database columns?

Since C++ isn't not a very "dynamic" language, it is running short of the kinds of ORM's you might commonly find available in other languages that make this task light work.
Personally speaking, I've always ended up having to write very thin wrapper classes for each table manually. Basically, you need a structure that maps to each table and an accessor class to get data in and out of the table as needed.
The structures should have a field per column and you'll need methods for each database operation you want to perform (CRUD for example).

Some interpreted / scripting languages (PHP, etc) support "refection", where code can examine itself. That would allow a database framework to automatically serialize struct members to / from a database. Unfortunately, C/C++ do not natively support this. Therefore, unless you want to store it as a giant BLOB (which certainly has drawbacks), you will need to manually map each member of the struct to a db column.
The only tricky part (aside from time consuming), is to choose the db column type that best corresponds to the C data type. (char[] -> varchar, etc). As jkp suggested, it's nice to have a thin wrapper class to read / write each of your persistent structures.

Hard to answer in general. The easiest approach would be one column per attribute, that may or may not be appropriate for your application.
The other extreme would be to merge it all into one column, depending on how you are going to use the data stored.
Maybe use some other persistence framework? sqlite might not be the best solution here.

I like to use a one to one relationship between my data structure fields and data base fields. Where each record in the table represents a complete structure instance. The only exception is if it will cause excessive de-normalization in the table. Now to get the data to/from the database from the structure I implement a template class that takes the structure as template parameter. I then derive from the template and implement the get/set features of the structure to the database. I use the OTL library for all the real database IO. This makes the burden of a special class per structure type less intrusive.

I have created a system of Fields and Records, now based on the Composite Design Pattern. The Fields contain a method to return the field name and optionally the field type (for an SQL statement). I'm currently moving the SQL stuff out of the field and into a Visitor object.
The record contains a function to return the table name.
Using this scheme, I can create an SQL table without knowing the details of the fields or records. I just call polymorphic methods in the base class.
I've tried other techniques, but my code has evolved to this implementation.

Contrary to some of the other answers, I say it is possible for this task to be automated. E.g. take a look at quince (http://quince-lib.com). It lets you do stuff like this:
struct point {
float x;
float y;
};
QUINCE_MAP_CLASS(point, (x)(y))
extern database db;
table<point> points(db, "points");
(Full disclosure: I wrote quince.)

Related

C++, creating classes in runtime

I have a query, I have set of flat files ( say file1, file2 etc) containing column names and native data types. ( how values are stored and can be read in c++ is elementary)
eg. flat file file1 may have data like
col1_name=id, col1_type=integer, col2_name=Name, col2_type=string and so on.
So for each flat file I need to create C++ data structure ( i.e 1 flat file = 1 data structure) where the member variable name is same name as column name and its data type will be of C++ native data type like int, float, string etc. according to column type in flat file.
from above eg: my flat file 1 should give me below declaration
class file1{
int id;
string Name;
};
Is there a way I can write code in C++, where binary once created will read the flat file and create data structure based on the file ( class name will be same as flat file name). All the classes created using these flat files will have common functionality of getter and setter member functions.
Do let me know if you have done something similar earlier or have any idea for this.
No, not easily (see the other answers for reasons why not).
I would suggest having a look at Python instead for this kind of problem. Python's type system combined with its ethos of using try/except lends itself more easily to the challenge of parsing data.
If you really must use C++, then you might find a solution using the dynamic properties feature of Qt's QObject class, combined with the QVariant class. Although this would do what you want, I would add a warning that this is getting kind of heavy-weight and may over-complicate your task.
No, not directly. C++ is a compiled language. The code for every class is created by the compiler.
You would need a two-step process. First, write a program that reads those files and translates them into a .cpp file. Second, pass those .cpp files to a compiler.
C++ classes are pure compile-time concepts and have no meaning at runtime, so they cannot be created. However, you could just go with
std::vector<std::string> fields;
and parse as necessary in your accessor functions.
No, but from what I can tell, you have to be able to store the names of multiple columns. What you can do is have a member variable map or unordered_map which you can index with a string - the name of the column - and get some data (like a column object or something) back. That way you can do
obj.Columns["Name"]
I'm not sure there's a design pattern to this, but if your list of possible type names is finite, and known at compile time, can't you declare all those classes in your program before running, and then just instantiate them based on the data in the files?
What you actually want is a field whose exact nature varies at runtime.
There are several approaches, including Boost.Any, but because of the static nature of C++ type system only 2 are really recommended, and both require to have beforehand an idea of all the possible data types that may be required.
The first approach is typical:
Object base type
Int, String, Date whatever derived types
and the use of polymorphism.
The second requires a bit of Boost magic: boost::variant<int, std::string, date>.
Once you have the "variant" part covered, you need to implement visitation to distinguish between the different possible types. Typical visitors for the traditional object-oriented approach or simply boost::static_visitor<> and boost::apply_visitor combinations for the boost approach.
It's fairly straightforward.

What is the Python counterpart to an Ada record / C++ struct type?

Suppose I am recording data and want to associate some number of data elements, such that each recorded set always has a fixed composition, i.e. no missing fields.
Most of my experience as a programmer is with Ada or C/C++ variants. In Ada, I would use a record type and aggregate assignment, so that when the record type was updated with new fields, anyone using the record would be notified by the compiler. In C++, chances are I would use a storage class and constructor to do something similar.
What is the appropriate way to handle a similar situation in Python? Is this a case where classes are the proper answer, or is there a lighter weight analog to the Ada record?
An additional thought, both Ada records and C++ constructors allow for default initialization values. Is there a Python solution to the above question which provides that capability as well?
A namedtuple (from the collections library) may suit your purposes. It is basically a tuple which allows reference to fields by name as well as index position. So it's a fixed structure of ordered named fields. It's also lightweight in that it uses slots to define field names thus eliminating the need to carry a dictionary in every instance.
A typical use case is to define a point:
from collections import namedtuple
Point = namedtuple("Point", "x y")
p1 = Point(x=11, y=22)
It's main drawback is that, being a tuple, it is immutable. But there is a method, replace which allows you to replace one or more fields with new values, but a new instance is created in the process.
There is also a mutable version of namedtuple available at ActiveState Python Recipes 576555 called records which permits direct field changes. I've used it and can vouch that it works well.
A dictionary is the classical way to do this in Python. It can't enforce that a value must exist though, and doesn't do initial values.
config = {'maxusers': 20, 'port': 2345, 'quota': 20480000}
collections.namedtuple() is another option in versions of Python that support it.

How to query abstract-class-based objects in Django?

Let's say I have an abstract base class that looks like this:
class StellarObject(BaseModel):
title = models.CharField(max_length=255)
description = models.TextField()
slug = models.SlugField(blank=True, null=True)
class Meta:
abstract = True
Now, let's say I have two actual database classes that inherit from StellarObject
class Planet(StellarObject):
type = models.CharField(max_length=50)
size = models.IntegerField(max_length=10)
class Star(StellarObject):
mass = models.IntegerField(max_length=10)
So far, so good. If I want to get Planets or Stars, all I do is this:
Thing.objects.all() #or
Thing.objects.filter() #or count(), etc...
But what if I want to get ALL StellarObjects? If I do:
StellarObject.objects.all()
It of course returns an error, because an abstract class isn't an actual database object, and therefore cannot be queried. Everything I've read says I need to do two queries, one each on Planets and Stars, and then merge them. That seems horribly inefficient. Is that the only way?
At its root, this is part of the mismatch between objects and relational databases. The ORM does a great job in abstracting out the differences, but sometimes you just come up against them anyway.
Basically, you have to choose between abstract inheritance, in which case there is no database relationship between the two classes, or multi-table inheritance, which keeps the database relationship at a cost of efficiency (an extra database join) for each query.
You can't query abstract base classes. For multi-table inheritance you can use django-model-utils and it's InheritanceManager, which extends standard QuerySet with select_subclasses() method, which does right that you need: it left-joins all inherited tables and returns appropriate type instance for each row.
Don't use an abstract base class if you need to query on the base. Use a concrete base class instead.
This is an example of polymorphism in your models (polymorph - many forms of one).
Option 1 - If there's only one place you deal with this:
For the sake of a little bit of if-else code in one or two places, just deal with it manually - it'll probably be much quicker and clearer in terms of dev/maintenance (i.e. maybe worth it unless these queries are seriously hammering your database - that's your judgement call and depends on circumstance).
Option 2 - If you do this quite a bit, or really demand elegance in your query syntax:
Luckily there's a library to deal with polymorphism in django, django-polymorphic - those docs will show you how to do this precisely. This is probably the "right answer" for querying straightforwardly as you've described, especially if you want to do model inheritance in lots of places.
Option 3 - If you want a halfway house:
This kind of has the drawbacks of both of the above, but I've used it successfully in the past to automatically do all the zipping together from multiple query sets, whilst keeping the benefits of having one query set object containing both types of models.
Check out django-querysetsequence which manages the merge of multiple query sets together.
It's not as well supported or as stable as django-polymorphic, but worth a mention nevertheless.
In this case I think there's no other way.
For optimization, you could avoid inheritance from abstract StellarObject and use it as separate table connected via FK to Star and Planet objects.
That way both of them would have ie. star.stellar_info.description.
Other way would be to add additional model for handling information and using StellarObject as through in many2many relation.
I would consider moving away from either an abstract inheritance pattern or the concrete base pattern if you're looking to tie distinct sub-class behaviors to the objects based on their respective child class.
When you query via the parent class -- which it sounds like you want to do -- Django treats the resulting ojects as objects of the parent class, so accessing child-class-level methods requires re-casting the objects into their 'proper' child class on the fly so they can see those methods... at which point a series of if statements hanging off a parent-class-level method would arguably be a cleaner approach.
If the sub-class behavior described above isn't an issue, you could consider a custom manager attached to an abstract base class sewing the models together via raw SQL.
If you're interested mainly in assigning a discrete set of identical data fields to a bunch of objects, I'd relate along a foreign-key, like bx2 suggests.
That seems horribly inefficient. Is that the only way?
As far as I know it is the only way with Django's ORM. As implemented currently abstract classes are a convenient mechanism for abstracting common attributes of classes out to super classes. The ORM does not provide a similar abstraction for querying.
You'd be better off using another mechanism for implementing hierarchy in the database. One way to do this would be to use a single table and "tag" rows using type. Or you can implement a generic foreign key to another model that holds properties (the latter doesn't sound right even to me).

Most efficient way to add data to an instance

I have a class, let's say Person, which is managed by another class/module, let's say PersonPool.
I have another module in my application, let's say module M, that wants to associate information with a person, in the most efficient way. I considered the following alternatives:
Add a data member to Person, which is accessed by the other part of the application. Advantage is that it is probably the fastest way. Disadvantage is that this is quite invasive. Person doesn't need to know anything about this extra data, and if I want to shield this data member from other modules, I need to make it private and make module M a friend, which I don't like.
Add a 'generic' property bag to Person, in which other modules can add additional properties. Advantage is that it's not invasive (besides having the property bag), and it's easy to add 'properties' by other modules as well. Disadvantage is that it is much slower than simply getting the value directly from Person.
Use a map/hashmap in module M, which maps the Person (pointer, id) to the value we want to store. This looks like the best solution in terms of separation of data, but again is much slower.
Give each person a unique number and make sure that no two persons ever get the same number during history (I don't even want to have these persons reuse a number, because then data of an old person may be mixed up with the data of a new person). Then the external module can simply use a vector to map the person's unique number to the specific data. Advantage is that we don't invade the Person class with data it doesn't need to know of (except his unique nubmer), and that we have a quick way of getting the data specifically for module M from the vector. Disadvantage is that the vector may become really big if lots of persons are deleted and created (because we don't want to reuse the unique number).
In the last alternative, the problem could be solved by using a sparse vector, but I don't know if there are very efficient implementations of a sparse vector (faster than a map/hashmap).
Are there other ways of getting this done?
Or is there an efficient sparse vector that might solve the memory problem of the last alternative?
I would time the solution with map/hashmap and go with it if it performs good enough. Otherwise you have no choice but add those properties to the class as this is the most efficient way.
Alternatively, you can create a subclass of Person, basically forward all the interface methods to the original class but add all the properties you want and just change original Person to your own modified one during some of the calls to M.
This way module M will see the subclass and all the properties it needs but all other modules would think of it as just an instance of Person class and will not be able to see your custom properties.
The first and third are reasonably common techniques. The second is how dynamic programming languages such as Python and Javascript implement member data for objects, so do not dismiss it out of hand as impossibly slow. The fourth is in the same ballpark as how relational databases work. It is possible, but difficult, to make relational databases run the like the clappers.
In short, you've described 4 widely used techniques. The only way to rule any of them out is with details specific to your problem (required performance, number of Persons, number of properties, number of modules in your code that will want to do this, etc), and corresponding measurements.
Another possibility is for module M to define a class which inherits from Person, and adds extra data members. The principle here is that M's idea of a person differs from Person's idea of a person, so describe M's idea as a class. Of course this only works if all other modules operating on the same Person objects are doing so via polymorphism, and furthermore if M can be made responsible for creating the objects (perhaps via dependency injection of a factory). That's quite a big "if". An even bigger one, if nothing other than M needs to do anything life-cycle-ish with the objects, then you may be able to use composition or private inheritance in preference to public inheritance. But none of it is any use if module N is going to create a collection of Persons, and then module M wants to attach extra data to them.

Parsing huge data with c++

In my job, i need to parse different kind of data files from different data sources.Sometimes i parse them by writing directly c++ code (with the help of qt and boost:D), sometimes manually with a helper program.
I must note that data types are so different from each other it is so hard to create common a interface for all of them. But i want to do this job in a more generic way.I am planning to write a library to convert them and it should be easy to add new parser utility in future.I am also planning to use other helper programs inside my program, not manually.
My question is what kind of an architecture or pattern do you suggest, Basic condition is library must be extendable via new classes or dll's and also configurable.
By the way data can be in text, ascii or something like CSV(comma seperated values) and most of them are specific for a certain data.
Not to blow my own trumpet, but my small Open Source utility CSVfix has an extensible architecture based on deriving new C++ classes with a very simple interface. I did consider using a plugin-architecture with DLLs but it seemed like overkill for such a simple utility . If interested, you can get the binaries & sources here.
I'd suggest a 3-part model, where the common data-format is a String which should be able to contain every value:
Reader: In this layer the values are read from the source (ie. CSV-file) using some sort of file-format-descriptor. The values are then stored in some sort of intermediate data structure.
Connector/Converter: This layer is responsible for mapping the reader-data to the writer-fields.
Writer: This layer is responsible for writing a specific data structure to the target (ie. another file-format or a database).
This way you can write different Readers for different input files.
I think the hardest part would be creating the definition of the intermediate storage format/structure so that it is future-proof and flexible.
One method I used for defining data structure in my datafile read/write classes is to use std::map<std::string, std::vector<std::string>, string_compare> where the key is the variable name and the vector of strings is the data. While this is expensive in memory, it does not lock me down to only numeric data. And, this method allows for different lengths of data within the same file.
I had the base class implement this generic storage, while the derived classes implemented the reader/writer capability. I then used a factory to get to the desired handler, using another class that determined the file format.