C++ data structures instead of database - c++

I have written a C++ program that using database with 5 tables(and thousands of row). But there is performance issue in that approach. Now, I will try to use C++ data types and functions for that. Which one is best for that situation. I am considering to use struct in vectors. Also I can use libraries for that if exists. Can they solve my problem?
By the way my DB is PostgreSQL.

Take a look at Berkeley DB, it is a popular embedded database that has good performance.
Latest versions offer a SQL-like API, this can simplify the migration of your code from your a classical SQL database.
However, be careful of the license terms of this library. It is available in both a GPL-like license and commercial license (i.e. no LGLP-like license is available). Depending of your application (and your wallet), this may be an issue.

Related

What are the beneifts of using a database abstraction layer?

I've been using some code that implements the phpBB DBAL for some time. Recently I had to implement a more full package around it and decided to use the DBAL throughout. In the main, it's been OK. But occassionally there are circumstances where I can't see the logic in using it. It seems to make the simple much more complicated.
What benefits does a DBAL offer rather then writing sql statements directly?
From wikipedia (http://en.wikipedia.org/wiki/Database_abstraction_layer) :
API level abstraction
Libraries like OpenDBX unify access to databases by providing a single low-level programming interface to the application developer. Their advantages are most often speed and flexibility because they are not tied to a specific query language (subset) and only have to implement a thin layer to reach their goal. The application developer can choose from all language features but has to provide configurable statements for querying or changing tables. Otherwise his application would also be tied to one database.
When cooking a dish, you do not want several chefs having access to the pot. They could all be adding spices unaware that another chef had already added a spice. Ideally, you want a single chef that would serve as a single point of access to avoid spoiling the soup.
The same with databases. A single point of access can avoid problems of multiple services accessing the data in different ways.

When should I use C++ instead of SQL?

I am a C++ programmer who occasionally uses MySQL to work with databases, but my SQL knowledge is rather limited. However I am surely willing to change that.
At the moment I am trying to do analysis(!) on the data I have in my database solely with SQL queries. But I am about to give up, and instead import the data to C++ and do the analysis with C++ code.
I have discussed this with my colleagues, and they also push me to use C++, saying that SQL is not meant for complex analysis but mainly for importing (from the existing tables) and exporting (to new tables) data, and a little bit more such as merging data to - e.g. - joined tables.
Can somebody help me drawing a line? So I know when to switch to C++? Of course performance is also an issue.
What are indications that things get to complex in SQL? Or maybe I just take the wrong approach with designing the queries. Then where can I find tutorials, books, ... to take a better approach?
I hope this is not too vague. I am really a bit lost.
SQL excels at analyzing large sets of relational data.
The place to draw the line is the scale of your analysis.
If you analyze individual records one at a time, do it in your application.
If you analyze large sets of records as a unit, SQL is definitely the best tool for that job.
Row-by-row analysis is not something SQL is designed or optimized for very well. But, if you want to know something about a million-row group of data, do it in the database.
I have discussed this with my colleagues, and they also push me to use C++, saying that SQL is not meant for complex analysis but mainly for importing (from the existent tables) and exporting (to new tables) data, and a little bit more such as merging data to - e.g. - joined tables.
This is completely arbitrary. Learn SQL. There are a lot of resources available on the web for free.
You can do very complex analysis of data in SQL, provided you know how use the features that SQL offers.
SQL has features for doing relational operations, like joins and projections. Also for doing set operations like union, intersection, and restriction (subset). Also for doing basic arithmetic on numbers, like the four arithmetic operators, and built in functions like SQRT. Also statistical functions like COUNT, SUM, and AVG that can be combined with projections in very interesting ways. A good DBMS will let you extend the built in functions with your own functions written in C, C++ or maybe PL/SQL.
The power you get from these features depends on how well designed the database is. A well designed database conforms to the relational model, and should be relvant to your intended use of the data.
SQL code can be stored in the database in stored prodecures. It can be stored in SQL script files. And, as you already know, it can be embedded in application programs. In addition to SQL, you can use OLAP tools and report generators to do standard things with the data very easily.
The people who advise you to keep all of your processing in C++ sound like they have learned just enough to use a database like a big and stupid file system. A good DBMS is much more than that.
SQL is usually very efficient handling its own database (depends on the server implementation).
You should use queries to analyze the database.
The main reason for that would be the communication overhead.
Even if the server is on the local machine (remote servers would have obvious communication overhead), you'll still have to retrieve the stored information from the SQL server to your c++ program for analysis.
Now if you have 10000s of lines in the SQL you would have to get the SQL server to read them all and send them to your program where it would probably create a local copy of the data for you to work on.
If you let the SQL server do it with queries, you'll gain the complex optimizations it does according the kind of query you're executing, and in the end you can retrieve only a limited amount of data (the one you actually need) through the communication.
You made right decision to begin data analysis with SQL. Now, when you feel that your knowledge of SQL limits you, you have 2 choices: give up and switch back to familiar but not very efficient toolset (C++) or bring your level with SQL up.
It's possible that at some point SQL will become too complex too, but then C++ won't be the answer either - most likely some specialized tools.
In my opinion you should only perform analysis in C++ if no equivalent for the analysis function is provided by database server, As database servers are very smart and it is hard and almost imposible to beat the algorithm efficiency of analysis function of database server. Also bringing raw data to the application for performing analysis also includes lots of overheads.
If at some point plain SQL becomes overly complex native PL of the sever could be a good choice
I agree with JNK and Jochai, but disagree with Ascanio.
It's better to improve the knowledge in database systems.
Sql comes with it
So, this is something I've been thinking about and it seems to me that SQL, as just a platform/language for storing/manipulating data, should have no inherent advantage over a C++ or C library. It seems to me that theoretically you could build a C++ library just as efficient, if not more efficient, than SQL at doing this. In doing so, you would be able to build it from the ground up, in terms of how ints, chars, strings, and other data types are stored, and make it easier to interface with you particular application (like web development). You could even make it so that the queries could be done in a language like javascript (allowing web developers to focus on just learning one language really well).

A non-relational embedded database with a permissive free software license?

many thanks in advance for taking the time to look at my question.
(I am aware of this question Nonrelational Databases for C++, but my needs are a bit different and it only has one answer.)
I am developing a commercial C++ library that must, among other things, persist messages. I would like to avoid reinventing the wheel by writing my own DBMS. Unfortunately, I have the following restricting criteria:
It must be usable from C++ - I'm writing a C++ library. Bindings are potentially acceptable, if the level of effort to make them work isn't too high.
I need an embedded database. Stand-alone will not work.
I want to avoid a relational database. In addition to concerns about performance overhead, there are technical politics beyond my control as a developer that discourage a relational database.
I need a permissive free software license. It'll be hard to buy licenses, but the client doesn't want to give his source away.
I'd like a solution that's established (been around for at least a little while, beyond the experimental stage, has been used by several projects).
Sadly, the two go-to choices don't work because of the above:
-SQLite is relational
-BerkeleyDB is GPL or commercial
Again, thanks for any help.
Use SQLite in b-tree mode. Public domain. Avoids politics. Let's you work around the political issues by avoiding the SQL interface for performance-critical paths, and optionally using the SQL parser path for those queries that are not on the critical path.
Both Tokyo Cabinet and QDBM are LGPL and have C APIs.

Open source libraries for abstracting database access in C++?

I'm looking for options for abstracting database server details away from my application (in c++), I'd like to write my code to be independent of the actual database backend. I know MySQL has a nice library, but I don't want to be tied to a single database implementation. Are there good options for this?
SOCI is good. Supports multiple databases, works well, modern C++ style API, works with boost.
My opinion is to forget about a cross-database driver, and focus on finding or creating a cross-database Data Access Layer. A few reaons:
Complex queries (read: anything that's not a toy) invariably end up using one or two database-specific features. LIMIT and OFFSET for example, commonly used for paging, isn't universal.
Sooner or later you'll want bulk insertion, and you'll want it to be as fast as possible, because 3 hours is better than 6 hours. Every database has a different "optimum" way to do this, so your DAL will need to special-case this anyways.
Different databases may expose different constraint mechanisms—even custom column types—that can be be worth taking advantage of where possible (PostgreSQL is wonderful for this).
If you want to do any application level caching, you'll need a DAL anyways.
So, go ahead and use libmysql by itself - just hide it behind a compiler firewall in your DAL, and be prepared to swap it out later. You can protect yourself from shifting infrastructure without having to use a lowest-common-denominator SQL wrapper.
If that doesn't jive with you, check out SQLAPI++.
many apps use odbc (via unixODBC for instance), there's also otl. on windows you could use ado.net from managed c++ or the old ado com interfaces...
Qt provides a database abstraction layer. See: http://doc.trolltech.com/4.6/qsqldatabase.html.
libodbc++ provides a pretty good API.
Also the big guys Qt (see Kyle Lutz' answer) & wxWidgets have db abstraction layers, so it may be a good idea to use them if you plan to use/you're already using any other parts of those frameworks.
OpenDBX and libzdb are two lightweight candidates. Libgda for GNOME.

Trying to choose SQL API library

I am just beginning to learn how to write software that accesses an SQL server. It seems that each server implementation (Postgres, MySQL, etc.) offers API libraries for various languages (my code is in C and C++, though solutions for Java and Python would also interest me). I'm a little wary of depending on these libraries, however, because I'd prefer a vendor-neutral solution.
As near as I can tell, Microsoft's ODBC API was meant to solve such problems for C/C++ (and JDBC for Java); unixODBC seems to be one popular implementation. Am I right even so far?
Moreover, do any such libraries provide an object-oriented interface? It would be nice to not simply embed SQL queries into another, more featureful language; I'd like to have a wrapper that mimics the style of the rest of the language, too.
So is there a preferred solution along those lines? Am I asking for something weird?
As near as I can tell, Microsoft's ODBC API was meant to solve such problems for C/C++ (and JDBC for Java); unixODBC seems to be one popular implementation. Am I right even so far?
Yes. The equivalent of ODBC or JDBC for Python is called the DB-API. Perl's equivalent is called DBI.
Moreover, do any such libraries provide an object-oriented interface? It would be nice to not simply embed SQL queries into another, more featureful language; I'd like to have a wrapper that mimics the style of the rest of the language, too.
Yeah, there are a bunch of things like this for different languages. C# has LINQ, Smalltalk has Roe and GLORP, Python has SQLAlchemy and SQLObject (and Django in Python has quite a bit of query power built into its ORM (see Simon Willison's notes)), Ruby has ActiveRecord, and so on. I don't know what you'd use in C++ but I bet it has to use a lot of ugly template hacking to approach these.
All these choices might seem overwhelming, but chances are your choice of language will be shaped by something other than the convenience of working with relational data. (If not, you should consider Prolog.) That will probably tie you more or less to some ORM you hate just like the rest of us.
Indeed, ODBC/JDBC are libraries that help make the calling interface standard between vendors, but you're right that each respective RDBMS has its own flavor of SQL. ODBC/JDBC doesn't help abstract the SQL syntax.
One solution to move literal SQL out of your application code is to implement queries in stored procedures that reside in each database back-end, and then use ODBC/JDBC to call the stored procedures. You can define stored procedures with similar names and calling interface for each flavor of RDBMS you use. But be aware that the stored procedure language is also variable from one vendor to the next.
Another solution is to use an "object-relational mapping" technology such as Hibernate for Java, or NHibernate for .NET. These technologies can make it feel more "object-oriented" to work with databases, and free you from writing literal SQL in many cases.
But most ORM tools tends to focus on very simple queries. If your query is at all complex (using a GROUP BY or a JOIN for instance), using the ORM tool is harder than using literal SQL.
See also "Good ORM for C++ solutions?"
If SQL troubles you that much, you're probably not going to be happy using an RDBMS at all. Some programmers don't see the value to the Rules of Normalization, for instance. If that's true for you, you might want to look into the emerging technologies for non-relational data stores, including:
BerkeleyDB
Project Voldemort
CouchDB
ODBC/JDBC attempt to abstract away the database interface to provide a consistent programming model. Bear in mind that, by using such a least-common-denominator interface, you cannot take advantage of specific, non-standard features that a given DB may offer.
To get an object oriented interface to your data model, look into Object Relational Mapping (ORM) solutions such as Hibernate. ORM solutions map your objects to their representation in a relational database, generally making data persistence much simpler from an application programming perspective.
Quince is a C++ library that lets you use C++ syntax and C++ types with the feature set of SQL. Currently it supports PostgreSQL and sqlite only, but new backends can always be added. See quince-lib.com. (Full disclosure: I wrote it.)
Take a look at Qt. It is not a library, but a complete framework. It has a very excellent SQL module.
Qt SQL is an essential module which provides support for SQL
databases. Qt SQL's APIs are divided into different layers:
Driver layer
SQL API layer
User interface layer
http://doc.qt.io/qt-5/qtsql-index.html