When should I use C++ instead of SQL? - c++

I am a C++ programmer who occasionally uses MySQL to work with databases, but my SQL knowledge is rather limited. However I am surely willing to change that.
At the moment I am trying to do analysis(!) on the data I have in my database solely with SQL queries. But I am about to give up, and instead import the data to C++ and do the analysis with C++ code.
I have discussed this with my colleagues, and they also push me to use C++, saying that SQL is not meant for complex analysis but mainly for importing (from the existing tables) and exporting (to new tables) data, and a little bit more such as merging data to - e.g. - joined tables.
Can somebody help me drawing a line? So I know when to switch to C++? Of course performance is also an issue.
What are indications that things get to complex in SQL? Or maybe I just take the wrong approach with designing the queries. Then where can I find tutorials, books, ... to take a better approach?
I hope this is not too vague. I am really a bit lost.

SQL excels at analyzing large sets of relational data.
The place to draw the line is the scale of your analysis.
If you analyze individual records one at a time, do it in your application.
If you analyze large sets of records as a unit, SQL is definitely the best tool for that job.
Row-by-row analysis is not something SQL is designed or optimized for very well. But, if you want to know something about a million-row group of data, do it in the database.

I have discussed this with my colleagues, and they also push me to use C++, saying that SQL is not meant for complex analysis but mainly for importing (from the existent tables) and exporting (to new tables) data, and a little bit more such as merging data to - e.g. - joined tables.
This is completely arbitrary. Learn SQL. There are a lot of resources available on the web for free.

You can do very complex analysis of data in SQL, provided you know how use the features that SQL offers.
SQL has features for doing relational operations, like joins and projections. Also for doing set operations like union, intersection, and restriction (subset). Also for doing basic arithmetic on numbers, like the four arithmetic operators, and built in functions like SQRT. Also statistical functions like COUNT, SUM, and AVG that can be combined with projections in very interesting ways. A good DBMS will let you extend the built in functions with your own functions written in C, C++ or maybe PL/SQL.
The power you get from these features depends on how well designed the database is. A well designed database conforms to the relational model, and should be relvant to your intended use of the data.
SQL code can be stored in the database in stored prodecures. It can be stored in SQL script files. And, as you already know, it can be embedded in application programs. In addition to SQL, you can use OLAP tools and report generators to do standard things with the data very easily.
The people who advise you to keep all of your processing in C++ sound like they have learned just enough to use a database like a big and stupid file system. A good DBMS is much more than that.

SQL is usually very efficient handling its own database (depends on the server implementation).
You should use queries to analyze the database.
The main reason for that would be the communication overhead.
Even if the server is on the local machine (remote servers would have obvious communication overhead), you'll still have to retrieve the stored information from the SQL server to your c++ program for analysis.
Now if you have 10000s of lines in the SQL you would have to get the SQL server to read them all and send them to your program where it would probably create a local copy of the data for you to work on.
If you let the SQL server do it with queries, you'll gain the complex optimizations it does according the kind of query you're executing, and in the end you can retrieve only a limited amount of data (the one you actually need) through the communication.

You made right decision to begin data analysis with SQL. Now, when you feel that your knowledge of SQL limits you, you have 2 choices: give up and switch back to familiar but not very efficient toolset (C++) or bring your level with SQL up.
It's possible that at some point SQL will become too complex too, but then C++ won't be the answer either - most likely some specialized tools.

In my opinion you should only perform analysis in C++ if no equivalent for the analysis function is provided by database server, As database servers are very smart and it is hard and almost imposible to beat the algorithm efficiency of analysis function of database server. Also bringing raw data to the application for performing analysis also includes lots of overheads.
If at some point plain SQL becomes overly complex native PL of the sever could be a good choice

I agree with JNK and Jochai, but disagree with Ascanio.
It's better to improve the knowledge in database systems.
Sql comes with it

So, this is something I've been thinking about and it seems to me that SQL, as just a platform/language for storing/manipulating data, should have no inherent advantage over a C++ or C library. It seems to me that theoretically you could build a C++ library just as efficient, if not more efficient, than SQL at doing this. In doing so, you would be able to build it from the ground up, in terms of how ints, chars, strings, and other data types are stored, and make it easier to interface with you particular application (like web development). You could even make it so that the queries could be done in a language like javascript (allowing web developers to focus on just learning one language really well).

Related

When should SQLite not be used for testing in Django if a different RDBMS(E.g. PostgreSQL) is used in production and development?

This article advises to use SQLite for tests even if you use another RDBMS (I use PostgreSQL) in development and production. I tried SQLite for one test case and it ran faster indeed (~18.8 times faster, 0.5s vs 9.4s!).
In which cases could using SQLite result in different test results than if I used PostgreSQL?
Only if I would test a piece of code that contains a raw SQL query?
Any time the query generator might produce queries that behave differently on different platforms despite the efforts of the query generator's platform abstractions. Differences in regular expressions, collations and sorting, different levels of strictness about aggregates and grouping, use of full-text search or other extension features, use of anything but the most utterly simple functions and operators, etc.
Also, as you noted, any time you run raw SQL.
It's moderately reasonable to run tests on SQLite during iterative development, but you really need to run them on the same DB you're going to deploy on before you push to production. Otherwise you'll get bitten by some query where different engines have different capabilities to prove transitive equality through joins and GROUP BY or are differently permissive of queries, so a query will work on one then fail on the other.
You should also test against PostgreSQL on a reasonable data set before pushing changes live in order to find obvious performance regressions that'll be an issue in production. It makes little sense to do this on SQLite, where often totally different queries will be fast or slow.
I'm surprised you're seeing the kind of speed difference you report. I'd want to look into why the tests run so much slower on PostgreSQL and what you can do about it, since in production it's clearly not going to have the same kind of performance difference. I wrote a bit about this in optimise PostgreSQL for fast testing.
The performance characteristics will be very different in most cases. Often faster. It's typically good for testing because the SQLite engine does not need to take into account multiple client access. SQLite only allowed one thread to access it at once. This greatly reduces a lot of the overhead and complexity compared to other RDBMSs.
As far as raw queries go, there are going to be lot of features that SQLite does not support compared to Postgres or another RDBMS. Stay away from raw queries as much as possible to keep your code portable. The exception will be when you need to optimize specific queries for production. In those cases you can keep a setting in settings.py to check if you are on production and run the generic filters instead of a raw query. There are many types of generic raw queries that will not need this sort of checking though.
Also, the fact that a SQLite DB is just a file, it makes it very simple to tear down and start over for testing.

Where to store SQL code for a C++ application?

We have a C++ application that utilizes some basic APIs to send raw queries to a MS SQL Server. Scattered through the various translation units in our program, we have simple 1-2 line queries as C++ strings, and every now and then you'll see more complex queries that can be over 20 lines.
I can't help but think that the larger queries, specifically the 20+ line ones, should not be embedded in C++ code as constant strings. I want to propose pulling these out into separate text files that are loaded on-demand by the C++ application, however I'm not sure if this is the best approach.
What design choices are typical for situations like this? I definitely feel there needs to be improvement, I just don't know if moving the SQL queries out into data files (text files) is the best idea.
You could make a DAL (Data Access Layer).
It would be the API that the rest of the program talks to. Then you can mess around and try anything and everything (Stored procedures, caching, etc.) without disturbing the main program.
Move them into their own files, or even into their own stored procedures. Queries embedded in the application cannot be changed without a recompile, and depending on your release procedures, that could severely impair your ability to respond to emergencies or deploy hot fixes. You could alter your app to cache the file contents, if you go down that road, and even periodically check the files for updates.
the best "design choice" - for many different reasons - is to use MSSQL stored procedures whenever/wherever possible.
I've seen code that segregates SQL queries into a common module, but I don't think there's much benefit to a common "queries module" (or a standalone text file) over having the SQL queries spelled out as string literals in the module that's calling them.
Stored procedures, on the other hand, increase modularity, enhance security, and can vastly improve performance.
IMHO...
I would leave the SQL embedded in the C++ functions that use it: it will be easier to read and understand what the code does.
If you have SQL queries scattered around your code I'd say that there is some problem with the overall structure of the classes you are using: you should have some (or even just one) 'low level' classes that handle the interaction with the database, and the rest of the code uses these classes.
I personally don't like using stored procedure: if you have to support a different database server the porting will be a pain, I never saw that much of a performance improvement and to understand what the code does you have to jump back and forth between the stored procedures and the C++.
It really depends, here are some notes:
1) If all your sql code resides in the application, then your application is pretty much self contained in terms of logic. This is good as you have done in the current application. In terms of speed, this can be a little slower as SQL will need to be parsed when when you run these queries(also depends if you used Prepared statements,etc which can speed it up).
2) The second approach is to put all SQL logic as stored procedures on the server. This is a very much preferred approach for even small SQL queries whether one line or not. You just build a DAL layer. In terms of performance this is very good, however the logic lives in two different systems, your C++ app and the SQL server. You will quite likely need to build a small utility application that can translate the stored procedures input and output to template code (be it C++ or any other) to make your life easier.
3) A mixed approach with the above two. I would not recommend this route.
You need to think about how these queries are likely to change over time, and compare it to how the related C++ code is likely to change. If the queries are relatively independent of the code, and have a higher likelihood of change, then I would either load them at runtime from separate files, or use stored procedures instead. That approach allows for changing the queries without recompiling the C++ code. On the other hand, if the queries are highly coupled to the C++ code, making a change in one likely to accompany a change in the other, I would keep the queries in the code. This approach makes a change more localized and less error prone.

Will the use of the built-in ORM in CF 9 increase db performance?

How will it or how will it not?
Appreciate it.
That's like asking if programming language A is faster than programming language B. The fact of the matter is that you can write poor code with either of them, and you can write good code with either of them.
As Stephen says, ORM is about improving development productivity - you don't have to pay the productivity cost of context switching between application code and SQL; and in some cases it offers application performance boosters.
However, if you're looking to "increase db performance" then ORM is not a silver bullet. I don't think that one (a silver bullet) exists.
Nothing can beat well written code (be it ORM or SQL) that has been analyzed and optimized.
Well no not really...
ORM is not about increasing the performance of your database. Its about how you manipulate that data on the application side.
It does have elements such as object caching built in which do help with performance of your application, but you still need to create a well structured and indexed database schema.

Even lighter than SQLite

I've been looking for a C++ SQL library implementation that is simple to hook in like SQLite, but faster and smaller. My projects are in games development and there's definitely a cutoff point between needing to pass the ACID test and wanting some extreme performance. I'm willing to move away from SQL string style queries, allowing it to be code driven, but I haven't found anything out there that provides SQL-like flexibility while also preferring performance over the ACID test.
I don't want to go re-inventing the wheel, and the idea of implementing an SQL library on my own is quite daunting, even if it's only going to be a simple subset of all the calls you could make.
I need the basic commands (SELECT, MODIFY, DELETE, INSERT, with JOIN, and WHERE), not data operations (like sorting, min, max, count) and don't need the database to be atomic, or even enforce consistency (I can use a real SQL service while I'm testing and debugging).
Are you sure that you have obtained the maximum speed available from SQLITE? Out of the box, SQLITE is extremely safe, but quite slow. If you know what you are doing, and are willing to risk db corruption on a disk crash, then there are several optimizations you can do that provide spectacular speed improvements.
In particular:
Switch off synchronization
Group writes into transactions
Index tables
Use database in memory
If you have not explored all of these, then you are likely running many times slower than you might.
I'm not sure you'll manage to find anything with better performances than SQL. Especially if you want operations like JOINs... Is SQLite speed really a problem? For simple requests it's usually faster than any full SGDB.
Don't you have an index problem?
About size, it's not event 1Meg extra in the binary file, so I'm a bit suprised it's a problem.
You can look at Berkeley DB which has to be probably the fastest DB available, but it's mostly only key->value database.
If you really need higher speed consider loading the whole database in memory (using SQLite again).
Take a look at gigabase and its twin fastdb.
You might want to consider Embedded innoDB. It offers the basic SQL functionality (obviously, see MySQL) but doesn't offer the actual SQL syntax (as that's part of MySQL, not innoDB). At 838KB, it's not too heavy.
If you just need those basic operations, you don't really need SQL. Take a look at NoSQL data storage, for example Tokyo Cabinet.
you can try leveldb, it's key/value store
http://code.google.com/p/leveldb

Trying to choose SQL API library

I am just beginning to learn how to write software that accesses an SQL server. It seems that each server implementation (Postgres, MySQL, etc.) offers API libraries for various languages (my code is in C and C++, though solutions for Java and Python would also interest me). I'm a little wary of depending on these libraries, however, because I'd prefer a vendor-neutral solution.
As near as I can tell, Microsoft's ODBC API was meant to solve such problems for C/C++ (and JDBC for Java); unixODBC seems to be one popular implementation. Am I right even so far?
Moreover, do any such libraries provide an object-oriented interface? It would be nice to not simply embed SQL queries into another, more featureful language; I'd like to have a wrapper that mimics the style of the rest of the language, too.
So is there a preferred solution along those lines? Am I asking for something weird?
As near as I can tell, Microsoft's ODBC API was meant to solve such problems for C/C++ (and JDBC for Java); unixODBC seems to be one popular implementation. Am I right even so far?
Yes. The equivalent of ODBC or JDBC for Python is called the DB-API. Perl's equivalent is called DBI.
Moreover, do any such libraries provide an object-oriented interface? It would be nice to not simply embed SQL queries into another, more featureful language; I'd like to have a wrapper that mimics the style of the rest of the language, too.
Yeah, there are a bunch of things like this for different languages. C# has LINQ, Smalltalk has Roe and GLORP, Python has SQLAlchemy and SQLObject (and Django in Python has quite a bit of query power built into its ORM (see Simon Willison's notes)), Ruby has ActiveRecord, and so on. I don't know what you'd use in C++ but I bet it has to use a lot of ugly template hacking to approach these.
All these choices might seem overwhelming, but chances are your choice of language will be shaped by something other than the convenience of working with relational data. (If not, you should consider Prolog.) That will probably tie you more or less to some ORM you hate just like the rest of us.
Indeed, ODBC/JDBC are libraries that help make the calling interface standard between vendors, but you're right that each respective RDBMS has its own flavor of SQL. ODBC/JDBC doesn't help abstract the SQL syntax.
One solution to move literal SQL out of your application code is to implement queries in stored procedures that reside in each database back-end, and then use ODBC/JDBC to call the stored procedures. You can define stored procedures with similar names and calling interface for each flavor of RDBMS you use. But be aware that the stored procedure language is also variable from one vendor to the next.
Another solution is to use an "object-relational mapping" technology such as Hibernate for Java, or NHibernate for .NET. These technologies can make it feel more "object-oriented" to work with databases, and free you from writing literal SQL in many cases.
But most ORM tools tends to focus on very simple queries. If your query is at all complex (using a GROUP BY or a JOIN for instance), using the ORM tool is harder than using literal SQL.
See also "Good ORM for C++ solutions?"
If SQL troubles you that much, you're probably not going to be happy using an RDBMS at all. Some programmers don't see the value to the Rules of Normalization, for instance. If that's true for you, you might want to look into the emerging technologies for non-relational data stores, including:
BerkeleyDB
Project Voldemort
CouchDB
ODBC/JDBC attempt to abstract away the database interface to provide a consistent programming model. Bear in mind that, by using such a least-common-denominator interface, you cannot take advantage of specific, non-standard features that a given DB may offer.
To get an object oriented interface to your data model, look into Object Relational Mapping (ORM) solutions such as Hibernate. ORM solutions map your objects to their representation in a relational database, generally making data persistence much simpler from an application programming perspective.
Quince is a C++ library that lets you use C++ syntax and C++ types with the feature set of SQL. Currently it supports PostgreSQL and sqlite only, but new backends can always be added. See quince-lib.com. (Full disclosure: I wrote it.)
Take a look at Qt. It is not a library, but a complete framework. It has a very excellent SQL module.
Qt SQL is an essential module which provides support for SQL
databases. Qt SQL's APIs are divided into different layers:
Driver layer
SQL API layer
User interface layer
http://doc.qt.io/qt-5/qtsql-index.html