Sqlite as a replacement for fopen()? - c++

On an official sqlite3 web page there is written that I should think about sqlite as a replacement of fopen() function.
What do you think about it? Is it always good solution to replece application internal data storage with sqlite? What are the pluses and the minuses of such solution?
Do you have some experience in it?
EDIT:
How about your experience? Is it easy to use? Was it painful or rather joyful? Do you like it?

It depends. There are some contra-indications:
for configuration files, use of plain text or XML is much easier to debug or to alter than using a relational database, even one as lightweight as SQLite.
tree structures are easier to describe using (for example) XML than by using relational tables
the SQLite API is quite badly documented - there are not enough examples, and the hyperlinking is poor. OTOH, the information is all there if you care to dig for it.
use of app-specific binary formats directly will be faster than storing same format as a BLOB in a database
database corruption can mean the los of all your data rather than that in a single bad file
OTOH, if your internal data fits in well with the relational model and if there is a a lot of it, I'd recommend SQLite - I use it myself for one of my projects.
Regarding experience - I use it, it works well and is easy to integrate with existing code. If the documentation were easier to navigate I'd give it 5 stars - as it is I'd give it four.

As always it depends, there are no "one size fits all" solutions
If you need to store data in a stand-alone file and you can take advantage of relational database capabilities of an SQL database than SQLite is great.
If your data is not a good fit for a relational model (hierarchical data for example) or you want your data to be humanly readable (config files) or you need to interoperate with another system than SQLite won't be very helpful and XML might be better.
If on the other hand you need to access the data from multiple programs or computers at the same time than again SQLite is not an optimal choice and you need a "real" database server (MS SQL, Oracle, MySQL, PosgreSQL ...).

The atomicity of SQLite is a plus. Knowing that if you half-way write some data(maybe crash in the middle), that it won't corrupt your data file. I normally accomplish something similar with xml config files by backing up the file on a successful load, and any future failed load(indicating corruption) automatically restores the last backup. Of course it's not as granular nor is it atomic, but it is sufficient for my desires.

I find SQLite a pleasure to work with, but I would not consider it a one-size-fits-all replacement for fopen().
As an example, I just wrote a piece of software that's downloading images from a web server and caching them locally. Storing them as individual files, I can watch them in Windows Explorer, which certainly has benefits. But I need to keep an index that maps between a URL and the image file in order to use the cache.
Storing them in a SQLite database, they all sit in one neat little file, and I can access them by URL (select imgdata from cache where url='http://foo.bar.jpg') with little effort.

Related

Design of data storage C++ application (maybe relational database)

I need to store and load some data in a C++ application. This data is basically going to end up as a set of tables as per a relational database.
I write the data to tables using something like csv format, then parse them myself and apply the database logic I need in my C++ code. But it seems stupid to reinvent the wheel with this and end up effectively writing my own database engine.
However, using something like a MySQL database seems like massive overkill for what is going to be a single user local system. I have tried setting up a MySQL daemon on my Windows system and I found it rather complex and possibly even impossible without admin privileges. It would be a serious obstacle to deployment as it would need each user's system to have MySQL set up and running.
Is there a reasonable middle ground solution? Something that can provide me with a simple database, accessible from C++, without all the complexities of setting up a full MySQL install?
NB. I have edited this question such that I hope it satisfies those who have voted to close the question. I am not asking for a recommendation for a tool, or someone's favourite tool or the best tools. That would be asking which database engine should I use. I am asking for what tools and design patterns are available to solve a specific programming problem - i.e. how can I get access to database like functionality from a C++ program, without writing my own database engine, nor setting up a full database server. This is conceptually no different to asking e.g. How do I print out the contents of a vector? - it's just a bigger problem. I have described the problem and what has been done so far to solve it. My understanding from the On Topic Page is that this is within scope.
You can try sqlite.
Here are some simple code examples: https://www.tutorialspoint.com/sqlite/sqlite_c_cpp.htm

sqlite: Encrypting in-memory database before it touches secondary storage

I am musing about the possibility of switching from a C++ runtime relational database (std::maps for indexes, std::vectors for columns, etc) to using an in-memory sqlite database. This would mainly be for code maintenance. (Going back and looking at my code months later took my some time to re-acquaint myself with the schema I designed using C++ components; using SQL statements would at least be more self-documenting.)
Here's the hurdle, though: The data cannot touch secondary storage without being encrypted. My current solution lends itself quite well to encryption before flushing. Sqlite, however, appears to be its own opaque, self-contained ecosystem that isn't very conducive to manipulating the underlying database binary data as anything but a database.
So, I'm curious if anybody knows of a way, or has developed one, to put an encrypting hook in the sqlite backup system such that: memory -> encryption -> disc, and then in reverse. Alternatively, I would also be fine just grabbing the in-memory database as a binary "blob" and do my own thing to it before I write it out myself (and reverse the process when loading it back in). I don't see any such opportunities explicitly available in the sqlite backup API.
Note: I am not interested in using the SEE stuff, only the "public domain" sqlite offering. I have sufficient cryptographic code in use already (via Qt) within my application, so I don't want competing systems.
Thanks for any suggestions.

Django/Sqlite Improve Database performance

We are developing an online school diary application using django. The prototype is ready and the project will go live next year with about 500 students.
Initially we used sqlite and hoped that for the initial implementation this would perform well enough.
The data tables are such that to obtain details of a school day (periods, classes, teachers, classrooms, many tables are used and the database access takes 67ms on a reasonably fast PC.
Most of the data is static once the year starts with perhaps minor changes to classrooms. I thought of extracting the timetable for each student for each term day so no table joins would be needed. I put this data into a text file for one student, the file is 100K in size. The time taken to read this data and process it for a days timetable is about 8ms. If I pre-load the data on login and store it in sessions it takes 7ms at login and 2ms for each query.
With 500 students what would be the impact on the web server using this approach and what other options are there (putting the student text files into a sort of memory cache rather than session for example?)
There will not be a great deal of data entry, students adding notes, teachers likewise, so it will mostly be checking the timetable status and looking to see what events exist for that day or week.
What is your expected response time, and what is your expected number of requests per minute? One twentieth of a second for the database access (which is likely to be slow part) for a request doesn't sound like a problem to me. SQLite should perform fine in a read-mostly situation like this. So I'm not convinced you even have a performance problem.
If you want faster response you could consider:
First, ensuring that you have the best response time by checking your indexes and profiling individual retrievals to look for performance bottlenecks.
Pre-computing the static parts of the system and storing the HTML. You can put the HTML right back into the database or store it as disk files.
Using the database as a backing store only (to preserve state of the system when the server is down) and reading the entire thing into in-memory structures at system start-up. This eliminates disk access for the data, although it limits you to one physical server.
This sounds like premature optimization. 67ms is scarcely longer than the ~50ms where we humans can observe that there was a delay.
SQLite's representation of your data is going to be more efficient than a text format, and unlike a text file that you have to parse, the operating system can efficiently cache just the portions of your database that you're actually using in RAM.
You can lock down ~50MB of RAM to cache a parsed representation of the data for all the students, but you'll probably get better performance using that RAM for something else, like the OS disk cache.
I agree with some of other answers which suggest to use MySQL or PostgreSQL instead of SQLite. It is not designed to be used as production db. It is great for storing data for one-user applications such as mobile apps or even a desktop application, but it falls short very quickly in server applications. With Django it is trivial to switch to any other full-pledges database backend.
If you switch to one of those, you should not really have any performance issues, especially if you will do all the necessary joins using select_related and prefetch_related.
If you will still need more performance, considering that "most of the data is static", you actually might want to convert Django site a static site (a collection of html files) and then serve those using nginx or something similar to that. The simplest way I can think of doing that is to just write a cron-job which will loop over all needed url-configs, request the page from Django and then save that as an html file. If you want to go into that direction, you also might want to take a look at Python's static site generators: Hyde and Pelican.
This approach will certainly work much faster then any caching system however you will loose any dynamic components of the site. If you need them, then caching seems like the best and fastest solution.
You should use MySQL or PostgreSQL for your production database. sqlite3 isn't a good idea.
You should also avoid pre-loading data on login. Since your records can be inserted in advance, write django management commands and run the import to your chosen database before hand and design your models such that when a user logs in, the user would already be able to access and view/edit his or her related data (which are pre-inserted before the application even goes live). Hardcoding data operations when log in does not smell right at all from an application design point-of-view.
https://docs.djangoproject.com/en/dev/howto/custom-management-commands/
The benefit of designing your django models and using custom management commands to insert the records right way before your application goes live implies that you can use django orm to make the appropriate relationships between users and their records.
I suspect - based on your description of what you need above - that you need to re-look at the approach you are creating this application.
With 500 students, we shouldn't even be talking about caching. If you want response speed, you should deal with the following issues in priority:-
Use a production quality database
Design your application use case correctly and design your application model right
Pre-load any data you need to the production database
front end optimization comes first (css/js compression etc)
use django debug toolbar to figure out if any of your sql is slow and optimize specifically those
implement caching (memcached etc) as needed
As a general guideline.

Database in a C++ Program

I've got a C++ program that needs to deal with a lot of typical database problems - looking at tables, inserting and deleting values, searching for records. All of the database information has to be stored locally. Let me emphasise that - I don't want to communicate with a server, I want the information to be stored on the user's computer.
Are there any libraries that can easily implement all this functionality, preferably in a SQL style syntax? Or what are some ways to easily and robustly implement this functionality?
You can use embedded DB.
I think SQLite is one of the more popular ones.
My personal preference would be SOCI, with a SQLite backend.
http://soci.sourceforge.net/
http://soci.sourceforge.net/doc/backends/sqlite3.html
http://www.sqlite.org/

At what point is it worth using a database?

I have a question relating to databases and at what point is worth diving into one. I am primarily an embedded engineer, but I am writing an application using Qt to interface with our controller.
We are at an odd point where we have enough data that it would be feasible to implement a database (around 700+ items and growing) to manage everything, but I am not sure it is worth the time right now to deal with. I have no problems implementing the GUI with files generated from excel and parsed in, but it gets tedious and hard to track even with VBA scripts. I have been playing around with converting our data into something more manageable for the application side with Microsoft Access and that seems to be working well. If that works out I am only a step (or several) away from using an SQL database and using the Qt library to access and modify it.
I don't have much experience managing data at this level and am curious what may be the best way to approach this. So what are some of the real benefits of using a database if any in this case? I realize much of this can be very application specific, but some general ideas and suggestions on how to straddle the embedded/application programming line would be helpful.
This is not about putting a database in an embedded project. It is also not a business type application where larger databases are commonly used. I am designing a GUI for a single user on a desktop to interface with a micro-controller for monitoring and configuration purposes.
I decided to go with SQLite. You can do some very interesting things with data that I didn't really consider an option when first starting this project.
A database is worthwhile when:
Your application evolves to some
form of data driven execution.
You're spending time designing and
developing external data storage
structures.
Sharing data between applications or
organizations (including individual
people)
The data is no longer short and
simple.
Data Duplication
Evolution to Data Driven Execution
When the data is changing but the execution is not, this is a sign of a data driven program or parts of the program are data driven. A set of configuration options is a sign of a data driven function, but the whole application may not be data driven. In any case, a database can help manage the data. (The database library or application does not have to be huge like Oracle, but can be lean and mean like SQLite).
Design & Development of External Data Structures
Posting questions to Stack Overflow about serialization or converting trees and lists to use files is a good indication your program has graduated to using a database. Also, if you are spending any amount of time designing algorithms to store data in a file or designing the data in a file is a good time to research the usage of a database.
Sharing Data
Whether your application is sharing data with another application, another organization or another person, a database can assist. By using a database, data consistency is easier to achieve. One of the big issues in problem investigation is that teams are not using the same data. The customer may use one set of data; the validation team another and development using a different set of data. A database makes versioning the data easier and allows entities to use the same data.
Complex Data
Programs start out using small tables of hard coded data. This evolves into using dynamic data with maps, trees and lists. Sometimes the data expands from simple two columns to 8 or more. Database theory and databases can ease the complexity of organizing data. Let the database worry about managing the data and free up your application and your development time. After all, how the data is managed is not as important as to the quality of the data and it's accessibility.
Data Duplication
Often times, when data grows, there is an ever growing attraction for duplicate data. Databases and database theory can minimize the duplication of data. Databases can be configured to warn against duplications.
Moving to using a database has many factors to be considered. Some include but are not limited to: data complexity, data duplication (including parts of the data), project deadlines, development costs and licensing issues. If your program can run more efficiently with a database, then do so. A database may also save development time (and money). There are other tasks that you and your application can be performing than managing data. Leave data management up to the experts.
What you are describing doesn't sound like a typical business application, and many of the answers already posted here assume that this is the kind of application you are talking about, so let me offer a different perspective.
Whether or not you use a database for 700 items is going to depend greatly on the nature of the data.
I would say that, about 90% of the time at this scale, you will benefit from a light-weight database like SQLite, provided that:
The data may potentially grow substantially larger than what you are describing,
The data may be shared by more than one user,
You may need to run queries against the data (which I don't think you're doing right now), and
The data can easily be described in table form.
The other 10% of the time, your data will be highly structured, hierarchical, object-based, and doesn't neatly fit into the table model of a database or Excel table. If this is the case, consider using XML files.
I know developers instinctively like to throw databases at problems like this, but if you are currently using Excel data to design user interfaces (or display configuration settings), rather than display a customer record, XML may be a better fit. XML is more expressive than either Excel or database tables, and can be easily manipulated with a simple text editor.
XML parsers and data binders for C++ are easy to find.
I recommend you to introduce a Database in your app, your application will gain flexibility and will be easier to maintain and to improve with new features in the future.
I would start with a lightweight file based db like Sqlite.
With a well designed db you'll have:
Reduced data redundancy
Greater data integrity
Improved data security
Last but not least, using a database will save you from the Excel import/update/export Hell!
Reasons for using a database:
Concurrent writes. It's easy to achieve concurrency in databases
Easy querying. SQL queries tend to be much concise than procedural code to search data. UPDATEs, INSERT INTOs can also do lots of stuff with very little code
Integrity. Constraints are very easy to define and are enforced without writing code. If you have a non-null constraint, you can rest assured that the value won't be null, no need to write checks anywhere. If you have a foreign key constraint in place, you won't have "dangling references".
Performance over large datasets. Indexing is very simple to add to an SQL database
Reasons for not using a database:
It tends to be an extra dependency (although there exist very lightweight databases- I like H2 for Java, for instance)
Data not well suited to a relational schema. Things that are basically key/value maps. XML (although databases often support XPath, etc.).
Sometimes files are more convenient. They can be diff'ed, merged, edited with a plain text editor, etc. Sometimes spreadsheets can be more practical (you don't have to build an editor- you can use a spreadsheet program)
Your data is already somewhere else
When you have a lot of data that you're not sure how they will be exploited in the future.
For example you might want to add an SQLite database in an embedded application that need to register statistics that you're not sure how will be used. Later you send the full database for injection in a bigger one running on a central server and those data can easily be exploited, using requests.
In fact, if your application's purpose is to "gather data" then having a database is a must have.
I see quite a few requirements that well met by databases:
1). Ad hoc queries. Find me all the {X} that meet criteria Y
2). Data with structure that can benefit from normalisation - factoring out common values into separate "tables". You can save space and reduce the possibility of inconsistency this way. Once you've done this then those ad-hoc queries start to be really useful.
3). Large data volumes. Professional database are very good at making good use of resoruces, clever query optmisations and paging strategies. Trying to write this stuff yourself is a real challenge.
You're clearly not needing that last one, but the other two, maybe do apply to you.
Don't forget that the appropriate database can be quite different depending on your requirements (and don't forget that a text file could be used as a database if you're requirements are simple enough - for example, config files are just a specific kind of database). Such parameters might be:
number of records
size of data items
does the database need to be shared with other devices? Concurrently?
how complex are the relationships between the various pieces of data
is the database read only (created at build time and not changed, for example)?
does the database need to be updated by multiple entities concurrently?
do you need to support complex queries?
For a database with 700 entries, an in-memory sorted array loaded from a text file might well be appropriate. But I could also see the need for an embedded SQL database or maybe having the controller request data from the database over a network connection depending on what the various requirements (and resource limitations) are.
There isn't a specific point at which a database is worthwhile. Instead I usually ask the following questions:
Is the amount of data the application uses/creates growing?
Is the upper limit of this data growth unknown (or unclear)?
Will the application need to aggregate or filter this data?
Could there be future uses of the data that may not be obvious right now?
Is performance of data retrieval and/or storage important?
Are there (or could there be) multiple users of the application who share data?
If I answer 'Yes' to most of these questions I almost always choose a database (as opposed to other options such as XML/ini/CSV/Excel/text files or the filesystem).
Also, if the application will have many users who could be accessing the data concurrently, I'll lean towards a full database server (MySQL, SQl Server, Oracle etc).
But often in a single user (or small concurrency) situation, a local database such as SQLite cannot be beaten for portability and ease of deployment.
To add a negative: not suitable for real-time processing, due to non-deterministic latency. However, It would be quite ample for looking up and setting operating parameters, for instance during startup. I would not put database accesses on critical time paths.
You don't need a database if you have a few thousand rows in one or two tables to handle in a single user app (for the embedded point).
If it is for multiple users (concurrent access, locking) or the need of transactions you definitly should consider a database.
Handling complex datastructures in normalized tables and maintain integrety, or a huge amount of data would be another indication you should use a database.
It sounds like your application is running on a desktop computer and simply communicating to the embedded device.
As such using a database is much more feasible. Using one on an embedded platform is a much more complex issue.
On the desktop front I use a database when there is the need to store new information continuously and the need to extract that information in a relational way. What i don't use databases for is storing static information, information i read once at load and thats it. The exception is when the application has many users and there is the need to storage this information on a per user basis.
It sounds be to me like your collecting information from your embedded device, storing it somehow, then using it later to display via a GUI.
This is a good case for using a database, especially if you can architect the system such that there is a data collection daemon that manages the continuous communication with the embedded device. This app can then just write the data into the database. When the GUI is launched it can extract the data for display.
Using the database will also ease your GUI develop if you need to display different views, such as "show me all the entries between 2 dates". With a database you just ask it for the correct values to display with a proper SQL query and the GUI displays whatever comes back allowing you to decouple much of the "business logic" code from the GUI.
We are also facing a similar situation. We have set of data coming from different test setups and it is currently being dumped into excel sheets, processed using Perl or VBA.
We found out this method had lot of problems:
i. Managing data using excel sheets is quite cumbersome. After some time you have a whole lot of excel sheets and there is no easy way to retrieve required data from it.
ii. People start sending the excel sheets to and fro for comments and review through e-mails. E-Mail becomes the primary mode of managing the comments related to the data. These comments are lost at a later point of time and there is no way of retrieving it back.
iii. Multiple copies of the files get created and changes in one copy are not reflected in the other - there is no versioning.
This is for the same reasons we have decided to move to a database based solution and are currently working on it. Let me summaries what we are trying to do:
i. The database is in a central server accessible by PC in all the test setups.
ii. All the data goes into a temporary location (local hard disk in files) as soon as it is generated. From the files, it is pushed into database by a process running in the background (so even if there is a network problem, data will be present in the local files system).
iii. We have a web based application which allows users to log in and access data in the format they want. The portal will allow them to add comment, generate different kind of reports, share it with other users after review etc. It will also have the ability to export data into excel sheet, just in case you need to take it with you.
Let know if this can be better implemented.
"At what point is it worth using a database?"
If and when you've got data to manage ?