SQLite - pre allocating database size - c++

Is there a way to pre allocate my SQLite database to a certain size? Currently I'm adding and deleting a number of records and would like to avoid this over head at create time.

The fastest way to do this is with the zero_blob function:
Example:
Y:> sqlite3 large.sqlite
SQLite version 3.7.4
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table large (a);
sqlite> insert into large values (zeroblob(1024*1024));
sqlite> drop table large;
sqlite> .q
Y:> dir large.sqlite
Volume in drive Y is Personal
Volume Serial Number is 365D-6110
Directory of Y:\
01/27/2011 12:10 PM 1,054,720 large.sqlite
Note: As Kyle properly indicates in his comment:
There is a limit to how big each blob can be, so you may need to insert multiple blobs if you expect your database to be larger than ~1GB.

There is a hack - Insert a bunch of data into the database till the database size is what you want and then delete the data. This works because:
"When an object (table, index, or
trigger) is dropped from the database,
it leaves behind empty space. This
empty space will be reused the next
time new information is added to the
database. But in the meantime, the
database file might be larger than
strictly necessary."
Naturally, this isn't the most reliable method. (Also, you will need to make sure that auto_vacuum is disabled for this to work). You can learn more here - http://www.sqlite.org/lang_vacuum.html

Related

How to take back storage from my postgres db

I have a table named duplicates_duplicatebackendentry_documents that has a size of 49gb. This table has 2 indexes that are each 25 gb. And two constraints that are also each 25gb.
The table is used by the duplicates module in a django app I deployed. I have now turned off the module. I am unable to run full vacuuum because I do not have the space necessary to run it. Deleting the table returns the storage (I tested in a dev env) but is there a way I can delete the bloat but keep the table, its constraints and indexes? I just want to empty the bloat along with all the contents.
I just want to empty the bloat along with all the contents.
The canonical way to do that is
TRUNCATE duplicates_duplicatebackendentry_documents;
which will render the table and all its indexes empty.

Simultaneously `CREATE TABLE LIKE` in AWS Redshift and change a few of columns' default values

Workflow
In a data import workflow, we are creating a staging table using CREATE TABLE LIKE statement.
CREATE TABLE abc_staging (LIKE abc INCLUDING DEFAULTS);
Then, we run COPY to import CSV data from S3 into the staging table.
The data in CSV is incomplete. Namely, there are fields partition_0, partition_1, partition_2 which are missing in the CSV file; we fill them in like this:
UPDATE
abc_staging
SET
partition_0 = 'BUZINGA',
partition_1 = '2018',
partition_2 = '07';
Problem
This query seems expensive (takes ≈20 minutes oftentimes), and I would like to avoid it. That could have been possible if I could configure DEFAULT values on these columns when creating the abc_staging table. I did not find any method as to how that can be done; nor any explicit indication that is impossible. So perhaps this is still possible but I am missing how to do that?
Alternative solutions I considered
Drop these columns and add them again
That would be easy to do, but ALTER TABLE ADD COLUMN only adds columns to the end of the column list. In abc table, they are not at the end of the column list, which means the schemas of abc and abc_staging will mismatch. That breaks ALTER TABLE APPEND operation that I use to move data from staging table to the main table.
Note. Reordering columns in abc table to alleviate this difficulty will require recreating the huge abc table which I'd like to avoid.
Generate the staging table creation script programmatically with proper columns and get rid of CREATE TABLE LIKE
I will have to do that if I do not find any better solution.
Fill in the partition_* fields in the original CSV file
That is possible but will break backwards compatibility (I already have perhaps hundreds thousands of files in there). Harder but manageable.
As you are finding you are not creating a table exactly LIKE the original and Redshift doesn't let you ALTER a column's default value. Your proposed path is likely the best (define the staging table explicitly).
Since I don't know your exact situation other paths might be better so me explore a bit. First off when you UPDATE the staging table you are in fact reading every row in the table, invalidating that row, and writing a new row (with new information) at the end of the table. This leads to a lot of invalidated rows. Now when you do ALTER TABLE APPEND all these invalidated rows are being added to your main table. Unless you vacuum the staging table before hand. So you may not be getting the value you want out of ALTER TABLE APPEND.
You may be better off INSERTing the data onto your main table with an ORDER BY clause. This is slower than the ALTER TABLE APPEND statement but you won't have to do the UPDATE so the overall process could be faster. You could come out further ahead because of reduced need to VACUUM. Your situation will determine if this is better or not. Just another option for your list.
I am curious about your UPDATE speed. This just needs to read and then write every row in the staging table. Unless the staging table is very large it doesn't seem like this should take 20 min. Other activity could be creating this slowdown. Just curious.
Another option would be to change your main table to have these 3 columns last (yes this would be some work). This way you could add the columns to the staging table and things would line up for ALTER TABLE APPEND. Just another possibility.
The easiest solution turned to be adding the necessary partition_* fields to the source CSV files.
After employing that change and removing the UPDATE from the importer pipeline, the performance has greatly improved. Imports now take ≈10 minutes each in total (that encompasses COPY, DELETE duplicates and ALTER TABLE APPEND).
Disk space is no longer climbing up to 100%.
Thanks everyone for help!

C++ and Sqlite DELETE query doesn`t actually delete the value from the database file

I`ve came accross this issue on SQlite and c++ and i can't find any answer on it.
Everything is working fine in SQlite and c++ all queries all outputs all functions but i have this question that can`t find any solution around it.
I create a database MyTest.db
I create a table test with an id and a name as fields
I enter 2 values to each id=1 name=Name1 and id=2 name=Name2
I delete the 2nd value
The data inside table now says that i have only the id=1 with name=Name1
When i open my Mytest.db with notepad.exe the values that i have deleted such as id=2 name=Name2 are still inside the database file though that it doesn`t come to the data results of this table but still exists though.
What i like to ask from anyone that knows about it is this :
Is there any other way that the value has to be deleted also from the database file or is it my mistake with the DELETE option of SQLITE (that i doubt it)
Its like the database file keeps collecting all the trash inside it without removing DELETED values from its tables...
Any help or suggestion is much appreciated
If you use "PRAGMA secure_delete=ON;" then SQLite overwrites deleted content with zeros. See https://www.sqlite.org/pragma.html#pragma_secure_delete
Even with secure_delete=OFF, the deleted space will be reused (and overwritten) to store new content the next time you INSERT. SQLite does not leak disk space, nor is it necessary to VACUUM in order to reclaim space. But normally, deleted content is not overwritten as that uses extra CPU cycles and disk I/O.
Basically all databases only mark rows active or inactive, they won't delete the actual data from the file immediately. It would be a huge waste of time and resources, since that part of the file can just be reused.
Since your queries show that the row isn't active in results, is this in some way an issue? You can always run a VACUUM on the database if you want to reclaim the space, but I would just let the database engine handle everything by itself. It won't "keep collecting all the trash inside it", don't worry.
If you see that the file size is growing and the space is not reused, then you can issue vacuums from time to time.
You can also test this by just inserting other rows after deleting old ones. The engine should reuse those parts of the file at some point.

Sitecore media conversion tool eating storage space

I have a question regarding the media conversion tool for Sitecore.
With this module you can convert media items between a hard drive location and a Sitecore database, and vice versa. But each time I convert some items it keeps taking additional harddrive space.
So when I convert 3gb to the hard drive it adds an additional 3gb (which seems logic -> 6gb total) but then when I convert them back to the blob format it adds another 3gb (9gb total). Instead of overwriting the previous version in the database.
Is there a way to clean the previous blobs or something? Because now it is using too much hard drive space.
Thanks in advance.
Using "Clean Up Databases" should work, but if the size gets too large, as my client's blob table did, the clean up will fail due to either a SQL timeout or because SQL Server uses up all the available locks.
Another solution is to run a script to manually clean up the blobs table. We had this issue previously and Sitecore support was able to provide us with a script to do so:
DECLARE #UsableBlobs table(
ID uniqueidentifier
);
I-N-S-E-R-T INTO
#UsableBlobs
S-E-L-E-C-T convert(uniqueidentifier,[Value]) as EmpID from [Fields]
where [Value] != ''
and (FieldId='{40E50ED9-BA07-4702-992E-A912738D32DC}' or FieldId='{DBBE7D99-1388-4357-BB34-AD71EDF18ED3}')
D-E-L-E-T-E from [Blobs]
where [BlobId] not in (S-E-L-E-C-T * from #UsableBlobs)
This basically looks for blobs that are still in use and stores them in a temp table. It them compares the items in this table to the Blobs table and deletes the ones that aren't in the temp table.
In our case, even this was bombing out due to the SQL Server locks problem, so I updated the delete statement to be delete top (x) from [Blobs] where x is a number you feel is more appropriate. I started at 1000 and eventually went up to deleting 400,000 records at a time. (Yes, it was that large)
So try the built-in "Clean Up Databases" option first and failing that, try to run the script to manually clean the table.
Edit note: Sorry, had to change the "insert", "select" and "delete" commands to allow SO to save the entry.

Sqlite3/C++ executes DELETE statement without changing the db size

How is it possible? I have a simple C++ app that is using SQLite3 to INSERT/DELETE records.
I use a single database and a single table inside. Then after I choose to store some data into the db, it does and the size of my.db increases naturally.
While there is a problem with DELETE - it does not. But if I do:
sqlite3 my.db
sqlite> select count(*) from mytable;
there is 0 returned which is okay, but if do ls -l on the folder containing my.db, the size
is the same.
Can anybody explain?
When you execute a DELETE query, Sqlite does not actually delete the records and rearrange the data. That would take too much time. Instead, it just marks deleted records and ignore them from then on.
If you actually want to reduce the data size, execute VACUUM command. There is also an option for auto vacuuming. See http://www.sqlite.org/lang_vacuum.html.
The scenario is listed in the SQLite Frequently Asked Questions:
(12) I deleted a lot of data but the database file did not get any
smaller. Is this a bug?
No. When you delete information from an SQLite database, the unused disk space is added to an internal "free-list" and is reused
the next time you insert data. The disk space is not lost. But neither
is it returned to the operating system.
If you delete a lot of data and want to shrink the database file, run the VACUUM command. VACUUM will reconstruct the database from
scratch. This will leave the database with an empty free-list and a
file that is minimal in size. Note, however, that the VACUUM can take
some time to run (around a half second per megabyte on the Linux box
where SQLite is developed) and it can use up to twice as much
temporary disk space as the original file while it is running.
As of SQLite version 3.1, an alternative to using the VACUUM command is auto-vacuum mode, enabled using the auto_vacuum pragma.
The documentation is your friend; please use it.
Also from the documentation:
When information is deleted in the database, and a btree page becomes
empty, it isn't removed from the database file, but is instead marked
as 'free' for future use. When a new page is needed, SQLite will use
one of these free pages before increasing the database size. This
results in database fragmentation, where the file size increases
beyond the size required to store its data, and the data itself
becomes disordered in the file.
Another side effect of a dynamic database is table fragmentation. The
pages containing the data of an individual table can become spread
over the database file, requiring longer for it to load. This can
appreciably slow database speed because of file system behavior.
Compacting fixes both of these problems.
The easiest way to remove empty pages is to use the SQLite command
VACUUM. This can be done from within SQLite library calls or the
sqlite utility.
In-depth examples follow.