Sitecore media conversion tool eating storage space - sitecore

I have a question regarding the media conversion tool for Sitecore.
With this module you can convert media items between a hard drive location and a Sitecore database, and vice versa. But each time I convert some items it keeps taking additional harddrive space.
So when I convert 3gb to the hard drive it adds an additional 3gb (which seems logic -> 6gb total) but then when I convert them back to the blob format it adds another 3gb (9gb total). Instead of overwriting the previous version in the database.
Is there a way to clean the previous blobs or something? Because now it is using too much hard drive space.
Thanks in advance.

Using "Clean Up Databases" should work, but if the size gets too large, as my client's blob table did, the clean up will fail due to either a SQL timeout or because SQL Server uses up all the available locks.
Another solution is to run a script to manually clean up the blobs table. We had this issue previously and Sitecore support was able to provide us with a script to do so:
DECLARE #UsableBlobs table(
ID uniqueidentifier
);
I-N-S-E-R-T INTO
#UsableBlobs
S-E-L-E-C-T convert(uniqueidentifier,[Value]) as EmpID from [Fields]
where [Value] != ''
and (FieldId='{40E50ED9-BA07-4702-992E-A912738D32DC}' or FieldId='{DBBE7D99-1388-4357-BB34-AD71EDF18ED3}')
D-E-L-E-T-E from [Blobs]
where [BlobId] not in (S-E-L-E-C-T * from #UsableBlobs)
This basically looks for blobs that are still in use and stores them in a temp table. It them compares the items in this table to the Blobs table and deletes the ones that aren't in the temp table.
In our case, even this was bombing out due to the SQL Server locks problem, so I updated the delete statement to be delete top (x) from [Blobs] where x is a number you feel is more appropriate. I started at 1000 and eventually went up to deleting 400,000 records at a time. (Yes, it was that large)
So try the built-in "Clean Up Databases" option first and failing that, try to run the script to manually clean the table.
Edit note: Sorry, had to change the "insert", "select" and "delete" commands to allow SO to save the entry.

Related

C++ and Sqlite DELETE query doesn`t actually delete the value from the database file

I`ve came accross this issue on SQlite and c++ and i can't find any answer on it.
Everything is working fine in SQlite and c++ all queries all outputs all functions but i have this question that can`t find any solution around it.
I create a database MyTest.db
I create a table test with an id and a name as fields
I enter 2 values to each id=1 name=Name1 and id=2 name=Name2
I delete the 2nd value
The data inside table now says that i have only the id=1 with name=Name1
When i open my Mytest.db with notepad.exe the values that i have deleted such as id=2 name=Name2 are still inside the database file though that it doesn`t come to the data results of this table but still exists though.
What i like to ask from anyone that knows about it is this :
Is there any other way that the value has to be deleted also from the database file or is it my mistake with the DELETE option of SQLITE (that i doubt it)
Its like the database file keeps collecting all the trash inside it without removing DELETED values from its tables...
Any help or suggestion is much appreciated
If you use "PRAGMA secure_delete=ON;" then SQLite overwrites deleted content with zeros. See https://www.sqlite.org/pragma.html#pragma_secure_delete
Even with secure_delete=OFF, the deleted space will be reused (and overwritten) to store new content the next time you INSERT. SQLite does not leak disk space, nor is it necessary to VACUUM in order to reclaim space. But normally, deleted content is not overwritten as that uses extra CPU cycles and disk I/O.
Basically all databases only mark rows active or inactive, they won't delete the actual data from the file immediately. It would be a huge waste of time and resources, since that part of the file can just be reused.
Since your queries show that the row isn't active in results, is this in some way an issue? You can always run a VACUUM on the database if you want to reclaim the space, but I would just let the database engine handle everything by itself. It won't "keep collecting all the trash inside it", don't worry.
If you see that the file size is growing and the space is not reused, then you can issue vacuums from time to time.
You can also test this by just inserting other rows after deleting old ones. The engine should reuse those parts of the file at some point.

What is the idiomatic way to perform a migration on a dynamo table

Suppose I have a dynamo table named Person, which has 2 fields, name (string), age (int). Let's assume it has a TB worth of data and experiences a small amount of read throughput, but a ton of write throughput. Now I want to add a new field called Phone (string). What is the best way to go about moving the data from one table to another?
Note: Dynamo doesn't let you rename tables, and fields cannot be null.
Here are the options I think I have:
Dump the table to .csv, run a script (overnight probably since it's a TB worth of data) to add a default phone number to this file. (Not ideal, will also lose all new data submitted into old table, unless I bring the service offline to perform the migration (which is not an option in this case)).
Use the SCAN api call. (SCAN will read all values, then will consume significant write throughput on the new table to insert all old data into it).
How can I do perform a dynamo migration on a large table w/o
significant data loss?
you don't need to do anything. This is NoSQL, not SQL. (i.e. there is no idiomatic way to do this as you normally don't need migrations for NoSQL)
Just start writing entries with the additional key.
Records you get back that are written before will not have this key. What you normally do is have a default value you use when missing.
If you want to backfill, just go through and read the value + put the value with the additional field. You can do this in one run via a scan or again do it lazily when accessing the data.

How can I determine how many bytes of images we have in the Sitecore Media Library?

We're looking at options to store our site's media (primarily images at this point) on some sort of cloud service. However, we'd like to get an idea of how much it might cost.
As part of this we need to get transfer (which wasn't enabled in IIS for logging, but now is) as well as the total amount of images we'd be storing on the server.
I've written a LINQ query against the database, but I'm wondering if there's a better way to handle this.
Checking the App_Data\MediaCache directory as the accepted answer to Sitecore Database and App_Data Size suggests shows a number of configuration and other documents in it as well, so might not be ideal.
Any suggestions? Thanks!
If all your media is stored in the db, you can get a pretty good approximation from looking at the Blobs table on your master database.
EXEC sp_spaceused N'dbo.Blobs';
If you've got a mix, you'll have to look at that plus what ever location is configured on the Media.FileFolder Sitecore setting (this defaults to /App_Data/MediaFiles).
Here is an addition to what #ddysart suggested. You can run the following query to find out number of media contents over a certain size.
SELECT DATALENGTH([Data]) FROM [Sitecore_DatabaseName].[dbo].[Blobs]
WHERE DATALENGTH([Data]) > <size_in_bytes>
Sitecore stores the size of the Media file in bytes in the Size field of the Media Item definition when you create or update a media item, regardless of whether the Item is stored in the database or on disk. You can SUM these values to give you the total.
The media with Unversioned template is stored in SharedFields table:
SELECT SUM(CAST([Value] as INT))
FROM [SharedFields]
WHERE FieldId = '6954B7C7-2487-423F-8600-436CB3B6DC0E'
And Versioned media is stored in VersionedFields table. You are probably only interested in the latest version of the uploaded media to reduce the amount stored in your CDN/Cloud:
SELECT SUM(CAST(vf.[Value] as INT))
FROM [VersionedFields] AS vf
INNER JOIN (
SELECT [ItemId], MAX([Version]) AS [Version]
FROM [VersionedFields]
GROUP BY [ItemId]) AS [vf1]
ON vf.ItemId = vf1.ItemId
AND vf.[Version] = vf1.[Version]
WHERE vf.FieldId = '5BE6C122-84C9-4661-A0C9-3718909C8DAD'
Note that the returned size is in bytes.

Sqlite3/C++ executes DELETE statement without changing the db size

How is it possible? I have a simple C++ app that is using SQLite3 to INSERT/DELETE records.
I use a single database and a single table inside. Then after I choose to store some data into the db, it does and the size of my.db increases naturally.
While there is a problem with DELETE - it does not. But if I do:
sqlite3 my.db
sqlite> select count(*) from mytable;
there is 0 returned which is okay, but if do ls -l on the folder containing my.db, the size
is the same.
Can anybody explain?
When you execute a DELETE query, Sqlite does not actually delete the records and rearrange the data. That would take too much time. Instead, it just marks deleted records and ignore them from then on.
If you actually want to reduce the data size, execute VACUUM command. There is also an option for auto vacuuming. See http://www.sqlite.org/lang_vacuum.html.
The scenario is listed in the SQLite Frequently Asked Questions:
(12) I deleted a lot of data but the database file did not get any
smaller. Is this a bug?
No. When you delete information from an SQLite database, the unused disk space is added to an internal "free-list" and is reused
the next time you insert data. The disk space is not lost. But neither
is it returned to the operating system.
If you delete a lot of data and want to shrink the database file, run the VACUUM command. VACUUM will reconstruct the database from
scratch. This will leave the database with an empty free-list and a
file that is minimal in size. Note, however, that the VACUUM can take
some time to run (around a half second per megabyte on the Linux box
where SQLite is developed) and it can use up to twice as much
temporary disk space as the original file while it is running.
As of SQLite version 3.1, an alternative to using the VACUUM command is auto-vacuum mode, enabled using the auto_vacuum pragma.
The documentation is your friend; please use it.
Also from the documentation:
When information is deleted in the database, and a btree page becomes
empty, it isn't removed from the database file, but is instead marked
as 'free' for future use. When a new page is needed, SQLite will use
one of these free pages before increasing the database size. This
results in database fragmentation, where the file size increases
beyond the size required to store its data, and the data itself
becomes disordered in the file.
Another side effect of a dynamic database is table fragmentation. The
pages containing the data of an individual table can become spread
over the database file, requiring longer for it to load. This can
appreciably slow database speed because of file system behavior.
Compacting fixes both of these problems.
The easiest way to remove empty pages is to use the SQLite command
VACUUM. This can be done from within SQLite library calls or the
sqlite utility.
In-depth examples follow.

SQLite - pre allocating database size

Is there a way to pre allocate my SQLite database to a certain size? Currently I'm adding and deleting a number of records and would like to avoid this over head at create time.
The fastest way to do this is with the zero_blob function:
Example:
Y:> sqlite3 large.sqlite
SQLite version 3.7.4
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table large (a);
sqlite> insert into large values (zeroblob(1024*1024));
sqlite> drop table large;
sqlite> .q
Y:> dir large.sqlite
Volume in drive Y is Personal
Volume Serial Number is 365D-6110
Directory of Y:\
01/27/2011 12:10 PM 1,054,720 large.sqlite
Note: As Kyle properly indicates in his comment:
There is a limit to how big each blob can be, so you may need to insert multiple blobs if you expect your database to be larger than ~1GB.
There is a hack - Insert a bunch of data into the database till the database size is what you want and then delete the data. This works because:
"When an object (table, index, or
trigger) is dropped from the database,
it leaves behind empty space. This
empty space will be reused the next
time new information is added to the
database. But in the meantime, the
database file might be larger than
strictly necessary."
Naturally, this isn't the most reliable method. (Also, you will need to make sure that auto_vacuum is disabled for this to work). You can learn more here - http://www.sqlite.org/lang_vacuum.html