I have a table on greenplum from which data is being deleted on a daily basis using a simple delete statement. However, the size of the table is not decreasing. Is there some way using which the space returned by deletion of rows is removed from the table and thus size of the table be reduced.
Greenplum database at it's core uses much of the same code as a Postgres database. Therefore, the command you want is the VACUUM command. From the docs at http://gpdb.docs.gopivotal.com/4300/pdf/GPDB43RefGuide.pdf:
VACUUM reclaims storage occupied by deleted tuples. In normal Greenplum Database operation, tuples that are deleted or obsoleted by an update are not physically removed from their table; they remain present on disk until a VACUUM is done. Therefore it is necessary to do VACUUM periodically, especially on frequently-updated tables.
Also, if you are altering a significant number of rows, then you may want to use VACUUM ANALYZE to allow the statistics for the table to be updated for better query planning.
Related
I have been working on AWS Redshift and kind of curious about which of the data loading (full reload) method is more performant.
Approach 1 (Using Truncate):
Truncate the existing table
Load the data using Insert Into Select statement
Approach 2 (Using Drop and Create):
Drop the existing table
Load the data using Create Table As Select statement
We have been using both in our ETL, but I am interested in understanding what's happening behind the scene on AWS side.
In my opinion - Drop and Create Table As statement should be more performant as it reduces the overhead of scanning/handling associated data blocks for table needed in Insert Into statement.
Moreover, truncate in AWS Redshift does not reseed identity columns - Redshift Truncate table and reset Identity?
Please share your thoughts.
Redshift operates on 1MB blocks as the base unit of storage and coherency. When changes are made to a table it is these blocks that are "published" for all to see when the changes are committed. A table is just a list (data structure) of block ids that compose it and since there can be many versions of a table in flight at any time (if it is being changed while others are viewing it).
For the sake of the is question let's assume that the table in question is large (contains a lot of data) which I expect is true. These two statements end up doing a common action - unlinking and freeing all the blocks in the table. The blocks is where all the data exists so you'd think that the speed of these two are the same and on idle systems they are close. Both automatically commit the results so the command doesn't complete until the work is done. In this idle system comparison I've seen DROP run faster but then you need to CREATE the table again so there is time needed to recreate the data structure of the table but this can be in a transaction block so do we need to include the COMMIT? The bottom line is that in the idle system these two approaches are quite close in runtime and when I last measured them out for a client the DROP approach was a bit faster. I would advise you to read on before making your decision.
However, in the real world Redshift clusters are rarely idle and in loaded cases these two statements can be quite different. DROP requires exclusive control over the table since it does not run inside of a transaction block. All other uses of the table must be closed (committed or rolled-back) before DROP can execute. So if you are performing this DROP/recreate procedure on a table others are using the DROP statement will be blocked until all these uses complete. This can take an in-determinant amount of time to happen. For ETL processing on "hidden" or "unpublished" tables the DROP/recreate method can work but you need to be really careful about what other sessions are accessing the table in question.
Truncate does run inside of a transaction but performs a commit upon completion. This means that it won't be blocked by others working with the table. It's just that one version of the table is full (for those who were looking at it before truncate ran) and one version is completely empty. The data structure of the table has versions for each session that has it open and each sees the blocks (or lack of blocks) that corresponds to their version. I suspect that it is managing these data structures and propagating these changes through the commit queue that slows TRUNCATE down slightly - bookkeeping. The upside for this bookkeeping is that TRUNCATE will not be blocked by other sessions reading the table.
The deciding factors on choosing between these approaches is often not performance, it is which one has the locking and coherency features that will work in your solution.
Situation
I'm using multiple storage databases as attachments to one central "manager" DB.
The storage tables share one pseudo-AUTOINCREMENT index across all storage databases.
I need to iterate over the shared index frequently.
The final number and names of storage tables are not known on storage DB creation.
On some signal, a then-given range of entries will be deleted.
It is vital that no insertion fails and no entry gets deleted before its signal.
Energy outage is possible, data loss in this case is hardly, if ever, tolerable. Any solutions that may cause this (in-memory databases etc) are not viable.
Database access is currently controlled using strands. This takes care of sequential access.
Due to the high frequency of INSERT transactions, I must trigger WAL checkpoints manually. I've seen journals of up to 2GB in size otherwise.
Current solution
I'm inserting datasets using parameter binding to a precreated statement.
INSERT INTO datatable VALUES (:idx, ...);
Doing that, I remember the start and end index. Next, I bind it to an insert statement into the registry table:
INSERT INTO regtable VALUES (:idx, datatable);
My query determines the datasets to return like this:
SELECT MIN(rowid), MAX(rowid), tablename
FROM (SELECT rowid,tablename FROM entryreg LIMIT 30000)
GROUP BY tablename;
After that, I query
SELECT * FROM datatable WHERE rowid >= :minid AND rowid <= :maxid;
where I use predefined statements for each datatable and bind both variables to the first query's results.
This is too slow. As soon as I create the registry table, my insertions slow down so much I can't meet benchmark speed.
Possible Solutions
There are several other ways I can imagine it can be done:
Create a view of all indices as a UNION or OUTER JOIN of all table indices. This can't be done persistently on attached databases.
Create triggers for INSERT/REMOVE on table creation that fill a registry table. This can't be done persistently on attached databases.
Create a trigger for CREATE TABLE on database creation that will create the triggers described above. Requires user functions.
Questions
Now, before I go and add user functions (something I've never done before), I'd like some advice if this has any chances of solving my performance issues.
Assuming I create the databases using a separate connection before attaching them. Can I create views and/or triggers on the database (as main schema) that will work later when I connect to the database via ATTACH?
From what it looks like, a trigger AFTER INSERT will fire after every single line of insert. If it inserts stuff into another table, does that mean I'm increasing my number of transactions from 2 to 1+N? Or is there a mechanism that speeds up triggered interaction? The first case would slow down things horribly.
Is there any chance that a FULL OUTER JOIN (I know that I need to create it from other JOIN commands) is faster than filling a registry with insertion transactions every time? We're talking roughly ten transactions per second with an average of 1000 elements (insert) vs. one query of 30000 every two seconds (query).
Open the sqlite3 databases in multi-threading mode, handle the insert/update/query/delete functions by separate threads. I prefer to transfer query result to a stl container for processing.
If I run a CREATE EXTERNAL TABLE cetasTable AS SELECT command then run:
EXPLAIN
select * from cetasTable
I see in the distributed query plan:
<operation_cost cost="4231.099968" accumulative_cost="4231.099968" average_rowsize="2056" output_rows="428735" />
It seems to know the correct row count, however, if I look there are no statistics created on that table as this query returns zero rows:
select * from sys.stats where object_id = object_id('cetasTable')
If I already have files in blob storage and I run a CREATE EXTERNAL TABLE cetTable command then run:
EXPLAIN
select * from cetTable
The distributed query plan shows SQL DW thinks there are only 1000 rows in the external table:
<operation_cost cost="4.512" accumulative_cost="4.512" average_rowsize="940" output_rows="1000" />
Of course I can create statistics to ensure SQL DW knows the right row count when it creates the distributed query plan. But can someone explain how it knows the correct row count some of the time and where that correct row count is stored?
What you are seeing is the difference between a table created using CxTAS (CTAS, CETAS or CRTAS) and CREATE TABLE.
When you run CREATE TABLE row count and page count values are fixed as the table is empty. If memory serves the fixed values are 1000 rows and 100 pages. When you create a table with CTAS they are not fixed. The actual values are known to the CTAS command as it has just created and populated the table in a single command. Consequently, the metadata correctly reflects the table SIZE when a CxTAS is used. This is good. The APS / SQLDW cost based optimizer can immediately make better estimations for MPP plan generation based on table SIZE when a table has been created via CxTAS as opposed to CREATE table.
Having an accurate understanding of table size is important.
Imagine you have a table created using CREATE TABLE and then 1 billion rows are inserted using INSERT into said table. The shell database still thinks that the table has 1000 rows and 100 pages. However, this is clearly not the case. The reason for this is because the table size attributes are not automatically updated at this time.
Now imagine that a query is fired that requires data movement on this table. Things may begin to go awry. You are now more likely to see the engine make poor MPP plan choices (typically using BROADCAST rather than SHUFFLE) as it does not understand the table size amongst other things.
What can you do to improve this?
You create at least one column level statistics object per table. Generally speaking you will create statistics objects on all columns used in JOINS, GROUP BYs, WHEREs and ORDER BYs in your queries. I will explain the underlying process for statistics generation in a moment. I just want to emphasise that the call to action here is to ensure that you create and maintain your statistics objects.
When CREATE STATISTICS is executed for a column three events actually occur.
1) Table level information is updated on the CONTROL node
2) Column level statistics object is created on every distribution on the COMPUTE nodes
3) Column level statistics object is created and updated on the CONTROL node
1) Table level information is updated on the CONTROL node
The first step is to update the table level information. To do this APS / SQLDW executes DBCC SHOW_STATISTICS (table_name) WITH STAT_STREAM against every physical distribution; merging the results and storing them in the catalog metadata of the shell database. Row count is held on sys.partitions and page count is held on sys.allocation_units. Sys.partitions is visible to you in both SQLDW and APS. However, sys.allocation_units is not visible to the end user at this time. I referenced the location for those familiar with the internals of SQL Server for information and context.
At the end of this stage the metadata held in the shell database on the CONTROL node has been updated for both row count and page count. There is now no difference between a table created by CREATE TABLE and a CTAS - both know the size.
2) Column level statistics object is created on every distribution on the COMPUTE nodes
The statistics object must be created in every distribution on every COMPUTE node. By creating a statistics object important, detailed statistical data (notably the histogram and the density vector) for the column has been created.
This information is used by APS and SQLDW for generating distribution level SMP plans. SMP plans are used by APS / SQLDW in the PHYSICAL layer only. Therefore, at this point the statistical data is not in a location that can be used for generating MPP plans. The information is distributed and not accessible in a timely fashion for cost based optimisation. Therefore a third step is necessary...
3) Column level statistics object is created and updated on the CONTROL node
Once the data is created PHYSICALLY on the distributions in the COMPUTE layer it must be brought together and held LOGICALLY to facilitate MPP plan cost based optimisation. The shell database on the CONTROL node also creates a statistics object. This is a LOGICAL representation of the statistics object.
However, the shell database stat does not yet reflect the column level statistical information held PHYSICALLY in the distributions on the COMPUTE nodes. Consequently, the statistics object in the shell database on the CONTROL node needs to be UPDATED immediately after it has been created.
DBCC SHOW_STATISTICS (table_name, stat_name) WITH STAT_STREAM is used to do this.
Notice that the command has a second parameter. This changes the result set; providing APS / SQLDW with all the information required to build a LOGICAL view of the statistics object for that column.
I hope this goes some way to explaining what you were seeing but also how statistics are created and why they are important for Azure SQL DW and for APS.
Sometimes schemamigration takes long time, e.g several fields are added/removed/edited. What happens if you try to make an insertion to a table while running a schema migration to change the structure of this table?
I'm aware the changes are not persistent until the entire migration is done.
That behavior depends on the underlying database and what the actual migration is doing. For example, PostgreSQL DDL operations are transactional; an insert to the table will block until a DDL transaction completes. For example, in one psql window, do something like this:
create table kvpair (id serial, key character varying (50), value character varying(100));
begin;
alter table kvpair add column rank integer;
At this point, do not commit the transaction. In another psql window, try:
insert into kvpair (key, value) values ('fruit', 'oranges');
You'll see it will block until the transaction in the other window is committed.
Admittedly, that's a contrived example - the granularity of what's locked will depend on the operation (DDL changes, indexing, DML updates). In addition, any statements that get submitted for execution may have assumed different constraints. For example, change the alter table statement above to include not null. On commit, the insert fails.
In my experience, it's always a good thing to consider the "compatibility" of schema changes, and minimize changes that will dramatically restructure large tables. Careful attention can help you minimize downtime, by performing schema changes that can occur on a running system.
How is it possible? I have a simple C++ app that is using SQLite3 to INSERT/DELETE records.
I use a single database and a single table inside. Then after I choose to store some data into the db, it does and the size of my.db increases naturally.
While there is a problem with DELETE - it does not. But if I do:
sqlite3 my.db
sqlite> select count(*) from mytable;
there is 0 returned which is okay, but if do ls -l on the folder containing my.db, the size
is the same.
Can anybody explain?
When you execute a DELETE query, Sqlite does not actually delete the records and rearrange the data. That would take too much time. Instead, it just marks deleted records and ignore them from then on.
If you actually want to reduce the data size, execute VACUUM command. There is also an option for auto vacuuming. See http://www.sqlite.org/lang_vacuum.html.
The scenario is listed in the SQLite Frequently Asked Questions:
(12) I deleted a lot of data but the database file did not get any
smaller. Is this a bug?
No. When you delete information from an SQLite database, the unused disk space is added to an internal "free-list" and is reused
the next time you insert data. The disk space is not lost. But neither
is it returned to the operating system.
If you delete a lot of data and want to shrink the database file, run the VACUUM command. VACUUM will reconstruct the database from
scratch. This will leave the database with an empty free-list and a
file that is minimal in size. Note, however, that the VACUUM can take
some time to run (around a half second per megabyte on the Linux box
where SQLite is developed) and it can use up to twice as much
temporary disk space as the original file while it is running.
As of SQLite version 3.1, an alternative to using the VACUUM command is auto-vacuum mode, enabled using the auto_vacuum pragma.
The documentation is your friend; please use it.
Also from the documentation:
When information is deleted in the database, and a btree page becomes
empty, it isn't removed from the database file, but is instead marked
as 'free' for future use. When a new page is needed, SQLite will use
one of these free pages before increasing the database size. This
results in database fragmentation, where the file size increases
beyond the size required to store its data, and the data itself
becomes disordered in the file.
Another side effect of a dynamic database is table fragmentation. The
pages containing the data of an individual table can become spread
over the database file, requiring longer for it to load. This can
appreciably slow database speed because of file system behavior.
Compacting fixes both of these problems.
The easiest way to remove empty pages is to use the SQLite command
VACUUM. This can be done from within SQLite library calls or the
sqlite utility.
In-depth examples follow.