I know redshift split vacuum progress into two stages: sort and merge.
During sorting, the disk usage will not change, but merge stage seems will occupy a lot of free space.
My cluster is 3 nodes of dw2.xlarge, total 480 GB SSD. Before vacuuming, the total disk usage is around 50%.
I'm doing a vacuum on a a table of 81GB, but it failed when trying to do merge, due to disk full error.
I want to know how much space should I reserve for vacuuming a large unsorted table?
I asked this question to redshift support team, but haven't got any reply until now. Does anyone have experience with it?
Yes for longer vacuum use deep copy
This will should avoid disk usage problems.
while you to insert into ( select * from...) here in select you select sorted order for data so that you data will be inserted in sorted fashion.
Do incremental insert into (select * from ) as per your sort key
e.g if you are storing data for 30 days then do deep copy day by day
This should avoid space issues
Related
I have 24 gb ram , centos 7 Os , mariadb 10.3 Version database and 1TB ssd linux server and i am running my cpp script and mariacpp connector to insert data in tables and i have around 50000 tables and want to insert data only in 5000 tables from those 50000 tables but when i run my script using cpp threading and mariacpp connector i get only 50 insertion / sec . Which is very low so i want to improve InnoDb insertion performance. So Please suggest me any configurations and any other ways.
My Current InnoDB Configurations :
Innodb_read_io_threads=64;
innodb_write_io_threads=64 ;
innodb_buffer_pool_size=16G;
innodb_buffer_pool_load_at_startup=ON;
innodb_log_file_size = 1G ;
innodb_log_files_in_group=10;
innodb_file_per_table=1;
innodb_log_buffer_size=1G;
innodb_flush_method=O_DIRECT;
innodb_flush_log_at_trx_commit=2;
skip-innodb_doublewrite ;
innodb_io_capacity = 2000;
innodb_io_capacity_max = 3000;
innodb_flush_sync=1;
I tried many thing related variables but not improved and i changed my all tables from MyISAM to InnoDB using Alter table query.
innodb_log_buffer_size=1G
That may be dangerously high. Lower it to 1% of RAM.
Having 50K tables is usually a bad schema design pattern; please explain what you are doing.
There are multiple ways to do fast INSERTs:
LOAD DATA INFILE -- but only if you already have the data in a CSV file.
Batched INSERT -- INSERT INTO t (...) VALUES (1,2), (3,4), ... -- Above 1000 rows at a time you get into "diminishing returns and other inefficiencies.
Putting several INSERTs into a single transaction. This also has issues with diminishing returns and inefficiencies.
Which of those have you employed? (Then we can critique things further.)
Please provide SHOW CREATE TABLE for one of the tables. There may be clues on what would help. (One example: lots of unnecessary indexes.)
Describe your disk subsystem -- HDD vs SSD; RAID; etc.
My table is 500gb large with 8+ billion rows, INTERLEAVED SORTED by 4 keys.
One of the keys has a big skew 680+. On running a VACUUM REINDEX, its taking very long, about 5 hours for every billion rows.
When i track the vacuum progress it says the following:
SELECT * FROM svv_vacuum_progress;
table_name | status | time_remaining_estimate
-----------------------------+--------------------------------------------------------------------------------------+-------------------------
my_table_name | Vacuum my_table_name sort (partition: 1761 remaining rows: 7330776383) | 0m 0s
I am wondering how long it will be before it finishes as it is not giving any time estimates as well. Its currently processing partition 1761... is it possible to know how many partitions there are in a certain table? Note these seem to be some storage level lower layer partitions within Redshift.
These days, it is recommended that you should not use Interleaved Sorting.
The sort algorithm places a tremendous load on the VACUUM operation and the benefits of Interleaved Sorts are only applicable for very small use-cases.
I would recommend you change to a compound sort on the fields most commonly used in WHERE clauses.
The most efficient sorts are those involving date fields that are always incrementing. For example, imagine a situation where rows are added to the table with a transaction date. All new rows have a date greater than the previous rows. In this situation, a VACUUM is not actually required because the data is already sorted according to the Date field.
Also, please realise that 500 GB is actually a LOT of data. Doing anything that rearranges that amount of data will take time.
If you vacuum is running slow you probably don’t have enough space on the cluster. I suggest you double the number of nodes temporarily while you do the vacuum.
You might also want to think about changing how your schema is set up. It’s worth going through this list of redshift tips to see if you can change anything:
https://www.dativa.com/optimizing-amazon-redshift-predictive-data-analytics/
The way we recovered back to the previous stage is to drop the table and restore it from the pre vacuum index time from the backup snapshot.
I am having issues with amazon athena, I have a small bucket ( 36430 objects , 9.7 mb ) with 4 levels of partition ( my-bucket/p1=ab/p2=cd/p3=ef/p4=gh/file.csv ) but when I run the command
MSCK REPAIR TABLE db.table
is taking over 25 minutes, and I have plans to put data of the magnitude of TB on Athena and I won't do it if this issue remains
Does anybody know why is taking too long?
Thanks in advance
MSCK REPAIR TABLE can be a costly operation, because it needs to scan the table's sub-tree in the file system (the S3 bucket). Multiple levels of partitioning can make it more costly, as it needs to traverse additional sub-directories. Assuming all potential combinations of partition values occur in the data set, this can turn into a combinatorial explosion.
If you are adding new partitions to an existing table, then you may find that it's more efficient to run ALTER TABLE ADD PARTITION commands for the individual new partitions. This avoids the need to scan the table's entire sub-tree in the file system. It is less convenient than simply running MSCK REPAIR TABLE, but sometimes the optimization is worth it. A viable strategy is often to use MSCK REPAIR TABLE for an initial import, and then use ALTER TABLE ADD PARTITION for ongoing maintenance as new data gets added into the table.
If it's really not feasible to use ALTER TABLE ADD PARTITION to manage the partitions directly, then the execution time might be unavoidable. Reducing the number of partitions might reduce execution time, because it won't need to traverse as many directories in the file system. Of course, then the partitioning is different, which might impact query execution time, so it's a trade-off.
While the marked answer is technically correct, it doesn't address your real issue, which is that you have too many files.
I have a small bucket ( 36430 objects , 9.7 mb ) with 4 levels of
partition ( my-bucket/p1=ab/p2=cd/p3=ef/p4=gh/file.csv )
For such a small table, 36430 files creates a huge amount of overhead on S3, and the partitioning with 4 levels is super-overkill. The partitioning has hindered query performance rather than optimizing it. MSCK is slow because it is waiting for S3 listing among other things.
Athena would read the entire 9.7MB table if it were in one file faster than it would be able to list that huge directory structure.
I recommend removing the partitions completely, or if you really must have them then remove p2, p3 and p4 levels. Also consider processing it into another table to compact the files into larger ones.
Some suggest optimal file sizes are between 64MB and 4GB, which relates to the native block sizes on S3. It's also helpful to have a number of files that is some multiple of the workers in the cluster, although that is unknown with Athena. Your data is smaller than that range, so 1 or perhaps 8 files at most would be appropriate.
Some references:
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/#OptimizeFileSizes
https://www.upsolver.com/blog/small-file-problem-hdfs-s3
Use Athena Projection to manage partitions automatically.
I wonder why unloading from a big table (>100 bln rows) when selecting by a column, which is NOT a sort key or a part of sort key, is immensely faster for newly added data. How Redshift understands that it is time to stop sequential scan in the second scenario?
Time the query spent executing. 39m 37.02s:
UNLOAD ('SELECT * FROM production.some_table WHERE daytime BETWEEN
\\'2017-01-15\\' AND \\'2017-01-16\\'') TO ...
vs.
Time the query spent executing. 23.01s :
UNLOAD ('SELECT * FROM production.some_table WHERE daytime BETWEEN
\\'2017-06-24\\' AND \\'2017-06-25\\'') TO ...
Thanks!
Amazon Redshift uses zone maps to identify the minimum and maximum value stored in each 1MB block on disk. Each block only stores data related to a single column (eg daytime).
If the SORTKEY is not set to daytime, then the data is unsorted and any particular date could appear in many different blocks. If SORTKEY is used, then a particular date will only appear in a minimum number of blocks.
Your second query possibly executes faster, even without a SORTKEY, because you are querying data that was probably added recently and is therefore all stored together in just a few blocks. The historical data might be spread in many blocks because a VACUUM probably reordered the data based upon the correct SORTKEY. In fact, if you did a VACUUM now, you might find that your second query becomes slower.
I have a stuck 'vacuum reindex' operation and am wondering what may be the cause for it taking such a long time.
I recently changed the schema of one of my Redshift tables, by creating a new table with the revised schema and deep copying the data using 'select into' (see Performing a Deep Copy). My basic understanding was that after deep copying the table, the data should be sorted according to the table's sort-keys. The table has an interleaved 4-column sort-key. Just to make sure, after deep copying I ran the 'interleaved skew' query (see Deciding When to Reindex), and the results were 1.0 for all columns, meaning no skew.
I then ran 'vacuum reindex' on the table, which should be really quick since the data is already sorted. However the vacuum is still running after 30 hours. During the vacuum I examined svv_vacuum_progress periodically to check the vacuum operation status. The 'sort' phase finished after ~6 hours but now the 'merge' phase is stuck in 'increment 23' for >12 hours.
What could be the cause for the long vacuum operation, given that the data is supposed to be already sorted by the deep copy operation? Am I to expect these times for future vacuum operations too? The table contains ~3.5 billion rows and its total size is ~200 GB.