Sitecore - delete bucket subitems - sitecore

I have items importer which deletes all item subitems and creates new subitems. Recently I switched it to buckets and now I have a problem with deleting.
I delete items using:
myItem.DeleteChildren();
Without bucket it took about 20min. Now it takes about 1h for 5k items. Do I need to revert bucket item before deleting and then synchronize again?
What is the quickest way to delete bucketable items?

My guess is, your deletion takes longer now because the bucket is updating indexes while deleting. While you could speed it up by disabling events around your .DeleteChildren call, you would still need to get those indexes updated for your bucket to function properly.
So to answer your question, there isn't a way to speed it up while still retaining full functionality.
If you want to test this in action, try the following:
using(new EventDisabler()) myItem.DeleteChildren();
It should bring the deletion speed up to where it was before, but at the price of a bucket that will not work properly, until indexes have been built.
I would recommend you adapt an integration approach, where complete deletion and rebuilding of your item store is not required.

You could take it a step further. I was able to import 30k in minutes by disabling all three.
using (new Sitecore.SecurityModel.SecurityDisabler())
{
using (new Sitecore.Data.Events.EventDisabler())
{
using (new ProxyDisabler())
{
//delete code here
}
}
}

If you have a lot of items in the bucket, it will execute events on each item that is being deleted. Put your delete code inside an event disabler:
using (new EventDisabler())
{
myItem.DeleteChildren();
}
That will stop all the events from firing and should be considerably quicker. As a caveat - the indexes will not be updated instantly when doing this, so you might want to run a index update on the master DB after your importer has run.
Another option would be to update existing items in the import rather than deleting all the items first.

Maybe (programatically) unbucket before calling DeleteChildren() ? I don't know how long that will take in total but might be quicker than it is now

Related

AWS console how to list s3 bucket content order by last modified date

I am writing files to a S3 bucket. How can I see the newly added files? E.g. in the below pic, you can see the files are not ordered by Last modified field. And I can't find a way to do any sort on that field or any other field.
You cannot sort on that, it is just how the UI works.
The main reason being that for buckets with 1000+ objects the UI only "knows" about the current 1000 elements displayed on the current page. And sorting them is meaningless because it would imply to show you the newest or oldest 1000 objects of the bucket but in fact it would just order the currently displayed 1000 objects. That would really confuse people and it is better to not let the user sort instead of sorting incorrectly.
Showing the actual 1000 newest or oldest objects requires you to list everything in the bucket, which takes time (minutes or hours for larger buckets) and backend requests and incurs more of a cost since List requests are billed. If you want to retrieve the 1000 newest or oldest you need to write code to do a full listing on the bucket or the prefix, then order all objects and then display parts of the result.
If you can sufficiently decrease the number of displayed objects with the "Find objects by prefix" field, the sort options become available and meaningful.

clear dynamo DB table without specifying any key

I want to truncate dynamodb table which can have up to 3 millions to 4 millions of records. what is the best way?
Right now I am using scan which does not give good performance(I have tried to delete only for few records: 3):
DynamoDB dynamoDB = new DynamoDB(amazonDynamoDBClient);
Table table = dynamoDB.getTable("table-test");
ItemCollection<ScanOutcome> resultItems = table.scan();
Iterator<Item> itemsItr = resultItems.iterator();
while(itemsItr.hasNext()){
Item item = itemsItr.next();
String itemPk = (String) item.get("PK");
String itemSk = (String) item.get("SK");
DeleteItemSpec deleteItemSpec = new DeleteItemSpec().withPrimaryKey("PK", itemPk, "SK", itemSk);
table.deleteItem(deleteItemSpec);
}
The best way is to delete your table, and create new one of the same name. This is how clearing all data from DynamoDB is usually performed.
As Marcin already answered, the best way is to delete your table and create a new one. It is certainly the cheapest way - because any other way would require scanning the entire table and paying for the read capacity units required to do it.
In some cases, however, you might want to delete old items while the table is still actively used. In that case you can use a Scan like you wanted, but can do it much more efficiently than you did: First, don't run individual DeleteItem requests sequentially, waiting for one delete to complete before asking for the next one... You can send batches of 25 deletes in one BatchWriteItem request. You can also send multiple BatchWriteItem requests in parallel. Finally, for even faster deletion, you can parallelize your Scan to multiple threads or even machines - see the parallel scan section of the DynamoDB documentation. Just don't forget that if you delete items while the table is still actively written to, you need a way to tell old items which you want to delete, from new items that you don't want to delete - as the scan may start producing these new items as well.
Finally, if you find yourself often clearing old data from a table - you should consider whether you can use DynamoDB's TTL feature, where DynamoDB automatically looks for expired items (based on an expiration-time attribute on each item) and deletes them - at no cost to you.

Undoing cascade deletions in a dense database

I have a fairly large production database system, based on a large hierarchy of nodes each with a 10+ associated models. If someone deletes a node fairly high in the tree, there can be thousands of models deleted and if that deletion was a mistake, restoring them can be very difficult. I'm looking for a way to give me an easy 'undo' option.
I've tried using Django-reversion, but it seems like in order to get the functionality I want (easily reverting a large cascade delete) it needs to store a bunch of information with each revision. When I created initial revisions, the process is less than 10% done and it's already using 8GB in my database, which is not going to work for me.
So, is there a standard solution for this problem? Or a way to customize Django-reversions to fit my use case?
What you're looking for is called a soft delete. Add a column named deleted with a value of false to the table. Now when you want to do a "delete" instead change the column deleted to true. Update all the code not to show the rows marked as deleted (or move the database table and replace it with a view that doesn't show them). Change all the unique constraints to have a filter WHERE deleted = false so you won't have a problem with not being able to add something similar to what user can't see in the system.
As for the cascades you have two options. Either do an ON UPDATE trigger that will update the child rows or add the deleted column to the FK and define it as ON UPDATE CASCADE.
You'll get the whole reverse functionality at a cost of one extra row (and not being able to delete stuff to save space unless you do it manually).

Sitecore Lucene Index queue lagging behind in Prod server

In our Sitecore (6.6) implementation we use Lucene indexing. In our PROD server, index bilding process is very slow. At the moment it has 5000+ entries to waiting in the index queue.
Queries I used (in master database),
select * from Properties (check the index last run time)
select * from History where created > 'last index updated time'
As a result of this delay, data gets created do not reflect their changes in the website. Also this queue keeps increasing. When the site takes offline, index building catch up after a while.
Its a heavy read intensive website.
We encountered CPU going high issues, but now they have been sorted. We thought index building was lagging because of the CPU high issue. But now the CPU is running around 30-40%. Still the lucene indexing queue increase rate is high.
How can I solve this issue? Please help.
You need to set up a database maintenance task, so that you regularly flush your History table. If you have sites that are index heavy, this table can grow excessively large. I think the default job cleans this table out with everything that is older than 30 days - you could set this much lower. Like 1 day, or a couple of days.
This article on SDN covers most of the standard maintenance tasks: http://sdn.sitecore.net/Articles/Administration/Database%20Maintenance.aspx
More general information about searching, indexing and performance here: http://sdn.sitecore.net/upload/sitecore6/65/sitecore_search_and_indexing_sc60-65-a4.pdf#search=%22clean%22
I think you need to take a step back and ask the question as to why there is such a large number of entries being added to the history table to begin with, before looking at what configuration changes to Sitecore can be made.
You should trace through your code in your development environment based on each of the use cases for your implementation, to find all calls to the Sitecore API where an item is:
Added into the Sitecore Tree
Edited - the changing of any fields item including security, presentation, workflow, publishing restrictions, etc.
Duplicated
Deleted from the Sitecore Tree
Moved to a new location.
Has a new version is added
Has a version removed
As you are going through, make sure that all edit actions to an item are performed with in a single Sitecore.Data.Items.Item.Editing.BeginEdit() and Sitecore.Data.Items.Item.Editing.EndEdit() call whenever possible, so that the changes are performed as a single edit action instead of multiple. Every time Sitecore.Data.Items.Item.Editing.EndEdit() is called, a new record will be inserted into the history table so unnecessary edits will only cause the history table size to increase.
If you are duplicating an item using the Sitecore.Data.Items.Item.CopyTo() method, remember that all versions of the item will be duplicated as well as the item's descendants. This means that the history table will have a record in it for every version of the item that was copied. If you only require the latest version and therefore removing older versions from the new item after it was created, again you should be aware that removing a version from an item will result in a record inserted into the history table for each version deleted.
If you have minimized all of the above actions to the bare minimum that is required to make the system functional, you should find that the Lucene Indexing will keep up-to-date pretty well without having to change Sitecore's default index configuration.

Added effect issue with spark list item redereres

I have a list that is mediated by a view mediator, so the data provider is managed by the said mediator (meaning it just calls viewComponent.list.dataProvider.addItemAt([object], 0) when new items are added to the list.
The list has a custom item renderer which has an addedEffect property (a basic fade in effect), which of course is supposed to play every time a new item is added to the list.
The issue is that the first time I add an item it works, but for any subsequent added items, it does not. Does anyone know the cause of this issue, or more preferably a fix?
Thank you in advance.
I'm not 100% sure, but my guess is that when useVirtualLayout is true, only a single item renderer is ever created. Multiple rows are accomplished by changing the itemRenderer's data, validating the component then taking a bitmap snapshot of it. Thus the item renderer is only ever added to the display list once, and the added event in turn is only ever fired once. Turning off useVirtualLayout forces the list to create new instances for each row in the list, so separate added events are dispatched.