I have a table ActivityLog to which new data is added in every second.
I am querying this table every 5 seconds using an Api in the following way.
logs = ActivityLog.objects.prefetch_related('activity').filter(login=login_obj,read_status=0)
Now let's say when I queried this table at time 13:20:05 I've got 5 objects in logs and after my querying 5 more rows were added to the table at 13:20:06. When I try to update only the queried logsdataset using logs.update(read_status=1) it also updates the newly added data in the table. That is instead of updating 5 objects it updates 10 objects. How can I update only the 5 objects that I've queried without looping through it.
Take a look at select_for_update. Just be aware that the rows will be locked in the meanwhile.
Related
Yesterday we've added a simple form to our site, and we just implemented an API View using Django which is connected to a PostgreSQL database.
Today I queried the database to see how many rows are submitted, and I encountered a strange thing in the results, We've created and migrated our model using Django ORM, so the primary key is defined as an auto-increment integer field, the problem is row ids are not continuous and they are so diverse, when I'm writing this question, the max id value is 252, but we have only 72 records in the table,
I've seen this before in other tables, but those tables were subjected to delete and update queries, but we only insert to this new table, and my question is: is our data deleted or it's a normal behavior in PostgreSQL?
I've searched in google and it seems that the only way is to check WAL logs, but we have not enabled that for our database yet, is there another way to check that the data is consistent or not?
Thanks.
Expect holes in a sequence
If you have multiple connections to a database that are adding rows, then you should expect to see holes in the sequence number results.
If Alice is adding a row, she may bump the sequence from 10 to 11 while not yet doing a COMMIT. Meanwhile, Bob adds a record, bumping the sequence to 12, and assigning 12 to his row, which he now commits. So the database has stored rows with ID field values of 10 and 12, but not 11.
If Alice commits, then 11 will appear in a query.
If Alice does a ROLLBACK, then 11 will never appear in a query.
I have a DynamoDB table that has a created date/time column that indicates when the record/item was inserted into the table. I have about 20 years worth of data in this table (records were migrated from a previous database), and I would now like to truncate anything older than 6 months old moving forward.
The obvious thing to do here would be to set a TTL on the table for 6 months, however my understanding is that AWS TTLs only go back a certain number of years (please correct me if you know otherwise!). So my understanding is that if I set a 6 month TTL on 20 years of data, I might delete record starting at 6 months old going back maybe 3 - 5 yearrs, but then there'd be a whole lot of really old data left over, unaffected by the TTL (again please correct me if you know otherwise!). So I guess I'm looking for:
The ability to do a manual, one-time deletion of data older than 6 months old; and
The ability to set a 6 month TTL moving forward
For the first one, I need to execute something like DELETE FROM mytable WHERE created > '2018-06-25', however I can't figure out how to do this from the AWS/DynamoDB management console, any ideas?
For the second part, when I go to Manage TTL in the DynamoDB console:
I'm not actually seeing where I would set the 6 month expiry. Is it the date/time fields at the very bottom of that dialog?! Seems strange to me...if that were the case then the TTL wouldn't be a scrolling 6 month window, it would just be a hardcoded point in time which I'd need to keep updating manually so that data is never more than 6 months old...
You are correct about how far back in time TTL goes, it's actually 5 years. The way it works is comparing your TTL attribute value with the current timestamp. If your item has a timestamp that is older than the current timestamp, it's scheduled for deletion in the next 48 hours (it's not immediate). So, if you use the timestamp of creation of the item, everything will be scheduled for deletion as soon as you insert, and that's not what you want.
The way you manage the 6-month expiry policy is in your application. When you create an item, set a TTL attribute to a timestamp 6 months ahead of the creation time and just leave it there. Dynamo will take care of deleting it in 6 months. For your "legacy" data, I can't see a way around querying and looping through each item and setting the TTL for each of them manually.
Deleting old records directly or updating their TTL so they can be deleted later by DynamoDB both require the same write capacity. You’ll need to scan / query and delete records one-by-one.
If you have, let’s say, 90% of old data, the most cost- and time-efficient way of deleting it is to move remaining 10% to a new table and delete the old one.
Another non-standard way I see is to choose an existent timestamp field you can sacrifice (for instance, audit field such as creation date), remove it from the new records and use as TTL to delete the old ones. It will allow you to do what you need cheaper and without switching to another table that may require multi-step changes in your application, but requires the field to (a) not being in use, (b) be in the past and (c) be a UNIX timestamp. If you don’t want to delete it permanently, you may copy it to another attribute and copy back after all old records have been deleted and TTL on that field is turned off (or switched to another attribute). It will not work for records having timestamp before 5 years ago.
I am new to nosql / DynamoDB.
I have a list of ~10 000 container-items records, which is updated every 6 hours:
[
{ containerId: '1a3z5', items: ['B2a3, Z324, D339, M413'] },
{ containerId: '42as1', items: ['YY23, K132'] },
...
]
(primary key = containerId)
Is it viable to just delete the table, and recreate with new values?
Or should I loop through every item of the new list, and conditionally update/write/delete the current DynamoDB records (using batchwrite)?
For this scenario batch update is better approach. You have 2 cases:
If you need to update only certain records than batch update is more efficient. You can scan the whole table and iterate thought the records and only update certain records.
If you need to update all the records every 6 hours batch update will be more efficient, because if you drop the table and recreate table, that also means you have to recreate indexes and this is not a very fast process. And after you recreate table you still have to do the inserts and in the meantime you have to keep all the records in another database or in-memory.
One scenario where deleting the whole table is a good approach if you need to delete all the data from the table with thousands or more records, than its much faster to recreate table, than delete all the records though API.
And one more suggestion have you considered alternatives, because your problem does not look like a good use-case for DynamoDB. For example MongoDB and Cassandra support update by query out of the box.
If the update touches some but not all existing items and if partial update of 'items' is possible then you have no choice but to do a per record operation. And this would be true even with a more capable database.
You can perhaps speed it up by retrieving only the existing containerIds first so based on that set you know which to do update versus insert on. Alternately you can do a batch retrieve by ids using the ids from the set of updates and which every ones do not return a result are the ones you have to insert and the ones where you do are the ones to update.
I'm using Django 1.6 and postgres, would a bulk_create on a specific table lock the entire table? (in my case I'm bulk creating 10,000 rows and it takes ~10 seconds) I've tested this while creating objects every half second while the bulk create was happening and none of those individual creates hung but I'd just like to make sure. Thanks!
bulk_create inserts the provided list of objects into the database in an efficient manner (generally only 1 query, no matter how many objects there are), so it blocks the table to perform atomic transaction.
usage: bulk_create(obj_list, batch_size=None)
The batch_size parameter controls how many objects are created in single query. The default is to create all objects in one batch, except for SQLite where the default is such that at most 999 variables per query are used.
The following article can also give you an idea how fast is bulk_create relativly to other methods.
i have some 200+ tables in my dynamodb. Since all my tables have localSecondaryIndexes defined, i have to ensure that no table is in the CREATING status, at the time of my CreateTable() call.
While adding a new table, i list all tables and iterate through their names, firing describeTable() calls one by one. On the returned data, i check for TableStatus key. Each describeTable() call takes a second. This implies an average of 3 minute waiting time before creation of each table. So if i have to create 50 new tables, it takes me around 4 hours.
How do i go about optimizing this? i think that a BatchGetItem() call works on stuff inside the table and not table-metadata. Can i do a bulk describeTable() call?
It is enough that you wait until the last table you created becomes ACTIVE. Run DescribeTable on that last created table with a few seconds interval.