I know that this can be done with a full table scan & inspecting all records for the presence of attributes. Is there a less painful way ?
No, there isn't. This is one of the trade-offs of DynamoDB.
If there was a way to do this, then storing a new item with a new attribute would have to update something else, somewhere else, that remembered all of the attributes that were present in the table.
Related
I am a little confused about which is better for soft delete.
There are two ways for Soft Delete.
create table for deleted records.(In this way we will make copy
for the records in the table of deleted records, then delete it from its table)
create extra column called deleted,(In this way we will only change the status of this field to true , then at display records we will filter according to this extra field)
Also, I want to store the changes of the records after every update, So I think creating extra table is more suitable. What is your opinion?
I agree with #web-engineer, adding a nullable column with the datetime of when the row has been soft-deleted is the best. I used this ressource to do this.
And to answer the second part of your question, yes an extra table will be needed. There is a third party app named django-simple-history which handles it for you.
Best option is the second one, in your first example it's not a soft delete if your deleting it from the table - soft should be to modify the data in a minimal way. Leaving the row in place is the purpose of a soft-delete, this has the minimal effect on the data and will retain all attributes such as primary key index value and any internals you cant see that the database might use.
Your first option is far less succinct as it means duplicating data structures. A common approach is to add a "deleted_at" column (default to NULL), this positively identifies the record state.
I have two small dynamodb tables with about 10 attributes, I want to add "CreatedDate" and "ModifiedDate" attributes to these. I am trying to decide what would be the best practice to do it with the lowest cost and highest performance, reusability.
First, I was thinking to create a trigger and add these attributes when there is an update or create operation in the table. I like this way because it will be centralized. However, I am not sure if this is the cheapest way to do it, because, after a new item written in the table, this trigger will do another write operation to insert the dates.
Second, just sending these values in the "PUT" request as new attributes. That way, I will have to do only one write operation. The downside of doing this, I will need to update each function writes an item to these tables.
Which way I should go in that case? Are there any better ways to do it or anything I am missing?
I have next table structure:
ID string `dynamodbav:"id,omitempty"`
Type string `dynamodbav:"type,omitempty"`
Value string `dynamodbav:"value,omitempty"`
Token string `dynamodbav:"token,omitempty"`
Status int `dynamodbav:"status,omitempty"`
ActionID string `dynamodbav:"action_id,omitempty"`
CreatedAt time.Time `dynamodbav:"created_at,omitempty"`
UpdatedAt time.Time `dynamodbav:"updated_at,omitempty"`
ValidationToken string `dynamodbav:"validation_token,omitempty"`
and I have 2 Global Secondary Indexes for Value(ValueIndex) filed and Token(TokenIndex) field. Later somewhere in the internal logic I perform the Update of this entity and immediate read of this entity by one of this indexes(ValueIndex or TokenIndex) and I see the expected problem that data is not ready(I mean not yet updated). I can't use ConsistentRead for this cases, because this is Global Secondary Index and it doesn't support this options. As a result I can't run my load tests over this logic, because data is not ready when tests go in 10-20-30 threads. So my question - is it possible to solve this problem somewhere? or should I reorganize my table and split it to 2-3 different tables and move filed like Value, Token to HASH key or SORT key?
GSIs are updated asynchronously from the table they are indexing. The updates to a GSI typically occur in well under a second. So, if you're after immediate read of a GSI after insert / update / delete, then there is the potential to get stale data. This is how GSIs work - nothing you can do about that. However, you need to be really mindful of three things:
Make sure you keep your GSI lean - that is, only project the absolute minimum attributes that you need. Less data to write will make it quicker.
Ensure that your GSIs have the correct provisioned throughput. If it doesn't, it may not be able to keep up with activity in the table and therefore you'll get long delays in the GSI being kept in sync.
If an update causes the keys in the GSI to be updated, you'll need 2 units of throughput provisioned per update. In essence, DynamoDB will delete the item then insert a new item with the keys updated. So, even though your table has 100 provisioned writes, if every single write causes an update to your GSI key, you'll need to provision 200 write units.
Once you've tuned your DynamoDB setup and you still absolutely cannot handle the brief delay in GSIs, you'll probably need to use different technology. For example, even if you decided to split your table into multiple tables, it'll have the same (if not worse) impact. You'll update one table, then try to read the data from another table and you haven't yet inserted the values into a different table.
I suspect that once you tune DynamoDB for your situation, you'll get pretty damn close you what you want.
User has an email address and a display name.
Both of these must be unique.
Both of these must be updatable as long as either is not being used already.
A User table will exist with additional non-key attributes and a guid ID.
How to model to support efficient query check if email address or display name is already being used?
Should I create a table with the guid as Key, no range, and 2 separate GSI one for email and one for display name (each being the key)? Both will also have a second field with the guid id of the user. Or should these be completely separate tables, or ????
Thoughts, is there a better way?
Thanks.
There are 3 ways you can design that I can think of:
As you have mentioned, a table with guid and 2 separate GSI one for email and other for Name.
You have stated that both the fields had to be unique, so potentially you can make any one of them as hash and create GSI for other.(This will run into problem as you mention that you need to update Email & Name as well, for that you have to delete old record and add a new record with same attributes and updated Hash keys)
Advantage of this would be that you need to pay less as there will be only one GSI compared to #1.
Another option is to use CloudSearch, your DynamoDB table can be integrated with cloudSearch, in this option you can simply create a table with guid no need to add any GSI, whenever you want to search you can search on CloudSearch to get the output.
One more advantage you will get in CloudSearch is that you will be able to query on any attributes of the table and can use different filters on them.
One thing you need to see it that price difference between #2 and #3, you can go with anyone which is better suited in terms of price and functionality.
If you implement this with other ways feel free to share it.
Hope that helps
Is there an approach for updating all items in an attribute(column) ?.
I'm updating the values one by one using a for loop, but it takes a while. I can easily update a whole row in my table by benefiting from DynamoDB mapper, but cannot find a similar functionality for an attribute.
No, the only way is to do a scan over the hash space and update each item.