Email as the primary key for customer orders in DynamoDB?

Email as the primary key for customer orders in DynamoDB? - amazon-web-services

I have a order I want to store in DynamoDB, with the following fields:
Order date: 2019-03-27 02:09pm
First Name: John
Last Name: Doe
Email: john#example.com
Phone: 555-11434
Address: 13 Lorong K Changi, Sunny Shores
City: Singapore
Zip: 654321
Country: Singapore
Status: new, confirmed, delivered
(There is no unique order identifier decreed)
At first I combined First&Last name "John Doe" as the partition key and put order date as the sort key. That worked quite well until:
I figured I can't query the partition key (name of customer). I want to be able to look up customer orders, by customer!
Secondly URLs addressing the order would look like: https://example.com/2019-03-27/John%20Doe... i.e. the space does cause some confusion. Is there a more efficient way to encode the name?
I am most keen on email address, but from researching that, it seems like email is a bad field to use.
The access patterns are pretty simple. Need a way to:
Look up an order
Search by customer (could be name, could be email)
Query by order status
I tried making a composite key with order status and order date, but that has not gone well: Replace an old item with a new item in DynamoDB

Most people in this scenario generate a UUID for the user, and make that the partition key.
If you use an email address as the partition key, it means your user cannot ever change their email address, at least not without some creative coding on your part.
It might be valid to use an email address in your case, for example if a user can never change email address. In that case you should just be able to URL encode the email address on your client. However if you want to avoid that altogether, you could accept the parameter in a Base64 encoded format, and decode it before use with DynamoDB.
If you decide to generate UUIDs and make these your partition keys, you would probably then create GSIs with partition keys of email address and order state. You can use these GSIs to access your data quickly with your specified access patterns.

Related

How to structure DynamoDB index to allow retrieval by multiple fields

I'm new to DynamoDB and trying to figure out how to structure my data/table/index. My schema includes an itemid (unique) and an orderid (multiple items per order), along with some other arbitrary attributes. I want to be able to retrieve a single item by its itemid, but also retrieve a set of items by their OrderId.
My initial instinct was to set the itemid as the primary key and the orderid as the sort key, but that didn't allow me to query by orderid only. However the same problem occurs if I reverse those.
Example data:
ItemId
OrderId
abc-123
1234
def-345
1234
ghi-678
5678
jkl-901
5678
I think I may need a Global Se but not quite understanding where those fit.

If your question is really whether you "are able" to do this, then with ItemId as the partition key, you can still retrieve by OrderId, with the Scan operation, which will let you filter by any attribute.
However Scan will perform full table scans, so the real question is probably whether you can retrieve by OrderId efficiently. In that case, you would indeed need a Global Secondary Index with OrderId and ItemId as the composite attribute key.

This is typically achieved using what's called a "single table design". What this means, is that you store all your data in one table, and store it normalized, i.e. duplicate your data so that it fits your access patterns.
Generally speaking, if you do not know your access patterns beforehand, dynamodb might not be a good fit. For many systems, a good solution is to have the "main" access patterns in dynamo and then offloading some not so performance critical ad-hoc queries by replicating data to something like elasticsearh.
If you have a table with the hash key PK (String) and the sort key SK (String), you can store your data like this. Use transactions to keep the multiple items up to date and consistent etc.
PK
SK
shippingStatus
totalPrice
cartQuantity
order_1234
order_status
PENDING
123123
order_1234
item_abc-123
3
order_1234
item_def-345
1
order_5678
order_status
SHIPPED
54321
order_5678
item_jkl-901
5
item_abc-123
order_1234
item_abc-123
order_9876
item_abc-123
order_5656
This table illustrates the schemaless nature of a dynamo table (except from the PK/SK). With this setup, you can store "metadata" about the order in the order_1234/order_status item. Then, you can query for items with PK order_1234 and SK starts_with "item_" to get all the items for that order. You can do the same to get all the orders for an item - query for PK item_abc-123 and SK starting with "order_" to get all the orders.
I highly recommend this talk by Rick Houlihan to get into single table design and data modelling in dynamo :)
https://www.youtube.com/watch?v=HaEPXoXVf2k

moving from rds to dynamoDB

have the following fields in RDS
id auto increment
creationDate datetime
status string
reason string
description string
customerID string
there are multiple records for the customerID + creationDate, so I am not able to use creationDate as sort key. The status field combo with customerID won't work as customer can have the duplicate record for same status. If I use the id field I can't auto increment as dynamoDB doesn't allow that? What are my options here? How should my ddb table look like?

The key to DynamoDB is knowing your access patterns. You haven't stated how you plan to query the data, so I can't advise on the overall design, but here is what you can do in order to have a unique primary key.
Do you really need auto-incrementing IDs? If not, consider using a UUID for all new data. Then you could use the ID field as the partition key; you could also use customerId as the partition key and id as the sort key.
If you must have auto-incrementing IDs, then you should store your creationDate in DynamoDB as an ISO 8061 string. Then, you can append a random UUID to the end of creationDate to avoid key collisions. This will allow you to use customerId and creationDate as the primary keys, and you are still able to query using the date (but instead of checking for equality, you should use the begins_with function).
Finally, you can introduce a new field specifically to ensure uniqueness. You could call it simply rangeKey, and it would be a randomly generated UUID that you could use with any other field as the partition key. You can still have your sequential ID field (and create a GSI for querying it, if you want).
I've presented 3 solutions, but they are really all the same—find a way to add a UUID to your primary key.

How can I select records n-at-a-time for multiple users to edit?

I am using a Django backend with postgresql.
Let's say I have a database with a table called Employees with about 20,000 records.
I need to allow multiple users to edit and verify the Area Code field for every record in Employees.
I'd prefer to allow a user to view the records, say, 30 at a time (to reduce burnout).
How can I select 30 records at a time from Employees to send to the front end UI for editing, without letting multiple users edit the same records, or re-selecting a record that has already been verified?
I don't need comments on the content of the database (these are example table and field names).

One way to do this would be to add 2 more fields to your table, say for example assigned_to and verified. You can update assigned_to, which can be a foreign key to the verifying user, when you allow the user to view that Employee. This will create a record preventing the Employee from being chosen twice. assigned_to can also double as a record of who verified this Employee for future reference.
verified could be simply a Boolean field which keeps track if the Employee has already been verified and can be updated when the user confirms the verification
The actual selects can be done like this:
employees = Employee.objects.filter(assigned_to=None, verified=False)[:30]
Then
for emp in employees:
emp.assigned_to = user
emp.save()
Note: This can still potentially cause a race condition if 2 users make this request at exactly the same time. To avoid this, another possibility could be to partition the employee tables into groups for each user with no overlap. This would ensure that no 2 users would ever have the same employees

Creating and verifying that 2 attirbutes are unique in DynamoDB

I need to create a way to have a email and username be unique acreoss all partions of a table (or tables if needed). I can't seem to find the way other then making only 1 unique (the primary key) and then the other being unique in the partion only.
I want to have an email address check so that each user has both a UNIQUE email AND a UNIQUE username.
So the database CANNOT have:
email username
a#a.com aa
b#b.com aa
OR:
email username
a#a.com a
a#a.com b
I need both to be independently unique across the entire system/database.
How is this done? I am using Lambda and DynamoDB.
And I also NEED to know independently which one is NOT UNIQUE too.

My understanding is that you want to ensure that user_names are unique AND email_addresses are unique and that a user_name maps to 1 and only 1 email_address and an email_address maps to 1 and only user_name.
One way to do this would be to use two DynamoDB tables. The first (table A) would use the user_name as the HASH and the record associated with it would contain all information about that user. The second table (table B) would use email_address as the HASH and would contain a single additional attribute, the user_name.
When creating a new user, you could do a conditional put on table A with a condition of attribute_not_exists(user_name) If this fails, the user_name already exists and so the new record would not be created. If it succeeds, the user_name was unique. You could then do a conditional put to table B with a condition of attribute_not_exists(email_address). If this fails, the email_address is already in use and you would either have to delete the record from table A or otherwise resolve the email address conflict with the user. If the conditional PUT succeeds then you know that the email_address is unique and you have successfully created a new, unique user record.
This is a bit more complicated but it does allow you to rely on DynamoDB to guarantee uniqueness and consistency rather than try to achieve that at the application level.

dynamodb uniqueness is on hash_key (or composite key: hash+range) only.
i think that the best option in this case is to ensure uniqueness on application level (add GSI on username and try to query for the new username). on email it will be easy to check uniqueness since it table hash key..

Grouping Custom Attributes in a Query

I have an application that allows for "contacts" to be made completely customized. My method of doing that is letting the administrator setup all of the fields allowed for the contact. My database is as follows:
Contacts
id
active
lastactive
created_on
Fields
id
label
FieldValues
id
fieldid
contactid
response
So the contact table only tells whether they are active and their identifier; the fields tables only holds the label of the field and identifier, and the fieldvalues table is what actually holds the data for contacts (name, address, etc.)
So this setup has worked just fine for me up until now. The client would like to be able to pull a cumulative report, but say state of all the contacts in a certain city. Effectively the data would have to look like the following
California (from fields table)
Costa Mesa - (from fields table) 5 - (counted in fieldvalues table)
Newport 2
Connecticut
Wallingford 2
Clinton 2
Berlin 5
The state field might be id 6 and the city field might be id 4. I don't know if I have just been looking at this code way to long to figure it out or what,
The SQL to create those three tables can be found at https://s3.amazonaws.com/davejlong/Contact.sql

You've got an Entity Attribute Value (EAV) model. Use the field and fieldvalue tables for searching only - the WHERE caluse. Then make life easier by keeping the full entity's data in a CLOB off the main table (e.g. Contacts.data) in a serialized format (WDDX is good for this). Read the data column out, deserialize, and work with on the server side. This is much easier than the myriad of joins you'd need to do otherwise to reproduce the fully hydrated entity from an EAV setup.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js