I need to create a way to have a email and username be unique acreoss all partions of a table (or tables if needed). I can't seem to find the way other then making only 1 unique (the primary key) and then the other being unique in the partion only.
I want to have an email address check so that each user has both a UNIQUE email AND a UNIQUE username.
So the database CANNOT have:
email username
a#a.com aa
b#b.com aa
OR:
email username
a#a.com a
a#a.com b
I need both to be independently unique across the entire system/database.
How is this done? I am using Lambda and DynamoDB.
And I also NEED to know independently which one is NOT UNIQUE too.
My understanding is that you want to ensure that user_names are unique AND email_addresses are unique and that a user_name maps to 1 and only 1 email_address and an email_address maps to 1 and only user_name.
One way to do this would be to use two DynamoDB tables. The first (table A) would use the user_name as the HASH and the record associated with it would contain all information about that user. The second table (table B) would use email_address as the HASH and would contain a single additional attribute, the user_name.
When creating a new user, you could do a conditional put on table A with a condition of attribute_not_exists(user_name) If this fails, the user_name already exists and so the new record would not be created. If it succeeds, the user_name was unique. You could then do a conditional put to table B with a condition of attribute_not_exists(email_address). If this fails, the email_address is already in use and you would either have to delete the record from table A or otherwise resolve the email address conflict with the user. If the conditional PUT succeeds then you know that the email_address is unique and you have successfully created a new, unique user record.
This is a bit more complicated but it does allow you to rely on DynamoDB to guarantee uniqueness and consistency rather than try to achieve that at the application level.
dynamodb uniqueness is on hash_key (or composite key: hash+range) only.
i think that the best option in this case is to ensure uniqueness on application level (add GSI on username and try to query for the new username). on email it will be easy to check uniqueness since it table hash key..
Related
Given the following models:
class Customer(models.Model):
pass
class User(models.Model):
email = models.EmailFIeld(blank=True, default="")
customer = models.ForeignKey(Customer, ...)
I want to enforce the following:
IF user has email
IF user has no customer
email must be globally unique (i.e. unique in the entire unfiltered users table)
IF user has customer
email must be unique within the user's customer
I attempted to implement this with two UniqueConstraints:
UniqueConstraint(
name="customer_scoped_unique_email",
fields=["customer", "email"],
condition=(
Q(customer__isnull=False)
& ~Q(email=None)
),
),
UniqueConstraint(
name="unscoped_unique_email",
fields=["email"],
condition=(
Q(customer=None)
& ~Q(email=None)
),
),
Testing has revealed that this still allows a user without a customer to be created with an email identical to an existing user (with a customer). My understanding is that this is because UniqueConstraint.condition determines both when the unique constraint should be triggered and what other records are included in the uniqueness check.
Is there any way to achieve my desired logic in the database, ideally in a Django ORM-supported way, and ideally with a UniqueConstraint or CheckConstraint? This must occur in the database. It's obviously possible in Python, but I want the extra reliability of a database constraint.
Is there any way to achieve my desired logic in the database ...
Yes, you can use triggers (see SQL in the Triggers section below).
... ideally in a Django ORM-supported way ...
Not within Django ORM, but for PostgreSQL, you could use django-pgtrigger to define it on models.
... and ideally with a UniqueConstraint or CheckConstraint?
This is not supported at the database level, since partial indexes only contain the rows based on WHERE.
Partial indexes
UniqueConstraint.condition has the same database restrictions as Index.condition.
PostgreSQL: https://www.postgresql.org/docs/8.0/indexes-partial.html
A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate.
SQLite: https://www.sqlite.org/partialindex.html
A partial index is an index over a subset of the rows of a table.
... In partial indexes, only some subset of the rows in the table have corresponding index entries.
Only rows of the table for which the WHERE clause evaluates to true are included in the index.
Triggers
Before insert or update on user table, check unscoped unique email.
PostgreSQL trigger:
CREATE OR REPLACE FUNCTION unscoped_unique_email() RETURNS TRIGGER AS $unscoped_unique_email$
DECLARE
is_used_email bool;
BEGIN
IF NEW.email IS NOT NULL AND NEW.customer_id IS NULL THEN
SELECT TRUE INTO is_used_email FROM user WHERE email = NEW.email AND (id != NEW.id OR NEW.id IS NULL);
IF is_used_email IS NOT NULL THEN
RAISE EXCEPTION 'duplicate key value violates unique constraint "unscoped_unique_email"'
USING DETAIL = format('Key (email)=(%s) already exists.', NEW.email);
END IF;
END IF;
RETURN NEW;
END;
$unscoped_unique_email$ LANGUAGE plpgsql;
CREATE TRIGGER unscoped_unique_email BEFORE INSERT OR UPDATE ON user
FOR EACH ROW EXECUTE PROCEDURE unscoped_unique_email();
I am having a table user.
user_id -> unique, partiton key
user_city -> primary sort key
Would the query perform a full scan or would it benefit from sort key?
Also, what would be results if i used gsi on user_city?
pseudocode: fetch all user_id that have user_city="abc"
If your partition key is unique, you don't need a sort key nor does having one provide any benefit. In fact, having one is a bad idea because now your user_id doesn't have to be unique. Also you'd have to use Query() to return a users information with just the user_id. GetItem() would need user_id and user_city
Simply define the table with user_id as the primary key.
Then create a GSI with a partition key of user_city.
You don't even need a sort key on the GSI unless you want the data returned in a particular order. Perhaps user_id or perhaps user_name.
have the following fields in RDS
id auto increment
creationDate datetime
status string
reason string
description string
customerID string
there are multiple records for the customerID + creationDate, so I am not able to use creationDate as sort key. The status field combo with customerID won't work as customer can have the duplicate record for same status. If I use the id field I can't auto increment as dynamoDB doesn't allow that? What are my options here? How should my ddb table look like?
The key to DynamoDB is knowing your access patterns. You haven't stated how you plan to query the data, so I can't advise on the overall design, but here is what you can do in order to have a unique primary key.
Do you really need auto-incrementing IDs? If not, consider using a UUID for all new data. Then you could use the ID field as the partition key; you could also use customerId as the partition key and id as the sort key.
If you must have auto-incrementing IDs, then you should store your creationDate in DynamoDB as an ISO 8061 string. Then, you can append a random UUID to the end of creationDate to avoid key collisions. This will allow you to use customerId and creationDate as the primary keys, and you are still able to query using the date (but instead of checking for equality, you should use the begins_with function).
Finally, you can introduce a new field specifically to ensure uniqueness. You could call it simply rangeKey, and it would be a randomly generated UUID that you could use with any other field as the partition key. You can still have your sequential ID field (and create a GSI for querying it, if you want).
I've presented 3 solutions, but they are really all the sameāfind a way to add a UUID to your primary key.
Say if I had a DynamoDB table:
UserId (hash): S
BookName (range): S
BorrowedTime: S
ReturnedTime: S
UserId is the primary key (hash), and I needed to set BookName as sort key (range) because another item being added to the database was overwriting the previous with the same UserId.
How would I go about creating a query using DynamoDBMapper, but the fields being queried are the time fields (which are non-keys)? For instance, say if I wanted to return the UserId and BookName of any book borrowed over 2 weeks ago that hasn't been returned yet?
Do I need to setup a GSI on both BorrowedTime and ReturnedTime fields?
Yes you can make a GSI using BorrowedTime and ReturnedTime or you can use scan instead of a query , if you use scan you dont need to make a gsi but scan operations scan the whole database so it is not really recommended on large db or frequent use.
Our schema has a USER table...
USER(
userId,
firstname,
lastname,
email)
and we want to ensure all user's have unique email addresses. Is it possible to create a unique index in VoltDB to enforce this constraint?
VoltDB supports primary key indexes (which are always unique) as well as secondary indexes that can be defined as unique.
For your particular table you have two choices to enforce uniqueness on the email column:
Define the USER table as replicated.
Partition the USER table on the email column.
If you create a unique index on email and partition the table on userId then the uniqueness enforcement of the email column will be within individual partitions.
VoltDB provides implicit indexes for primary keys. For example if you assign userID as a primary key then userID will be unique (because of VoltDB's implicitly index assignment on primary key) but to make email column as unique you have to explicitly assign constraint 'UNIQUE' on email column.
Similarly suppose you are partitioning a table and partitioning is done on column userID then to enforce email to be unique in every partition, you should explicitly assign 'ASSUMEUNIQUE' constraint on email column.