I'm referring to :db/unique :db.unique/identity
(as opposed to :db.unique/value)
Upsert to me in my naivety sounds a bit scary, because if i try to insert a new record which has the same value (for a field declared unique) for an existing record, my feeling is that I want that to fail.
What I'd do if a field is declared unique is, when creating or updating another record, check whether that value is taken, and if it is give the user feedback that it's taken.
What am i missing about upsert and when/why is it useful?
A search revealed the following (not in relation to datomic)
"upsert helps avoid the creation of duplicate records and can save you time (if inserting a group) because you don't have to determine which records exist first".
What I dont' understand is that the datomic docs sometimes suggest using this, but I don't see why it's better in any way. Saving time at the cost of allowing collisions?
e.g., if I have a user registration system, I definitely do not want to "upsert" on the email the user submits. I can't think of a case when it would be useful - other than the one quoted above, if you had a large collection of things and you "didn't have time" to check whether they existed first.
Datomic property :db/unique can be used with :db.unique/identity or :db.unique/value
If it's used with :db.unique/identity, will upsert
If used with :db.unique/value, will conflict.
Related
The DynamoDB best practice documentation has this line:
You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table.
It's the last line that confuses me the most.
Take an example photo storage application. Does this mean that I should store user accounts (account ID, password, email) and photos (owner ID, photo location, metadata) in the same table?
If so I assume the primary key should be the account/owner ID, and the sort key would be the type of object it is (e.g. account or photo).
Should I be using one table like this instead of two tables (one for accounts, one for photos)?
It is generally recommended to use as few tables as possible, and very often a single table unless you have a really good reason to use more than one. Chances are you won't have a good reason to use more than one - except for old habits.
It seems counter-intuitive if you are coming from a traditional database background (like me), but it is in fact best practice.
The primary key could become a combination of the 'row'/object type and another value, stored in a single field, i.e. 'account#12345' for an account object with unique id of 12345 and 'photo#67890' for a photo object with your id of 67890 -
If you are looking up an account by your id number, you would query with the account prefix, and if you were looking for a photo, you would add the 'photo' prefix. this is a very simple example - your design may vary.
The video recommended in the first comment on your question is excellent - watch it at 0.75 speed or slower, and watch it a few times.
The short answer is yes. But the way it would be designed would be highly specific to how your application interacts with the database.
I highly recommend that anyone still confused with how to design DynamoDB/NoSQL tables watches this video from re:Invent.
In my DynamoDB table named users, I need a unique identifier, which is easy for users to remember.
In a RDBMS I can use auto increment id to meet the requirement.
As there is no way to have auto increment id in DynamoDB, is there a way to meet this requirement?
If I keep last used id in another table (lastIdTable) retrieve it before adding new document, increment that number and save updated numbers in both tables (lastIdTable and users), that will be very inefficient.
UPDATE
Please note that there's no way of using an existing attribute or getting users input for this purpose.
Since it seems you must create a memorable userId without any information about the user, I’d recommend that you create a random phrase of 2-4 simple words from a standard dictionary.
For example, you might generate the phrase correct horse battery staple. (I know this is a userId and not a password, but the memorability consideration still applies.)
Whether you use a random number (which has similar memorability to a sequential number) or a random phrase (which I think is much more memorable), you will need to do a conditional write with the condition that the ID does not already exist, and if it does exist, you should generate a new ID and try again.
email address seems the best choice...
Either as a partition key, or use a GUID as the partition key and have a Global Secondary Index over email address.
Or as Matthew suggested in a comment, let the users pick a user name.
Docker container naming strategy might give you some idea. https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go
It will result in unique (limited) yet human friendly
Examples
awesome_einstein
nasty_weinstein
perv_epstein
A similar one: https://github.com/jjmontesl/codenamize
I'm using oracle 12c database and want to test out one problem.
When carrying out web service request it returns underlying ORA-02292 error on constraint name (YYY.FK_L_TILSYNSOBJEKT_BEGRENSNING).
Here is SQL of the table with the constraint:
CONSTRAINT "FK_L_TILSYNSOBJEKT_BEGRENSNING" FOREIGN KEY ("BEGRENSNING")
REFERENCES "XXX"."BEGRENSNING" ("IDSTRING") DEFERRABLE INITIALLY DEFERRED ENABLE NOVALIDATE
The problem is, that when I try to delete the row manually with valid IDSTRING (in both tables) from parent table - it successfully does it.
What cause it to behave this way? Is there any other info I should give?
Not sure if it helps someone, since it was fairly stupid mistake, but i'll try to make it useful since people demand answers.
Keyword DEFERRABLE INITIALLY DEFERRED means that constraint is enforced at commit not at query run time as opposed to INITIALLY IMMEDIATE, which does the check right after you issue the query, however this keyword makes database bulk updates a bit slower (since every query in a transaction has to be checked by constraint, meanwhile in initial deference if it turns out there is an issue - whole bulk is rolled back, no additional unnecessary queries are issued and something can be done about it), hence used less often than initial deference.
Error ORA-02292 however is shown only for DELETE statements, knowing that its a bit easier to debug your statements.
Background
In distributed systems messages can arrive in an out of order fashion. For example if message A is sent at time T1 and message B is sent at T2 there is a chance that B is received before A. This matters for example if A is a message such as "CustomerRegistered" and B is "CustomerUnregistered".
In other databases I'd typically write a tombstone if CustomerUnregistered is received for a customer that is not present in the database. I can then check if this tombstone exists when the CustomerRegistered message is received (and perhaps simply ignore this message depending on use case). I could of course do something similar with Datomic as well but I hope that maybe Datomic can help me so that I don't need to do this.
One potential solution I'm thinking of is this:
Can you perhaps retract a non-existing customer entity (CustomerUnregistered) and later when CustomerRegistered is received the customer entity is written at a time in history before the retraction? It would be neat (I think) if the :db/txInstant could be set to a timestamp defined in the message.
Question
How would one deal with this scenario in Datomic in an idiomatic way?
As a general principle, do not let your application code manipulate :db/txInstant. :db/txInstant represents the time at which you learned a fact, not the time at which it happened.
Maybe you should consider un-registration as adding a Datom about a customer (e.g via an instant-typed :customer/unregistered attribute) instead of retracting the Datoms of that customer (which means: "forget that this customer existed").
However, if retracting the datoms of customer is really the way you want to do things, I'd use a record which prevents the customer registration transaction to take place (which I'd enforce via a transaction function).
I posted a similar question over on the Adobe Community forums, but it was suggested to ask over here as well.
I'm trying to cache distinct queries associated with a particular database, and need to be able to flush all of the queries for that database while leaving other cached queries intact. So I figured I'd take advantage of ColdFusion's ehcache capabilities. I created a specific cache region to use for queries from this particular database, so I can use cacheRemoveAll(myRegionName) to flush those stored queries.
Since I need each distinct query to be cached and retrievable easily, I figured I'd hash the query parameters into a unique string that I would use for the cache key for each query. Here's the approach I've tried so far:
Create a Struct containing key value pairs of the parameters (parameter name, parameter value).
Convert the Struct to a String using SerializeJSON().
Hash the String using Hash().
Does this approach make sense? I'm wondering how others have approached cache key generation. Also, is the "MD5" algorithm adequate for this purpose, and will it guarantee unique key generation, or do I need to use "SHA"?
UPDATE: use cacheRegion attribute introduced in CF10!
http://help.adobe.com/en_US/ColdFusion/10.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7fae.html
Then all you need to do is to specify cachedAfter or cachedWithin, and forget about how to to generate unique keys. CF will do it for you by 'hashing':
query "Name"
SQL statement
Datasource
Username and
password
DBTYPE
reference: http://www.coldfusionmuse.com/index.cfm/2010/9/19/safe.caching
I think this would be the easiest, unless you really need to fetch a specific query by a key, then u can feed your own hash using cacheID, another new attribute introduced in CF10.