I posted a similar question over on the Adobe Community forums, but it was suggested to ask over here as well.
I'm trying to cache distinct queries associated with a particular database, and need to be able to flush all of the queries for that database while leaving other cached queries intact. So I figured I'd take advantage of ColdFusion's ehcache capabilities. I created a specific cache region to use for queries from this particular database, so I can use cacheRemoveAll(myRegionName) to flush those stored queries.
Since I need each distinct query to be cached and retrievable easily, I figured I'd hash the query parameters into a unique string that I would use for the cache key for each query. Here's the approach I've tried so far:
Create a Struct containing key value pairs of the parameters (parameter name, parameter value).
Convert the Struct to a String using SerializeJSON().
Hash the String using Hash().
Does this approach make sense? I'm wondering how others have approached cache key generation. Also, is the "MD5" algorithm adequate for this purpose, and will it guarantee unique key generation, or do I need to use "SHA"?
UPDATE: use cacheRegion attribute introduced in CF10!
http://help.adobe.com/en_US/ColdFusion/10.0/CFMLRef/WSc3ff6d0ea77859461172e0811cbec22c24-7fae.html
Then all you need to do is to specify cachedAfter or cachedWithin, and forget about how to to generate unique keys. CF will do it for you by 'hashing':
query "Name"
SQL statement
Datasource
Username and
password
DBTYPE
reference: http://www.coldfusionmuse.com/index.cfm/2010/9/19/safe.caching
I think this would be the easiest, unless you really need to fetch a specific query by a key, then u can feed your own hash using cacheID, another new attribute introduced in CF10.
Related
Our team has started to use AWS and one of our projects will require storing approval statuses of various recommendations in a table.
There are various things that identify a single recommendation, let's say they're : State, ApplicationDate, LocationID, and Phase. And then a bunch of attributes corresponding to the recommendation (title, volume, etc. etc.)
The use case will often require grabbing all entries for a given State and ApplicationDate (and then we will look at all the LocationId and Phase items that correspond to it) for review from a UI. Items are added to the table one at a time for a given Station, ApplicationDate, LocationId, Phase and updated frequently.
A dev with a little more AWS experience mentioned we should probably use State+ApplicationDate as the partition key, and LocationId+Phase as the sort key. These two pieces combined would make the primary key. I generally understand this, but how does that work if we start getting multiple recommendations for the same primary key? I figure we either are ok with just overwriting what was previously there, OR we have to add some other attribute so we can write a recommendation for the State+ApplicationDate/LocationId+Phase multiple times and get all previous values if we need to... but that would require adding something to the primary key right? Would that be like adding some kind of unique value to the sort key? Or for example, if we need to do status and want to record different values at different statuses, would we just need to add status to the sort key?
Does this sound like a reasonable approach or should I be exploring a different NAWS offering for storing this data?
Use a time-based id property, such as a ULID or KSID. This will provide randomness to avoid overwriting data, but also provide a time-based sorting of your data when used as part of a sort key
Because the id value is random, you will want to add it to your sort key for the table or index where you perform your list operations, and reserve the pk for known values that can be specified exactly.
It sounds like the 'State' is a value that can change. You can't update an item's key attributes on the table, so it is more common to use these attributes in a key for a GSI if they are needed to list data.
Given the above, an alternative design is to use the LocationId as the pk, the random id value as the sk, and a GSI with the GSI with 'State' as the pk and the random id as the sk. Or, if you want to list the items by State -> Phase -> date, the GSI sk could be a concatenation of the Phase and id property. The above pattern gives you another list mechanism using the LocationId + timestamp of the recommendation create time.
In my DynamoDB table named users, I need a unique identifier, which is easy for users to remember.
In a RDBMS I can use auto increment id to meet the requirement.
As there is no way to have auto increment id in DynamoDB, is there a way to meet this requirement?
If I keep last used id in another table (lastIdTable) retrieve it before adding new document, increment that number and save updated numbers in both tables (lastIdTable and users), that will be very inefficient.
UPDATE
Please note that there's no way of using an existing attribute or getting users input for this purpose.
Since it seems you must create a memorable userId without any information about the user, I’d recommend that you create a random phrase of 2-4 simple words from a standard dictionary.
For example, you might generate the phrase correct horse battery staple. (I know this is a userId and not a password, but the memorability consideration still applies.)
Whether you use a random number (which has similar memorability to a sequential number) or a random phrase (which I think is much more memorable), you will need to do a conditional write with the condition that the ID does not already exist, and if it does exist, you should generate a new ID and try again.
email address seems the best choice...
Either as a partition key, or use a GUID as the partition key and have a Global Secondary Index over email address.
Or as Matthew suggested in a comment, let the users pick a user name.
Docker container naming strategy might give you some idea. https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go
It will result in unique (limited) yet human friendly
Examples
awesome_einstein
nasty_weinstein
perv_epstein
A similar one: https://github.com/jjmontesl/codenamize
I'm currently facing some questions regarding my database design. Currently i'm developing an api which lets users do the following:
Create an Account ( 1 User owns 1 Account)
Create a Profile ( 1 Account owns 1-n Profiles)
Let a profile upload 2 types of items ( 1 Profile owns 0-n Items ; the items differ in type and purpose)
Calling the API methods triggers AWS Lambda to perform the requested operations in the DynamoDB tables.
My current plan looks like this:
It should be possible to query items by specifying a time frame and the Profile ID. But i think my design completely defeats the purpose of DynamoDB. AWS documentation says that a well designed product only requires one table.
What would be a good way to realise this architecture in one table?
Are there any drawbacks on using the current design?
What would you specify as Primary/Partition/sort key/secondary indexes in both the current design and a one-table-approach?
I’m going to give this answer assuming that you need to be able to do the following queries.
Given an Account, find all profiles
Given a Profile, find all Items
Given a Profile and a specific ItemType, find all Items
Given an Item, find the owning Profile
Given a Profile, find the owning account
One of the beauties of DynamoDB (and also a bane, perhaps) is that it is mostly schema-less. You need to have the mandatory Primary Key attributes for every item in the table, but all of the other attributes can be anything you like. In order to have a DynamoDB design with only one table, you usually need to get used to the idea of having mixed types of objects in the same table.
That being said, here’s a possible schema for your use case. My suggestion assumes that you are using something like UUIDs for your identifiers.
The partition key is a field that is simply called pkey (or whatever you want). We’ll also call the sort key skey (but again, it doesn’t really matter). Now, for an Account, the value of pkey is Account-{{uuid}} and the value of skey would be the same. For a Profile, the pkey value is also Account-{{uuid}}, but the skey value is Profile-{{uuid}}. Finally, for an Item, the pkey is Profile-{{uuid}} and the skey is Item-{{type}}-{{uuid}}. For all of the attributes of an item, don’t worry about it, just use whatever attributes you want to use.
Since the “parent” object is always the partition key, you can get any of the “child” objects simply by querying for the ID of the of the parent. For example, your key condition expression to get all the ‘ItemType2’s for a Profile would be
pkey = “Profile-{{uuid}}” AND begins_with(skey, “Item-Type2”)
In this schema, your GSI has the same keys as the table, but reversed. You can query the GSI for ‘Item-{{type}}-{{uuid}}’ to get the owning Profile, and similarly with a Profile is to get the owning account.
What I have illustrated here is the adjacency list pattern. DynamoDB also has an article describing how to use composite sort keys for hierarchical data, which would also be suitable for your data, and depending on your expected queries, it might be more suitable than using the adjacency list.
You don’t have to put everything in a single table. Yes, DynamoDB recommends it, but it is far more important to make sure that your application is correct and maintainable. If having multiple tables means it’s easier to write a defect free application, then use multiple tables.
User has an email address and a display name.
Both of these must be unique.
Both of these must be updatable as long as either is not being used already.
A User table will exist with additional non-key attributes and a guid ID.
How to model to support efficient query check if email address or display name is already being used?
Should I create a table with the guid as Key, no range, and 2 separate GSI one for email and one for display name (each being the key)? Both will also have a second field with the guid id of the user. Or should these be completely separate tables, or ????
Thoughts, is there a better way?
Thanks.
There are 3 ways you can design that I can think of:
As you have mentioned, a table with guid and 2 separate GSI one for email and other for Name.
You have stated that both the fields had to be unique, so potentially you can make any one of them as hash and create GSI for other.(This will run into problem as you mention that you need to update Email & Name as well, for that you have to delete old record and add a new record with same attributes and updated Hash keys)
Advantage of this would be that you need to pay less as there will be only one GSI compared to #1.
Another option is to use CloudSearch, your DynamoDB table can be integrated with cloudSearch, in this option you can simply create a table with guid no need to add any GSI, whenever you want to search you can search on CloudSearch to get the output.
One more advantage you will get in CloudSearch is that you will be able to query on any attributes of the table and can use different filters on them.
One thing you need to see it that price difference between #2 and #3, you can go with anyone which is better suited in terms of price and functionality.
If you implement this with other ways feel free to share it.
Hope that helps
I've been going through AWS DynamoDB docs and cannot figure out what's the difference between batchGetItem() and Query().
My use case: I have a table which has Id as primary hash key, and attribute values are Name and Marks.
I would like to perform batch query which returns list of names and marks by providing list of Id's which are primary keys.
Should I use batchGetItem() or Query()?
BatchGetItem: Allows to you parallelize "GetItem" requests for languages that don't support parallelism (i.e. javascript). This includes retrieving items from different tables (doesn't support indexes though).
Query: Allows you to page through tables with a Hash-Range schema (where you'll have multiple results associated with a Hash key) and allows you to retrieve items from the indexes on your table. Note you can also add an additional condition on range key in your KeyConditions and add conditions on any non primary key attribute in your QueryFilter.
It seems like that your use case calls for a BatchGetItem request, as you are trying to retrieve items from your base table by way of a Hash key.
Hope that helps!