I'm working on a Django app for keeping track of collections (coins, cards, gems, stamps, cars, whatever). You can have multiple collections, each collection can have sets (Pirates cards, Cardinals cards, etc.) and then of course the individual items in each collection/set. Each item can contain multiple pictures, a name, and description, but here's where I'm unsure how to proceed. Each collection will need it's own set of values, or fields, that the user will need to determine (condition, dimensions in the appropriate units, coin thickness, model number, etc). How can I make custom fields such that the user can name the field and choose the input type (text, numbers, dropdown w/choices) and those fields will show up to be entered on each item within that collection?
This would be called an Entity-Attribute-Value (EAV) model and it is quite tricky to implement in the way you want. You have to anticipate all sorts of issues with user input, how to validate field types, what happens when the user wants to change fields, etc. I would start by reading the issues raised in that question and think about ways that you could modify your schema to avoid letting users define their own metadata at runtime. Are there some fields that could be common to all collections (like condition, dimensions, model number)? How tolerant do you want to be of data type issues, and will users be allowed to change field types after creation?
The more thought you put into implementation, the more issues you can avoid down the road.
Related
I have several times come across a want to have a Django model field that comprises multiple database columns, and am wondering what the most Django way to do it would be.
Three use cases come specifically to mind.
I want to provide a field that wraps another field, keeping record of whether the wrapped field has been set or not. A use case for this particular field would be for dynamic configuration. A new configuration value is introduced, and a view marks itself as dependent upon a configuration value, redirecting if the value isn't set. Storing whether it's been set yet or not allows for easy indefinite caching of the state. This also lets the configuration value itself be not-nullable, and the application can ignore any value it might have when unset.
I want to provide a money field that combines a decimal (or integer) value, and a currency.
I want to provide a file field with a link to some manner of access rule to determine whether the request should include it/a request for it should succeed.
For each of the use cases, there exists a workaround, that in each case seems less elegant.
Define the configuration fields as nullable. This is undesirable for a few reasons: it removes the validity of NULL as a value for the configuration itself, so tristates and other use valid cases for NULL have to become a pair of fields or a different data type, or an edge case; null=True on the fields allows them to be set back to None in modelforms and the admin without writing a custom FormField for them every time; and every nullable column in a database is arguably bad design.
Define the field as a subclass of DecimalField with an argument accepting a string, and use that to contribute another field to the model. (This is what django-money does). Again, this is undesirable: fields are appearing "as if by magic" on the model; and configuring the currency field becomes not obvious.
Define the combined file+rule field instead as an entire model, and one-to-one to it from the model where you want to have the field. This is a solution to all use cases, but again comes with downsides: there's an extra JOIN required for every instance of the field - one can imagine a User with profile_picture, cv, passport, private_key etc.; there's an implicit requirement to .select_related(*fields) on every query that would ever want to access the fields; and the layout of the related model is going to have cold data interleaved with hot data all over the place given that it's reused everywhere.
In addition to solution 3., there's also the option to define a mixin factory that produces the multiple fields with matching names and whatever desired properties and methods. Again this isn't perfect because the user ends up with fields being defined in the model body, but also above that in the inheritance list.
I think the main reason this keeps sending me in circles is because custom Django model fields are always defined in terms of a single base field, because it's done by inheritance.
What is the accepted way to achieve this end?
I'm trying to implement personalization and having problems with Items schema.
Imagine I'm Amazon, I've products their brands and their categories. In what kind of Items schema should I include this information?
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
What about categories? I've the same questions.
Metadata Fields Metadata includes string or non-string fields that
aren't required or don't use a reserved keyword. Metadata schemas have
the following restrictions:
Users and Items schemas require at least one metadata field,
Users and Interactions datasets can contain up to five metadata
fields. An Items dataset can contain up to 50 metadata fields.
If you add your own metadata field of type string, it must include the
categorical attribute. Otherwise, Amazon Personalize won't use the
field when training a model.
https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html
There are simply 2 ways to include your metadata in Items/Users datasets:
If it can be represented as a number value, then provide the actual value if it makes sense.
If it can be represented as string, then provide the string value and make sure, that categorical is set to true.
But let's take a look into "Why does they need me, to categorize my strings metadata?". The answer is pretty simple.
Let's start with an example.
If you would have Items as Amazon.com products and you would like to provide rates metadata field, then:
You could take all of the rates including the full review text sent by clients and simply put it as metadata field.
You can take just stars rating, calculate the average and put it as metadata field.
Probably the second one is making more sense in general. Having random, long reviews of product as metadata, pretty much changes nothing. Personalize doesn't understands if the review itself is good or bad, or if the author also recommends another product, so pretty much it doesn't really add anything to the recommendations.
However if you simply "cut" your dataset and calculate the average rating, like in the 2. point, then it makes a lot more sense. Maybe some of our customers like crappy products? Maybe they want to buy them, because they are famous YouTubers and they create videos about that? Based on their previous interactions and much more, Personalize will be able to perform just slightly better, because now it knows, that this product has rating of 5/5 or 3/5.
I wanted to show you, that for some cases, providing Items metadata as string makes no sense. That's why your string metadata must be categorical. It means, that it should be finite set of values, so it adds some knowledge for Personalize about given Item and why some of people might want to interact with it.
Going back to your question:
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
I would simply go with brand ID as string. You could also go with brand name, but probably single brand can be renamed, when it's still the same brand, so picking up the ID would be more constant. Also two different brands could have the same names, because they are present on different markets, so picking up the ID solves that.
The "categorical": true switch in your schema just tells Personalize:
Hey, do you see that string field? It's categorised, finite set of values. If you train a model for me, please include this one during the training, it's important!
And as it's said in documentation, if you will provide string metadata field, which is not marked as categorical, then Personalize will "think" that:
Hmm.. this field is a string, it has pretty random values and it's not marked as categorical. It's probably just a leftover from Items export job. Let's ignore that.
I want to give users the ability to create new data models,and specify the relations between these models.
My use case is a world simulator. Let's say it has Places, Characters, Incidents, etc models. The user should be able to create additional models, and also specify relations between different models. E.g. We could create subgroups of Characters called Heroes and Villains. We may decide that a Character cannot be both a Hero and a Villain. There may also be many-to-many relationships, e.g. a Character can be in many Incidents, and an Incident can involve many Characters.
How do I do this in django? Is it even possible or feasible for users to be changing the actual data models? Or is there some other way to do it implement it
How do people deal with index data (the data usually shown on index pages, like a customer list) -vs- the model detail data?
When somebody goes to the customer/index route -- they only need access to a small subset of the full customer resource. Since I am dealing with legacy data, my customer model has > 10 relationships. It seems wasteful to have the api return a complete and full customer representation for every customer just to render a list/select/index view.
I know those relationships are somewhat lazy-loaded, but it still takes effort on the backend to pull all those relationships in. For some relationships (such as customer->invoices) this could be a large list of ids.
I feel answers to this can be very opinionated. But my two cents:
The API you are drawing on for your data should have an end-point to fetch the subset of data you're interested in, e.g. /api/mini-customer vs /api/customer.
You can then either define two separate models (one to represent the model in the list and one to represent the detailed view), or simply populate the original model with the subset of data and merge the extra data in at a later point.
That said, I've also seen plenty of cases such as the one you describe, where you load all data initially and just display the subset to begin with. If it's reasonable that the data will eventually be used and your page-load constraints can handle it, then this can be an acceptable approach.
I have a field in a model that I want users to feel like they can write an arbitrary amount of text in. Django provides a CharField and a TextField in the models. I assume that the difference is that one of them is a char(max_length) and the other is a varchar internally.
I am tempted to use the TextField, but since it doesn't respect max_length, I am somewhat wary of someone dumping loads of data into it and DOSing my server. How should I deal with this?
Thanks!
Fields in model only represent the way data is stored in database.
You can very easily enforce maximum length in form which will validate users' input. Like this.
class InputForm(forms.Form):
text = forms.CharField(max_length=16384, widget=forms.TextArea)
...
This will make sure the maximum length user can successfully enter is 16k.