Designing sets of data and support class extension OO approach, in c++ - c++

i'm currently working on a project and something came up on the design.
I have a class named Key which is composed of several Fields. This Field class it's a mother class and their sons like Age, Name, etc implement Field. Inside the Key class there's an attribute which is an array of Fields, to hold different kinds of Fields.
class Key {
private:
Field * fieldList;
}
I'm working on a team and a design choice came up that i couldn't defend cause i didn't knew how to answer to the following problem... or maybe the lack of it? I trust that you'll be able to open my mind on this.
The purpose of this Key class is to hold several fields. The existence of this class is because i'm going to handle data of this kind.
(Name, Age....)
This is how i thought it would look already implemented:
Key myKey = Key();
Age newAge = Age(50);
myKey.add(newAge);
This is what the prototype of the add method of the Key class would look like:
void Key::add(Field);
As you may have assumed, since the Key class has an array of Field's this method receives a Field and since Age is also Field, cause of inheritance, then this works like a charm. Same can be said of the Name class and other classes that could come up in the future.
This is the same idea as in a database where you have rows with data and the columns belong to the attributes, so a same column has the same type of attribute.
We would also like to compare 2 Key's only by one of the Fields, for example:
Let's say i have 2 Key's with this data:
(John, 50) <- myKey1
(Paul, 60) <- myKey2
My method to do this would look like this:
myKey1.compareTo(myKey2, 2)
This would answer if the 2nd attribute of the first myKey1 is bigger, equal or less than the one on the second myKey2.
There's a problem with this. When i used the add method, i randomly added Field's of different types, say Age first, then Name second, etc to the Key object. My Key object added them to it's internal array by order of appearance.
So when i use the compareTo method, nothing is assuring me that inside both objects, the 2nd elements of their arrays will have a Field say, the Name Field, and therefore if that were not to be true, it could be comparing a Name with Age, cause inside it only holds an array of Field's, that are equal type as long the Key class knows.
This was my approach to my solution, but what i couldn't answer is, how to fix this problem.
Another member of my team proposed, that we implement a method for the key class for each of the existing fields, that is:
myKey.addAge(newAge);
myKey.addName(newName);
Inside it would still have the Field array but this time, the class can assure you that Age will go in the 1st place of the array, and that Name would go in the 2nd position of the array, cause each method would make sure of it.
The obvious problem with this, is that i would have to add a method for each type of Field that exists. That means that if in the future i wish to add say "born date" and so creating the new Date class, i'll have to add a method addDate, and so on and so on...
Another reason my team member gave me is that, "we can't trust an exterior user that he will add the Fields the way they're supposed to be ordered" when pointing why my approach was bad.
So to conclude:
On the first approach, the Key class depends on the programmer that added Fields, to make sure they have the order they should, but as a benefit no need to add a method for each type of field.
On the second approach, the Key class makes sure the order is the right one, by implementing a method for each type Field that exists, but then, by each type of new Field created, the class would grow bigger and bigger.
Any ideas with this? is there a workaround for this?
Thanks in advance, and i apologize if i wasn't clear with it, i'll add new details if needed.

Expanding on #tp1's excellent idea of an ID field in the Field class and an enum, you can actually make it very flexible. If you are comfortable limiting the number of field types to 32, you could even take a set of flags as the ID in CompareTo. Then you could compare multiple fields at the same time. Does that approach make sense?

Related

What is the most Django-appropriate way to combine multiple database columns into one model field?

I have several times come across a want to have a Django model field that comprises multiple database columns, and am wondering what the most Django way to do it would be.
Three use cases come specifically to mind.
I want to provide a field that wraps another field, keeping record of whether the wrapped field has been set or not. A use case for this particular field would be for dynamic configuration. A new configuration value is introduced, and a view marks itself as dependent upon a configuration value, redirecting if the value isn't set. Storing whether it's been set yet or not allows for easy indefinite caching of the state. This also lets the configuration value itself be not-nullable, and the application can ignore any value it might have when unset.
I want to provide a money field that combines a decimal (or integer) value, and a currency.
I want to provide a file field with a link to some manner of access rule to determine whether the request should include it/a request for it should succeed.
For each of the use cases, there exists a workaround, that in each case seems less elegant.
Define the configuration fields as nullable. This is undesirable for a few reasons: it removes the validity of NULL as a value for the configuration itself, so tristates and other use valid cases for NULL have to become a pair of fields or a different data type, or an edge case; null=True on the fields allows them to be set back to None in modelforms and the admin without writing a custom FormField for them every time; and every nullable column in a database is arguably bad design.
Define the field as a subclass of DecimalField with an argument accepting a string, and use that to contribute another field to the model. (This is what django-money does). Again, this is undesirable: fields are appearing "as if by magic" on the model; and configuring the currency field becomes not obvious.
Define the combined file+rule field instead as an entire model, and one-to-one to it from the model where you want to have the field. This is a solution to all use cases, but again comes with downsides: there's an extra JOIN required for every instance of the field - one can imagine a User with profile_picture, cv, passport, private_key etc.; there's an implicit requirement to .select_related(*fields) on every query that would ever want to access the fields; and the layout of the related model is going to have cold data interleaved with hot data all over the place given that it's reused everywhere.
In addition to solution 3., there's also the option to define a mixin factory that produces the multiple fields with matching names and whatever desired properties and methods. Again this isn't perfect because the user ends up with fields being defined in the model body, but also above that in the inheritance list.
I think the main reason this keeps sending me in circles is because custom Django model fields are always defined in terms of a single base field, because it's done by inheritance.
What is the accepted way to achieve this end?

Detect duplicate inserts when adding many-to-many relation

Let's assume there are two models, A and B:
class A(models.Model):
name = models.CharField(max_length=100)
class B(models.Model):
children = models.ManyToManyField(A)
I'm using b.children.add() method to add instance of A to b:
a = A.objects.get(pk=SOMETHING)
b.children.add(a)
As far as I know, Django by default doesn't allow duplicate many-to-many relationship. So I cannot add same instance of A more than once.
But the problem is here, I fetch instances of A with another query, then loop around them and add them one by one. How can I detect a duplicate relation? Does add() method return something useful?
A look at the source code reveals that Django first checks to see if there are any entries that already exist in the database, and then only adds the new ones. It doesn't return any information to the caller, though.
It's not clear if you actually need to detect duplicates, or if you just want to make sure that they're not being added to the database? If it's the latter then everything's fine. If it's the former, there's no way around hitting the database. If you're really concerned about performance you could always perform the check and update the through table yourself (i.e. re-implement add()).

In data mining what is a class label..? please give an example

i don't understand what it means.
in database a tuple means a field value and a attribute means a table field?
am i correct?
and what is a Class label in Data Mining?
Very short answer: class label is the discrete attribute whose value you want to predict based on the values of other attributes. (Do read the rest of the answer.)
The term class label is usually used in the contex of supervised machine learning, and in classification in particular, where one is given a set of examples of the form (attribute values, classLabel) and the goal is to learn a rule that computes the label from the attribute values. The class label always takes on a finite (as opposed to inifinite) number of different values.
For a concrete example, we might be given a set of adult people and we'd like to predict whether they're homeless or not. Suppose the attributes were highest educational level achieved and origin (examples are of the from (origin, educationalLevel; isHomeless):
(Manhattan, PhD; no)
(Brooklyn, Primary school; yes)
...
In this particular case, isHomeless is the class label. The goal is to learn a function that computes whether the person with a given attribute values is homeless or not. (More specifically, to learn a function that makes as little mistakes as possible under a certain quantification of the number of mistakes.)
The Wikipedia article Supervised learning gives a good description.
Regarding the other question: no, a tuple means the whole set of values of the attributes in a given row. For example, if you had a table Table person(id, name, surname) then a tuple representing the first row could be (0, 'Akhil', 'Mohan').
Basically a class label (in classification) can be compared to a response variable (in regression): a value we want to predict in terms of other (independent) variables.
Difference is that a class labels is usually a discrete/Categorcial variable (eg-Yes-No, 0-1, etc.), whereas a response variable is normally a continuous/real-number variable.
You can find more about Regression and Classification related to Response variables and Class lables at https://math.stackexchange.com/questions/141381/regression-vs-classification.
Take an example of email spam filter, it classifies that an email is a spam or not, for which we define 2 classes which are spam(class 1) and not spam(class 2). Both of these are class labels or you can say that, if an email have some certain attributes then it belongs to spam class or not spam class

Django model inheritance vs composition, and querying multiple models/tables together

I have a Django app that has a number of different models, all with a bunch of common data. Ultimately, I think this question comes down to a decision between inheritance and composition. My current implementation is something like this:
class Thing(models.Model):
foo = models.FooField()
bar = models.BarField()
type = models.CharField()
class A(CommonStuff):
aFoo = models.FooField()
class B(CommonStuff):
bFoo = models.FooField()
With this model, I'm able to query for a Thing using the Thing model's manager. Using the type field on Thing, I can get the child object data by looking at the type field, which contains either 'a' or 'b', and then asking for (i.e.) thing.a.aFoo. This is a feature I like because it's a fairly common operation in my app to get a list of all Thing objects.
I see a couple couple issues here. First, the type field seems unnecessary. Is there way to get at the child data without having to first look up the type? It seems like I could work around this with an instance method that returned the correct object given its type value, or if I really wanted to get rid of the type field, I could iterate over each of the reverse relation fields on Thing, looking for one that doesn't raise a DoesNotExist exception. This feels quite brittle to me though. If I add a new 'subthing' C, I have to update Thing to look for the new type. I could fix this by making Thing and abstract model. That way, A and B get all the fields of Thing and I avoid having to use the type field. Problem, though, is that I lose the ability to perform queries for all Thing objects.
Another model I'm thinking about sort of flips this one on its head by turning the data in Thing into a field on A and B.
class Thing(models.Model):
foo = models.FooField()
bar = models.BarField()
class A(models.Model):
aFoo = models.FooField()
thing = models.OneToOneField(Thing)
class B(models.Model):
bFoo = models.FooField()
thing = models.OneToOneField(Thing)
This version has a few benefits. It gets rid of the type field on Thing, and—at least to me—looks and feels cleaner and less brittle. The problem here, though, is the same as the problem with making Thing abstract in the first version. I lose the ability to query all my 'subthings' together. I can do a query for A objects or a query for B objects, but not both. Can use this version of the model without having to sacrifice the ability to query for all 'subthings'? One possibility is to write a manager that queries both models and returns a QuerySet of all the objects. Does that work?

Django: Query with F() into an object not behaving as expected

I am trying to navigate into the Price model to compare prices, but met with an unexpected result.
My model:
class ProfitableBooks(models.Model):
price = models.ForeignKey('Price',primary_key=True)
In my view:
foo = ProfitableBooks.objects.filter(price__buy__gte=F('price__sell'))
Producing this error:
'ProfitableBooks' object has no attribute 'sell'
Is this your actual model or a simplification? I think the problem may lie in having a model whose only field is its primary key is a foreign key. If I try to parse that out, it seems to imply that it's essentially a field acting as a proxy for a queryset-- you could never have more profitable books than prices because of the nature of primary keys. It also would seem to mean that your elided books field must have no overlap in prices due to the implied uniqueness constraints.
If I understand correctly, you're trying to compare two values in another model: price.buy vs. price.sell, and you want to know if this unpictured Book model is profitable or not. While I'm not sure exactly how the F() object breaks down here, my intuition is that F() is intended to facilitate a kind of efficient querying and updating where you're comparing or adjusting a model value based on another value in the database. It may not be equipped to deal with a 'shell' model like this which has no fields except a joint primary/foreign key and a comparison of two values both external to the model from which the query is conducted (and also distinct from the Book model which has the identifying info about books, I presume).
The documentation says you can use a join in an F() object as long as you are filtering and not updating, and I assume your price model has a buy and sell field, so it seems to qualify. So I'm not 100% sure where this breaks down behind the scenes. But from a practical perspective, if you want to accomplish exactly the result implied here, you could just do a simple query on your price model, b/c again, there's no distinct data in the ProfitableBooks model (it only returns prices), and you're also implying that each price.buy and price.sell have exactly one corresponding book. So Price.objects.filter(buy__gte=F('sell')) gives the result you've requested in your snipped.
If you want to get results which are book objects, you should do a query like the one you've got here, but start from your Book model instead. You could put that query in a queryset manager called "profitable_books" or something, if you wanted to substantiate it in some way.