On demand feature view using multiple entities - feast

is it possible to combine values from multiple entities in a feature? For example, each row of input data contains multiple people, each of which is an entity in the store. I want a feature that is the MAX of the supplied people values.
Is this possible?
I have read the docs on the on demand feature views, but it is not clear if I can combine data from different entities.

Related

Negative impact of a Django model with multiple fields (75+ fields) [duplicate]

This question already has answers here:
Why use a 1-to-1 relationship in database design?
(6 answers)
Closed 6 months ago.
I'm in the process of building a web app that takes user input and stores it for retrieval and data manipulation. There are essentially 100-200 static fields that the user needs to input to create the Company model.
I see how I could break the Company model into multiple 1-to-1 Django models that map back the a Company such as:
Company General
Company Notes
Company Finacials
Company Scores
But why would I not create a single Company model with 200 fields?
Are there noticeable performance tradeoffs when trying to load a Query Set?
In my opinion, it would be wise for your codebase to have multiple models related to each other. This will give you better scalability opportunities and easier navigation to your model fields. Also, when you want to make a custom serializer, or custom views that will deal with some of your fields, but not all, it would be ideal to not have to retrieve 100+ fields every time.
Turns out I wasn't asking the right question. This is the questions I was asking. It's more a database question than a Django question I believe: Why use a 1-to-1 relationship in database design?
From the logical standpoint, a 1:1 relationship should always be
merged into a single table.
On the other hand, there may be physical considerations for such
"vertical partitioning" or "row splitting", especially if you know
you'll access some columns more frequently or in different pattern
than the others, for example:
You might want to cluster or partition the two "endpoint" tables of a
1:1 relationship differently. If your DBMS allows it, you might want
to put them on different physical disks (e.g. more
performance-critical on an SSD and the other on a cheap HDD). You have
measured the effect on caching and you want to make sure the "hot"
columns are kept in cache, without "cold" columns "polluting" it. You
need a concurrency behavior (such as locking) that is "narrower" than
the whole row. This is highly DBMS-specific. You need different
security on different columns, but your DBMS does not support
column-level permissions. Triggers are typically table-specific. While
you can theoretically have just one table and have the trigger ignore
the "wrong half" of the row, some databases may impose additional
limits on what a trigger can and cannot do. For example, Oracle
doesn't let you modify the so called "mutating" table from a row-level
trigger - by having separate tables, only one of them may be mutating
so you can still modify the other from your trigger (but there are
other ways to work-around that). Databases are very good at
manipulating the data, so I wouldn't split the table just for the
update performance, unless you have performed the actual benchmarks on
representative amounts of data and concluded the performance
difference is actually there and significant enough (e.g. to offset
the increased need for JOINing).
On the other hand, if you are talking about "1:0 or 1" (and not a true
1:1), this is a different question entirely, deserving a different
answer...

Alternatives to dynamically creating model fields

I'm trying to build a web application where users can upload a file (specifically the MDF file format) and view the data in forms of various charts. The files can contain any number of time based signals (various numeric data types) and users may name the signals wildly.
My thought on saving the data involves 2 steps:
Maintain a master table as an index, to save such meta information as file names, who uploaded it, when, etc. Records (rows) are added each time a new file is uploaded.
Create a new table (I'll refer to this as data tables) for each file uploaded, within the table each column will be one signal (first column being timestamps).
This brings the problem that I can't pre-define the Model for the data tables because the number, name, and datatype of the fields will differ among virtually all uploaded files.
I'm aware of some libs that help to build runtime dynamic models but they're all dated and questions about them on SO basically get zero answers. So despite the effort to make it work, I'm not even sure my approach is the optimal way to do what I want to do.
I also came across this Postgres specifc model field which can take nested arrays (which I believe fits the 2-D time based signals lists). In theory I could parse the raw uploaded file and construct such an array and basically save all the data in one field. Not knowing the limit of size of data, this could also be a nightmare for the queries later on, since to create the charts it usually takes only a few columns of signals at a time, compared to a total of up to hundreds of signals.
So my question is:
Is there a better way to organize the storage of data? And how?
Any insight is greatly appreciated!
If the name, number and datatypes of the fields will differ for each user, then you do not need an ORM. What you need is a query builder or SQL string composition like Psycopg. You will be programatically creating a table for each combination of user and uploaded file (if they are different) and programtically inserting the records.
Using postgresql might be a good choice, you might also create a GIN index on the arrays to speed up queries.
However, if you are primarily working with time-series data, then using a time-series database like InfluxDB, Prometheus makes more sense.

Boolean attribute or new table (Django + PostgreSQL)

Situation: I have a Books set. Book can be one of the types: "Test", "Premium" and "Common". Data proportional: 2%, 15%, 83%. Amount query per time unit (in percent): 40%, 20%, 40%
I see some ways for resolve it in database:
Boolean: is_test, is_premium. If we need only "Tests" book: Book.objects.filter(is_test=True). It is can be a proxy model, for example. Analogy for premium books;
Separate Tables: books_test, books_premium, books_common.
Choice field: string in ['Test', 'Premium', 'Common'];
Combine 1 and 2: books_test table and books table with 'is_premium' attribute.
And we need optimally querying this data! All three Book variants need in one page. Exist queryset combinations: only tests, only common, common + premium, only premium.
If we use 1,3 variant: 1 endpoint with specific filter;
If we use 2 variant: one of the tree endpoints without filters (frontend should know what kind endpoint use). Or we can create one endpoint with some conditions and check by backend. Anyway: need extend logic;
Which way is more correct and why?
If you need to mix different types on one page, separate models/tables would complicate things for no good reason. The same goes for mapping more than two exclusive states to a combination of boolean fields.
This leaves you with a choice field or a separate BookType model containing the choices.

Ember index data -vs- show data

How do people deal with index data (the data usually shown on index pages, like a customer list) -vs- the model detail data?
When somebody goes to the customer/index route -- they only need access to a small subset of the full customer resource. Since I am dealing with legacy data, my customer model has > 10 relationships. It seems wasteful to have the api return a complete and full customer representation for every customer just to render a list/select/index view.
I know those relationships are somewhat lazy-loaded, but it still takes effort on the backend to pull all those relationships in. For some relationships (such as customer->invoices) this could be a large list of ids.
I feel answers to this can be very opinionated. But my two cents:
The API you are drawing on for your data should have an end-point to fetch the subset of data you're interested in, e.g. /api/mini-customer vs /api/customer.
You can then either define two separate models (one to represent the model in the list and one to represent the detailed view), or simply populate the original model with the subset of data and merge the extra data in at a later point.
That said, I've also seen plenty of cases such as the one you describe, where you load all data initially and just display the subset to begin with. If it's reasonable that the data will eventually be used and your page-load constraints can handle it, then this can be an acceptable approach.

Django: how to rearrange object ids

I've started a model with the default, automatically handled IDs Django provides.
Now I've started interfacing with an external system which has its own IDs for the same objects, and it would be very convenient to align my own IDs with the external system's.
However, the numerical ranges overlap, so a naive solution wouldn't work.
Is there some elegant way to alter the IDs in a safe manner? (the object have multiple foreign keys/m2ms etc)
Thanks
I don't know if this is the best way but I would set all foreign/m2m keys to cascade updates and add a defined number to all IDs in the django db to remove the overlapping of the ranges (10k or 100K or more depeding on your data).
with that done, you can copy IDs from the other system. For safety, I would duplicate the ID columns while doing that in order to not lose the original ones... until sure all works ok