Comparative database metrics for different implementations of Django Model [closed]

Comparative database metrics for different implementations of Django Model [closed] - django

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Considering two approaches in designing Django models:
1 - Very few models, but each model has a long list of fields.
2 - Many models, each model has significantly shorter list of fields. But the models employ multi-level hierarchical one-to-many structure.
Suppose they both use Postgres as the database. In general, what would be the comparative database size, latency in retrieving, processing and updating data for these two approaches?

In short: define models based on the business logic.
People often aim to optimize too early in the development process. As Donald Knuth said:
Premature optimization is the root of all evil.
You should see tables as a storage device for entities. For example if you are making an e-commerce website. It makes sense that there is a model for Products, a model for Order, and a model in between (the junction table of a many-to-many relation between Product and Order) that determines how many times the product appears in the order.
By modeling this based on data, and not on a specific use case, it is usually simpler to add new features to your application. Furthermore it makes querying simpler, and therefore often has a positive effect on the overall performance.
Furthermore it is of importance to get used to the Django ORM tooling, and especially, as #markwalker_ says, with select_related(…) and prefetch_related(…) method calls to load in bulk data that is related to the data you are retrieving. Often the number of queries to the database is already a strong indicator how efficient that program will run, not that much the exact queries: if your application makes a lot of queries, even simple ones, the number of roundtrips to the database will slow down the application significantly. If there is a bottleneck somewhere, then you can run a profiler and try to find parts of the code that needs to be optimized.
There is for example a package named nplusone [GitHub] and scout can detect N+1 query problems that thus can be resolved with select_related(…) and prefetch_related(…).

Related

Does many to many-many relationship hurt model performance?

Generally, we recommend minimizing the use of bi-directional
relationships. They can negatively impact on model query performance,
and possibly deliver confusing experiences for your report users.
Link: https://learn.microsoft.com/en-us/power-bi/guidance/relationships-bidirectional-filtering
Documentation clearly says that bi-directional filtering hurts model performance.
Does many to many relationship also hurt model performance? The documentation (https://learn.microsoft.com/en-us/power-bi/guidance/relationships-many-to-many) doesn't mention this.
The reason for asking this question is - my understanding was that model performance is based on table expansion, and since many to many relationship doesn't support table expansion, does this imply that it will have bad performance?
Whereas bidirectional relationship doesn't affect table expansion (in an intra group 1:n relationship). Yet it is said that bidirectional relationship has bad performance.
So is table expansion not a factor that affects model performance?

There is an extensive article and associated video on many-to-many performance here: https://www.sqlbi.com/articles/different-options-to-model-many-to-many-relationships-in-power-bi-and-tabular/

What kind of variable should I use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am using Meteor and have an opinion question.
I have a series of templates that I designed for making interactive tables: sortable column headers, pagination, reactive counter of table's elements, etc. Up until now I have been storing several pieces of information (current page, items per page, and sort order) as session variables so that it was easy to access them from every template regardless of their relationship (parent, sibling...) to eachother.
This has worked alright until now, but now I want multiple tables on the same page. Since I have statically-named session variables, information gets overwritten by other tables on the page.
I am working on a bunch of solutions and welcome other suggestions. What do you guys think?
1) Name every table and store all the information for every table on the site in a giant session variable, which would be an object keyed by tables' names. The downside here is that every table would need a unique name and I'd have to keep track of that. The upside is that implementing the table in new parts of the system could be easier than ever. Also, table sort/filter/page information would be stored even when leaving pages (but could be overridden if that were desired).
2) On the template that contains all the table parts, define reactive variables, then explicitly pass those down to lower levels with helpers. This would help with our goal of cleansing the system of session variables floating around (not that all session variables are bad), but would be a trickier refactor and harder to implement new tables with. Information would not be remembered when navigating between pages.
3) Each table template could reference the parent's reactive variables (messy, but possible) and look for specifically named ones (such as "table_current_page"). This would make setup of new tables easier than #2, but would only allow one table per template (but multiple per page would still be possible).
None of these are quite ideal, but I am leaning towards #1 or something like it. Suggestions? Thanks!

As the other user commented on your question, opinion based questions are off-topic on SO. But anyway here is my opinion,
Option 1: I would not use this! This way, even if one of the reactive parameters for one table changes all other tables helpers will re-run and those tables will also re-render. As your application grows and probably when it is time to have more than 4-5 tables at a time, you might feel the performance is not that good.
Option 2: I will definitely use this, this (as you mentioned) is very cleaner way. No performance impact (like the one mentioned in option 1), even if you have multiple tables in the same page.
Option 3: If you do this you will have very strong dependency between those templates. So all child templates cannot be used independently elsewhere.
So if you have enough time then go for option 2. If not, option 1 with a slight change, that is, instead of one large session variable, you can use multiple smaller session variables that have a unique table name as prefix or suffix. This does pollute your Session variables but it will not have performance impact.
This is my opinion.

API best practice - generic vs ad hoc methods [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm creating REST API that will be used by a web page.
There are several types of data I should provide, and I'm wondering what would be the best practice:
Create one method that will return a complex object with all the needed data.
Pros: one call will be needed from the UI side to get all the data.
Cons: not generic solution at all.
Create multiple autonomous method.
Pros: generic enough to be used in the future by other components.
Cons: will require the UI to make several calls to the server.
Which one adheres more to best practices?

It ultimately depends on your environment, the data-size and the quantity of methods. But there are several reasons to go with the second option and only one to go with the first.
First option: One complex method
Reason to go with the first: The HTTP overhead of multiple requests.
Does the overhead exist? Of course, but is it really that high? HTTP is one of the lightest application layer protocols. It is designed to have little overhead. It's simplicity and light headers are some of the main reasons to its success.
Second option: Multiple autonomous methods
Now there are several reasons to go with the second option. Even when the data is large, believe me, it still is a better option. Let's discuss some aspects:
If the data-size is large
Breaking data transfer into smaller pieces is better.
HTTP is a best effort protocol and data failures are very common, specially in the internet environment - so common they should be expected. The larger the data block, the greater the risks of having to re-request everything back.
Quantity of methods: Maintainability, Reuse, Componentization, Learnability, Layering...
You said yourself, a generic solution is easier to be used by other components. The simpler and more concise the methods' responsibilities are, the easier to understand them and reuse them in other methods it is.
It is easier to maintain, to learn: the more independent they are, the less one has to know to change it (or get rid of a bug!).
To take REST into consideration here is important, but the reasons to break down the components into smaller pieces really comes from understanding the HTTP protocol and good programming/software engineering.

So, here's the thing: REST is great. But not every pattern in its purest form works in every situation. If efficiency is an issue, go the one-call route. Or maybe supply both, if others will be consuming it and might not need to pull down the full complex object every time.
I'd say REST does not care about data normalization. Having two ways to get at the same data is not going to hurt.

C++ architecture to represent flexible powerplant hierachy [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I m not an experienced programmer so please bear with me. As a consequence I will need to be specific on my problem which is about building an architecture to represent a power plant hierarchy:
Indeed, I m trying to construct a flexible architecture to represent contracts and pricing/analysis for multiple type of power plants. I am reading the alexandrescu book about generic design patterns and policy classes as it seems to me a good way to handle the need for flexibility and extensibility for what I want to do. Let s detail a bit :
Power plant can have different type of combustible to run (be of different types) : coal or gas or fuel. Among each of those combustible, you can choose among different sub-type of combustible (ones of different quality or Financial index). Among those sub-types, contract formula describing the delivery can be again of different types (times series averaged with FX within or via a division,etc...) Furthermore, you can be in europe and be subject to emissions reduction schemes and have to provide co2 crédits (enters in the formula of your margin), or not which depend on regulatory issues. As well, you can choose to value this power plant using different methodology etc... etc...
Thus my point is you can represent an asset in very different way which will depend on regulation, choices you make, the type of contracts you agree with another counterparty, the valuation you want to proceed and CLEARLY, you don't want to write 100 times the same code with just a little bit of change. As I said in the beginning, I am trying to find the best programming techniques to handle my program the best way. But as I said, I m new in building software achitecture. It appears to me that Policy classes would be great to handle such an architecture as they can express the kind of choices we have to make.
However, putting it in practice makes me headache. I thought of a generic object factory where Powerplant* is my abstract type where functions like void price() or riskanalysis() would be pure virtual. Then I would need to make a hierachy based on this and derive elements

I couldn't really get what do you want, but I think you should learn programming before you want to do anything related to programming.
Learning is hard and takes a lot of time but it's worth. Also, more useful than asking and getting the answer without explaination. ;)

Which RDBMS do you use with Django and why? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I recently switched a project from MySQL InnoDB to PostgreSQL, and I feel bigger lags when inserting and updating data with ajax. This may be subjective. I know the Django devs recommend postgres and I know psycopg2 is supposed to be faster than MysqlDB. Personaly I like the way postgres enforces database integrity, but am mostly worried about performance for this project. Want to hear other people's opinion on this.

Why don't you measure? That's the only way to be sure about the performance. Hand waving about how slow or not is something without hard data is like trying to catch water with your hands.
Measure transactions per second or even better, requests per second with a web server stress tool like The Grinder (which can be scripted in Jython) with both MySQL backend and PostgreSQL and afterwards see if that makes a difference. If it does, ask around here, or, more specifically, ask on the pgsql-general or pgsql-performance mailing lists. There are many expert people there which know loads about it, even the main devs. There are tons of knobs on PostgreSQL configuration related to performance.

It may be incorrect usage of indexes. Simply making sure you have the right indexes, and making sure the tables are analyzed and vacuumed periodically should give rather decent results.
Even if Postgres turns out to be a minuscule slower, in some cases, my personal opinion is that the features it provides greatly outweigh the minor performance losses.
Postgres really is a beautiful database, and any time I'm using anything else, I wish I was using Postgres.

I used SQLite for the first time in the development phase of the last project. It is easy to set up, convenient to carry around from one dev system to another, etc. I have to add that when I finally moved the project to production over on MySQL, a number of subtle issues manifested themselves with MySQL that were not present at all with SQLite. Nothing big, but from now on if I have to deploy a project on MySQL, I would prefer to use MySQL in the development phase too.

SQLite. No backend server. Excellent for dev

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js