What is the best way to implement functions while writing an app in django? For example, I'd like to have a function that would read some data from other tables, merge then into the result and update user score based on it.
I'm using postgresql database, so I can implement it as database function and use django to directly call this function.
I could also get all those values in python, implement is as django function.
Since the model is defined in django, I feel like I shouldn't define functions in the database construction but rather implement them in python. Also, if I wanted to recreate the database on another computer, I'd need to hardcode those functions and load them into database in order to do that.
On the other hand, if the database is on another computer such function would need to call database multiple times.
Which is preferred option when implementing an app in django?
Also, how should I handle constraints, that I'd like the fields to have? Overloading the save() function or adding constraints to database fields by hand?
This is a classic problem: do it in the code or do it in the DBMS? For me, the answer comes from asking myself this question: is this logic/functionality intrinsic to the data itself, or is it intrinsic to the application?
If it is intrinsic to the data, then you want to avoid doing it in the application. This is particularly true where more than one app is going to be accessing / modifying the data. In which case you may be implementing the same logic in multiple languages / environments. This is a situation that is ripe with ways to screw up—now or in the future.
On the other hand, if this is just one app's way of thinking about the data, but other apps have different views (pun intended), then do it in the app.
BTW, beware of premature optimization. It is good to be aware of DB accesses and their costs, but unless you are talking big data, or a very time sensitive UI, then machine-time, and to a lesser degree user-time, is less important than your time. Getting v1.0 out the door is often more important. As the inimitable Fred Brooks said, "Plan to throw one away; you will anyhow."
Related
What is the best way to call a SQL function / stored procedure when converting code to use the repository pattern? Specifically, I am interested in read/query capabilities.
Options
Add an ExecuteSqlQuery to IRepository
Add a new repository interface specific to the context (i.e. ILocationRepository) and add resource specific methods
Add a special "repository" for all the random stored procedures until they are all converted
Don't. Just convert the stored procedures to code and place the logic in the service layer
Option #4 does seem to be the best long term solution, but it's also going to take a lot more time and I was hoping to push this until a future phase.
Which option (above or otherwise) would be "best"?
NOTE: my architecture is based on ardalis/CleanArchitecture using ardalis/Specification, though I'm open to all suggestions.
https://github.com/ardalis/CleanArchitecture/issues/291
If necessary, or create logically grouped Query services/classes for
that purpose. It depends a bit on the functionality of the SPROC how I
would do it. Repositories should be just simple CRUD, at most with a
specification to help shape the result. More complex operations that
span many entities and/or aggregates should not be added to
repositories but modeled as separate Query objects or services. Makes
it easier to follow SOLID that way, especially SRP and OCP (and ISP)
since you're not constantly adding to your repo
interfaces/implementations.
Don't treat STORED PROCEDURES as 2nd order citizens. In general, avoid using them because they very often take away your domain code and hide it inside database, but sometimes due to performance reasons, they are your only choice. In this case, you should use option 2 and treat them same as some simple database fetch.
Option 1 is really bad because you will soon have tons of SQL in places you don't want (Application Service) and it will prevent portability to another storage media.
Option 3 is unnecessary, stored procedures are no worse than simple Entity Framework Core database access requests.
Option 4 is the reason why you cannot always avoid stored procedures. Sometimes trying to query stuff in application service/repositories will create very big performance issues. That's when, and only when, you should step in with stored procedures.
I've been using some code that implements the phpBB DBAL for some time. Recently I had to implement a more full package around it and decided to use the DBAL throughout. In the main, it's been OK. But occassionally there are circumstances where I can't see the logic in using it. It seems to make the simple much more complicated.
What benefits does a DBAL offer rather then writing sql statements directly?
From wikipedia (http://en.wikipedia.org/wiki/Database_abstraction_layer) :
API level abstraction
Libraries like OpenDBX unify access to databases by providing a single low-level programming interface to the application developer. Their advantages are most often speed and flexibility because they are not tied to a specific query language (subset) and only have to implement a thin layer to reach their goal. The application developer can choose from all language features but has to provide configurable statements for querying or changing tables. Otherwise his application would also be tied to one database.
When cooking a dish, you do not want several chefs having access to the pot. They could all be adding spices unaware that another chef had already added a spice. Ideally, you want a single chef that would serve as a single point of access to avoid spoiling the soup.
The same with databases. A single point of access can avoid problems of multiple services accessing the data in different ways.
Specifically thinking of web apps,
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
The web apps I write have logic built-in that validates user input against required fields. I see no real use for foreign keys and thus no real use for relational databases.
Besides, if I were to put all the required field validation logic in the RDBMS(ie:MySQL) it would simply return a vague error. At least with PHP-based validation I know which field is missing and I can notify the user(though with Javascript-based validation this would almost NEVER happen anyway).
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
I really need some insight on this topic. I'm simply can't come up with a good answer.
I will come at this from a different angle.
I work at a place where we had a database that had no foreign key constraints, default values, or other data checks whatsoever in their initial records database. The lead engineer's excuse for this was something similar to what you have described above. "The application will ensure the referential integrity".
The problem is, we did not have a standard data layer (like an object relational mapping) over the top of the database. We had multiple programmatic sources that fed into the same tables. It was funny because after a while, you could tell which parts of the code created which rows in the table. Sometimes the links lined up, sometimes they didn't. Sometimes the links were NULL (when they shouldn't be), and sometimes they were 0. We even had a few cyclic records which was fun.
My point is, you never know when you are going to need to write a quick script to batch import records, or write a new subsystem that references the same tables. It behooves us as programmers to program as defensively as possible. We can't assume that those who come after us will know as much (if anything) about how our schema should be used.
I'm not much of an SQL lover, but even I must say that the relational structure has its advantages.
It doesn't only allow validation. By providing the database with metadata describing the relations between the actual pieces information stored, a great number of optimizations are possible.
This makes it possible to quickly retrieve large, complex datasets. It also reduces the number of queries needed to make modifications and keep the data coherent, since most of the "book-keeping" is carried out automatically on the DB side of the connection.
One incredibly useful feature of foreign keys in most relational databases are cascades.
Suppose you have a families table and a persons table. Each family can have multiple people, but a person can only belong in one family (one-to-many relationship). If you have foreign keys and you delete a family row, the database can automatically update all the related people, either by deleting them or setting their foreign keys to null.
If you do not have this constraint, you must handle this situation yourself, in your own code.
RDBMSs are still very useful. Not sure why you wouldn't think so. Foreign key constraints can be used to maintain referential integrity (in other words, to provide a simple way to express 1:1, 1:many and many:many relationships. RDBMSs are also useful because there was a rich theory accompanying practical developments, unlike previous DBMSs. In particular, relational calculus/algebra are nice since they allow for good query optimization, normalization, etc.
Not sure if that really answers your question. Wikipedia might list some advantages of RDBMSs.
(1) why are relationships(ie:foreign keys) in RDBMS even useful?
First off, I think you are talking about foreign key CONSTRAINTS. Foreign keys are just a logical design feature that says that this entity matches up with that one.
The reason foreign key constraints are useful are:
They help you adhere to the DRY (Don't repeat yourself) principle. Sure your app validates the relationship, but does it do it in several places? Are there multiple apps that access the same DB? Do you have to repeat the logic in each app? Hey, you could pull that logic out and use a common DLL for access to that data that enforces that logic.Better yet, what if that was built into the RDMBS so I didn't have to write custom code to do something so routine? Bam. Foreign key constraints.
If your app enforces the foreign key validations, how do you force users who are working directly in the DB to honor your rules? I know, I know. You shouldn't let users into the back-end directly, but you just try telling that to the data analysts when they have a project for corporate and you are the bottleneck.
As to the vague error. Wouldn't your argument be better stated as RDBMS X has vague errors when data fails foreign key constraint checks? The way you have generalized it, you could also argue that we should use paper ledgers instead of computers because the constraint had a vague error.
(2) Was there a point in the past where RDBMS were useful for some reason or is there a reason they are useful now that I'm not aware of?
Yeah, that would be now, yesterday and probably long into the future.
I could go on forever about the reasons, but here is the big one...
It provides a common structured file format that is easy to extend, leverage by other applications. You may be too young to remember when every dang system had it's own proprietary structured file format, but it sucked. Plus, it forced you re-invent the wheel constantly in terms of things like indexing, a query language, locking, etc.
"I see no real use for foreign keys and thus no real use for
relational databases"
Judging by this remark, you seem to be underestimating what a relational database is for. Foreign key constraints aren't a defining feature of relational databases and certainly aren't the only reason for using such databases. The relational database model is a powerful and effective way to represent data and it remains so even if you decide you don't want to implement a foreign key constraint. I will therefore assume the question you really meant to ask is: Why are foreign keys useful in relational databases?
A foreign key constraint is just one kind of data integrity constraint. You can of course implement integrity rules outside the database but the DBMS is designed and optimised to do the job for you and is generally the most efficient place to do it because it is closest to the data structures. If you did it outside the database then you would have at least an extra round trip to retrieve the necessary data. You would also have to replicate the DBMS's locking/concurrency model in your application code.
The database optimiser can take advantage of constraints in the database to improve the performance of queries. It can't do that if the rules only exist in your application code.
If you have many applications sharing the same database then implementing data integrity rules in every application is impractical and expensive to maintain. Centralising the constraint logic makes more sense.
Various CASE tools and DBA tools will take advantage of database constraints, can reverse engineer them and use them to assist development and maintenance tasks.
In practice the meaning and function of a database constraint versus some procedural code that validates data only on entry is very different. If X is implemented in a database constraint then I know it is valid for every piece of data in the database. If X is implemented in the application when data is entered then I only know it applies to future data - I can't be sure it applies to everything already in the database (maybe X was only implemented today and didn't apply to the data entered yesterday).
Because they maintain the integrity of the database. If you have all your business logic in the application then in theory they are not needed, but are still useful as a safeguard against bad data.
I'm writing an app in Django where I'd like to make use of implicit inheritence when using ForeignKeys. As far as I'm concerned the only way to handle this nicely is to use django_polymorphic library (no single table inheritence in Django, WHY OH WHY??).
I'd like to know about the performance implications of this solution. What kind of joins are performed when doing polymorphic queries? Does it have to hit the database multiple times as compared to regular queries (the infamous N+1 queries problem)? The docs warn that "the type of queries that are performed aren't handled efficiently by the modern RDBMs"? However it doesn't really tell what those queries are. Any statistics, experiences would be really helpful.
EDIT:
Is there any way of retrieving a list of objects, each being an instance of its actual class with a constant number of queries ?? I thought this is what the aforementioned library does, however now I got confused and I'm not that certain anymore.
Django-Typed-Models is an alternative to Django-Polymorphic which takes a simple & clean approach to solving the single table inheritance issue. It works off a 'type' attribute which is added to your model. When you save it, the class is persisted into the 'type' attribute. At query time, the attribute is used to set the class of the resulting object.
It does what you expect query-wise (every object returned from a queryset is the downcasted class) without needing special syntax or the scary volume of code associated with Django-Polymorphic. And no extra database queries.
In Django inherited models are internally represented through an OneToOneField. If you are using select_related() in a query Django will follow a one to one relation forwards and backwards to include the referenced table with a join; so you wouldn't need to hit the database twice if you are using select_related.
Ok, I've digged a little bit further and found this nice passage:
https://github.com/bconstantin/django_polymorphic/blob/master/DOCS.rst#performance-considerations
So happily this library does something reasonably sane. That's good to know.
In our current codebase we are using MFC db classes to connect to DB2. This is all old code that has been passed onto us by another development team, so we know some of the history but not all.
Most of the Code abstracts away the creation of SQL queries through functions such as Update() and Insert() that prepend something like "INSERT INTO SCHEMA.TABLE" onto a string that you supply. This is done through the recordset classes that sit on top of the database class
The other way to do the SQL queries is to execute them directly on the database class using dbclass.ExecuteSQL(String).
We are wondering what the pro's and cons of each approach is. From our perspective its much easier to do the ExecuteSQL() call, as we dont have to write another class etc, but there must be good reasons to do the other way. we are just not sure what they are.
Any help would be great!
Thanks Mark
Update----
I think I may have misunderstood Dynamic and Static SQL. I think our code always uses Dynamic, so my question really becomes, should I construct the SQL strings myself and do an ExecuteSQL() or should this be abstracted away in a class for each table in the database, as the recordset classes from mfc seem to do?
The ATL OLE DB consumer database classes are absolutely the way to go. Beyond the risks of injection (mentioned by Skurmedel), piles of string-concatenated queries will become impossible to maintain very quickly.
While the ATL classes can be initially tedious, they provide the benefit of strong-typed and named columns, result navigation, connection and session management, etc.
I would try to abstract it away if it's many SQL statements. Managing dozens of different SQL queries quickly become tedious. Also it's easier to validate input that way.