Can I return DTOs in Repository design pattern? - repository-pattern

Repository is like a collection of domain objects. So it should not return DTOs or anything that is not a domain object.
But, Suppose your domain model has 20 fields with large amount of data and you want to use only 2 fields here, you have to fetch the whole row first and then map it, which is very inefficient.

It depends. If you are modeling with DDD and CQRS then you should return Aggregates for commands and ViewModels for queries. You can split repos in reads and writes, where reads are used for serving views for example, or REST APIs, case in which you would have DTOs and not ViewModels, thus you only return the data (fields) that you need from the query.
In the write stack you should have a single method that returns, and the return type should be the Aggregate of that specific repository (use lazy loading if you don't want to load all related child collections)
TAggregate GetById(object id)


Synchronized Model instances in Django

I'm building a model for a Django project (my first Django project) and noticed
that instances of a Django model are not synchronized.
a_note = Notes.objects.create(message="Hello") # pk=1
same_note = Notes.objects.get(pk=1)
same_note.message = "Good day"
a_note.message # Still is "Hello"
a_note is same_note # False
Is there a built-in way to make model instances with the same primary key to be
the same object? If yes, (how) does this maintain a globally consistent state of all
model objects, even in the case of bulk updates or changing foreign keys
and thus making items enter/exit related sets?
I can imagine some sort of registry in the model class, which could at least handle simple cases (i.e. it would fail in cases of bulk updates or a change in foreign keys). However, the static registry makes testing more difficult.
I intend to build the (domain) model with high-level functions to do complex
operations which go beyond the simple CRUD
actions of Django's Model class. (Some classes of my model have an instance
of a Django Model subclass, as opposed to being an instance of subclass. This
is by design to prevent direct access to the database which might break consistencies and to separate the business logic from the purely data access related Django Model.) A complex operation might touch and modify several components. As a developer
using the model API, it's impossible to know which components are out of date after
calling a complex operation. Automatically synchronized instances would mitigate this issue. Are there other ways to overcome this?
TL;DR "Is there a built-in way to make model instances with the same primary key to be the same object?" No.
A python object in memory isn't the same thing as a row in your database. So when you create a_note and then fetch same_note from the db, those are two different objects in memory, even though they are the same representation of the underlying row in your database. When you fetch same_note, in fact, you instantiate a new Notes object and initialise it with the values fetched from the database.
Then you change and save same_note, but the a_note object in memory isn't changed. If you did a_note.refresh_from_db() you would see that a_note.message was changed.
Now a_note is same_note will always be False because the location in memory of these two objects will always be different. Two variables are the same (is is True) if they point to the same object in memory.
But a_note == same_note will return True at any time, since Django defines two model instances to be equal if their pk is the same.
Note that if the complexity you're talking about is that in the case of multiple requests one request might change underlying values that are being used by another request, then use F to avoid race conditions.
Within one request, since everything is sequential and single threaded, there's not risk of variables going out of sync: You know the order in which things are done and therefore can always call refresh_from_db() when you know a previous method call might have changed the database value.
Note also: Having two variables holding the same row means you'll have performed two queries to your db, which is the one thing you want to avoid at all cost. So you should think why you have this situation in the first place.

What are the trade-offs in Cloud Datastore for list property vs multiple properties vs ancestor key?

My application has models such as the following:
class Employee:
name = attr.ib(str)
department = attr.ib(int)
organization_unit = attr.ib(int)
pay_class = attr.ib(int)
cost_center = attr.ib(int)
It works okay, but I'd like to refactor my application to more of a microkernel (plugin) pattern, where there is a core Employee model that just might just have the name, and plugins can add other properties. I imagine perhaps one possible solution might be:
class Employee:
name = attr.ib(str)
labels = attr.ib(list)
An employee might look like this:
name='John Doe'
Perhaps another solution would be to just create an entity for each "label" with the core employee as the ancestor key. One concern with this solution is that currently writes to an entity group are limited to 1 per second, although that limitation will go away (hopefully soon) once Google upgrades existing Datastores to the new "Cloud Firestore in Datastore mode":
I suppose an application-level trade-off between the list property and ancestor keys approaches is that the list approach more tightly couples plugins with the core, whereas the ancestor key has a somewhat more decoupled data scheme (though not entirely).
Are there any other trade-offs I should be concerned with, performance or otherwise?
Personally I would go with multiple properties for many reasons but it's possible to mix all of these solutions for varying degree of flexibility as required by the app. The main trade-offs are
a) You can't do joins in data store, so storing related data in multiple entities will prevent querying with complex where clauses (ancestor key approach)
b) You can't do range queries if you make numeric and date fields as labels (list property approach)
c) The index could be large and expensive if you index your labels field and only a small set of the labels actually need to be indexed
So, one way to think of mixing all these 3 is
a) For your static data and application logic, use multiple properties.
b) For dynamic data that is not going to be used for querying, you can use a list of labels.
c) For a pluggable data that a plugin needs to query on but doesn't need to join with the static data, you can create another entity that again uses a) and b) so the plugin stores all related data together.

Boolean attribute or new table (Django + PostgreSQL)

Situation: I have a Books set. Book can be one of the types: "Test", "Premium" and "Common". Data proportional: 2%, 15%, 83%. Amount query per time unit (in percent): 40%, 20%, 40%
I see some ways for resolve it in database:
Boolean: is_test, is_premium. If we need only "Tests" book: Book.objects.filter(is_test=True). It is can be a proxy model, for example. Analogy for premium books;
Separate Tables: books_test, books_premium, books_common.
Choice field: string in ['Test', 'Premium', 'Common'];
Combine 1 and 2: books_test table and books table with 'is_premium' attribute.
And we need optimally querying this data! All three Book variants need in one page. Exist queryset combinations: only tests, only common, common + premium, only premium.
If we use 1,3 variant: 1 endpoint with specific filter;
If we use 2 variant: one of the tree endpoints without filters (frontend should know what kind endpoint use). Or we can create one endpoint with some conditions and check by backend. Anyway: need extend logic;
Which way is more correct and why?
If you need to mix different types on one page, separate models/tables would complicate things for no good reason. The same goes for mapping more than two exclusive states to a combination of boolean fields.
This leaves you with a choice field or a separate BookType model containing the choices.

Clojure: Difference between defrecord and defschema

I am new to clojure. I want to fetch x records with fields from database and want to insert records into database. Which once should I use between defrecord and defschema in this scenario?
Are those the same?
defschema and defrecord do not refer to database schema ("shape of database") nor to records (i.e., rows in relational DBs).
Schema is a library for describing the shape of your data, and validating whether some data conforms to this shape. It is similar to the more recent clojure.spec. Clojure Records are custom datatypes, which look a bit like Java-classes.
It is easy to be tempted to write "Object Oriented" DB communication with Records for each entity. However, all database contains is data, which is just lists, maps, sets, and some basic data types. I suggest you keep your data in built-in Clojure data structures, ready at hand, and don't hide it in unnecessary abstractions. (Side note: your DB component, instead of DB entity, may very well be a Clojure Record. For example, lifecycle management with Component uses Records.)
A good place to start would be Honey SQL, which allows you to build SQL queries as Clojure data structures. You get back data and can operate on that data with the full might of Clojure.
Then, when you are comfortable with "laying all your data open (without encapsulation)", go and describe the shape of your data, what is valid and what is not. clojure.spec is a powerful tool for that.

Ember index data -vs- show data

How do people deal with index data (the data usually shown on index pages, like a customer list) -vs- the model detail data?
When somebody goes to the customer/index route -- they only need access to a small subset of the full customer resource. Since I am dealing with legacy data, my customer model has > 10 relationships. It seems wasteful to have the api return a complete and full customer representation for every customer just to render a list/select/index view.
I know those relationships are somewhat lazy-loaded, but it still takes effort on the backend to pull all those relationships in. For some relationships (such as customer->invoices) this could be a large list of ids.
I feel answers to this can be very opinionated. But my two cents:
The API you are drawing on for your data should have an end-point to fetch the subset of data you're interested in, e.g. /api/mini-customer vs /api/customer.
You can then either define two separate models (one to represent the model in the list and one to represent the detailed view), or simply populate the original model with the subset of data and merge the extra data in at a later point.
That said, I've also seen plenty of cases such as the one you describe, where you load all data initially and just display the subset to begin with. If it's reasonable that the data will eventually be used and your page-load constraints can handle it, then this can be an acceptable approach.