django - why is the request.POST object immutable?

django - why is the request.POST object immutable? - django

As the title asks, why did the Django guys decide to implement the request.POST object with a querydict (which, of course, in turn, makes the whole thing immutable?)
I know you can mutify it by making a copy of the post data
post = request.POST.copy()
but why do this? Surely it would be simpler just to allow the thing to be mutable anyway? Or is it being used for some other reason too which might cause issue?

It's a bit of a mystery, isn't it? Several superficially plausible theories turn out to be wrong on investigation:
So that the POST object doesn't have to implement mutation methods? No: the POST object belongs to the django.http.QueryDict class, which implements a full set of mutation methods including __setitem__, __delitem__, pop and clear. It implements immutability by checking a flag when you call one of the mutation methods. And when you call the copy method you get another QueryDict instance with the mutable flag turned on.
For performance improvement? No: the QueryDict class gains no performance benefit when the mutable flag is turned off.
So that the POST object can be used as a dictionary key? No: QueryDict objects are not hashable.
So that the POST data can be built lazily (without committing to read the whole response), as claimed here? I see no evidence of this in the code: as far as I can tell, the whole of the response is always read, either directly, or via MultiPartParser for multipart responses.
To protect you against programming errors? I've seen this claimed, but I've never seen a good explanation of what these errors are, and how immutability protects you against them.
In any case, POST is not always immutable: when the response is multipart, then POST is mutable. This seems to put the kibosh on most theories you might think of. (Unless this behaviour is an oversight.)
In summary, I can see no clear rationale in Django for the POST object to be immutable for non-multipart requests.

If the request was the result of a Django form submission, then it is reasonable for POST being immutable to ensure the integrity of the data between the form submission and the form validation. However, if the request was not sent via a Django form submission, then POST is mutable as there is no form validation.
You can always do something like this: (as per #leo-the-manic's comment)
# .....
mutable = request.POST._mutable
request.POST._mutable = True
request.POST['some_data'] = 'test data'
request.POST._mutable = mutable
# ......

Update:
Gareth Rees was right that point 1 & 3 were not valid in this case. Though I think point 2 and 4 are still valid, therefore I will leave theses here.
(I noticed that the request.POST object of both Pyramid(Pylon) and Django is some form of MultiDict. So perhaps it is a more common practice than making request.POST immutable.)
I can't speak for the Django guys, though it seems to me that it could because of some of these reasons:
Performence. immutable objects are "faster" over mutable ones in that they allow substantial optimizations. An object is immutable means that we can allocate space for it at creation time, and the space requirements are not changing. It also has things like copy efficiency and comparison efficiency because of it.
Edit: this is not the case for QueryDict as Gareth Rees pointed out.
In the case of request.POST, it seems no activity in the server side should need to alter the request's data. And hence immutable objects are more suited, not to mention they have substantial performence advantage.
Immutable objects can be used as dict keys, which I suppose could be very useful somewhere in Django..
Edit: my mistake, immutable does not directly imply hashable; hashable objects however, are typically immutable as well.
When you pass around request.POST (especially to third-party plugins and out), you can expect that this request object from the user will remain unchanged.
In some way these reasons are also generic answers to "immutable vs mutable?" question. I am certain there are much more design considerations than above in the Django case.

I like it being immutable by default.
As pointed out you can make it mutable if you need to but you must be explicit about it.
It is like 'I know that I can make my form debugging a nightmare but I know what I am doing now.'

I found this in a comment on Stack Answer https://stackoverflow.com/a/2339963
And it must be immutable so that it can be built lazily. The copy forces getting all the POST data. Until the copy, it may not all be fetched. Further, for a multi-threaded WSGI server to work reasonably well, it's helpful if this is immutable

Please note: multipart request are immutable since Django 1.11
https://github.com/django/django/blob/stable/1.11.x/django/http/multipartparser.py#L292
They were mutable in previous versions.

Related

Design of a small object pool in C++ which helps reduce "repetitive operations on objects"

I have a system where I have to "update certain data members of an object again and again in every "execution path"".
The thing is, depending on the type of object, at least 40-60 % of the "data members" that I update are hard coded values.
What I want is to do these hardcodings only once and then use the "hardcodings-already-done" object to update the data members that actually need on-the-fly updating.
This will make my code significantly faster, as I am doing a number of string assignments (50-100 depending on the type of object) as part of "hardcodings".
Clearly I cannot use references of "hardcodings-already-done" in an object cache because once I use that reference to build my final object, there will be a lot of "dirty fields" as I would be updating the "on-the-fly" fields. The same reference cannot be used next time (Unless I write an "erase dirty fields" routine).
Any ideas on the design. It feels like such kind of problems are routine. There is probably a well accepted pattern about this I don't know of.
Sorry, I have no code, this is basically a design question till now.

I think you're talking about the Prototype design pattern

What exactly is a safe method in REST web services?

I am absolutly new in REST and I have the following doubt about what are safe method in REST and what are idempotent method.
I know (but it could be wrong) that GET, HEAD, OPTIONS and TRACE methods are defined as safe because they are only intended for retrieving data.
But now I am reading this article: http://restcookbook.com/HTTP%20Methods/idempotency/ and it say that:
Safe methods are HTTP methods that do not modify resources. For
instance, using GET or HEAD on a resource URL, should NEVER change the
resource.
And untill here it is ok, it is nothing different from what I yek know, but after it assert that:
However, this is not completely true. It means: it won't change the
resource representation. It is still possible, that safe methods do
change things on a server or resource, but this should not reflect
in a different representation.
What exactly means this assertion? What exactly is a representation? and what means that a safe method so change on a resource but that this change is not refleted into a different representation?
Then it does also this example:
GET /blog/1234/delete HTTP/1.1
and say that it is incorrect if this would actually delete the blogpost and assert that:
Safe methods are methods that can be cached, prefetched without any
repercussions to the resource.

What exactly is a representation?
A "representation" is the data that is returned from the server that represents the state of the object. So if you GET at http://server/puppy/1 it should return a "representation" of the puppy (because, it can't return the actual puppy of course.)
However, this is not completely true. It means: it won't change the
resource representation. It is still possible, that safe methods do
change things on a server or resource, but this should not reflect in
a different representation.
What exactly means this assertion?
They mean that if you GET /server/puppy/1 two times in a row, it should give you the same response. However, imagine you have a field that contains the number of times each puppy was viewed. That field is used to provide a page listing the top 10 most viewed puppies. That information is provided via GET /server/puppystats. It is okay for GET /server/puppy/1 to update that information. But it should not update information about the puppy itself. Or, if it DOES update the information about the puppy itself, that information is not part of the representation of the puppy returned by GET /server/puppy/1. It is only part of some other representation that is available via another URL.
If it helps, this is a similar concept to the "mutable" keyword in C++ when applied to a const object. "mutable" allows you to modify the object, but it should not modify it in a way that is visible outside of the class.

Django: Class based view instantiated for each request, is it efficient?

From Django documentation:
While your class is instantiated for each request dispatched to it,
class attributes set through the as_view() entry point are configured
only once at the time your URLs are imported.
Will it not be inefficient to instantiate view per request considering heavy concurrent traffic ?

Beside the comment from jpmc26 I would guess its not a big problem. If you follow the workflow Django is doing from when a request is coming in until the response is rendered, there are way more steps involved which initiate objects. The class-based view is probably the least of the problem, assuming you didn't implement it to be blocking.

I'm not sure why you think memory would be an issue (if you had been talking about time taken, you might have had an argument, but see jpmc26's comment).
Python's memory allocation/deallocation is done by means of reference counting, not timed garbage collection. As soon as an object goes out of scope, assuming it has no circular references, it is destroyed. So if the server has enough memory to serve the request and allocate the object in the first place, there's no danger of it hanging around past its useful lifetime.

Why does django ORM's `save` method not return the saved object?

Any insight into the reasoning behind this design decision? It seems to me that having obj.save() return something, has only benefits (like method chaining) and no drawbacks.

It's generally considered good practice in Python to have functions that primarily affect existing objects not return themselves. For instance, sorted(yourlist) returns a sorted list but yourlist.sort() sorts the list in-place and does not return anything.
Performing multiple operations with side-effects (as opposed to no-side-effect functions where the focus is on the return value) on a single line is not really good practice. The code will be more compact in terms of number of lines, but it will be harder to read because important side-effects may be buried in the middle of a chain. If you want to use method chaining, use functions with no side effects in the beginning of the chain and then have a single function with a side effect like .save() at the end.
To put it another way, in a method chain, the beginning of the chain is the input, the middle of the chain transforms the input (navigating down a tree, sorting the input, changing case of a string etc) and the end of the chain is the functional part that does work with side-effects. If you bury methods with side-effects in the middle of the chain then it will be unclear what your method chain actually does.

This reminds me of the general principle that Greg Ward espoused at Pycon2015 recently, not to confuse functions with procedures. Every function should return a value or have a side-effect, but not both.
Basically the same question is asked of dict.update().

Since this is the first result that I get when searching for "django return saved object", to compliment Andrew's answer, if you still want to return the saved object, instead of using:
ExampleModel(title=title).save()
which returns None, you'd use:
saved_instance = ExampleModel.objects.create(title=title)
And this works because ExampleModel.objects is a Model Manager rather than an instance of the class, so it's not returning itself.

C++ class design from database schema

I am writing a perl script to parse a mysql database schema and create C++ classes when necessary. My question is a pretty easy one, but us something I haven't really done before and don't know common practice. Any object of any of classes created will need to have "get" methods to populate this information. So my questions are twofold:
Does it make sense to call all of the get methods in the constructor so that the object has data right away? Some classes will have a lot of them, so as needed might make sense too. I have two constrcutors now. One that populates the data and one that does not.
Should I also have a another "get" method that retrieves the object's copy of the data rather that the db copy.
I could go both ways on #1 and am leaning towards yes on #2. Any advice, pointers would be much appreciated.

Ususally, the most costly part of an application is round trips to the database, so it would me much more efficient to populate all your data members from a single query than to do them one at a time, either on an as needed basis or from your constructor. Once you've paid for the round trip, you may as well get your money's worth.
Also, in general, your get* methods should be declared as const, meaning they don't change the underlying object, so having them go out to the database to populate the object would break that (which you could allow by making the member variables mutable, but that would basically defeat the purpose of const).
To break things down into concrete steps, I would recommend:
Have your constructor call a separate init() method that queries the database and populates your object's data members.
Declare your get* methods as const, and just have them return the data members.

First realize that you're re-inventing the wheel here. There are a number of decent object-relational mapping libraries for database access in just about every language. For C/C++ you might look at:
http://trac.butterfat.net/public/StactiveRecord
http://debea.net/trac
Ok, with that out of the way, you probably want to create a static method in your class called find or search which is a factory for constructing objects and selecting them from the database:
Artist MJ = Artist::Find("Michael Jackson");
MJ->set("relevant", "no");
MJ->save();
Note the save method which then takes the modified object and stores it back into the database. If you actually want to create a new record, then you'd use the new method which would instantiate an empty object:
Artist StackOverflow = Artist->new();
StackOverflow->set("relevant", "yes");
StackOverflow->save();
Note the set and get methods here just set and get the values from the object, not the database. To actually store elements in the database you'd need to use the static Find method or the object's save method.

there are existing tools that reverse db's into java (and probably other languages). consider using one of them and converting that to c++.

I would not recommend having your get methods go to the database at all, unless absolutely necessary for your particular problem. It makes for a lot more places something could go wrong, and probably a lot of unnecessary reads on your DB, and could inadvertently tie your objects to db-specific features, losing a lot of the benefits of a tiered architecture. As far as your domain model is concerned, the database does not exist.
edit - this is for #2 (obviously). For #1 I would say no, for many of the same reasons.

Another alternative would be to not automate creating the classes, and instead create separate classes that only contain the data members that individual executables are interested in, so that those classes only pull the necessary data.
Don't know how many tables we're talking about, though, so that may explode the scope of your project.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js