Usage of .to_representation() and .to_internal_value in django-rest-framework? - django

What do .to_representation() and .to_internal_value do in serializers?
If I pass data to a serializer, is the data thrown to_representation() first?
What's the usage of these two?

If you want to create a custom field, you'll need to subclass Field
and then override either one or both of the .to_representation() and
.to_internal_value() methods. These two methods are used to convert
between the initial datatype, and a primitive, serializable datatype.
Primitive datatypes will typically be any of a number, string,
boolean, date/time/datetime or None. They may also be any list or
dictionary like object that only contains other primitive objects.
Other types might be supported, depending on the renderer that you are
using.
The .to_representation() method is called to convert the initial
datatype into a primitive, serializable datatype.
The to_internal_value() method is called to restore a primitive
datatype into its internal python representation. This method should
raise a serializers.ValidationError if the data is invalid.
Note that the WritableField class that was present in version 2.x no
longer exists. You should subclass Field and override
to_internal_value() if the field supports data input.
Ref:
http://www.django-rest-framework.org/api-guide/fields/#custom-fields
https://github.com/tomchristie/django-rest-framework/blob/master/rest_framework/serializers.py#L417

Related

How model objects are sorted in Django

What is performed behind the scenes for the following code: sorted(MyModel.objects.all())?
Is it __lt__? How is it defined?
Thank you.
It just converts queryset into a list and tries to sort the objects. Since I don't think Django has defined a default implementation of comparison methods in model's base class, so most likely it is sorting them on the basis of a random parameter like the memory address.
If you want the database to sort it for you, then you would either have to use order_by provided by the queryset or you can provide key to sorted method itself.
sorted(MyModel.objects.all(), key='pk')
You can use ordering field on Model's Meta for defining a field to sort.
ordering = ['-order_date'] # order_date field reverse sorted
Look for more information: Model Meta Ordering
Actually, in Python 3 at least, this code will give you an error:
'<' not supported between instances of 'MyModel' and 'MyModel'
This is because, as noted in other answers, models.Model does not define any comparison methods; in this case Python does not guess, as recommended by the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.

Ensuring that all fields of a class are serialized

I am writing serialize / de-serialize methods on an existing C++ class that contains several fields where most of these fields are complex, nested data structures themselves. What are some techniques I could use to ensure that I had serialized all the fields? I am using the Boost serialization library for this project.
One method I can think of is to write an 'equal' function for all the data types and write unit-tests that would assert that the original object was equal to the de-serialized object. However, this method seems error prone since amongst other things, the equals function would require an update whenever a field was added to the data type.

Finding protocol buffer message type from serialized data

I have some binary data, which was obtained by serializing a google protocol buffer class.
How do I find out, at runtime, the class for which the data was serialized.
For example, suppose i have a class abc. I serialized this class abc into binary data.
Is there any way of validating that this binary data was obtained by serializing class abc, and not some other class?
Further, if i parse this binary data of class abc by the parse method of class xyz, how would I know if the parse was successful.
protobuf does not include any type information on the wire (unless you do that yourself external to protobuf). As such you cannot strictly validate that - which is actually a good thing, because it means that types are interchangeable and compatible. As long as class abc has a compatible contract to the other type, it will work. By "compatible" here, I mean: for any field-numbers that are common to both, they have compatible wire-types. If abc declares field 4 to be a string, and the other class declares field 4 to be a double-precision number, then it will fail at deserialize.
One other "signal" you could use is the omission of required fields: if abc always includes field 3, but you get data that omits field 3, then it probably isn't an abc. Note that protobuf is designed to be version tolerant, though: you can't assume that extra fields mean it isn't an abc, as it could be that the data is using a later version of the contract, or is using extension fields. Likewise, missing optional fields could be missing because either they simply chose not to provide a value, or that field is not declared on the version of the contract they are using.
Re testing for a successful parse: that will be implementation specific. I would imagine that the c++ implementation will have either a return-value to check, or a flag field to check. I don't use that api myself so I cannot say. On some other platforms I would expect an exception to be thrown (java, .net, etc) if there was a critical issue.

Scala empty a list

I have a member variable in a class:
val options = mutable.LinkedList[SelectOption]()
I latter then populate this list from the database.
At some point I want to refresh the list. How do I empty it?
In java:
options.clear();
Is there an equivalent in Scala?
Do not use LinkedList. That is a low level collection which provides a data structure that can be manipulated at user's will... and responsibility.
Instead, use one of the Buffer classes, which have the clear method. This method, by the way, is inherited from the Clearable trait, so you can just look at classes that extend Clearable.

Pre-serialisation message objects - implementation?

I have a TCP client-server setup where I need to be able to pass messages of different formats at different times, using the same transmit/receive infrastructure.
Two different types of messages sent from client to server might be:
TIME_SYNC_REQUEST: Requesting server's game time. Contains no information other than the message type.
UPDATE: Describes all changes to game state that happened since the last update that was posted (if this is not the very first one after connecting), so that the server may update its data model where it sees fit.
(The message type to be included in the header, and any data to be included in the body of the message.)
In dynamic languages, I'd create an AbstractMessage type, and derive two different message types from it, with TimeSyncRequestMessage accommodating no extra data members, and UpdateMessage containing all necessary members (player position etc.), and use reflection to see what I need to actually serialise for socket send(). Since the class name describes the type, I would not even need an additional member for that.
In C++: I do not wish to use dynamic_cast to mirror the approach described above, for performance reasons. Should I use a compositional approach, with dummy members filling in for any possible data, and a char messageType? I guess another possibility is to keep different message types in differently-typed lists. Is this the only choice? Otherwise, what else could I do to store the message info until it's time to serialise it?
Maybe you can let the message class to do the serialization - Define a serialize interface, and each message implements this interface. So at the time you want to serialize and send, you call AbstractMessage::Serialize() to get the serialized data.
Unless you have some very high performance characteristics, I would use a self describing message format. This typically use a common format (say key=value), but no specific structure, instead known attributes would describe the type of the message, and then any other attributes can be extracted from that message using logic specific to that message type.
I find this type of messaging retains better backward compatibility - so if you have new attributes you want to add, you can add away and older clients will simply not see them. Messaging that uses fixed structures tend to fare less well.
EDIT: More information on self describing message formats. Basically the idea here is that you define a dictionary of fields - this is the universe of fields that your generic message contains. Now a message be default must contain some mandatory fields, and then it's up to you what other fields are added to the message. The serialization/deserialization is pretty straightforward, you end up constructing a blob which has all the fields you want to add, and at the other end, you construct a container which has all the attributes (imagine a map). The mandatory fields can describe the type, for example you can have a field in your dictionary which is the message type, and this is set for all messages. You interrogate this field to determine how to handle this message. Once you are in the handling logic, you simply extract the other attributes the logic needs from the container (map) and process them.
This approach affords the best flexibility, allows you to do things like only transmit fields that have really changed. Now how you keep this state on either side is up to you - but given you have a one-to-one mapping between message and the handling logic - you need neither inheritance or composition. The smartness in this type of system stems from how you serialize the fields (and deserialize so that you know what attribute in the dictionary the field is). For an example of such a format look at the FIX protocol - now I wouldn't advocate this for gaming, but the idea should demonstrate what a self describing message is.
EDIT2: I cannot provide a full implementation, but here is a sketch.
Firstly let me define a value type - this is the typical type of values which can exist for a field:
typedef boost::variant<int32, int64, double, std::string> value_type;
Now I describe a field
struct field
{
int field_key;
value_type field_value;
};
Now here is my message container
struct Message
{
field type;
field size;
container<field> fields; // I use a generic "container", you can use whatever you want (map/vector etc. depending on how you want to handle repeating fields etc.)
};
Now let's say that I want to construct a message which is the TIME_SYNC update, use a factory to generate me an appropriate skeleton
boost::unique_ptr<Message> getTimeSyncMessage()
{
boost::unique_ptr<Message> msg(new Message);
msg->type = { dict::field_type, TIME_SYNC }; // set the type
// set other default attributes for this message type
return msg;
}
Now, I want to set more attributes, and this is where I need a dictionary of supported fields for example...
namespace dict
{
static const int field_type = 1; // message type field id
// fields that you want
static const int field_time = 2;
:
}
So now I can say,
boost::unique_ptr<Message> msg = getTimeSyncMessage();
msg->setField(field_time, some_value);
msg->setField(field_other, some_other_value);
: // etc.
Now the serialization of this message when you are ready to send is simply stepping through the container and adding to the blob. You can use ASCII encoding or binary encoding (I would start with former first and then move to latter - depending on requirements). So an ASCII encoded version of the above could be something like:
1=1|2=10:00:00.000|3=foo
Here for arguments sake, I use a | to separate the fields, you can use something else that you can guarantee doesn't occur in your values. With a binary format - this is not relevant, the size of each field can be embedded in the data.
The deserialization would step through the blob, extract each field appropriately (so by seperating by | for example), use the factory methods to generate a skeleton (once you've got the type - field 1), then fill in all the attributes in the container. Later when you want to get a specific attribute - you can do something like:
msg->getField(field_time); // this will return the variant - and you can use boost::get for the specific type.
I know this is only a sketch, but hopefully it conveys the idea behind a self describing format. Once you've got the basic idea, there are lots of optimizations that can be done - but that's a whole another thing...
A common approach is to simply have a header on all of your messages. for example, you might have a header structure that looks like this:
struct header
{
int msgid;
int len;
};
Then the stream would contain both the header and the message data. You could use the information in the header to read the correct amount of data from the stream and to determine which type it is.
How the rest of the data is encoded, and how the class structure is setup, greatly depends on your architecture. If you are using a private network where each host is the same and runs identical code, you can use a binary dump of a structure. Otherwise, the more likely case, you'll have a variable length data structure for each type, serialized perhaps using Google Protobuf, or Boost serialization.
In pseudo-code, the receiving end of a message looks like:
read_header( header );
switch( header.msgid )
{
case TIME_SYNC:
read_time_sync( ts );
process_time_sync( ts );
break;
case UPDATE:
read_update( up );
process_update( up );
break;
default:
emit error
skip header.len;
break;
}
What the "read" functions look like depends on your serialization. Google protobuf is pretty decent if you have basic data structures and need to work in a variety of languages. Boost serialization is good if you use only C++ and all code can share the same data structure headers.
A normal approach is to send the message type and then send the serialized data.
On the receiving side, you receive the message type and based on that type, you instantiate the class via a factory method (using a map or a switch-case), and then let the object deserialize the data.
Your performance requirements are strong enough to rule out dynamic_cast? I do not see how testing a field on a general structure can possibly be faster than that, so that leaves only the different lists for different messages: you have to know by some other means the type of your object on every case. But then you can have pointers to an abstract class and do a static cast over those pointers.
I recommend that you re-assess the usage of dynamic_cast, I do not think that it be deadly slow for network applications.
On the sending end of the connection, in order to construct our message, we keep the message ID and header separate from the message data:
Message is a type that holds only the messageCategory and messageID.
Each such Message is pushed onto a unified messageQueue.
Seperate hashes are kept for data pertaining to each of the messageCategorys. In these, there is a record of data for each message of that type, keyed by messageID. The value type depends on the message category, so for a TIME_SYNC message we'd have a struct TimeSyncMessageData, for instance.
Serialisation:
Pop the message from the messageQueue, reference the appropriate hash for that message type, by messageID, to retrieve the data we want to serialise & send.
Serialise & send the data.
Advantages:
No potentially unused data members in a single, generic Message object.
An intuitive setup for data retrieval when the time comes for serialisation.