RFC: Django models repr

On february 27th, there was an internal bug that made all celery tasks for creating trades fail. These tasks could have been re-executed by taking them from djcelery.task_state table, where you can see failed tasks with their arguments (Incident 20200227_A). However these arguments are stored as strings of their representation, and where the argument is the instance of a Django model, its representation can be harder to decode depending on the model.

Problem Description

Main point is Python’s discussion about __repr__ vs __str__ (or __unicode__)

  • __repr__ goal is to be unambiguous

  • __str__ goal is to be readable

__repr__ should look like a valid Python expression that could be used to recreate an object with the same value

Taking BOS as reference, we have all our models classes implementing __unicode__ method but none implementing its __repr__. Django has however __repr__ method in its Model class that makes the following:

def __repr__(self):
    try:
        u = six.text_type(self)
    except (UnicodeEncodeError, UnicodeDecodeError):
        u = '[Bad Unicode data]'
    return force_str('<%s: %s>' % (self.__class__.__name__, u))

So, it’s a composition of class name plus unicode method, and class where our unicode is ambiguous, we will be losing this capacity to recreate the object from its repr, for example, in the arguments stored in database associated with celery tasks.

This won’t actually solve the initially described issue of re-executing celery tasks, it will be just an option to help handling manual interventions, and general good practice for a Python system.

Unexpected impact example:

We have unicode method in SpotDeal class defined like this:

def __unicode__(self):
    return self.reference

@property
def reference(self):
    if self.deal:
        return self.deal.reference
    return "ORPHAN"

This makes that executing "SpotDeal.objects.all()" in a shell throws 22 queries instead of 1 in database because each unicode representation requires an additional query on Deal table. This is inefficient.

Proposed Change

Define a new repr for all classes and make all models inherit from it.

class ModelRepresentationMixin(object):
    def __repr__(self):
        return force_str('<{}: {}>'.format(self.__class__._meta.model, getattr(self, 'pk')))

class SpotDeal(ModelRepresentationMixin, models.Model):
    ...

Similar implementation of __repr__ method would be needed also in non Model classes like BankAccountInfo, that inherits from object, but is an abstraction for selecting one of BankAccount or BankAccountMask which are Models.

Alternatives

I think we should have this in any case. Not having it already with our current definition of __unicode__ methods seems like a bad decision.

Any other implementation of method is possible as long as it is unambiguous.

We can also implement __repr__ method only in classes where we hav ambiguous __unicode__, or just avoid ambiguous __unicode__ methods (which we know we won’t do in a near future)

Security Impact

AFAIK it shouldn't have any impact. I have to check if rest_framework is able to receive an instance and render it using repr.

Other End User Impact

Support will need to be aware of this, because the existing implementation gives them useful info when they’re debugging a production incident, so we’ll need to let them know to cast the models to str to see what they used to see before.

My main concern with this is the uncertainty of its impact. Is there anything I am missing?

Change will affect as soon as it is deployed in the way elements are shown. If any person is really stuck to what he is habituated to see, it could look weird. This shouldn’t be a problem as far as it is well notified.

Performance Impact

Performance could be even better where repr is rendered with a single method instead of calculations done in some unicode.

Implementation

Assignee(s)

Change only takes a couple of hours to update model inheritance, I can handle it.

References