Chapter 6
Class Factories

As described in Chapter 5, “Metaclasses,” Python classes are also objects. The fact that classes are first-class objects in Python also allows for the possibility to employ other powerful patterns. A class factory is one of these patterns. Essentially, this is a function that creates a class, and does so at runtime. This concept allows for the creation of a class whose attributes are determined, for example, as a result of user input.

This chapter covers class factories, first by reviewing generating classes on the fly, and showing how to do so within functions. Then, it covers a couple of common cases where class factories are valuable.

A Review of type

Recall from the discussion in Chapter 5 that, like other objects in Python, classes are instantiated by a class. For example, say that you create a class, Animal, as shown here:

class Animal(object):
    """A class representing an arbitrary animal."""

    def __init__(self, name):
        self.name = name

    def eat(self):
        pass

    def go__to__vet(self):
        pass

The Animal class is responsible for creating Animal objects when its constructor is called. But, in the same way that Animal creates its objects, so, too, is Animal an object itself. Its class is type, a built-in class in Python that creates all other classes.

type is primary metaclass, and custom metaclasses (as you learned in Chapter 5) subclass type.

It is also possible to invoke type directly to create a class, in lieu of using the class keyword. type takes three positional arguments: name, bases, and attrs, which correspond to the name of the class, the superclass or superclasses for the class (specified as a tuple), and, finally, any attributes for the class, as a dictionary.

Understanding a Class Factory Function

A class factory function is exactly what the name implies—a function that creates and returns a class.

Consider the previous Animal class. You can use code to create an equivalent class using type rather than using the class keyword, as shown here:

def init(self, name):
    self.name = name

def eat(self):
    pass

def go_to_vet(self):
    pass


Animal = type('Animal', (object,), {
    '__doc__': 'A class representing an arbitrary animal.',
    '__init__': init,
    'eat': eat,
    'go_to_vet': go_to_vet,
})

This is not ideal, for several reasons. One of these reasons is that it leaves functions in the namespace alongside Animal. It is usually not desirable to use type directly instead of the class keyword unless you really need to do so.

However, sometimes you do, in fact, need to do so. In this kind of case, you can minimize the clutter by wrapping this code in a function, which can then be passed around and used. This is a class factory. Consider the following function for the example Animal class:

def create_animal_class():
    """Return an Animal class, built by invoking the type
    constructor.
    """
    def init(self, name):
        self.name = name

    def eat(self):
        pass

    def go_to_vet(self):
        pass

    return type('Animal', (object,), {
        '__doc__': 'A class representing an arbitrary animal.',
        '__init__': init,
        'eat': eat,
        'go_to_vet': go_to_vet,
    })

What has changed here? The init, eat, and go_to_vet functions that were previously cluttering the namespace (as well as the creation of the Animal class itself) have been moved inside a create_animal_class function.

Now, you can get a custom-built Animal class by calling said function, as shown here:

Animal = create_animal_class()

It is important to note here that multiple calls to create_animal_class will return distinct classes. That is, while the classes returned would all have the same name and the same attributes, they will not actually be the same class. The similarity between those classes is based on the fact that each run of the function assigns the same dictionary keys and similar functions.

In other words, the similarity between the classes that would be returned is contingent. There is no reason why the function could not take one or more parameters and return wildly different classes based on those parameters. In fact, this is the entire purpose of class factory functions.

Consider the following distinct classes returned from distinct calls to create_animal_class:

>>> Animal1 = create_animal_class()
>>> Animal2 = create_animal_class()
>>> Animal1
<class '_main_.Animal'>
>>> Animal2
<class '_main_.Animal'>
>>> Animal1 == Animal2
False

Similarly, consider the following instances:

>>> animal1 = Animal1('louisoix')
>>> animal2 = Animal2('louisoix')
>>> isinstance(animal1, Animal1)
True
>>> isinstance(animal1, Animal2)
False

While these classes are both called Animal internally, they are not the same class. They are distinct results from two distinct function runs.

This example creates the Animal class by invoking type, but this is actually not necessary. It is far more straightforward to create the class using the class keyword. This works, even within the function, and then you can return the class at the end of the function:

def create_animal_class():
    """Return an Animal class, built using the class keyword
    and returned afterwards.
    """
    class Animal(object):
        """A class representing an arbitrary animal."""
        def __init__(self, name):
            self.name = name

        def eat(self):
            pass

        def go_to_vet(self):
            pass

    return Animal

It is almost always preferable to create a class using the class keyword rather than by invoking type directly. However, it is not always feasible to do so.

Determining When You Should Write Class Factories

The primary reason to write a class factory function is when it is necessary to create a class based on execution-time knowledge, such as user input. The class keyword assumes that you know the attributes you wish to assign to the class (albeit not necessarily the instances) at coding time.

If you do not know the attributes to be assigned to the class at coding time, a class factory function can be a convenient alternative.

Runtime Attributes

Consider the following function that creates a class, but this time, the attributes of that class can vary based on parameters sent to the function:

def get_credential_class(use_proxy=False, tfa=False):
    """Return a class representing a credential for the given service,
    with an attribute repsenting the expected keys.
    """
    # If a proxy, such as Facebook Connect, is being used, we just
    # need the service name and the e-mail address.
    if use_proxy:
        keys = ['service_name', 'email_address']
    else:
        # For the purposes of this example, all other services use
        # username and password.
        keys = ['username', 'password']

        # If two-factor auth is in play, then we need an authenticator
        # token also.
        if tfa:
            keys.append('tfa_token')

    # Return a class with a proper __init__ method which expects
    # all expected keys.
    class Credential(object):
        expected_keys = set(keys)

        def __init__(self, **kwargs):
            # Sanity check: Do our keys match?

            if self.expected_keys != set(kwargs.keys()):
                raise ValueError('Keys do not match.')

            # Write the keys to the credential object.
            for k, v in kwargs.items():
                setattr(self, k, v)

    return Credential

This get_credential_class function is asking for information about the type of login that is occurring—either a traditional login (with username and password), or using an OpenID service. If it is a traditional login, it also may use two-factor authentication, which adds the need for an authentication token.

The function returns a class (not an instance) that represents the appropriate type of credential. For example, if the use_proxy variable is set to True, then the class will be returned with the expected_keys attribute set to ['service_name', 'email_address'], representing the keys necessary to authenticate through the proxy. Alternate inputs to the function will return a class with a different expected_keys attribute.

Then, the __init__ method on the class itself checks the keyword arguments that it gets against the keys identified in the expected_keys attribute. If they do not match, the constructor raises an error. If they do, it writes the values to the instance.

You were able to create this class within the function using the class keyword, rather than invoking type. Because the class block was within the def block, the class was created locally to the function.

Understanding Why You Should Do This

You may be asking why a class factory is even valuable in this case. After all, there are only three possibilities. These classes could just be hard-coded, rather than dynamically created on the fly. That said, it is easy to extrapolate a case from this example where a hard-coded class is no longer tenable.

After all, there are lots of websites with a non-trivial number of authentication paradigms. For example, some use custom usernames, while others use an e-mail address. For development services, you are likely to have an API key and potentially one or more secret tokens.

There is really no way to programmatically determine what credentials a website requires (at least not reliably), but consider a service that did try to represent credentials from lots of different, supported third-party sites. That service would likely store the required keys and types of values in a database.

Now, suddenly, you have a class with attributes generated based on a database lookup. This is important because database lookups happen at runtime, not at coding time. Now, suddenly, you have a functionally infinite number of possibilities for how the expected_keys attribute of the classes might need to be written, and it is no longer feasible to code them all up front.

Storing that kind of data in the database also means that, as the data changes, the code need not do so. A website may alter or augment what kind of credentials it supports, and this would require adding or removing rows from the database, but the Credential class would still be up to the task.

Attribute Dictionaries

Just because some attributes are only known at execution time does not always mean that a class factory is the correct approach. Often, attributes can be written to the class on the fly, or a class can simply store a dictionary with an arbitrary set of attributes.

If this is a sufficient solution, it is likely an easier and more straightforward one.

class MyClass(object):
    attrs = {}

The most common case where attribute dictionaries are most likely to fall short is in a situation where you are subclassing an existing class over which you do not have direct control, and you require the class's existing functionality to work against the modified attributes. You will see a subclassing example shortly.

Fleshing Out the Credential Class

Consider a credentials database with a single table, and that table has two columns: a service name (such as Apple or Amazon), and a credential key (such as username).

This mock database is obviously still far too simple to cover all use cases. In this example, support for alternative modes of login (such as OpenID) has been dropped. Also, the example does not have any concept for presenting credentials in a specific order (username before password, for example). All of this is fine; it is sufficient for a proof of concept.

Now, consider a class factory that reads from this database (which will simply be stored as a CSV flat file) and returns an appropriate class.

import csv


def get_credential_class(service):
    """Return a class representing a credential for the given service,
    with an attribute representing the expected keys.
    """
    # Open our "database".
    keys = []
    with open('creds.csv', 'r') as csvfile:
        for row in csv.reader(csvfile):
            # If this row does not correspond to the service we
            # are actually asking for (e.g., if it is a row for
            # Apple and we are asking for an Amazon credential class),
            # skip it.
            if row[0].lower() != service.lower():
                continue

            # Add the key to the list of expected keys.
            keys.append(row[1])

    # Return a class with a proper __init__ method which expects
    # all expected keys.
    class Credential(object):
        expected_keys = keys

        def __init__(self, **kwargs):
            # Sanity check: Do our keys match?
            if set(self.expected_keys) != set([i for i in kwargs.keys()]):
                raise ValueError('Keys do not match.')

            # Write the keys to the credential object.
            for k, v in kwargs.items():
                setattr(self, k, v)

    return Credential

The inputs for the get_credential_class function have now been entirely replaced. Instead of describing the type of credential, you simply specify whom the credential is for.

For example, a sample CSV “database” might look like this:

Amazon,username
Amazon,password
Apple,email_address
Apple,password
GitHub,username
GitHub,password
GitHub,auth_token

The value that get_credential_class takes is a string, and it corresponds to the first column in the CSV file. Therefore, calling get_credential_class('GitHub') will return a class with expected keys of username, password, and auth_token. The lines in the CSV file corresponding to Apple and Amazon will be skipped.

The Form Example

One place where you can see this concept at work is in the forms API of a popular web framework, Django. This framework includes an abstract class, django.forms.Form, which is used to create HTML forms.

Django forms have a custom metaclass that takes the attributes declared on the form and erects a distinction between form fields and form data. Creating a credential form in this API is very easy if you know what your fields are.

from django import forms


class CredentialForm(forms.Form):
    username = forms.CharField()
    password = forms.CharField(widget=forms.PasswordInput)

On the other hand, if you do not know what your fields are (as in the case of the previous example), this is a more complicated task. A class factory becomes the perfect approach.

import csv

from django import forms


def get_credential_form_class(service):
    """Return a class representing a credential for the given service,
    with attributes representing the expected keys.
    """
    # Open our "database".
    keys = []
    with open('creds.csv', 'r') as csvfile:
        for row in csv.reader(csvfile):
            # If this row does not correspond to the service we
            # are actually asking for (e.g. if it is a row for
            # Apple and we are asking for an Amazon credential class),
            # skip it.
            if row[0].lower() != service.lower():
                continue

            # Add the key to the list of expected keys.
            keys.append(row[1])

    # Put together the appropriate credential fields.
    attrs = {}
    for key in keys:
        field_kw = {}
        if 'password' in key:
            field_kw['widget'] = forms.PasswordInput
        attrs[key] = forms.CharField(**field_kw)

    # Return a form class with the appropriate credential fields.
    metaclass = type(forms.Form)
    return metaclass('CredentialForm', (forms.Form,), attrs)

In this case, you have substituted your custom Credential class for a Django form subclass. It is no longer the case that you are just setting an expected_keys attribute. Rather, you are setting one attribute for each expected key. The previous code puts these together in a dictionary (doing a blatant hand-wave for passwords and PasswordInput), and then creates a new form subclass and returns it.

It is worth calling out explicitly that Django's Form class uses a custom metaclass, which subclasses type. Therefore, it is important that you call its constructor, rather than type directly. You do this on the last two lines by asking forms.Form for its metaclass, and then using that constructor directly.

It is also worth noting that this is a case where it really is necessary to use the metaclass constructor, rather than creating the class using the class keyword. You are not able to create the class using the class keyword here because, even within a function, you would have to create the class and then write the attributes to the class, and the metaclass behavior will not be applied to the attributes assigned to the class after it is built. (Chapter 5 covers this in more detail.)

Dodging Class Attribute Consistency

Another reason to write class factory functions deals with how attributes differ between classes and instances.

Class Attributes Versus Instance Attributes

The following two code blocks do not produce equivalent classes or instances:

##########################
###  CLASS ATTRIBUTE   ###
##########################

class C(object):
    foo = 'bar'


##########################
### INSTANCE ATTRIBUTE ###
##########################

class I(object):
    def __init__(self):
        self.foo = 'bar'

The first and most obvious thing that is different about these classes is where the foo attribute can be accessed. It is not particularly surprising that C.foo is a string, and I.foo raises AttributeError.

>>> C.foo
'bar'
>>> I.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'I' has no attribute 'foo'

After all, foo was instantiated as an attribute on C, but not on I. Since I is being accessed directly, rather than by way of an instance, the __init__ function has not even run yet. Even if an instance of I had been created, the instance would have the foo attribute while the class would not.

>>> i = I()
>>> i.foo
'bar'
>>> I.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'I' has no attribute 'foo'

There is, however, a lesser-noticed difference between C and I, which involves what happens if the foo attribute is modified against one of their instances.

Consider the following two instantiated C instances:

>>> c1 = C()
>>> c2 = C()

Now, say you modify the foo attribute on one of them, as shown here:

>>> c1.foo = 'baz'

You see that the c2 instance still uses the attribute of the class, while c1 has its own.

>>> c1.foo
'baz'
>>> c2.foo
'bar'

The lookup happening here is not quite the same. c1 has written an instance attribute, called foo, with the value of 'baz'. However, c2 has no such instance attribute. However, because the class, C, does, the lookup uses the class attribute.

Consider what happens if you modify the class attribute, as shown here:

>>> C.foo = 'bacon'
>>> c1.foo
'baz'
>>> c2.foo
'bacon'

Here, c1.foo was unaffected, because c1 has an instance attribute called foo. However, the value of c2.foo has changed, because it has no such attribute on the instance. Therefore, when the attribute of the class changes, you observe the change on the instance.

You can view this within Python's internal data model by examining the __dict__ attribute of both instances.

>>> c1.__dict__
{'foo': 'baz'}
>>> c2.__dict__
{}

Under normal circumstances, the special __dict__ attribute is what stores all the attributes (and their values) for an object. There are exceptions to this rule. A class may define a custom __getattr__ or __getattribute__ method (as discussed in Chapter 4, “Magic Methods”), or may define a special attribute __slots__, which also introduces alternative attribute behavior. (This is rarely needed except in particular situations where memory use is paramount, and is not discussed in this book.) Notice that c1 has a foo key in its __dict__, and c2 does not.

The Class Method Limitation

This situation gets really interesting when classes define class methods. Remember that class methods are methods that do not expect or require an instance of the class to execute, but do require the class itself. They are usually declared by decorating a method with the @classmethod decorator, and their first argument is traditionally called cls rather than self.

Consider the following C class with a class method that accesses and returns foo from the class:

class C(object):
    foo = 'bar'

    @classmethod
    def classfoo(cls):
        return cls.foo

In the context of the classfoo method, the foo attribute is being accessed explicitly on the class, rather than on the instance. Re-run the example using the new class definition, and then consider the following:

>>> c1.foo
'baz'
>>> c1.classfoo()
'bacon'
>>> c2.classfoo()
'bacon'

There is, in fact, no actual way to access the instance attribute from the class method. That is the entire point of class methods, after all. They do not require an instance.

Tying This in with Class Factories

One of the biggest reasons to need class factories is when you are subclassing existing classes that rely on class attributes that must be adjusted.

Essentially, in code that you do not control, if an existing class sets a class attribute that must be customized, class factories are an attractive approach to generating appropriate subclasses with the overridden attributes.

Consider a situation where a class has an attribute that must be overridden at runtime (or where there are too many options for subclassing in static code to be reasonable). In this case, a class factory can be a very useful approach. Following is a continuation of the use of C as an instructive example:

def create_C_subclass(new_foo):
    class SubC(C):
        foo = new_foo
    return SubC

What matters here is that it is not necessary to know what the value of foo should be until the class is created, which is when the function runs. Like most other use of class factories, then, this is about knowing the attribute value at runtime.

Running your classfoo class method on C subclasses created this way gives you what you expect.

>>> S = create_C_subclass('spam')
>>> S.classfoo()
'spam'
>>> E = create_C_subclass('eggs')
>>> E.classfoo()
'eggs'

It is worth noting that, in many cases, it is much easier to simply create a subclass that accepts this value as part of its __init__ method. However, there are some cases where this is an insufficient solution. If the parent class relies on class methods, for example, then writing a new value to an instance will not cause the class methods to receive the new value, and this model of subclass creation becomes a valuable solution.

Answering the Singleton Question

One thing that can make class factory functions somewhat awkward to use is that, as their name suggests, their responsibility is to return classes, rather than instances of those classes.

This means that if you want an instance, you must call the result of the class factory function to get one. The correct code to instantiate a subclass generated with create_C_subclass, for example, would be create_C_subclass('eggs')().

There is nothing inherently wrong with this, but it is not always what you really want. Sometimes classes created through class factories are functionally singletons. A singleton is a class pattern where only one instance is permitted.

In the case of classes generated in functions, it is possible that the purpose of the function is simply to act like a class constructor. This is problematic if the end developer must constantly think about instantiating the class that comes back.

This is not a requirement, though. If there is not a need to deal with reusing the class elsewhere, or if the class factory is able to handle the reuse itself, it is completely reasonable and useful to simply have the class factory return an instance of the class it creates, rather than the class itself.

To continue the simple example of C, consider this factory:

def CPrime(new_foo='bar'):
    # If ‘foo‘ is set to 'bar', then we do not need a
    # custom subclass at all.
    if new_foo = 'bar':
        return C()

    # Create a custom subclass and return an instance.
    class SubC(C):
        foo = new_foo
    return SubC()

Now, calling CPrime will return an instance of the appropriate C subclass with the foo attribute modified as needed.

One issue with this is that many (probably most) classes do expect arguments to be sent to their __init__ methods, which this function is not able to handle. The pattern for this is simple enough, though. Consider an example of a credential form, with the method retooled to return an instance.

import csv

from django import forms


def get_credential_form(service, *args, **kwargs):
    """Return a form instance representing a credential for the
    given service.
    """
    # Open our "database".
    keys = []
    with open('creds.csv', 'r') as csvfile:
        for row in csv.reader(csvfile):
            # If this row does not correspond to the service we
            # are actually asking for (e.g. if it is a row for
            # Apple and we are asking for an Amazon credential class),
            # skip it.
            if row[0].lower() != service.lower():
                continue

            # Add the key to the list of expected keys.
            keys.append(row[1])

    # Put together the appropriate credential fields.
    attrs = {}
    for key in keys:
        field_kw = {}
        if 'password' in key:
            field_kw['widget'] = forms.PasswordInput
        attrs[key] = forms.CharField(**field_kw)

    # Return a form class with the appropriate credential fields.
    metaclass = type(forms.Form)
    cls = metaclass('CredentialForm', (forms.Form,), attrs)
    return cls(*args, **kwargs)

This does not actually entail very many changes from the previous class factory. There are really only two changes:

First, *args and **kwargs have been added to the function signature.
Second, the final line now returns an instance of the class that was created, with the *args and **kwargs passed to the instance.

Now you have an entirely functional class factory, which returns an instance of the form class that it creates. This raises a final point. Now the function is likely indistinguishable from a class to the end developer, unless said end developer inspects the inner workings. Therefore, perhaps it should be presented as one in the naming convention.

def CredentialForm(service, *args, **kwargs):
    [...]

In Python, functions are normally named with all lowercased letters, and with underscores for word separation. However, this is a function that is being used like a class constructor by developers who actually use it, so by changing the naming convention, you present it as a class name.

Conveniently, the name also matches the name of the class used for the instances, because the first argument to the metaclass' constructor, 'CredentialForm', is the internal name of the class.

And, this is Python. If it looks like a duck and quacks like a duck…

Summary

The power of class factories shows itself when it is necessary to have class attributes be determined at runtime, rather than at coding time. The Python language is able to handle this situation precisely because classes are first-class objects, and can be created similarly to how any other object is created.

On the other hand, classes containing unknown attributes add some uncertainty. Their methods must be written to allow for an attribute to be present or absent, where, in other cases, the presence of the attribute may be able to be assumed.

The ability to declare classes at runtime is extremely powerful, but brings with it a tradeoff in simplicity. This is fine. When you encounter a situation where class factories are the right answer, it is often salient, and there is often no other direct way to solve the issue. Put directly, you can be reasonably sure that a class factory is a good approach if it is the simplest approach.

That rule holds true for programming generally, but it is a particularly useful one here.

Chapter 7, “Abstract Base Classes,” discusses Python strings and bytestrings, and how to manage string data with minimal pain.