As described in Chapter 5, “Metaclasses,” Python classes are also objects. The fact that classes are first-class objects in Python also allows for the possibility to employ other powerful patterns. A class factory is one of these patterns. Essentially, this is a function that creates a class, and does so at runtime. This concept allows for the creation of a class whose attributes are determined, for example, as a result of user input.
This chapter covers class factories, first by reviewing generating classes on the fly, and showing how to do so within functions. Then, it covers a couple of common cases where class factories are valuable.
Recall from the discussion in Chapter 5 that, like other objects in Python, classes are instantiated by a class. For example, say that you create a class, Animal
, as shown here:
class Animal(object):
"""A class representing an arbitrary animal."""
def __init__(self, name):
self.name = name
def eat(self):
pass
def go__to__vet(self):
pass
The Animal
class is responsible for creating Animal
objects when its constructor is called. But, in the same way that Animal
creates its objects, so, too, is Animal
an object itself. Its class is type
, a built-in class in Python that creates all other classes.
type
is primary metaclass, and custom metaclasses (as you learned in Chapter 5) subclass type
.
It is also possible to invoke type
directly to create a class, in lieu of using the class keyword. type
takes three positional arguments: name
, bases
, and attrs
, which correspond to the name of the class, the superclass or superclasses for the class (specified as a tuple), and, finally, any attributes for the class, as a dictionary.
A class factory function is exactly what the name implies—a function that creates and returns a class.
Consider the previous Animal
class. You can use code to create an equivalent class using type
rather than using the class
keyword, as shown here:
def init(self, name):
self.name = name
def eat(self):
pass
def go_to_vet(self):
pass
Animal = type('Animal', (object,), {
'__doc__': 'A class representing an arbitrary animal.',
'__init__': init,
'eat': eat,
'go_to_vet': go_to_vet,
})
This is not ideal, for several reasons. One of these reasons is that it leaves functions in the namespace alongside Animal
. It is usually not desirable to use type
directly instead of the class
keyword unless you really need to do so.
However, sometimes you do, in fact, need to do so. In this kind of case, you can minimize the clutter by wrapping this code in a function, which can then be passed around and used. This is a class factory. Consider the following function for the example Animal
class:
def create_animal_class():
"""Return an Animal class, built by invoking the type
constructor.
"""
def init(self, name):
self.name = name
def eat(self):
pass
def go_to_vet(self):
pass
return type('Animal', (object,), {
'__doc__': 'A class representing an arbitrary animal.',
'__init__': init,
'eat': eat,
'go_to_vet': go_to_vet,
})
What has changed here? The init
, eat
, and go_to_vet
functions that were previously cluttering the namespace (as well as the creation of the Animal
class itself) have been moved inside a create_animal_class
function.
Now, you can get a custom-built Animal
class by calling said function, as shown here:
Animal = create_animal_class()
It is important to note here that multiple calls to create_animal_class
will return distinct classes. That is, while the classes returned would all have the same name and the same attributes, they will not actually be the same class. The similarity between those classes is based on the fact that each run of the function assigns the same dictionary keys and similar functions.
In other words, the similarity between the classes that would be returned is contingent. There is no reason why the function could not take one or more parameters and return wildly different classes based on those parameters. In fact, this is the entire purpose of class factory functions.
Consider the following distinct classes returned from distinct calls to create_animal_class
:
>>> Animal1 = create_animal_class()
>>> Animal2 = create_animal_class()
>>> Animal1
<class '_main_.Animal'>
>>> Animal2
<class '_main_.Animal'>
>>> Animal1 == Animal2
False
Similarly, consider the following instances:
>>> animal1 = Animal1('louisoix')
>>> animal2 = Animal2('louisoix')
>>> isinstance(animal1, Animal1)
True
>>> isinstance(animal1, Animal2)
False
While these classes are both called Animal
internally, they are not the same class. They are distinct results from two distinct function runs.
This example creates the Animal
class by invoking type
, but this is actually not necessary. It is far more straightforward to create the class using the class
keyword. This works, even within the function, and then you can return the class at the end of the function:
def create_animal_class():
"""Return an Animal class, built using the class keyword
and returned afterwards.
"""
class Animal(object):
"""A class representing an arbitrary animal."""
def __init__(self, name):
self.name = name
def eat(self):
pass
def go_to_vet(self):
pass
return Animal
It is almost always preferable to create a class using the class
keyword rather than by invoking type
directly. However, it is not always feasible to do so.
The primary reason to write a class factory function is when it is necessary to create a class based on execution-time knowledge, such as user input. The class
keyword assumes that you know the attributes you wish to assign to the class (albeit not necessarily the instances) at coding time.
If you do not know the attributes to be assigned to the class at coding time, a class factory function can be a convenient alternative.
Consider the following function that creates a class, but this time, the attributes of that class can vary based on parameters sent to the function:
def get_credential_class(use_proxy=False, tfa=False):
"""Return a class representing a credential for the given service,
with an attribute repsenting the expected keys.
"""
# If a proxy, such as Facebook Connect, is being used, we just
# need the service name and the e-mail address.
if use_proxy:
keys = ['service_name', 'email_address']
else:
# For the purposes of this example, all other services use
# username and password.
keys = ['username', 'password']
# If two-factor auth is in play, then we need an authenticator
# token also.
if tfa:
keys.append('tfa_token')
# Return a class with a proper __init__ method which expects
# all expected keys.
class Credential(object):
expected_keys = set(keys)
def __init__(self, **kwargs):
# Sanity check: Do our keys match?
if self.expected_keys != set(kwargs.keys()):
raise ValueError('Keys do not match.')
# Write the keys to the credential object.
for k, v in kwargs.items():
setattr(self, k, v)
return Credential
This get_credential_class
function is asking for information about the type of login that is occurring—either a traditional login (with username and password), or using an OpenID service. If it is a traditional login, it also may use two-factor authentication, which adds the need for an authentication token.
The function returns a class (not an instance) that represents the appropriate type of credential. For example, if the use_proxy
variable is set to True
, then the class will be returned with the expected_keys
attribute set to ['service_name', 'email_address']
, representing the keys necessary to authenticate through the proxy. Alternate inputs to the function will return a class with a different expected_keys
attribute.
Then, the __init__
method on the class itself checks the keyword arguments that it gets against the keys identified in the expected_keys
attribute. If they do not match, the constructor raises an error. If they do, it writes the values to the instance.
You were able to create this class within the function using the class
keyword, rather than invoking type
. Because the class
block was within the def
block, the class was created locally to the function.
You may be asking why a class factory is even valuable in this case. After all, there are only three possibilities. These classes could just be hard-coded, rather than dynamically created on the fly. That said, it is easy to extrapolate a case from this example where a hard-coded class is no longer tenable.
After all, there are lots of websites with a non-trivial number of authentication paradigms. For example, some use custom usernames, while others use an e-mail address. For development services, you are likely to have an API key and potentially one or more secret tokens.
There is really no way to programmatically determine what credentials a website requires (at least not reliably), but consider a service that did try to represent credentials from lots of different, supported third-party sites. That service would likely store the required keys and types of values in a database.
Now, suddenly, you have a class with attributes generated based on a database lookup. This is important because database lookups happen at runtime, not at coding time. Now, suddenly, you have a functionally infinite number of possibilities for how the expected_keys
attribute of the classes might need to be written, and it is no longer feasible to code them all up front.
Storing that kind of data in the database also means that, as the data changes, the code need not do so. A website may alter or augment what kind of credentials it supports, and this would require adding or removing rows from the database, but the Credential
class would still be up to the task.
Just because some attributes are only known at execution time does not always mean that a class factory is the correct approach. Often, attributes can be written to the class on the fly, or a class can simply store a dictionary with an arbitrary set of attributes.
If this is a sufficient solution, it is likely an easier and more straightforward one.
class MyClass(object):
attrs = {}
The most common case where attribute dictionaries are most likely to fall short is in a situation where you are subclassing an existing class over which you do not have direct control, and you require the class's existing functionality to work against the modified attributes. You will see a subclassing example shortly.
Consider a credentials database with a single table, and that table has two columns: a service name (such as Apple
or Amazon
), and a credential key (such as username
).
This mock database is obviously still far too simple to cover all use cases. In this example, support for alternative modes of login (such as OpenID) has been dropped. Also, the example does not have any concept for presenting credentials in a specific order (username before password, for example). All of this is fine; it is sufficient for a proof of concept.
Now, consider a class factory that reads from this database (which will simply be stored as a CSV flat file) and returns an appropriate class.
import csv
def get_credential_class(service):
"""Return a class representing a credential for the given service,
with an attribute representing the expected keys.
"""
# Open our "database".
keys = []
with open('creds.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
# If this row does not correspond to the service we
# are actually asking for (e.g., if it is a row for
# Apple and we are asking for an Amazon credential class),
# skip it.
if row[0].lower() != service.lower():
continue
# Add the key to the list of expected keys.
keys.append(row[1])
# Return a class with a proper __init__ method which expects
# all expected keys.
class Credential(object):
expected_keys = keys
def __init__(self, **kwargs):
# Sanity check: Do our keys match?
if set(self.expected_keys) != set([i for i in kwargs.keys()]):
raise ValueError('Keys do not match.')
# Write the keys to the credential object.
for k, v in kwargs.items():
setattr(self, k, v)
return Credential
The inputs for the get_credential_class
function have now been entirely replaced. Instead of describing the type of credential, you simply specify whom the credential is for.
For example, a sample CSV “database” might look like this:
Amazon,username
Amazon,password
Apple,email_address
Apple,password
GitHub,username
GitHub,password
GitHub,auth_token
The value that get_credential_class
takes is a string, and it corresponds to the first column in the CSV file. Therefore, calling get_credential_class('GitHub')
will return a class with expected keys of username, password,
and auth_token
. The lines in the CSV file corresponding to Apple
and Amazon
will be skipped.
One place where you can see this concept at work is in the forms API of a popular web framework, Django. This framework includes an abstract class, django.forms.Form,
which is used to create HTML forms.
Django forms have a custom metaclass that takes the attributes declared on the form and erects a distinction between form fields and form data. Creating a credential form in this API is very easy if you know what your fields are.
from django import forms
class CredentialForm(forms.Form):
username = forms.CharField()
password = forms.CharField(widget=forms.PasswordInput)
On the other hand, if you do not know what your fields are (as in the case of the previous example), this is a more complicated task. A class factory becomes the perfect approach.
import csv
from django import forms
def get_credential_form_class(service):
"""Return a class representing a credential for the given service,
with attributes representing the expected keys.
"""
# Open our "database".
keys = []
with open('creds.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
# If this row does not correspond to the service we
# are actually asking for (e.g. if it is a row for
# Apple and we are asking for an Amazon credential class),
# skip it.
if row[0].lower() != service.lower():
continue
# Add the key to the list of expected keys.
keys.append(row[1])
# Put together the appropriate credential fields.
attrs = {}
for key in keys:
field_kw = {}
if 'password' in key:
field_kw['widget'] = forms.PasswordInput
attrs[key] = forms.CharField(**field_kw)
# Return a form class with the appropriate credential fields.
metaclass = type(forms.Form)
return metaclass('CredentialForm', (forms.Form,), attrs)
In this case, you have substituted your custom Credential
class for a Django form subclass. It is no longer the case that you are just setting an expected_keys
attribute. Rather, you are setting one attribute for each expected key. The previous code puts these together in a dictionary (doing a blatant hand-wave for passwords and PasswordInput
), and then creates a new form subclass and returns it.
It is worth calling out explicitly that Django's Form
class uses a custom metaclass, which subclasses type
. Therefore, it is important that you call its constructor, rather than type
directly. You do this on the last two lines by asking forms.Form
for its metaclass, and then using that constructor directly.
It is also worth noting that this is a case where it really is necessary to use the metaclass constructor, rather than creating the class using the class
keyword. You are not able to create the class using the class
keyword here because, even within a function, you would have to create the class and then write the attributes to the class, and the metaclass behavior will not be applied to the attributes assigned to the class after it is built. (Chapter 5 covers this in more detail.)
Another reason to write class factory functions deals with how attributes differ between classes and instances.
The following two code blocks do not produce equivalent classes or instances:
##########################
### CLASS ATTRIBUTE ###
##########################
class C(object):
foo = 'bar'
##########################
### INSTANCE ATTRIBUTE ###
##########################
class I(object):
def __init__(self):
self.foo = 'bar'
The first and most obvious thing that is different about these classes is where the foo
attribute can be accessed. It is not particularly surprising that C.foo
is a string, and I.foo
raises AttributeError
.
>>> C.foo
'bar'
>>> I.foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'I' has no attribute 'foo'
After all, foo
was instantiated as an attribute on C
, but not on I
. Since I
is being accessed directly, rather than by way of an instance, the __init__
function has not even run yet. Even if an instance of I
had been created, the instance would have the foo
attribute while the class would not.
>>> i = I()
>>> i.foo
'bar'
>>> I.foo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'I' has no attribute 'foo'
There is, however, a lesser-noticed difference between C
and I
, which involves what happens if the foo
attribute is modified against one of their instances.
Consider the following two instantiated C
instances:
>>> c1 = C()
>>> c2 = C()
Now, say you modify the foo
attribute on one of them, as shown here:
>>> c1.foo = 'baz'
You see that the c2
instance still uses the attribute of the class, while c1
has its own.
>>> c1.foo
'baz'
>>> c2.foo
'bar'
The lookup happening here is not quite the same. c1
has written an instance attribute, called foo
, with the value of 'baz'
. However, c2
has no such instance attribute. However, because the class, C
, does, the lookup uses the class attribute.
Consider what happens if you modify the class attribute, as shown here:
>>> C.foo = 'bacon'
>>> c1.foo
'baz'
>>> c2.foo
'bacon'
Here, c1.foo
was unaffected, because c1
has an instance attribute called foo
. However, the value of c2.foo
has changed, because it has no such attribute on the instance. Therefore, when the attribute of the class changes, you observe the change on the instance.
You can view this within Python's internal data model by examining the __dict__
attribute of both instances.
>>> c1.__dict__
{'foo': 'baz'}
>>> c2.__dict__
{}
Under normal circumstances, the special __dict__
attribute is what stores all the attributes (and their values) for an object. There are exceptions to this rule. A class may define a custom __getattr__
or __getattribute__
method (as discussed in Chapter 4, “Magic Methods”), or may define a special attribute __slots__
, which also introduces alternative attribute behavior. (This is rarely needed except in particular situations where memory use is paramount, and is not discussed in this book.) Notice that c1
has a foo
key in its __dict__
, and c2
does not.
This situation gets really interesting when classes define class methods. Remember that class methods are methods that do not expect or require an instance of the class to execute, but do require the class itself. They are usually declared by decorating a method with the @classmethod
decorator, and their first argument is traditionally called cls
rather than self
.
Consider the following C
class with a class method that accesses and returns foo
from the class:
class C(object):
foo = 'bar'
@classmethod
def classfoo(cls):
return cls.foo
In the context of the classfoo
method, the foo
attribute is being accessed explicitly on the class, rather than on the instance. Re-run the example using the new class definition, and then consider the following:
>>> c1.foo
'baz'
>>> c1.classfoo()
'bacon'
>>> c2.classfoo()
'bacon'
There is, in fact, no actual way to access the instance attribute from the class method. That is the entire point of class methods, after all. They do not require an instance.
One of the biggest reasons to need class factories is when you are subclassing existing classes that rely on class attributes that must be adjusted.
Essentially, in code that you do not control, if an existing class sets a class attribute that must be customized, class factories are an attractive approach to generating appropriate subclasses with the overridden attributes.
Consider a situation where a class has an attribute that must be overridden at runtime (or where there are too many options for subclassing in static code to be reasonable). In this case, a class factory can be a very useful approach. Following is a continuation of the use of C
as an instructive example:
def create_C_subclass(new_foo):
class SubC(C):
foo = new_foo
return SubC
What matters here is that it is not necessary to know what the value of foo
should be until the class is created, which is when the function runs. Like most other use of class factories, then, this is about knowing the attribute value at runtime.
Running your classfoo
class method on C
subclasses created this way gives you what you expect.
>>> S = create_C_subclass('spam')
>>> S.classfoo()
'spam'
>>> E = create_C_subclass('eggs')
>>> E.classfoo()
'eggs'
It is worth noting that, in many cases, it is much easier to simply create a subclass that accepts this value as part of its __init__
method. However, there are some cases where this is an insufficient solution. If the parent class relies on class methods, for example, then writing a new value to an instance will not cause the class methods to receive the new value, and this model of subclass creation becomes a valuable solution.
One thing that can make class factory functions somewhat awkward to use is that, as their name suggests, their responsibility is to return classes, rather than instances of those classes.
This means that if you want an instance, you must call the result of the class factory function to get one. The correct code to instantiate a subclass generated with create_C_subclass
, for example, would be create_C_subclass('eggs')()
.
There is nothing inherently wrong with this, but it is not always what you really want. Sometimes classes created through class factories are functionally singletons. A singleton is a class pattern where only one instance is permitted.
In the case of classes generated in functions, it is possible that the purpose of the function is simply to act like a class constructor. This is problematic if the end developer must constantly think about instantiating the class that comes back.
This is not a requirement, though. If there is not a need to deal with reusing the class elsewhere, or if the class factory is able to handle the reuse itself, it is completely reasonable and useful to simply have the class factory return an instance of the class it creates, rather than the class itself.
To continue the simple example of C
, consider this factory:
def CPrime(new_foo='bar'):
# If ‘foo‘ is set to 'bar', then we do not need a
# custom subclass at all.
if new_foo = 'bar':
return C()
# Create a custom subclass and return an instance.
class SubC(C):
foo = new_foo
return SubC()
Now, calling CPrime
will return an instance of the appropriate C
subclass with the foo
attribute modified as needed.
One issue with this is that many (probably most) classes do expect arguments to be sent to their __init__
methods, which this function is not able to handle. The pattern for this is simple enough, though. Consider an example of a credential form, with the method retooled to return an instance.
import csv
from django import forms
def get_credential_form(service, *args, **kwargs):
"""Return a form instance representing a credential for the
given service.
"""
# Open our "database".
keys = []
with open('creds.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
# If this row does not correspond to the service we
# are actually asking for (e.g. if it is a row for
# Apple and we are asking for an Amazon credential class),
# skip it.
if row[0].lower() != service.lower():
continue
# Add the key to the list of expected keys.
keys.append(row[1])
# Put together the appropriate credential fields.
attrs = {}
for key in keys:
field_kw = {}
if 'password' in key:
field_kw['widget'] = forms.PasswordInput
attrs[key] = forms.CharField(**field_kw)
# Return a form class with the appropriate credential fields.
metaclass = type(forms.Form)
cls = metaclass('CredentialForm', (forms.Form,), attrs)
return cls(*args, **kwargs)
This does not actually entail very many changes from the previous class factory. There are really only two changes:
First, *args
and **kwargs
have been added to the function signature.
Second, the final line now returns an instance of the class that was created, with the *args
and **kwargs
passed to the instance.
Now you have an entirely functional class factory, which returns an instance of the form class that it creates. This raises a final point. Now the function is likely indistinguishable from a class to the end developer, unless said end developer inspects the inner workings. Therefore, perhaps it should be presented as one in the naming convention.
def CredentialForm(service, *args, **kwargs):
[...]
In Python, functions are normally named with all lowercased letters, and with underscores for word separation. However, this is a function that is being used like a class constructor by developers who actually use it, so by changing the naming convention, you present it as a class name.
Conveniently, the name also matches the name of the class used for the instances, because the first argument to the metaclass' constructor, 'CredentialForm'
, is the internal name of the class.
And, this is Python. If it looks like a duck and quacks like a duck…
The power of class factories shows itself when it is necessary to have class attributes be determined at runtime, rather than at coding time. The Python language is able to handle this situation precisely because classes are first-class objects, and can be created similarly to how any other object is created.
On the other hand, classes containing unknown attributes add some uncertainty. Their methods must be written to allow for an attribute to be present or absent, where, in other cases, the presence of the attribute may be able to be assumed.
The ability to declare classes at runtime is extremely powerful, but brings with it a tradeoff in simplicity. This is fine. When you encounter a situation where class factories are the right answer, it is often salient, and there is often no other direct way to solve the issue. Put directly, you can be reasonably sure that a class factory is a good approach if it is the simplest approach.
That rule holds true for programming generally, but it is a particularly useful one here.
Chapter 7, “Abstract Base Classes,” discusses Python strings and bytestrings, and how to manage string data with minimal pain.