When you think about testing the code that you write, the first thing that probably comes to mind is simply running your program directly. If your program executes, you at least know that you do not have any syntax errors (provided every module was imported).
Similarly, if you provide appropriate inputs, and do not get a traceback, you know that your program completes successfully with those inputs. And, if the result matches the result you expect, that is additional inductive evidence that your program works.
This has a couple of key limitations, though. The first is that for a non-trivial program, it is not possible to test every scenario. It is impossible to avoid this limitation, although it is important to be as complete as possible when thinking through potential scenarios to test.
The second limitation (and the one that the bulk of this chapter covers) is time. For most applications, it is not practical to manually test every scenario you imagine for every change that you ever make to your program, because iterating over these scenarios is time-consuming.
It is possible, however, to ameliorate this limitation somewhat by automating your tests. An automated test suite can run while you are absent or working on something else, providing a significant time savings and making it much easier to test your work early and often.
This chapter explores some of the world of testing. Specifically, it focuses on unit testing using the built-in tools provided by the Python standard library (such as unittest
and mock
), and some common packages available for testing.
So, what is a unit test exactly? Furthermore, how does it differ between a functional test or an integration test or some other kind of test? To answer this, this chapter discusses two different testing scenarios.
First, consider a very complete testing environment. If you are writing an application that primarily runs on servers, this might entail a “staging” server that has a copy of relevant data, and where potentially breaking actions can be performed safely. For a script or desktop application, the principle is the same. It runs in an area with a copy of anything it must touch or alter.
In this scenario, everything your program must do mimics what it does in its actual live environment. If you connect to a particular type of database, that database is still present in your test environment (just at a different location). If you get data from a web service, you still make that same request.
Essentially, in the copied ecosystem, any external dependencies your program relies on must still be present and set up in an identical way.
This type of testing scenario is designed not only to test specific code being worked on, but also to test that the entire ecosystem structure that is put in place is viable. Any data that is passed back and forth between different components of your application is actually passed in exactly the same way.
Automated tests that are run against a copied ecosystem such as this are generally called system tests. This term signifies the complete duplicated ecosystem under which these tests run. This kind of test is designed not only to test your specific code, but also to detect breaking changes in the external environment.
Another very distinct type of test is one that is intended to test a very specific block of code, and to do so in an isolated environment.
In a copied ecosystem, any external requirements and dependencies (such as a database, external service, or the like) are all duplicated. On the other hand, tests intended to be run in an isolated environment do so generally by hand-waving the interactions between the tested code and the external dependencies, focusing only on what the actual code does.
This sort of hand wave is done by stipulating that an external service or dependency received a given input and returned a given output. The purpose of this kind of test is explicitly not to test the interaction between your application and the other service. Rather, it is to test what your application does with the data it receives from that service.
For example, consider a function that determines a person's age at the time of his or her wedding. It first gets information about the person (birthday and anniversary) from an external database, and then computes the delta between the two dates to determine the person's age at the time.
Such a function might look like this:
def calculate_age_at_wedding(person_id):
"""Calculate the age of a person at his or her wedding, given the
ID of the person in the database.
"""
# Get the person from the database, and pull out the birthday
# and anniversary datetime.date objects.
person = get_person_from_db(person_id)
anniversary = person['anniversary']
birthday = person['birthday']
# Calculate the age of the person on his or her wedding day.
age = anniversary.year – birthday.year
# If the birthday occurs later in the year than the anniversary, then
# subtract one from the age.
if birthday.replace(year=anniversary.year) > anniversary:
age -= 1
# Done; return the age.
return age
Of course, if you try to actually run this function, it will fail. This function depends on another function, get_person_from_db
, which is not defined in this example. You intuitively understand from reading the comments and code around it that it gets a specific type of record from a database and returns a dictionary-like object.
When testing a function like this, a copied ecosystem would simply reproduce the database, pull a person record with a particular ID, and test that the function returns the expected age. In contrast, a test in an isolated environment wants to avoid dealing with the database at all. An isolated environment test would declare that you got a particular record, and test the remainder of the function against that record.
This kind of test, which seeks to isolate the code being tested from the rest of the world (and even sometimes the rest of the application itself) is called a unit test.
Both of these fundamental types of tests have advantages and disadvantages, and most applications must have some of both types of tests as part of a robust testing framework.
One of the most important advantages to unit tests that run in an isolated environment is speed. Tests that run against a copied ecosystem often have long setup and teardown processes. Furthermore, the I/O required to pass data between the various components is often one of the slowest aspects of your application.
By contrast, tests that run in an isolated environment are usually extremely fast. In the previous example, the time it takes to do the arithmetic to determine this person's age is far less (by several orders of magnitude) than the time it takes to ask the database for the row corresponding to the person's ID and to pass the data over the pipe.
Having a set of isolated tests that run very fast is valuable, because you are able to run them extremely often and get feedback from running those tests very quickly.
The primary reason why isolated tests are so fast is precisely because they are isolated. Isolated tests stipulate the interactions between various services involved in powering your application.
However, these interactions require testing, too. This is why you also need tests in a copied ecosystem. This enables you to ensure that these services continue to interact the way that you expect.
The focus of this chapter is specifically on unit testing. Therefore, how can you write a test that runs the calculate_age_at_wedding
function in the previous example ? Your goal is to not actually talk to a database to get a record of a person, so you must test the function and provide that information.
In many cases, the best and by far the most straightforward way to handle testing such a function is simply to organize your code in a way that makes it easily testable.
In the example of the calculate_age_at_wedding
function, you may not need to retrieve a record from the database at all. Depending on your application, it might be fine (and even preferable) to have the function simply accept the full record, rather than the person_id
variable. In other words, the baton handoff to this function would not happen until the database call already occurred, and the only thing this function would do would be to perform the arithmetic.
Reorganizing in this way would also make the function less opinionated about what kind of data it gets. Any dictionary-like object with the appropriate keys would do.
The following trimmed-down function only does the calculation of the age, and is expected to receive a full person record (where it gets it from is not relevant).
def calculate_age_at_wedding(person):
"""Calculate the age of a person at his or her wedding, given the
record of the person as a dictionary-like object.
"""
# Pull out the birthday and anniversary datetime.date objects.
anniversary = person['anniversary']
birthday = person['birthday']
# Calculate the age of the person on his or her wedding day.
age = anniversary.year - birthday.year
# If the birthday occurs later in the year than the anniversary, then
# subtract one from the age.
if birthday.replace(year=anniversary.year) > anniversary:
age -= 1
# Done; return the age.
return age
In most ways, this function is almost exactly the same as the previous version. The only thing that has changed is that the call to get_person_from_db
has been removed (and the comments and docstring updated to match).
When it comes to testing this function, the problem is now very simple. Just pass a dictionary and make sure you get the correct result.
>>> from datetime import date
>>>
>>> person = {'anniversary': date(2012, 4, 21),
... 'birthday': date(1986, 6, 15)}
>>> age = calculate_age_at_wedding(person)
>>> age
25
Of course, a couple limitations exist here. First, this is still something that was run manually in the interactive terminal. The value of a unit testing suite is that you run it in an automated fashion.
A second (and even more important) limitation to recognize is that this tests only one input against only one output. Suppose you gutted the function the next day and replaced it with the following:
def calculate_age_at_wedding(*args, **kwargs):
return 25
The test would still pass, even though the function would be extremely broken.
Indeed, the test does not even cover some sections of this function. After all, there is an if
block in the function based on whether or not the birthday falls before or after the anniversary in a calendar year. At a minimum, you would want to ensure that your test takes both pathways.
The following test function handles this:
from datetime import date
def test_calculate_age_at_wedding():
"""Establish that the 'calculate_age_at_wedding' function seems to
calculate a person's age at his wedding correctly, given a
dictionary-like object representing a person.
"""
# Assert that if the anniversary falls before the birthday in a
# calendar year, that the calculation is done properly.
person = {'anniversary': date(2012, 4, 21),
'birthday': date(1986, 6, 15)}
age = calculate_age_at_wedding(person)
assert age == 25, 'Expected age 25, got %d.' % age
# Assert that if the anniversary falls after the birthday in a calendar
# year, that the calculation is done properly.
person = {'anniversary': date(1969, 8, 11),
'birthday': date(1945, 2, 15)}
age = calculate_age_at_wedding(person)
assert age == 24, 'Expected age 24, got %d.' % age
Now you have a function that can be run by an automated process. Python includes a test runner, which is explored shortly. Also, this test covers a couple of different permutations of the function. It certainly does not cover every possible input (it would be impossible to do that), but it provides a slightly more complete sanity check.
However, always remember that the tests are not an exhaustive check. They only test the inputs and outputs that you provide. For example, this test function says nothing about what would happen if the calculate_age_at_wedding
function were sent something other than a dictionary, or if it were sent a dictionary with the wrong keys, or if datetime
objects were used instead of date
objects, or if you were to send an anniversary date that is earlier than the birth date, or any number of other permutations. This is fine. It is simply important to understand what the limits of your tests are.
assert
StatementWhat about the assert
statement that the test function is using? Consider what a unit test fundamentally is. A unit test is an assertion or a set of assertions. In this case, you assert that if you send a properly formatted dictionary with specific dates, you get a specific integer result.
In Python, assert
is a keyword, and assert
statements are used almost exclusively for testing (although they need not appear exclusively in test code). The assert
statement expects the expression sent to it to evaluate to True
. If it does, the assert
statement does nothing; if it does not, AssertionError
is raised. You can optionally provide a custom error message to be raised with the AssertionError
, as the previous example does.
When writing tests, you want to use AssertionError
as the exception to be raised when a test fails, either by raising it directly, or (usually) by using the assert
statement to assert the test's pass conditions, because all of the unit testing frameworks will catch the error and handle it appropriately when compiling test failures.
Now that you have your test as a function, the next step is to set up a process to run that test (as well as any others you may write to test the remainder of the application).
Several unit testing frameworks, such as py.test
and nose
, are available as third-party packages. However, the Python standard library also ships with a quite robust unit testing framework, available under the unittest
module in the standard library.
Consider the testing function from the previous example, but structured to be run by the unittest
module.
import unittest
from datetime import date
class Tests(unittest.TestCase):
def test_calculate_age_at_wedding(self):
"""Establish that the 'calculate_age_at_wedding' function seems
to calculate a person's age at his wedding correctly, given
a dictionary-like object representing a person.
"""
# Assert that if the anniversary falls before the birthday
# in a calendar year, that the calculation is done properly.
person = {'anniversary': date(2012, 4, 21),
'birthday': date(1986, 6, 15)}
age = calculate_age_at_wedding(person)
self.assertEqual(age, 25)
# Assert that if the anniversary falls after the birthday
# in a calendar year, that the calculation is done properly.
person = {'anniversary': date(1969, 8, 11),
'birthday': date(1945, 2, 15)}
age = calculate_age_at_wedding(person)
self.assertEqual(age, 24)
In most ways, this looks the same as what you saw before. However, it has a couple of key differences. The first difference is that you now have a class, which subclasses unittest.TestCase
. The unittest
module expects to find tests grouped using unittest.TestCase
subclasses. Each test must be a function whose name begins with test
. As a corollary, because the test itself is now a method of the class rather than an unbound function, it now has self
as an argument.
The other change is that the raw assert
statements have been replaced with calls to self.assertEqual
. The unittest.TestCase
class provides a number of wrappers around assert
that standardize error messages and provide some other boilerplate.
Now it is time to actually run this test within the unittest
framework. To do this, save both the function and the test class in a single module, such as wedding.py
.
The Python interpreter provides a flag, -m
, which takes a module in the standard library or on sys.path
, and runs it as a script. The unittest
module supports being run in this way, and accepts the Python module to be tested. (If you named your module wedding.py
, this would be wedding
.)
$ python -m unittest wedding
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
What is happening here? The wedding
module was loaded, and the unittest
module found a unittest.TestCase
subclass. It instantiated the class and then ran every method beginning with the word test
, which the test_calculate_age_at_wedding
method does.
The unittest
output prints a period character (.
) for a successful test, or a letter for failures (F
), errors (E
), and a few other cases, such as tests that are intentionally skipped (s
). Because there was only one test, and it was successful, you see a single .
character followed by the concluding output.
You can observe what happens when a test fails by simply changing the test's condition so that it will intentionally fail.
To illustrate this, add the following method to your Tests
class:
def test_failure_case(self):
"""Assert a wrong age, and fail."""
person = {'anniversary': date(2012, 4, 21),
'birthday': date(1986, 6, 15)}
age = calculate_age_at_wedding(person)
self.assertEqual(age, 99)
This is a similar test, except that it asserts that the age is 99
, which is wrong. Observe what happens if you run tests on the module now:
$ python -m unittest wedding
.F
======================================================================
FAIL: test_failure_case (wedding.Tests)
Assert a wrong age, and fail.
----------------------------------------------------------------------
Traceback (most recent call last):
File "wedding.py", line 50, in test_failure_case
self.assertEqual(age, 99)
AssertionError: 25 != 99
----------------------------------------------------------------------
Ran 2 tests in 0.000s
FAILED (failures=1)
Now you have two tests. You have the main test from before, which still passes, and a second test with a bogus age, which fails.
If you ran the function directly, you would just get a standard traceback when AssertionError
is raised. However, the unittest
module actually catches this error and tracks the failure, and prints the output nicely at the end of the test run.
This may seem like an unimportant distinction at this point, but if you have hundreds of tests, this difference matters. A Python module will terminate when it comes across the first uncaught exception, so your test run would stop on the first failure. When you're using unittest
, the tests continue to run, and you get all the failures at once at the end.
The unittest
output also includes the test function and the beginning of the docstring, so it is easy to go find the failing test and investigate, as well as the full traceback, so you still have the same insight into the offending code.
Only a small difference distinguishes an error from a failure. A test that raises AssertionError
is considered to have failed, whereas a test that raises any exception other than AssertionError
is considered to be in error.
Consider what would happen if the person
variable being tested is an empty dictionary. Add the following function to your Tests
class in the wedding
module:
def test_error_case(self):
"""Attempt to send an empty dict to the function."""
person = {}
age = calculate_age_at_wedding(person)
self.assertEqual(age, 25)
Now what happens when you run tests?
$ python -m unittest wedding
.EF
======================================================================
ERROR: test_error_case (wedding.Tests)
Attempt to send an empty dict to the function.
----------------------------------------------------------------------
Traceback (most recent call last):
File "wedding.py", line 55, in test_error_case
age = calculate_age_at_wedding(person)
File "wedding.py", line 10, in calculate_age_at_wedding
anniversary = person['anniversary']
KeyError: 'anniversary'
======================================================================
FAIL: test_failure_case (wedding.Tests)
Assert a wrong age, and fail.
----------------------------------------------------------------------
Traceback (most recent call last):
File "wedding.py", line 50, in test_failure_case
self.assertEqual(age, 99)
AssertionError: 25 != 99
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (failures=1, errors=1)
Now you have three tests. You have the passing and failing test from earlier, and a test that is in error. Instead of raising AssertionError
, the error case raised KeyError
, because the calculate_age_at_wedding
function expected an anniversary key in the dictionary (and the key was not there).
For most practical purposes, you probably will not actually put much stock in the difference between a failure and an error. They are simply failing tests that fail in slightly different ways.
It is also possible to mark that a test should be skipped under certain situations. For example, say that an application is designed to run under Python 2 or Python 3, but a particular test only makes sense in one of the two environments. Rather than have the test fail when it should not, it is possible to declare that a test should run only under certain conditions.
The unittest
module provides skipIf
and skipUnless
decorators that take an expression. The skipIf
decorator causes the test to be skipped if the expression it receives evaluates to True
, and the skipUnless
decorator causes the test to be skipped if the expression it receives evaluates to False
. In addition, both decorators take a second, required argument, which is a string that describes why the test was skipped.
To see skipped tests in action, add the following function to your Tests
class. (To keep the output shown here down to a reasonable size, the failure and error tests have been removed.)
@unittest.skipIf(True, 'This test was skipped.')
def test_skipped_case(self):
"""Skip this test."""
pass
This function is decorated with unittest.skipIf
. True
is a valid expression in Python, and obviously evaluates to True
. Now see what happens when you run the tests:
$ python -m unittest wedding
.s
----------------------------------------------------------------------
Ran 2 tests in 0.000s
OK (skipped=1)
The output for a skipped test is an s
, rather than the traditional period character that denotes a test that passed. The use of a lowercase letter rather than an uppercase one (as in F
and E
) signifies that this is not an error condition, and indeed, the complete test run is considered to be a success.
So far, you have run tests out of a single module, and the tests have lived in the same module where the code that it is testing also lives. This is fine for a trivial example but entirely unfeasible for a large application.
The unittest
module understands this, and provides an extensible mechanism for programmatically loading tests from a complete project tree. The default class, which is suitable for most needs, is unittest.TestLoader
.
If you are just using the default test loading class, which is what you want most of the time, you can trigger it by using the word discover
instead of the module name to be tested.
$ python -m unittest discover
----------------------------------------------------------------------
Ran 0 tests in 0.000s
OK
Where did your tests go? The test discovery follows certain rules for determining where it goes to actually look for tests. By default, it expects all files containing tests to be named according to the pattern test*.py
.
This is what you really want to do anyway. The value of test discovery is that you can separate your tests from the rest of your code. So, if you move the passing test itself from the wedding.py
file to a new file matching that pattern (for example, test_wedding.py
), the test discovery system will find it. (Note that you must import the calculate_age_at_wedding
function explicitly, because it is not in the same module anymore!)
Sure enough, now the test discovery finds the tests:
$ python -m unittest discover
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
To make the calculate_age_at_wedding
function something that was capable of being easily unit tested, recall how you had to remove part of the function. The idea was that you organize your code to make that function easily testable by doing a database call elsewhere.
Often, organizing your code in a way that makes it easily testable is the ideal approach to this problem, but sometimes it is not possible or wise. Instead of implicitly hand-waving certain functionality by organizing your code around atomic testing, how do you explicitly hand-wave a segment of tested code?
The answer is mocking. Mocking is the process of declaring within a test that a certain function call should be stipulated to give a particular output, and the function call itself should be suppressed. Additionally, you can assert that the mocked call that you expect was made in a particular way.
Beginning in Python 3.3, the unittest
module ships with unittest.mock
, which contains tools for mocking. If you are using Python 3.2 or earlier, you can use the mock
package, which you can download from www.pypi.python.org
.
The API between these is identical, but how you import it obviously changes. If you are using Python 3.3, you want from unittest import mock
; if you are using the installed package, you want import mock
.
Consider again the original function for calculate_age_at_wedding
, which included a call to retrieve a record from an unspecified database. (If you are following along, you should create a new file.)
def calculate_age_at_wedding(person_id):
"""Calculate the age of a person at his or her wedding, given the
ID of the person in the database.
"""
# Get the person from the database, and pull out the birthday
# and anniversary datetime.date objects.
person = get_person_from_db(person_id)
anniversary = person['anniversary']
birthday = person['birthday']
# Calculate the age of the person on his or her wedding day.
age = anniversary.year – birthday.year
# If the birthday occurs later in the year than the anniversary, then
# subtract one from the age.
if birthday.replace(year=anniversary.year) > anniversary:
age -= 1
# Done; return the age.
return age
Before, you tested most of this function by actually changing the function itself. You reorganized the code around ease of testability. However, you also want to be able to test code where this is either impossible or undesirable.
First things first. You still do not actually have a get_person_from_db
function, so you want to suppress that function call. Therefore, add a function that raises an exception.
def get_person_from_db(person_id):
raise RuntimeError('The real 'get_person_from_db' function '
'was called.')
At this point, if you actually try to run the calculate_age_at_wedding
function, you will get a RuntimeError
. This is convenient for this example because it will make it very obvious if your mocking does not work. Your test will loudly fail.
Next comes the test. If you just try to run the same test from before, it will fail (with RuntimeError
). You need a way of getting around the get_person_from_db
call. This is where mock
comes in.
The mock
module is essentially a monkey-patching library. It temporarily replaces a variable in a given namespace with a special object called a MagicMock
, and then returns the variable to its previous value after the scope of the mock is concluded. The MagicMock
object itself is extremely permissive. It accepts (and tracks) basically any call made to it, and returns whatever you tell it.
In this case, you want the get_person_from_db
function to be replaced with a MagicMock
object for the duration of your test.
import unittest
import sys
from datetime import date
# Import mock regardless of whether it is from the standard library
# or from the PyPI package.
try:
from unittest import mock
except ImportError:
import mock
class Tests(unittest.TestCase):
def test_calculate_age_at_wedding(self):
"""Establish that the 'calculate_age_at_wedding' function seems
to calculate a person's age at his wedding correctly, given
a person ID.
"""
# Since we are mocking a name in the current module, rather than
# an imported module (the common case), we need a reference to
# this module to send to 'mock.patch.object`.
module = sys.modules[__name__]
with mock.patch.object(module, 'get_person_from_db') as m:
# Ensure that the get_person_from_db function returns
# a valid dictionary.
m.return_value = {'anniversary': date(2012, 4, 21),
'birthday': date(1986, 6, 15)}
# Assert that that the calculation is done properly.
age = calculate_age_at_wedding(person_id=42)
self.assertEqual(age, 25)
The big new thing going on here is the call to mock.patch.object
. This is a function that can be used either as a context manager or a decorator, and it takes two required arguments: a module that contains the callable being mocked, and then the name of the callable as a string. In this case, because the function and the test are all contained in a single file (which is not what you would normally do), you must get a reference to the current module, which is always sys.modules[__name__]
.
The context manager returns a MagicMock
object, which is m
in the previous example. Before you can call the function being tested, however, you must specify what you expect the MagicMock
to do. In this case, you want it to return a dictionary that approximates a valid record of a person. The return_value
property of the MagicMock
object is what handles this. Setting it means that every time the MagicMock
is called, it will return that value. If you do not set return_value
, another MagicMock
object is returned.
If you run tests on this module, you will see that the test passes. (Here, the new module is named mock_wedding.py
.)
$ python -m unittest mock_wedding
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
This test passes, but it is still fundamentally incomplete in one important way. It mocks the function call to get_person_from_db
, and tests that the function does the right thing with the output.
What the test does not do is actually verify that the baton handoff to the get_person_from_db
function actually occurred. In some ways, this is redundant. You know the call happened, because otherwise you would not have received the return value from the mock object. However, sometimes you will mock function calls that do not have a return value.
Fortunately, MagicMock
objects track calls made to them. Rather than just spitting out the return value and being done, the object stores information about how many times it was called, and the signature of each call. Finally, MagicMock
provides methods to assert that calls occurred in a particular fashion.
Probably the most common method you will use for this purpose is MagicMock.assert_called_once_with
. This asserts two things: that the MagicMock
was called once and exactly once, and that the specified argument signature was used. Consider an augmented test function that ensures that the get_person_from_db
method was called with the expected person ID:
class Tests(unittest.TestCase):
def test_calculate_age_at_wedding(self):
"""Establish that the 'calculate_age_at_wedding' function seems
to calculate a person's age at his wedding correctly, given
a person ID.
"""
# Since we are mocking a name in the current module, rather than
# an imported module (the common case), we need a reference to
# this module to send to 'mock.patch.object`.
module = sys.modules[__name__]
with mock.patch.object(module, 'get_person_from_db') as m:
# Ensure that the get_person_from_db function returns
# a valid dictionary.
m.return_value = {'anniversary': date(2012, 4, 21),
'birthday': date(1986, 6, 15)}
# Assert that that the calculation is done properly.
age = calculate_age_at_wedding(person_id=42)
self.assertEqual(age, 25)
# Assert that the 'get_person_from_db' method was called
# the way we expect.
m.assert_called_once_with(42)
The thing that has changed here is that the MagicMock
object is now being checked at the end to ensure that you got the call to it that you expected. The call signature is simply a single positional argument: 42
. This is the person ID used in the test (just a few lines earlier). It is sent as a positional argument because that is the way the argument is provided in the original function.
person = get_person_from_db(person_id)
Notice that person_id
is provided as a single positional argument, so that is what the MagicMock
will record.
If you run the test, you will see that it still passes:
$ python -m unittest mock_wedding
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
What happens if the MagicMock
's assertions are incorrect? The tests fail with a useful failure message, as you can see by changing the assert_called_once_with
argument signature:
$ python -m unittest mock_wedding
F
======================================================================
FAIL: test_calculate_age_at_wedding (wedding.Tests)
Establish that the 'calculate_age_at_wedding' function seems
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/luke/Desktop/wiley/wedding.py", line 58, in
test_calculate_age_at_wedding
m.assert_called_once_with(84)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest
/mock.py", line 771, in assert_called_once_with
return self.assert_called_with(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/unittest
/mock.py", line 760, in assert_called_with
raise AssertionError(_error_message()) from cause
AssertionError: Expected call: get_person_from_db(84)
Actual call: get_person_from_db(42)
----------------------------------------------------------------------
Ran 1 test in 0.001s
Here you are told which call the MagicMock
expected to get, as well as the call it actually received. You would get similar errors if there were no call, or more than one call.
The assert_called_once_with
method has a close cousin, which is assert_called_with
. This is identical except for the fact that it does not fail if the MagicMock
has been called more than once, and it checks the call signature against only the most recent call.
You can inspect MagicMock
objects in several other ways to determine what occurred. You may just want to know that it was called, or how many times it was called. You also may want to assert a sequence of calls, or only look at part of the call's signature.
A couple of the easiest and most straightforward questions are whether a MagicMock
has been called, and how many times it has been called.
If you just want to know whether a MagicMock
has been called at all, you can check the called
property, which is set to True
the first time that the MagicMock
is called.
>>> from unittest import mock
>>> m = mock.MagicMock()
>>> m.called
False
>>> m(foo='bar')
<MagicMock name='mock()' id='4315583152'>
>>> m.called
True
On the other hand, you may also want to know exactly how many times the MagicMock
has been called. This is available, too, as call_count
.
>>> from unittest import mock
>>> m = mock.MagicMock()
>>> m.call_count
0
>>> m(foo='bar')
<MagicMock name='mock()' id='4315615752'>
>>> m.call_count
1
>>> m(spam='eggs')
<MagicMock name='mock()' id='4315615752'>
>>> m.call_count
2
The MagicMock
class does not have built-in methods for asserting the presence of a call or a given call count, but the assertEqual
and assertTrue
methods that are part of unittest.TestCase
are more than sufficient for that task.
You may also want to assert the composition of multiple calls to a MagicMock
in one fell swoop. MagicMock
objects provide the assert_has_calls
method for this purpose.
To use assert_has_calls
, you must understand call
objects, which are provided as part of the mock library. Whenever you make a call to a MagicMock
object, it internally creates a call
object that stores the call signature (and appends it to the mock_calls
list on the object). These call
objects are considered to be equivalent if the signatures match.
>>> from unittest.mock import call
>>> a = call(42)
>>> b = call(42)
>>> c = call('foo')
>>> a is b
False
>>> a == b
True
>>> a == c
False
This is actually how assert_called_once_with
and similar methods work under the hood. They make a new call
object, and then ensure that it is equivalent to the one in the mock_calls
list.
The assert_has_calls
method takes a list (or other similar object, such as a tuple) of call
objects. It also accepts an optional keyword argument, any_order
, which defaults to False
. If this remains False
, this means that it expects the calls to have occurred in the same sequence that they do in the list. If it is set to True
, only the presence of each call to the MagicMock
is relevant, not the order of the calls.
Here is what assert_has_calls
looks like in action:
>>> from unittest.mock import MagicMock, call
>>>
>>> m = MagicMock()
>>> m.call('a')
<MagicMock name='mock.call()' id='4370551920'>
>>> m.call('b')
<MagicMock name='mock.call()' id='4370551920'>
>>> m.call('c')
<MagicMock name='mock.call()' id='4370551920'>
>>> m.call('d')
<MagicMock name='mock.call()' id='4370551920'>
>>> m.assert_has_calls([call.call('b'), call.call('c')])
It is worth noting that although assert_has_calls
does expect the calls to occur in order, it does not require that you send it the entire list of calls. Having other calls on either end of the list is fine.
Sometimes, you may not want to test the entirety of a call signature. Perhaps it is only important that a certain argument be included. This is a little bit more difficult to do. There is no ready-made method for a call
to declare that it matches anything other than a complete call signature.
However, it is possible to inspect the call
object itself and look at the arguments sent to it. The way this works is that the call
class is actually a subclass of tuple
, and call objects are tuples with three elements, the second and third of which are the call signature.
>>> from unittest.mock import call
>>> c = call('foo', 'bar', spam='eggs')
>>> c[1]
('foo', 'bar')
>>> c[2]
{'spam': 'eggs'}
By inspecting the call
object directly, you can get a tuple of the positional arguments and a dictionary of the keyword arguments.
This gives you the capability to test only part of a call signature. For example, what if you want to ensure that the string bar
was one of the arguments given to the call, but you do not care about the rest of the arguments?
>>> assert 'bar' in c[1]
>>> assert 'baz' in c[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError
>>> assert c[2]['spam'] == 'eggs'
Once you have access to the positional arguments as a tuple and the keyword arguments as a dictionary, testing for the presence or absence of a single argument is no different than testing for the presence of an element in a list or dictionary.
Several other testing tools are available that you may want to consider using as you build out a unit test suite in your applications.
How do you actually know what code is being tested? Ideally, you want to test as much of your code as possible in each test run, while still maintaining a test suite that runs quickly.
If you want to know just how much of your code your test suite is exercising, you will want to use the coverage
application, which is available from www.pypi.python.org
. Originally written by Ned Batchelder, coverage
is a tool that keeps track of all of the lines of code in each module that run as your tests are running, and provides a report detailing what code did not run. Of course, coverage
runs on both Python 2 and Python 3.
The application works by installing a coverage
script, and you use coverage run
as a substitute for python
when invoking a Python script of any kind, including your unit test script. The output will look fundamentally similar.
$ coverage run -m unittest mock_wedding
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
However, if you look at the directory, you will see that a .coverage
file was created in the process. This file contains information about what code in the file actually ran.
You can view this information with coverage report
.
$ coverage report
Name Stmts Miss Cover
----------------------------------
mock_wedding 22 1 95%
This report shows how many statements ran and how many statements are in the file that did not run. So, you know that one statement was omitted, but not which one. Adding -m
to the command adds output showing which lines were skipped:
$ coverage report -m
Name Stmts Miss Cover Missing
--------------------------------------------
mock_wedding 22 1 95% 24
Now you know that line 24 was the test that did not run. (In the example mock_wedding.py
file, line 24 corresponds to the RuntimeError
that is raised if the “real” get_person_from_db
function was called.)
The coverage
application can also write attractive HTML output using the coverage html
command. This highlights in red the lines that did not run. Additionally, if you have a statement with multiple branches (such as an if
statement), it highlights those in yellow if only one path was taken.
Many Python applications need to run on multiple versions of Python, including both Python 2 and Python 3. If you are writing an application that runs in multiple environments (even just multiple minor revisions), you want to run your tests against all of those environments.
Attempting to run tests manually across every environment you support is likely to be cumbersome. If you need to do this, consider tox
. Written by Holger Krekel, tox
is a tool that automatically creates virtual environments (using virtualenv
) with the appropriate versions of Python (provided you have them installed) and runs the tests within those environments.
This chapter has focused primarily on the test runner provided by Python itself, but other alternatives are available. Some, such as nose
and py.test
, are quite popular, and add numerous features and hooks for extensibility.
These libraries are easy to adopt even if you already have a robust unit test suite, because both support using unittest
tests out of the box. However, both libraries support other ways of adding tests to the pool.
Both of these libraries are available on www.pypi.python.org
, and run on Python 2.6 and up.
Unit testing is a powerful way to ensure that your code remains consistent over time. It is a useful way to discover when your code changes, and how to make adjustments accordingly.
This is an important facet of any application. Having a robust testing suite makes it easier to detect some bugs and makes you aware when a function's behavior changes, thus simplifying application maintenance.
Chapter 12 examines the optparse
and argparse
tools for using Python on the command-line interface (CLI).