Comparison of Common Lisp Testing Frameworks (28 Aug 2023 Edition)
1. Changelog
28 Aug 2023 - Picked up Shinmera's name change.
14 Aug 2023 - A number of updates for Try
18 Feb 2023 - Added a build-up and tear down example for Parachute fixtures.
30 Jan 2023 - Updated for some new functionality in Parachute with respect to parent fixtures now being inherited by children and added a useful editor customization.
27 Jan 2023 - Adding input from Pierre Neidhardt: adding vocabulary criteria, correcting that Lisp-Unit2 does have progress reports. Adding some additional examples for Parachute and Lisp-Unit2.
16 Jan 2023 - Updated adding Confidence and Try and deleting Kaputt. Thank you Michaël Le Barbier for actually doing the work of submitting the proposed pull request for Confidence.
2 October 2021 - Updated for changes at the Common Lisp Cookbook on testing. Updated number of issues and pull requests for FiveAM.
13 June 2021 - Updated clunit2 on its huge performance improvement update and new edge case abilities to compare multiple values in values expressions and handle variables declared in closures. Clunit2 is now substantially differentiated from clunit.
2. Introduction
What testing framework should I use? The response should not be X or Y because it is "battle-tested", or "extensible", or "has color coding". Those are just content-free advertising buzzwords. Some webpages which mention different Common Lisp unit test frameworks merely parrot comments from the framework authors without ensuring their veracity.
The real response should be to respond, "How do you code?" But even armed with that information, how do you match a testing framework to the user's needs? Common Lisp has a ridiculous number of testing framework libraries.
Some past reviews of testing frameworks were done in 2007 and 2010 (NST review). There was also some work in 2012 by the author of clunit, which seems to have bitrotten and is now only accessible via the Wayback Machine - part 1 and part 2. I thought it was time for an update. I am open to pull requests on this document for corrections, additions, whatever. See https://github.com/sabracrolleton/sabracrolleton.github.io
The best testing framework is context-dependent, and that context includes how you work. As an example, there was an exchange on r/common_lisp between dzecniv and Shinmera. dzecniv likes Prove more than Parachute because they could run tests just by hitting (C-c) the source. Shinmera pointed out a way you could easily add that to Parachute, but he views compilation and execution as two different things. I'm in Shinmera's camp - I do not want to run the test until I want to run it. At the same time, I want to be clear on terminology. If I understand the context, dzecniv was talking about was what I would call an assertion. In the interest of clear terminology, consider the following pseudocode -
(defsuite s0 ; (1) (deftest t1 ; (2) (assert-true (= 1 1)))) ; (3)
- (1)
- I call this a suite - it contains one or more tests (or other suites) and hopefully composes the results
- (2)
- I will call this a test - it contains one or more assertions and hopefully composes the results
- (3)
- I will call this an assertion and the
assert-true
or equivalent as a testing function
I do not want to run assertions at compile-time, but I do not compile assertions outside of tests anyway. So, just for Dzecniv and those like him, there is a functionality table on running assertions on compilation.
Some people are just looking for regression testing - they focus on running all the tests at once. For example, the README for Prove does not even mention deftest
or run-test
. Others are looking for TDD or other continuous testing for their development. I use individual tests a lot during development, but obviously need regression testing as well.
Some testing frameworks insist on tracking progress by printing little dots or checkmarks for every assertion or test that passed or failed. Consider a library like uax-9, which has 1,815,582 assertions (yes, the tests are autogenerated). I don't want to waste screen space on that many dots; I don't want to waste time on printing those dots to screen, nor, if a test fails, in trying to find the failed test amidst the haystack of successful tests. Therefore, I prefer a testing framework that allows me to only collect failures, and to turn off progress reporting. Of course, some frameworks do not display progress reports at all - or do so at the test level rather than the assertion level - but you might really want or need those progress reports.
Some frameworks cannot find lexical variables declared in a closure containing the test. Most people will not care, but if you use closures, you might.
Some people find that whether you use deftest v. define-test or similar vocabulary terms makes a difference. As Pierre Neidhardt said: `define-test` and `assert-false` are well chosen words because:
- They are full, legible words, which is more Lispy (as opposed to "deftest").
- Emacs colors them properly by default.
Other contexts are best served by other libraries. Complex fixture requirements might drive you towards something like NST, the need to define your own bespoke assertions might be served by should-test or confidence. Numerical comparisons might lead you to use lisp-unit, lisp-unit2 or confidence (or just writing your own). For macro-expansion testing, clunit, clunit2, lisp-unit, lisp-unit2 and rove have specific assertion functions. Your situation should govern which testing framework you use, not your project skeleton or market share.
I occasionally hear "extensibility" used as a buzzword applied to a framework. I'm not sure what it means to people. Consider the following example in sxql
, which is not a testing framework but which included this macro in its own test file to simplify its use of prove
-
(defmacro is-mv (test result &optional desc) `(is (multiple-value-list (yield ,test)) ,result ,desc)) (is-mv (select ((:+ 1 1))) '("SELECT (? + ?)" (1 1)) "field")
This is not hard to do, but may not be as composable.
In any event, yes, I will state my opinions, but what you think is important will drive your preferences in testing frameworks.
3. Testing Libraries Considered
3.1. Testing Frameworks
Library | Homepage | Author | License | Last Update |
---|---|---|---|---|
1am | homepage | James Lawrence | MIT | 2014 |
2am 1 | homepage | Daniel Kochmański | MIT | 2016 |
cacau | homepage | Noloop | GPL3 | 2020 |
cardiogram | homepage | Abraham Aguilar | MIT | 2020 |
clunit 2 | homepage | Tapiwa Gutu | BSD | 2017 |
clunit2 | homepage | Cage (fork of clunit) | BSD | 2022 |
com.gigamonkeys.test-framework | homepage | Peter Seibel | BSD | 2010 |
confidence | homepage | Michaël Le Barbier | MIT | 2023 |
fiasco 3 | homepage | João Távora | BSD 2 Clause | 2020 |
fiveam | homepage | Edward Marco Baringer | BSD | 2020 |
lift | homepage | Gary Warren King | MIT | 2019 4 |
lisp-unit | homepage | Thomas M. Hermann | MIT | 2017 |
lisp-unit2 | homepage | Russ Tyndall | MIT | 2018 |
nst | homepage | John Maraist | LLGPL3 latest | 2021 |
parachute | homepage | Yukari Hafner | zlib | 2021 |
prove (archived) | homepage | Eitaro Fukamachi | MIT | 2020 |
ptester | homepage | Kevin Layer | LLGPL | 2016 |
rove | homepage | Eitaro Fukamachi | BSD 3 Clause | 2022 |
rt | none | Kevin M. Rosenberg | MIT | 2010 |
should-test | homepage | Vsevolod Dyomkin | MIT | 2019 |
simplet | homepage | Noloop | GPLv3 | 2019 |
stefil 5 | homepage | Attila Lendvai, Tamas Borbely, Levente Meszaros | BSD/Public Domain | 2018 |
tap-unit-test 6 | homepage | Christopher K. Riesbeck, John Hanley | MIT | 2017 |
try | homepage | Gábor Melis | MIT | 2022 |
unit-test | homepage | Manuel Odendahl, Alain Picard | MIT | 2012 |
xlunit | homepage | Kevin RosenBerg | BSD | 2015 |
xptest | none | Craig Brozensky | Public Domain | 2015 |
3.2. Speciality Libaries
Library | Homepage | Author | License | Last Update |
---|---|---|---|---|
checkl | homepage | Ryan Pavlik | LLGPL, BSD | 2018 |
Library | Homepage | Author | License | Last Update | Selenium |
---|---|---|---|---|---|
cl-selenium-webdriver | homepage | TatriX | MIT | 2018 | 2.0 |
selenium | homepage | Matthew Kennedy | LLGPL | 2016 | 1.0? |
The selenium interfaces are here for reference purposes and are not further discussed.
3.3. Helper Libraries
Library | Homepage | Author | License | Last Update |
---|---|---|---|---|
assert-p | homepage | Noloop | GPL3 | 2020 |
assertion-error | homepage | Noloop | GPL3 | 2019 |
check-it | homepage | Kyle Littler | LLGPL | 2015 |
cl-fuzz | homepage | Neil T. Dantam | BSD 2 Clause | 2018 |
cl-quickcheck | homepage | Andrew Pennebaker | MIT | 2020 |
cover | homepage | Richard Waters | MIT | 1991 |
hamcrest | homepage | Alexander Artemenko | BSD 3 Clause | 2022 |
mockingbird | homepage | Christopher Eames | MIT | 2017 |
portch (not in quicklisp) | homepage | Nick Allen | BSD 3 Clause | 2009 |
protest | homepage | Michał Herda | LLGPL | 2020 |
rtch (not in quicklisp) | download | David Thompson | LLGPL | 2008 |
slite | homepage | Arnold Noronha | Apache 2.0 | 2022 |
testbild | homepage | Alexander Kahl | GPLv3 | 2010 |
test-utils | homepage | Leo Zovic | MIT | 2020 |
assert-p
, assertion-error
, check-it
, cl-fuzz
, cl-quickcheck
, cover
, hamcrest
, protest
, slite
, testbild
and test-utils
are not, per se, testing frameworks. They are designed to be used in conjunction with other testing frameworks.
check-it
andcl-quickcheck
are randomized property-based testing libraries (Quickcheck style). See https://en.wikipedia.org/wiki/QuickCheckcl-fuzz
is another variant of testing with random data.assert-p
andAssertion-error
are collections of assertions or assertion error macros that can be used in testing frameworks or by a test runner.cover
is a test coverage library, much like SBCL's sb-cover, CCL's code-cover, or LispWorks Code Coveragehamcrest
uses pattern matching for building tests.mockingbird
provides stubbing and mocking macros for unit testing. These are used when specified functions in a test should not be computed but should instead return a provided constant value.portch
helps organize tests written with Franz's portable ptester libraryprotest
is a wrapper around other testing libraries, currently1am
andparachute
. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step.rtch
helps organize RT tests based on their position in a directory hierarchyslite
is a Slite stands for SLIme TEst runner (also works with SLY). Slite interactively runs your Common Lisp tests (currently only FiveAM and Parachute are supported). It allows you to see the summary of test failures, jump to test definitions, rerun tests with debugger all from inside Emacs.testbild
provides a common interface for unit testing output, supporting TAP (versions 12 and 13) and xunit styles.test-utils
provides convenience functions and macros forprove
andcl-quickcheck
.
3.4. Dependencies
Libraries not in the table below do not show any dependencies in their asd files.
Library | Dependencies |
---|---|
cacau | eventbus, assertion-error |
checkl | marshal |
fiasco | alexandria, trivial-gray-streams |
fiveam | alexandria, net.didierverna.asdf-flv, trivial-backtrace |
lisp-unit2 | alexandria, cl-interpol, iterate, symbol-munger |
nst | (#+(or allegro sbcl clozure openmcl clisp) closer-mop, org-sampler) |
parachute | documentation-utils, form-fiddle |
prove | cl-ppcre, cl-ansi-text, cl-colors, alexandria, uiop |
rove | trivial-gray-streams, uiop |
should-test | rutils, local-time, osicat, cl-ppcre |
4. Quick Summary
4.1. Opinionated Awards
For those who want the opinionated quick summary. The awards are -
Best General Purpose: Parachute followed by Lisp-Unit2.
Parachute hits almost everything on my wish list - optional progress reports and debugging, good suite setup and reporting, good default error-reporting and the ability to provide diagnostic strings with variables, the ability to skip failing test dependencies, to set time limits on tests, to report the time for each test and decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on. While it is not the fastest, it is in the pack as opposed to the also-rans.
Atlas Engineering makes a strong case for Lisp-Unit2 here.
My use cases give Parachute a slight edge because I can get more detail on assertions within a test than with Lisp-Unit2. Atlas Engineering's use cases give Lisp-Unit2 a slight edge because its tests are functions. Your choice will depend on your use case.
My next pick would be Fiasco, but I like Parachute and Lisp-Unit2's fixture capability and suite setup better.
(Update 13 June 2021 - based on the latest update of Clunit2, it needs to be included for consideration as well) (Update 16 Jan 2023 - maybe consider the two newest entries Confidence and Try)
- If Only Award: Lift If only it reported all failing assertions and did not stop at the first one. Why? Why can't I change this?
- If you only care about speed: Lift and 2am Go to Benchmarking
- Best General Purpose Fixtures (Suite/Tag and test level): Lisp-Unit2 and Lift
- Ability to reuse tests in multiple suites: Lisp-Unit2 (because of composable tags), Try
- If you need tests to take parameters: Fiasco, Confidence and Try
- If you need progress reporting to be optional: Parachute, Lisp-Unit2, Fiasco, Clunit2, or Try
Favorite Hierarchy Setup (nestable suites): Parachute and Lisp-Unit2 (which has a different setup using tags)
Everything is a test and its
:parents
all the way up; can easily specify parents at the child level.Honorable mentions - 2am and Lift
- Assertions that take diagnostic comments with variables: Parachute, Fiasco, 2am, Fiveam, Lift, Clunit2, Confidence, Try This is something that I like for debugging purposes along with whatever reporting comes built in with the framework. See error-reporting
- Values expression testing: Lisp-Unit2, Lisp-Unit, Parachute, (Update Clunit2 and Try as well)
- I want to track if my functions changed results: Checkl
- Tests that specify suite or tags (does not rely on location in file): Parachute, Lisp-Unit (tags), Lisp-Unit2(tags), Lift, Clunit2
- Heavy duty complex fixtures: NST (but there are trade-offs in the shape of the learning curve and performance)
- Ability to define new assertions: Confidence, NST, Try (but they have their issues in other areas)
- Ability to rerun failures only: Fiasco, Lisp-Unit2, Try (you can extend Parachute and Fiveam to get this, but it is not there now)
- Favorite Random Data Generator: Check-it
- Can redirect output to a different stream (a): Clunit2, Confidence, Fiasco, Lift, Lisp-Unit, Lisp-Unit2, RT and Try
- Randomized Property Tests: Check-it with any framework
- Choice of Interactive Debugging or Reporting: Most frameworks at this point
- Rosetta Stone Award for reading different test formats: Parachute (can read Fiveam, Prove and Lisp-Unit tests)
- Code Coverage Reports: Use your compiler
- I use it because it was included in my project skeleton generator: Prove
(a) Most frameworks just write to *standard-output*
so you have to redirect that to a file.
4.2. Features Considered
- Ease of use and documentation: Most of the frameworks are straightforward. Some have no documentation, others have partial documentation (often documenting only one use case). The documentation may be out of sync with the code. Some get so excited about writing up the implementation details that it becomes difficult to see the forest for the trees. NST has a high learning curve. Prove and Rove will require digging into the source code if you want to do more than simple regression testing. Lift has a lot of undocumented functionality that might be just what you need but you have no way of knowing.
- Tests
- Tests should take multiple assertions and report ALL the assertion failures in the test (Looking at you Lift, and Xlunit - I put multiple assertions into a test for a reason, please do not lose some of the evidence.)
- Are tests functions or otherwise funcallable? (Faré and others requested this in an exchange with Tapiwa, the author of Clunit, back in 2013. At the same time others want or do not want test names in the function namespace. You choose your preference. Those who want funcallable tests typically cite either the ability to program running the test or the ability to go to definion from test name.)
- Immediate access to source code (Integration with debugger or funcallable tests?)
- Does a failure or error throw you immediately into the debugger, never into the debugger, and is that optional?
- Easy to test structures/classes (does the framework provide assistance in determining that all parts of a structure or class meet a test)
- Tests can call other tests (This is not the same as funcallable tests. To be useful this does require a minimum level of test labeling in the reporting.)
- Assertions (aka Assertion Functions)
- There are frameworks with only a few assertion test functions. There are frameworks with so many assertions that you wonder if you have to learn them all. The advantage of specialized assertions is less typing, possibly faster (or slower) performance and possibly relevant built-in error messages. You will have to check for yourself whether performance is positively or negatively impacted. You have to decide for yourself how much weight to put on extra assertions like having
assert-symbolp
instead of(is (symbolp x))
. - Assertions that either automatically explain why the the test failed or allow a diagnostic string that describes the assertion and what failed. (Have you ever seen a test fail but the report of what it should have been and what the result was look exactly the same? Maybe the test required EQL and you thought it was EQUALP? These might or might not help)
- Can assertions can access variables in a closure containing the test? (Most frameworks can, but Clunit, Clunit2, Lisp-Unit, Lisp-Unit2 and NST cannot).
- Do the assertions have macroexpand assertion functions? (Clunit, Clunit2, Lisp-Unit, Lisp-Unit2, Prove, Rove and Tap-Unit-Test have this)
- Do the assertions have floating point and rational comparisons or do you have to write your own? (Confidence, Lift, Lisp-Unit, Lisp-Unit2, have these functions for you.)
- Signal and condition testing or at least be able to validate that the right condition was signalled.
- Definable assertions/criteria (can you easily define additional assertions?)
- Do assertions or tests run on compilation (C-c C-c in the source file)?
- Do the assertions handle values expressions? Most frameworks accept a values expression but compare just the first value. Fiveam complains about getting a values expression and throws an error. Parachute and NST will compare a single values expression against multiple individual values. Prove will compare a values expression against a list. Lisp-Unit and Lisp-Unit2 (Update Clunit2) will actually compare two values expressions value by value.
- There are frameworks with only a few assertion test functions. There are frameworks with so many assertions that you wonder if you have to learn them all. The advantage of specialized assertions is less typing, possibly faster (or slower) performance and possibly relevant built-in error messages. You will have to check for yourself whether performance is positively or negatively impacted. You have to decide for yourself how much weight to put on extra assertions like having
- Easy to set up understandable suites and hierarchies or tags. Many frameworks automatically add tests to the last test suite that was defined. That it makes things easy if you work very linearly or just in files for regression testing. If you are working in the REPL and switching between multiple test sub-suites that can create unexpected behavior. I like to able to specify the suite (or tags) when defining the test, but that creates more unecessary typing if you work differently.
- Choice of Interactive (drop directly into the debugger) or Reporting (run one or more tests and show which ones fail and which ones pass).
- Data generators are nice to have, but the helper libraries Check-it and Cl-Quickcheck can also be used and probably have more extensive facilities.
- Easy to setup and clean up Fixtures
- Composable fixtures (fixtures for multiple test suites can be composed into a single fixture)
- Freezing existing data while a test temporarily changes it
- Systems that can wrap around tests, allowing you to pass local variables around.
- Compilation: Some people want the ability to compile before running tests for two reasons. First, deferred compilation can seriously slow down extensive tests. Second, getting compile errors and warnings at the test run stage can be hard to track down in the middle of a lot of test output. Other people want deferred compilation (running the test compiles it, so no pre-compilation step required) and tested functions which have changed will get picked up when running the test.
- Reports
- Easy to read reports with descriptive comments (this requires that each test have description or documentation support)
- Does the framework have progress reporting, at what level and can it be turned off?
- Report just failing tests with descriptive info
- Composable Reports (in the sense of a single report aggregating multiple tests or test suites)
- Reports to File. I know most developers do not care, but I have seen situations where the ability to prove that the software at date A is documented to have passed xyz tests would have been nice. See Dribble and Output Streams
- Test Timing. See Timing
- TAP Output (some people like to pass this test results in this format on to other tools).
- Reports of Function (and parameter) test coverage (Rove was the only framework that has something in this area and it depends on using SBCL. I would suggest looking to your compiler and did not test this.)
- Error tracking (Do test runs create a test history so that you can run only against failing tests?) As far as I can tell, no framework creates a database to allow historical analysis.
- Test Sequencing Shuffling
- Can choose test sequencing or shuffle
- Can choose consistent or random or fuzzing data
- Can choose just the tests that failed last time (Chris Riesbeck exchange with Tapiwa in 2013)
- Ability to skip tests Skipping
- Skip tests
- Skip assertions
- Skip based on implementations
- also skip tests that exceed a certain time period
- Benchmarks In general, functionality should matter to you more than benchmarks. The timing benchmark provided here is a regression test on UAX-15 with 16 tests containing 343332 assertions run 10 times. While useful for regression tests, this is not necessarily indicative of development testing.
- Asynchronous and parallel testing (not tested in this report)
- Case safety (Max Mikhanosha asked for this an an exchange with Tapiwa in 2013. Not tested in this report)
- Memory, time and resource usage reports (no one documented this and I did not dive into the source code looking for it.)
I am not covering support for asdf package-inferred systems, roswell script support and integration with travis ci, github actions, Coveralls, etc. If someone wants to do that and submit a pull request, I am open to that.
I am not including a pie chart describing which library has market share because (a) I do not like pie charts and (b) I do not believe market share is a measure of quality. That being said, because someone asked nicely, I pulled the following info out of quicklisp just based on who-depends-on
. The actual count in the wild is completely unknown.
Name | Count |
---|---|
1am | 22 |
2am | 0 |
fiveam | 323 |
clunit | 11 |
clunit2 | 4 |
confidence | 2 |
fiasco | 24 |
lift | 54 |
lisp-unit | 42 |
lisp-unit2 | 21 |
nst | 10 |
parachute | 49 |
prove | 163 |
ptester | 5 |
rove | 31 |
rt | 29 |
should-test | 3 |
try | 7 |
xlunit | 4 |
xptest | 0 |
5. Functionality Comparison
5.1. Hierarchy Overview
Name | Hierarchies/suites/tags/lists | Composable | Reports |
---|---|---|---|
1am | ❌️ (2)(5) | ❌️ | ❌️ |
2am | ✅️ | ✅️ (5) | (4) |
cacau | (6) | (4) | |
clunit | ✅️ | ✅️ | (4) |
clunit2 | ✅️ | ✅️ | (4) |
confidence | ❌️ (9) | ✅️ | ✅️ |
fiasco | ✅️ | ✅️ | |
fiveam | ✅️ | ✅️ | |
gigamonkeys | ❌️ | ||
lift | ✅️ | ✅️ | |
lisp-unit | (tags) (3) | (1,4) | |
lisp-unit2 | (tags) (3)(5) | ✅️ (5) | (1,4) |
nst | ✅️ | ✅️ | |
parachute | ✅️ | ✅️ | (1) |
prove | ✅️ | ✅️ | (4) |
ptester | ❌️ | ||
rove | (7) | (7) | |
rt | package | (8) | |
should-test | package | ||
simplet | ❌️ | ||
tap-unit-test | ❌️ | (4) | |
try | ✅️ | ✅️ (5) | ✅️ |
unit-test | ✅️ | ✅️ | |
xlunit | ✅️ | ✅️ | |
xptest | ✅️ | ❌️ |
- report objects are provided which are expected to be extended by the user
- uses a flat list of tests. You can pass any list of test-names to run. See, e.g. macro provided by Phoe in the 1am discussion.
- lisp-unit and lisp-unit2 organize by packages and by tags. You can run all the tests in a package, or all the tests for a list of tags, but they do not have the strict sense of hierarchy that other libraries have.
- TAP Formatted Reports are available
- Because tests are functions, tests can call other functions so you can create ad-hoc suites or hierarchies.
- Has suites but no real capacity to run them independently - all or nothing
- Rove's
run-suite
function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing. Rove'srun
function does accept a style parameter but seems to handle only package-inferred systems. I confirm Rove's issue #42 that it will not run with non-package inferred systems. - RT does not have suites per se. You can run all the tests that have been defined using the DO-TESTS function. By default it prints to
*standard-output*
but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice. do-tests will print the results for each individual test and then summarize with something like the following: - Tests are hierarchical and compose results but there is no tagging system. Each suite is a function that can directly be executed.
5.2. Run on compile, funcallable tests
There are multiple benefits to tests being funcallable. One of them being that you can "go to the test definition" easily.
Library | Run on compile | Are Tests Funcallable? |
---|---|---|
1am | A | Y |
2am (not in quicklisp) | A | Y |
cacau | N | N |
clunit | A | N |
clunit2 | A | N |
confidence | N | Y |
fiasco | A | Y |
fiveam | Optional | N |
gigamonkeys | N | N |
lift | A, T(1) | N |
lisp-unit | N | N |
lisp-unit2 | N | Y |
nst | N | N |
parachute | N | N |
prove | A | N |
ptester | N | N |
rove | A | N |
rt | N | N |
should-test | N | N |
tap-unit-test | N | N |
try | T(4) | Y |
unit-test | N | N |
xlunit | T(2) | N |
xptest | N | N |
- A means assertions run on compile, T means tests run on compile
- (1) if compiled at REPL
- (2) Optional by test, specified at definition:
(def-test-method t1 ((test tf-xlunit) :run nil) body)
- (3)
*run-test-when-defined*
controls this option - (4) =*run-deftest-when* controls this option
5.3. Fixtures
Library | Fixtures | Suite Fixtures | Test Fixtures | Multiple Fixtures |
---|---|---|---|---|
1am | ❌️ | |||
2am (not in quicklisp) | ❌️ | |||
cacau | ✅️ | ✅️ | ✅️ | |
clunit | ✅️ | ✅️ | ✅️ | ✅️ |
clunit2 | ✅️ | ✅️ (c) | ✅️ | ✅️ |
confidence | ❌️ | |||
fiasco | ❌️ | |||
fiveam (a) | K | ✅️ | ✅️ | |
gigamonkeys | ❌️ | |||
lift | ✅️ | ✅️ | inherited from higher level suites | |
lisp-unit | ❌️ | |||
lisp-unit2 | ✅️ | ✅️ | ||
nst | ✅️ | ✅️ | ✅️ | ✅️ |
parachute | ✅️ | ✅️ | ✅️ | ✅️ |
prove | ❌️ | |||
ptester | ❌️ | |||
rove | ✅️ | ✅️ | ✅️ | ✅️ |
rt | ❌️ | |||
should-test | ❌️ | |||
tap-unit-test | ❌️ | |||
try | ❌️ | |||
unit-test (b) | ✅️ | (b) | (b) | (b) |
xlunit | ✅️ | ✅️ | ✅️ | ✅️ |
xptest | ✅️ | ✅️ |
(a) Not really recommended, but does exist.
(b) Users are expected to create a subclass of the unit-test class using the define-test-class
macro.
(c) Only one fixture per suite
5.4. Control over debugging, and user-provided diagnostic messages
Does a failure (not error) trigger the debugger, is it optional, and do assertions allow user-provided diagnostic messages. If yes, can you further provide variables for a failure message?
Library | Failure triggers debugger | Diagnostic Messags in Assertions |
---|---|---|
1am | (always) | N |
2am | (optional) | with vars |
cacau | (optional) | N |
clunit | (optional) | with vars |
clunit2 | (optional) | with vars |
confidence | (never) | with vars |
gigamonkeys | (optional) | N |
fiasco | (optional) | with vars |
fiveam | (optional) | with vars |
lift | (optional) | with vars |
lisp-unit | (optional) | Y |
lisp-unit2 | (optional) | Y |
nst | (optional) | N |
parachute | (optional) | with vars |
prove | (optional) | Y |
ptester | (optional) | N |
rove | (optional) | Y |
rt | (never) | N |
should-test | (never) | Y |
simplet | (never) | N |
tap-unit-test | (optional) | Y |
try | (optional) | Y |
unit-test | (never) | Y |
xlunit | (never) | Y |
xptest | (never) | N |
Also see error-reporting
5.5. Output of Run Functions (other than what is printed to the stream)
Library | Function | Returns |
---|---|---|
1am | run | nil |
2am (not in quicklisp) | run | nil |
cacau | run | nil |
clunit | run-test, run-suite | nil |
clunit2 | run-test, run-suite | nil |
confidence | name-of-test | nil |
fiasco | run-tests | test-run object |
fiveam | run | list of test-passed, test-skipped, test-failure objects |
run! | nil | |
gigamonkeys | test | nil |
lift | run-test, run-tests | results object |
lisp-unit | run-tests | test-results-db object |
lisp-unit2 | run-tests | test-results-db object |
nst | :run | nil |
parachute | test | a result object |
prove | run | Returns 3 multiple-values, a flag if the tests passed as T or NIL, passed test files as a list and failed test files also as a list. |
run-test-system | passed-files, failed-files | |
run-test | nil | |
ptester | with-tests | nil |
rove | run-test, run-suite | t or nil |
rt | do-test | nil |
should-test | test | hash-table (1) |
tap-unit-test | run-tests | nil |
try | try | trial object |
unit-test | run-test | test-equal-result object |
xlunit | textui-test-run | test-results-object |
xptest | run-test | list of test-result objects |
(1) Should-test: at the lowest level should returns T or NIL and signals information about the failed assertion. This information is aggregated by deftest which will return aggregate information about all the failed assertions in the hash-table at the highest level test will once again aggregate information over all tests.
5.6. Progress Reports
Does the framework provide a progress report, is it optional, and does it run just at the test level or also at the asserts level?
Library | Progress Reports |
---|---|
1am | Every assert |
2am | Every assert |
cacau | optional |
clunit | optional |
clunit2 | optional |
confidence | never |
gigamonkeys | never |
fiasco | optional |
fiveam | optional (1) |
lift | never |
lisp-unit | never |
lisp-unit2 | optional |
nst | Every test |
parachute | optional |
prove | Every assert |
ptester | Every assert |
rove | Optional |
rt | Every test |
should-test | Every assert |
simplet | Every test |
tap-unit-test | never |
try | optional (2) |
unit-test | Every test |
xlunit | never |
xptest | never |
(1) The following will allow fiveam to run without output
(2) What to print is parameterized by event type (e.g. everything, nothing, UNEXPECTED-FAILURE
, etc).
(let ((fiveam:*test-dribble*
(make-broadcast-stream)))
(fiveam:run! …))
5.7. Skipping, Shuffling and Re-running
Name | Skip failing dependencies | Shuffle | Re-run only failed tests |
---|---|---|---|
1am | Y (auto) | ||
2am | Y (auto) | ||
cacau | S, T | ||
clunit | D | Y (auto) | |
clunit2 | D | Y (auto) | Y |
confidence | Y | ||
fiasco | P(1), A | Y | |
fiveam | P(2) | (3) | |
gigamonkeys | |||
lift | T | ||
lisp-unit | |||
lisp-unit2 | Y | ||
nst | |||
parachute | D, C, P | Y | |
prove | (4) | ||
ptester | |||
rove | A | ||
rt | |||
should-test | N | Y | |
simplet | P | ||
tap-unit-test | |||
try | S,T.A | Y | Y (5) |
unit-test | |||
xlunit | |||
xptest |
D - failing dependencies, C - children, P - pending, S - suites, T - tests, A - assertions
- skip based on conditions when and skip-unless
- skip when specified
- run! returns a list of failed-test-results that you could save and use for this purpose
- Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped.
- What to re-rerun is parameterized by event type.
5.8. Timing Reporting and Time Limits
Library | Time Reporting | Time Limits |
---|---|---|
1am | N | N |
2am (not in quicklisp) | N | N |
cacau | N | Y(T or S) |
clunit | N | N |
clunit2 | N | N |
confidence | N | N |
fiasco | N | N |
fiveam (a) | ? | N |
gigamonkeys | N | N |
lift | Y | Y |
lisp-unit | Y | N |
lisp-unit2 | Y | N |
nst | Y | Y |
parachute | Y | Y |
prove | N | Y |
ptester | N | N |
rove | N | N |
rt | N | N |
should-test | N | N |
tap-unit-test | Y | N |
try | Y | Y |
unit-test | N | N |
xlunit | N | N |
xptest | N | N |
(a) Fiveam has some undocumented profiling capabilities that I did not look at
5.9. Dribble and Output Streams
Library | Dribble | output streams |
---|---|---|
1am | N | S |
2am (not in quicklisp) | N | S |
cacau | N | S |
clunit | N | S |
clunit2 | N | *test-output-stream* |
confidence | N | optional parameter |
fiasco | N | optional parameter |
fiveam | Y *test-dribble* |
S |
gigamonkeys | N | S |
lift | Y *lift-dribble-pathname* |
optional parameter |
lisp-unit | N | optional parameter |
lisp-unit2 | N | *test-stream* |
nst | N | optional parameter |
parachute | N | (setf output) |
prove | N | *test-result-output* |
ptester | N | S |
rove | N | *report-stream* |
rt | N | optional parameter |
should-test | N | *test-output* |
tap-unit-test | N | S |
try | N | optional parameter |
unit-test | N | S |
xlunit | N | S |
xptest | N | S |
Where S is *standard-output*
5.10. Edge Cases: Float Testing, Value Expressions and Closure Variables
top This table is looking at whether the framework provides float equality tests, looks at all the values coming from a values expression, and can access variables declared in a closure surrounding the test.
Name | float tests | Handles value expressions | Variables in Closures |
---|---|---|---|
1am | First value only | Y | |
2am | First value only | Y | |
cacau | First value only | Y | |
clunit | First value only | N | |
clunit2 (a) | Y | N | |
confidence | Y | First value only | Y |
fiasco | First value only | Y | |
fiveam | N | N | |
gigamonkeys | First value only | Y | |
lift | First value only | N | |
lisp-unit | Y | Y | N |
lisp-unit2 | Y | Y | N |
nst | Y | N | |
parachute | Y | Y | |
prove | Y | Y | |
ptester | First value only | Y | |
rove | First value only | Y | |
rt | N | N | |
should-test | First value only | N | |
tap-unit-test | Y | N | |
try | Y | Y | Y |
unit-test | First value only | Y | |
xlunit | First value only | Y | |
xptest | relies on CL predicates | Y |
(a) Updated 13 June 2021
5.11. Vocabulary - Define and Assert
top This table is looking at whether the framework uses define- and assert- which provide makes it easier to read as Emacs will color them. This is important to at least one group that commented on earlier versions of this report.
Name | Define- | Assert- |
---|---|---|
1am | ||
2am | ||
cacau | ||
clunit | assert-true, assert-condition | |
clunit2 | assert-true, assert-condition | |
confidence | define-testcase | assert-true, assert-p, assert-t |
fiasco | ||
fiveam | ||
gigamonkeys | ||
lift | ||
lisp-unit | define-test | asset-true, assert-error, assert-result, assert-test, check-type |
lisp-unit2 | define-test | assert=, assert/=, asssert-char=, assert-char-equal,assert-char/=, assert-char-not-equal, assert-eq, assert-eql, assert-equal, assert-equality, assert-equalp, assert-error, assert-expands, assert-fail, assert-false, assert-float-equal, assert-no-error, assert-no-signal, assert-no-warning, assert-norm-equal, assert-number-equal, assert-numerical-equal, assert-passes?, assert-prints, assert-rational-equal, assert-sigfig-equal, assert-signal, assert-string=, assert-string-equal, assert-string/=, assert-string-not-equal, assert-true, assert-typep, assert-warning, assertion-fail, assertion-pass, check-type, logically-equal |
nst | ||
parachute | define-test | |
prove | ||
ptester | ||
rove | ||
rt | ||
should-test | ||
tap-unit-test | define-test | assert-true, assert-error |
try | ||
unit-test | ||
xlunit | assert-true, assert-condition | |
xptest | defmethod |
5.12. Compatibility and Customizable Assertions
Name | compatibility layers | Customizeable Assertion Functions |
---|---|---|
cacau | Y | |
confidence | Y | |
parachute | fiveam lisp-unit prove | |
nst | Y |
(a) Running suites without tests or tests without test functions will result in tests marked PENDING rather than success or fail
5.13. Claims Not Tested
Name | Async | Thread Ready | Package Inferred |
---|---|---|---|
1am | X | ||
2am | X | ||
Cacau | X | ||
Rove | X (1) | X |
(1) Tycho Garen reported in February 2021 that "Rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."
6. Assertion Failure Comments
There are two reasons you test. First, to pat yourself on the back when all test pass. Second, to find any bugs. Assertions in the test frameworks have different amounts of automatically generated information that they will provide on failures. The following are the automatically generated failure messages on an assertion that (= x y) where x is 1 and y is 2. We also note whether the framework also accepts diagnostic strings and variables for those strings.
6.1. 1am
What, you wanted a report? Let me introduce you to the debugger.
6.2. 2am
Assertions also accept diagnostic strings with variables
T1-FAIL-34: FAIL: (= X Y)
6.3. cacau
Error message: BIT EQUAL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 2
6.4. clunit and clunit2
Assertions also accept diagnostic strings with variables
T1-FAIL-34: Expression: (= X Y) Expected: T Returned: NIL
6.5. confidence
Into the debugger you never go.
Test assertion failed: (ASSERT-T (= X Y)) In this call, the composed forms in argument position evaluate as: (= X Y) => NIL The assertion (ASSERT-T EXPR) is true, iff EXPR is a true generalised boolean.
6.6. fiasco
Assertions also accept diagnostic strings with variables
Failure 1: FAILED-ASSERTION when running T1-FAIL Binary predicate (= X Y) failed. x: X => 1 y: Y => 2
6.7. fiveam
Assertions also accept diagnostic strings with variables. I deleted several blank lines. Why do you waste so much screen space Fiveam?
T1-FAIL-34 []: Y evaluated to 2 which is not = to 1
6.8. gigamonkeys
FAIL ... (T1-FAIL): (= X Y) X => 1 Y => 2 (= X Y) => NIL
6.9. lift
Assertions also accept diagnostic strings with variables
Failure: s0 : t1-fail-34
Documentation: NIL
Source : NIL
Condition : Ensure failed: (= X Y) ()
During : (END-TEST)
Code : (
((LET ((X 1) (Y 2))
(ENSURE (= X Y)))))
6.10. lisp-unit
Assertions also accept diagnostic strings but no variables
Failed Form: (= X Y) | Expected T but saw NIL | X => 1 | Y => 2
6.11. lisp-unit2
Assertions also accept diagnostic strings but no variables
| FAILED (1)
| Failed Form: (ASSERT-TRUE (= X Y))
| Expected T
| but saw NIL
6.12. parachute
Assertions also accept diagnostic strings with variables
test 't1-fail-34) ? TF-PARACHUTE::T1-FAIL-34 0.000 ✘ (is = x y) 0.010 ✘ TF-PARACHUTE::T1-FAIL-34 ;; Failures: 1/ 1 tests failed in TF-PARACHUTE::T1-FAIL-34 The test form y evaluated to 2 when 1 was expected to be equal under =.
6.13. ptester
Test failed: Y wanted: 1 got: 2
6.14. prove
Assertions also accept diagnostic strings but no variables
× NIL is expected to be T (prove)
6.15. rove
Assertions also accept diagnostic strings but no variables
(EQUAL X Y) (rove) X = 1 Y = 2
6.16. rt
Form: (LET ((X 1) (Y 2))
(= X Y))
Expected value: T
Actual value: NIL.
6.17. should-test
Assertions also accept diagnostic strings but no variables
Test T1-FAIL-34: Y FAIL expect: 1 actual: 2 FAILED
6.18. tap-unit-test
Assertions also accept diagnostic strings but no variables
T1-FAIL-34: (= X Y) failed: Expected T but saw NIL
6.19. try
(deftest t1-fail () (let ((x 1) (y 2)) (is (equal 1 2)) (is (= x y) :msg "Intentional failure x does not equal y" :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%" *package* *print-case*)))) (try 't1-fail :print 'unexpected) T1-FAIL ⊠ (IS (EQUAL 1 2)) ⊠ Intentional failure x does not equal y where X = 1 Y = 2 *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE ⊠ T1-FAIL ⊠2 #<TRIAL (T1-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2>
7. Benchmarking
First some points about what I have discovered about benchmarking these frameworks:
- In general, functionality and comfort will drive your framework decision not benchmarks.
That said, bad benchmarks can say one of four things:
a. I did not fully understand the best way to use the framework. As an example, my first naive version of the test for confidence had a runtime of 742 seconds. The author showed me how to rewrite the tests and it dropped to 24 seconds. This could very well be the explanation for the bad results of nst. (If someone experienced with nst would like to re-write that benchmark test, please let me know.)
b. There is a problem in the framework code. The author of clunit2 cut the run time from the original runtime of 600 seconds to 14 seconds.
c. If your framework reports using what emacs thinks is long lines, run the tests in a terminal, do not run it in an emacs repl. As an example, fiveam's runtime was 10 seconds, but real-time (total time to get a result in the emacs repl was 37 minutes).
d. There might be something of interest for the SBCL and CCL developers. See, e.g. the wildly different result between the results for try and prove.
The benchmarks are based on a regression test, not development or functional testing. All the benchmark times below were done in a terminal window with SBCL version 2.3.0 and CCL version 1.12.1 on a linux server. I tried to rewrite the tests for UAX-15 for each framework. The uax-15 tests have 16 separate tests with a total of 343332 assertions (all of which pass) and the assertions are all straight-forward. The tests were stripped to the minimum. No diagnostic strings were used. For the frameworks which allowed it, the test was set to no progress reporting and overall summary only. Trivial-benchmark was used with 10 repetitions for each test. (Since Cacau does not run tests again unless they are recompiled, I have multiplied a single run by 10 to get some kind of comparable.)
Since all of the assertions pass, any real world test with failing assertions generating failure reports will be different.
Unsurprisingly, the simplest frameworks were the fastest. Your context will be important as to whether these benchmarks are at all meaningful to you.
7.1. Stack Ranking
Considering that the benchmark is based on 10 test runs of 16 tests with 343332 passing assertions (3433320 total assertions, 160 tests), test speed on regression tests are not going to drive your decision. Development and functional testing will obviously have a different result.
Library | SBCL RunTime | CCL Runtime |
---|---|---|
xptest | 5.8903 | 11.7284 |
xlunit | 5.9102 | 11.7532 |
cacau | 6.0173 | 11.6543 |
lift | 6.0686 | 11.9706 |
1am | 6.1541 | 13.6843 |
ptester | 6.2000 | 12.5130 |
rt | 6.2079 | 11.9097 |
2am | 6.2408 | 14.0614 |
unit-test | 6.3616 | 17.6696 |
tap-unit-test | 6.9988 | 13.2284 |
lisp-unit | 7.1250 | 13.4544 |
should-test | 7.1710 | 25.1831 |
gigamonkeys | 7.8872 | 30.7511 |
fiasco | 8.8940 | 38.4574 |
lisp-unit2 | 9.2966 | 30.2440 |
parachute | 9.9155 | 40.3792 |
fiveam | 10.1231 | 19.1292 |
rove | 11.8615 | 35.8269 |
cardiogram | 13.3693 | 29.2526 |
clunit2 | 14.3416 | 35.9992 |
try | 14.6188 | 177.8841 |
confidence | 24.7286 | 56.8546 |
prove | 30.5618 | 132.1456 |
nst | 517.8853 | 500.4885 |
Library | Bytes Consed |
---|---|
2am (not in quicklisp) | 3361006176 |
1am | 3361283232 |
xptest | 3363654192 |
xlunit | 3367157968 |
cacau | 3383879200 |
lift | 3480602800 |
rt | 3586779376 |
unit-test | 3668546192 |
ptester | 3690505472 |
should-test | 3859488496 |
cardiogram | 3965584752 |
fiasco | 4077786880 |
tap-unit-test | 4217541312 |
lisp-unit | 4222413456 |
confidence | 4338353600 |
fiveam | 4512244000 |
com.gigamonkeys.test-framework | 4788180016 |
lisp-unit2 | 4939787968 |
clunit | 5262303120 |
parachute | 5212345824 |
rove | 7427051216 |
try | 9019833552 |
clunit2 (a) | 15407395920 |
prove | 14018185696 |
clunit2 (b) | 15377667616 |
nst | 306684151680 |
Library | Eval Calls |
---|---|
1am | 0 |
cacau | 0 |
com.gigamonkeys.test-framework | 0 |
fiveam | 0 |
confidence | 0 |
lift | 0 |
lisp-unit2 | 0 |
parachute | 0 |
prove | 0 |
ptester | 0 |
rove | 0 |
should-test | 0 |
unit-test | 0 |
xlunit | 0 |
xptest | 0 |
clunit2 (a) | 0 |
2am (not in quicklisp) | 0 |
fiasco | 10 |
lisp-unit | 160 |
tap-unit-test | 160 |
clunit | 320 |
rt | 480 |
nst | 6860220 |
Now the detailed report.
7.2. 1am
1am seems to have no way to turn off the progress reports. The benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines. YMMV with other editors.
(benchmark:with-timing (10) (uax-15-1am-tests::run))
Success: 16 tests, 343332 checks.
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 9.609976 0.88333 1.05333 0.943332 0.960998 0.050641
RUN-TIME 10 6.154181 0.607383 0.64799 0.610502 0.615418 0.011578
USER-RUN-TIME 10 6.117711 0.601297 0.631522 0.609088 0.611771 0.008027
SYSTEM-RUN-TIME 10 0.036479 0 0.016462 0.003285 0.003648 0.004761
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 99.246 6.092 39.833 6.396 9.9246 9.983065
BYTES-CONSED 10 3361283232 336076080 336267312 336108464 336128320.0 50528.4
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 25.275585 2.384657 2.649059 2.491271 2.527559 0.080996 RUN-TIME 10 13.684325 1.337679 1.436498 1.360898 1.368432 0.028661
7.3. 2am
2am seems to have no way to turn off the progress reports. As with the 1am benchmark, the benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines.
(benchmark:with-timing (10) (uax-15-2am-tests::run))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 9.886643 0.899998 1.073331 0.993331 0.988664 0.043823
RUN-TIME 10 6.240839 0.618159 0.648082 0.620287 0.624084 0.008487
USER-RUN-TIME 10 6.214425 0.614933 0.644856 0.618156 0.621442 0.008304
SYSTEM-RUN-TIME 10 0.026427 0 0.00694 0.003034 0.002643 0.002041
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 147.903 11.379 36.216 11.669 14.7903 7.227465
BYTES-CONSED 10 3361006176 336067904 336158016 336087456 336100600.0 27877.61
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 29.176336 2.674119 3.128804 2.907567 2.917634 0.11941 RUN-TIME 10 14.061409 1.364417 1.471937 1.403766 1.406141 0.03288
7.4. cacua
Since Cacau does not run tests unless they are recompiled, you need to multiply numbers below by 10 to get some kind of comparable here. Running at the minimum reporting.
(benchmark:with-timing (10) (uax-15-cacau-tests::run :reporter :min)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 1 0.603331 0.603331 0.603331 0.603331 0.603331 0.0 RUN-TIME 1 0.601739 0.601739 0.601739 0.601739 0.601739 0.0 USER-RUN-TIME 1 0.581692 0.581692 0.581692 0.581692 0.581692 0.0 SYSTEM-RUN-TIME 1 0.020046 0.020046 0.020046 0.020046 0.020046 0.0 PAGE-FAULTS 1 0 0 0 0 0 0.0 GC-RUN-TIME 1 24.25 24.25 24.25 24.25 24.25 0.0 BYTES-CONSED 1 338387920 338387920 338387920 338387920 338387920 0.0 EVAL-CALLS 1 0 0 0 0 0 0.0
The CCL version (total multiplied by 10 get try to get a comparable)
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 1 1.168082 1.168082 1.168082 1.168082 1.168082 0 RUN-TIME 1 1.165435 1.165435 1.165435 1.165435 1.165435 0
7.5. cardiogram
- sbcl
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 15.639978 1.529998 1.61333 1.553331 1.563998 0.025854 RUN-TIME 10 13.36939 1.289864 1.409203 1.324704 1.336939 0.03158 USER-RUN-TIME 10 9.034622 0.860167 0.933511 0.897475 0.903462 0.020972 SYSTEM-RUN-TIME 10 4.334774 0.392391 0.479334 0.426708 0.433477 0.025352 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 883.805 70.762 112.345 85.737 88.3805 13.442876 BYTES-CONSED 10 3965584752 396493296 396752736 396527616 396558460.0 73038.71 EVAL-CALLS 10 0 0 0 0 0 0.0
- ccl
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 31.406332 3.098291 3.187677 3.141698 3.140633 0.02607 RUN-TIME 10 29.252617 2.900375 2.954562 2.923258 2.925262 0.017778
7.6. clunit
Clunit has always had a concern about performance. Running this benchmark was painful. Unlike fiveam, which should not be run in a REPL in emacs on tests with lots of assertions because of emacs' issues with long lines, clunit has no one to blame but itself. But look at the CCL results compared to the SBCL results. Clunit was the only framework faster under CCL than SBCL. Still unacceptably slow, but … Wwith SBCL in a terminal.
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 601.4108 57.19953 64.00593 59.65935 60.141087 2.303678 RUN-TIME 10 601.0652 57.161556 63.96824 59.62751 60.106518 2.301759 USER-RUN-TIME 10 600.65216 57.108273 63.941593 59.587543 60.06522 2.305383 SYSTEM-RUN-TIME 10 0.413016 0.019989 0.059948 0.043303 0.041302 0.011839 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 1158.426 87.246 145.034 115.57 115.8426 17.674866 BYTES-CONSED 10 5262303120 526034656 527650448 526069408 526230312 473977.47 EVAL-CALLS 10 320 32 32 32 32 0.0 NIL
The CCL result
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 272.9831 27.003325 27.478271 27.37946 27.298307 0.179919 RUN-TIME 10 272.8588 26.99254 27.466413 27.36916 27.28588 0.179731
7.7. clunit2
Update 13 June 2021: Clunit2 has had a huge performance increase, most of it apparently involving moving from using lists to using arrays. Clunit2 should now be considered a member of the pack from a performance standpoint.
I ran the new improved clunit2 two ways and there is a performance difference to be considered here.
First I let CL equal do the comparision and then clunit2 just checked whether the assertion was true (assert-true) which was how all the other frameworks were also tested.
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 14.846632 1.366663 1.749995 1.389997 1.484663 0.138653 RUN-TIME 10 14.341602 1.36469 1.746867 1.375746 1.43416 0.113668 USER-RUN-TIME 10 13.959459 1.327791 1.650369 1.35222 1.395946 0.095244 SYSTEM-RUN-TIME 10 0.382167 0.020135 0.096501 0.029916 0.038217 0.021336 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 1396.363 79.062 426.267 94.67 139.6363 102.216064 BYTES-CONSED 10 15407395920 1540473248 1542494352 1540569936 1540739592 586165.7 EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 36.061737 3.571143 3.670895 3.591581 3.606174 0.031347 RUN-TIME 10 35.999214 3.565532 3.666779 3.587588 3.599922 0.031904
7.8. confidence
Confidence has no built in capability for running all the tests in a suite or package, so this is based on creating a function that just runs all the tests for uax-15-confidence-tests.
There is no way to turn off the progress report.
- sbcl
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 24.766607 2.373328 2.619993 2.449995 2.476661 0.074266 RUN-TIME 10 24.728697 2.370314 2.620167 2.44499 2.47287 0.074612 USER-RUN-TIME 10 24.206568 2.333738 2.529879 2.408485 2.420657 0.061988 SYSTEM-RUN-TIME 10 0.52216 0.033206 0.119802 0.043091 0.052216 0.025093 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 2399.87 157.192 422.991 199.997 239.987 77.71274 BYTES-CONSED 10 4338353600 430977312 457204272 431014240 433835360 7812895.0 EVAL-CALLS 10 0 0 0 0 0 0.0
- ccl
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 56.97802 4.964914 6.698658 5.584261 5.697802 0.544525 RUN-TIME 10 56.854607 4.940436 6.691704 5.57673 5.685461 0.55132
7.9. fiasco
With progress reporting turned off
(setf *print-test-run-progress* nil) (in-package :uax-15-fiasco-suite) (benchmark:with-timing (10) (run-package-tests)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 8.906644 0.836663 1.036664 0.873331 0.890664 0.055253 RUN-TIME 10 8.894009 0.833614 1.035559 0.87176 0.889401 0.055545 USER-RUN-TIME 10 8.567723 0.821425 0.982309 0.835402 0.856772 0.046346 SYSTEM-RUN-TIME 10 0.326294 0.009906 0.05325 0.036599 0.032629 0.012558 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 1226.872 82.416 269.812 99.856 122.6872 54.588013 BYTES-CONSED 10 4077786880 407500144 409297696 407538304 407778688 543070.25 EVAL-CALLS 10 10 1 1 1 1 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 38.544464 3.658651 4.815928 3.764052 3.854446 0.323914 RUN-TIME 10 38.457424 3.651049 4.807069 3.759031 3.845742 0.3241
7.10. fiveam
With progress reporting turned off:
(benchmark:with-timing (10) (let ((fiveam:*test-dribble* (make-broadcast-stream))) (run 'uax-15-fiveam))) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 10.133307 0.976664 1.06333 1.006664 1.013331 0.025777 RUN-TIME 10 10.123116 0.977754 1.062285 1.005493 1.012312 0.025377 USER-RUN-TIME 10 9.826661 0.958934 1.025672 0.964349 0.982666 0.023269 SYSTEM-RUN-TIME 10 0.29648 0.013407 0.043277 0.029962 0.029648 0.008991 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 887.548 56.765 130.141 83.046 88.7548 21.552885 BYTES-CONSED 10 4512244000 451134320 451410384 451200752 451224400 74965.13 EVAL-CALLS 10 0 0 0 0 0 0.0
If you do not have progress reporting turned off, besides wasting a huge amount of screen space and time, it creates interesting issues based on what you are running fiveam on. Emacs has known problems with long lines and fiveam's progress reporting in a benchmark like this creates lots of long line. It gets even worse if you set the run
keyword parameter :print-names
to nil.
Rule of thumb for big test systems and fiveam. Run it from a terminal, not an emacs REPL.
The CCL version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 19.166885 1.904093 1.946594 1.914582 1.916688 0.011433 RUN-TIME 10 19.129244 1.90118 1.94372 1.911262 1.912924 0.011477
7.11. gigamonkeys
Gigamonkeys does not do progress reporting
(benchmark:with-timing (10) (test-package))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 7.893314 0.783331 0.813332 0.786665 0.789331 0.008406
RUN-TIME 10 7.887205 0.780954 0.816817 0.785493 0.78872 0.009683
USER-RUN-TIME 10 7.850746 0.777678 0.796806 0.785181 0.785075 0.005411
SYSTEM-RUN-TIME 10 0.036482 0 0.02001 0.000027 0.003648 0.005858
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 190.862 14.962 40.392 16.909 19.0862 7.12733
BYTES-CONSED 10 4788180016 478635504 479903744 478708672 478818000.0 363684.38
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 19.166885 1.904093 1.946594 1.914582 1.916688 0.011433 RUN-TIME 10 19.129244 1.90118 1.94372 1.911262 1.912924 0.011477
7.12. lift
Lift says that there were 16 successful tests, but does not specify the number of successful assertions, so no progress reports..
(benchmark:with-timing (10) (run-tests :suite 'uax-lift-15)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 6.076652 0.596666 0.666665 0.599999 0.607665 0.019779 RUN-TIME 10 6.068622 0.596521 0.664457 0.601027 0.606862 0.019258 USER-RUN-TIME 10 6.035427 0.593182 0.644546 0.599972 0.603543 0.013911 SYSTEM-RUN-TIME 10 0.03322 0 0.019911 0.000003 0.003322 0.005932 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 189.801 14.242 51.038 14.762 18.9801 10.741669 BYTES-CONSED 10 3480602800 347045776 356690112 347112656 348060280 2876791.8 EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version of the benchmark resulted in this:
(benchmark:with-timing (10) (run-tests :suite 'uax-lift-15)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 11.987319 1.186334 1.216747 1.196841 1.198732 0.008089 RUN-TIME 10 11.970636 1.185057 1.215185 1.195852 1.197064 0.008253
7.13. lisp-unit
No progress reports
(benchmark:with-timing (10) (run-tests))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 7.129982 0.699998 0.786665 0.703331 0.712998 0.025186
RUN-TIME 10 7.125096 0.699222 0.788776 0.7028 0.71251 0.026041
USER-RUN-TIME 10 7.068603 0.692563 0.765489 0.699554 0.70686 0.019895
SYSTEM-RUN-TIME 10 0.056505 0.000003 0.023286 0.003277 0.00565 0.00699
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 260.177 19.339 60.275 21.956 26.0177 11.598406
BYTES-CONSED 10 4222413456 421806272 425655232 421847136 422241340.0 1138669.0
EVAL-CALLS 10 160 16 16 16 16 0.0
Now the CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 13.487027 1.339966 1.362137 1.349544 1.348703 0.007266 RUN-TIME 10 13.454422 1.337902 1.359824 1.343596 1.345442 0.007124
7.14. list-unit2
No progress reports
(benchmark:with-timing (10) (run-tests))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 9.306638 0.899997 0.956663 0.923331 0.930664 0.019709
RUN-TIME 10 9.29662 0.90105 0.953252 0.924323 0.929662 0.018766
USER-RUN-TIME 10 9.126992 0.895342 0.93455 0.908129 0.912699 0.01386
SYSTEM-RUN-TIME 10 0.169641 0.00335 0.033231 0.016619 0.016964 0.010221
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 512.84 32.691 73.531 42.772 51.284 15.129883
BYTES-CONSED 10 4939787968 493387600 498633408 493461152 493978780.0 1552274.1
EVAL-CALLS 10 0 0 0 0 0 0.0
Now the CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 30.287363 2.994757 3.274799 3.001426 3.028736 0.082134 RUN-TIME 10 30.244043 2.990828 3.270884 2.996805 3.024404 0.082271
7.15. nst
NST's results were surprisingly bad. I ran tests with and without :cache being set on each fixture and it did not seem to make much of a difference. At this point I do not know if the issue is with NST or an error between chair and keyboard.
(benchmark:with-timing (10) (nst-cmd :run :uax-15-nst)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 516.88855 51.526524 51.85985 51.67986 51.688854 0.105207 RUN-TIME 10 517.8853 51.65252 51.97778 51.795273 51.788532 0.090498 USER-RUN-TIME 10 515.4022 51.376358 51.718292 51.56273 51.540226 0.092005 SYSTEM-RUN-TIME 10 2.483108 0.206179 0.282712 0.258424 0.248311 0.027323 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 11704.455 1081.936 1224.102 1173.542 1170.4456 33.417885 BYTES-CONSED 10 306684151680 30666874416 30677103952 30667547952 30668415168 2911205.3 EVAL-CALLS 10 6860220 686022 686022 686022 686022 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 501.6569 49.96337 50.555893 50.08843 50.16569 0.18232 RUN-TIME 10 500.4885 49.85959 50.36908 49.97486 50.048847 0.165674
7.16. parachute
Progress reporting turned off by using the quiet report.
(benchmark:with-timing (10) (test 'suite :report 'quiet)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 9.929974 0.913331 1.089997 0.969997 0.992997 0.063411 RUN-TIME 10 9.91559 0.912637 1.089303 0.970189 0.991559 0.062786 USER-RUN-TIME 10 9.403031 0.879351 1.042706 0.916937 0.940303 0.056901 SYSTEM-RUN-TIME 10 0.512589 0.02995 0.073216 0.046596 0.051259 0.014293 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 1677.654 106.9 267.475 143.608 167.7654 51.604637 BYTES-CONSED 10 5212345824 512596848 598222384 512680656 521234600.0 25662644.0 EVAL-CALLS 10 0 0 0 0 0 0.0
The same benchmark with CCL:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 40.437767 4.010897 4.063527 4.042738 4.043777 0.013439 RUN-TIME 10 40.37925 4.007424 4.053993 4.037376 4.037925 0.011995 NIL
7.17. prove
The prove tests were done with the *default-reporter*
set to :dot
because there is no way to turn off the progress reporting. The times were surprisingly slow (not clunit slow, but roughly five times longer than the other frameworks), with no real difference between running in a terminal window or in an emacs REPL.
(benchmark:with-timing (10) (run-all-uax-15))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 66.02316 6.323317 6.976648 6.516648 6.602316 0.192082
RUN-TIME 10 30.561893 2.896409 3.223873 3.053162 3.056189 0.099394
USER-RUN-TIME 10 29.63921 2.823159 3.153962 2.939861 2.963921 0.092085
SYSTEM-RUN-TIME 10 0.922708 0.069913 0.133064 0.079994 0.092271 0.020397
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 2216.068 137.512 437.243 197.665 221.6068 81.27371
BYTES-CONSED 10 14018185696 1394824144 1428527632 1395103136 1401818600.0 10494809.0
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 229.77249 12.113011 57.055473 12.571996 22.97725 16.502855 RUN-TIME 10 132.14566 12.067512 16.609463 12.487875 13.214567 1.402868
7.18. ptester
The benchmarking was done with a single function (ptester-tests) that called all the tests. Progess reporting cannot be turned off.
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 6.209992 0.613333 0.663334 0.616665 0.620999 0.014303 RUN-TIME 10 6.20006 0.611543 0.660918 0.615629 0.620006 0.013763 USER-RUN-TIME 10 6.150195 0.604901 0.630972 0.614696 0.61502 0.006247 SYSTEM-RUN-TIME 10 0.049889 0 0.029945 0.003299 0.004989 0.008579 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 166.527 12.391 45.026 13.742 16.6527 9.467175 BYTES-CONSED 10 3690505472 369001712 369094144 369043328 369050560.0 26112.266 EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 12.558413 1.240772 1.280743 1.251957 1.255841 0.010159 RUN-TIME 10 12.513009 1.239555 1.27798 1.24801 1.251301 0.009971
7.19. rove
(benchmark:with-timing (10) (run :uax-15-rove-tests :style :none)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 11.879972 1.139999 1.223331 1.189997 1.187997 0.025957 RUN-TIME 10 11.86156 1.138402 1.223247 1.187435 1.186156 0.026327 USER-RUN-TIME 10 11.185844 1.068141 1.144235 1.126029 1.118584 0.021292 SYSTEM-RUN-TIME 10 0.675742 0.03663 0.09343 0.06979 0.067574 0.017528 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 2986.486 259.002 332.193 304.989 298.6486 24.707445 BYTES-CONSED 10 7427051216 740502400 762046592 740546112 742705150.0 6447249.0 EVAL-CALLS 10 9 0 9 0 0.9 2.7
Now the CCL versioin
(benchmark:with-timing (10) (run :uax-15-rove-tests :style :none)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 35.897484 3.53852 3.666501 3.562608 3.589748 0.044875 RUN-TIME 10 35.826942 3.532155 3.660664 3.558934 3.582694 0.045938
7.20. rt
Rt reports only the tests, not the assertions.
(benchmark:with-timing (10) (do-tests)) - SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 6.216662 0.613332 0.643333 0.616665 0.621666 0.009689 RUN-TIME 10 6.207924 0.610565 0.640804 0.618108 0.620792 0.009539 USER-RUN-TIME 10 6.118069 0.598097 0.630868 0.610568 0.611807 0.00845 SYSTEM-RUN-TIME 10 0.089871 0 0.020013 0.003355 0.008987 0.007601 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 219.521 16.74 44.619 18.213 21.9521 7.871968 BYTES-CONSED 10 3586779376 358638608 358705088 358675856 358677950.0 17602.818 EVAL-CALLS 10 480 48 48 48 48 0.0
Now CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 11.99317 1.183099 1.219643 1.194585 1.199317 0.012819 RUN-TIME 10 11.909768 1.180592 1.202258 1.190041 1.190977 0.006007
7.21. should-test
Should-test prints out the name of each test with OK.
(benchmark:with-timing (10) (test))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 7.179996 0.709999 0.766666 0.713332 0.718 0.016343
RUN-TIME 10 7.171032 0.709636 0.768002 0.711339 0.717103 0.017043
USER-RUN-TIME 10 7.12796 0.703418 0.751316 0.7081 0.712796 0.013232
SYSTEM-RUN-TIME 10 0.043095 0 0.016688 0.003242 0.00431 0.004734
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 167.82 12.366 53.552 12.436 16.782 12.267456
BYTES-CONSED 10 3859488496 385585888 388806016 385627600 385948860.0 952933.5
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 25.263529 2.514491 2.538153 2.52439 2.526353 0.007306 RUN-TIME 10 25.18316 2.508538 2.532314 2.518607 2.518316 0.006587
7.22. tap-unit-test
No progress reporting.
(benchmark:with-timing (10) (run-tests))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 7.009999 0.686666 0.736665 0.693334 0.701 0.014761
RUN-TIME 10 6.998872 0.688417 0.734644 0.694193 0.699887 0.013857
USER-RUN-TIME 10 6.949086 0.687007 0.718025 0.689829 0.694909 0.010584
SYSTEM-RUN-TIME 10 0.049802 0.000008 0.016621 0.003327 0.00498 0.004523
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 152.373 7.746 40.42 11.712 15.2373 9.486478
BYTES-CONSED 10 4217541312 421678080 421820960 421733744 421754140.0 41600.227
EVAL-CALLS 10 160 16 16 16 16 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 13.258068 1.311242 1.367641 1.322649 1.325807 0.01496 RUN-TIME 10 13.228476 1.309399 1.364858 1.319752 1.322848 0.014998
7.23. try
First the SBCL version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 14.643297 1.453329 1.529996 1.456663 1.46433 0.022113 RUN-TIME 10 14.618829 1.449815 1.530918 1.454007 1.461883 0.023133 USER-RUN-TIME 10 14.588861 1.447368 1.514259 1.452136 1.458886 0.018746 SYSTEM-RUN-TIME 10 0.030001 0 0.016658 0.000004 0.003 0.005039 PAGE-FAULTS 10 0 0 0 0 0 0.0 GC-RUN-TIME 10 208.982 16.999 54.556 17.101 20.8982 11.22053 BYTES-CONSED 10 9019833552 901957296 902012768 901979472 901983360.0 15399.991 EVAL-CALLS 10 0 0 0 0 0 0.0
Now the CCL Version
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 178.21579 17.72332 17.91731 17.834017 17.82158 0.06565 RUN-TIME 10 177.88414 17.69016 17.892391 17.789274 17.788414 0.066341
7.24. unit-test
Progress reporting on tests, not assertions
(benchmark:with-timing (10) (run-all-tests))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 6.373335 0.613332 0.710001 0.629999 0.637334 0.026658
RUN-TIME 10 6.361601 0.613024 0.708814 0.628271 0.63616 0.02653
USER-RUN-TIME 10 6.28175 0.610608 0.702157 0.618595 0.628175 0.026044
SYSTEM-RUN-TIME 10 0.079867 0.000004 0.016669 0.00665 0.007987 0.005205
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 327.129 18.211 57.563 28.737 32.7129 12.666767
BYTES-CONSED 10 3668546192 363377264 397349504 363448128 366854620.0 10165068.0
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 17.734257 1.730896 1.833714 1.737197 1.773426 0.045684 RUN-TIME 10 17.669695 1.724412 1.830537 1.729753 1.766969 0.045973
7.25. xlunit
Xlunit does progress reports only on the tests, not the assertions.
(benchmark:with-timing (10) (xlunit:textui-test-run (xlunit:get-suite uax-15)))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 5.916651 0.586665 0.609998 0.586666 0.591665 0.008465
RUN-TIME 10 5.910279 0.585645 0.610623 0.58654 0.591028 0.008659
USER-RUN-TIME 10 5.880327 0.582374 0.598663 0.585712 0.588033 0.005418
SYSTEM-RUN-TIME 10 0.029973 0 0.013315 0.000023 0.002997 0.004059
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 102.22 8.052 17.68 8.219 10.222 3.462927
BYTES-CONSED 10 3367157968 336039856 340887936 336103936 336715800.0 1445466.0
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 11.781408 1.159445 1.203044 1.176365 1.178141 0.010789 RUN-TIME 10 11.753234 1.158247 1.200881 1.171458 1.175323 0.01094
7.26. xptest
(benchmark:with-timing (10) (report-result (run-test *uax-15-suite*)))
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION
REAL-TIME 10 5.899985 0.583332 0.629999 0.586665 0.589998 0.013499
RUN-TIME 10 5.89038 0.580846 0.626912 0.584578 0.589038 0.012844
USER-RUN-TIME 10 5.860286 0.577223 0.616916 0.582145 0.586029 0.010817
SYSTEM-RUN-TIME 10 0.030112 0.000034 0.00999 0.003309 0.003011 0.003122
PAGE-FAULTS 10 0 0 0 0 0 0.0
GC-RUN-TIME 10 111.974 7.402 41.731 7.481 11.1974 10.193354
BYTES-CONSED 10 3363654192 336079376 338668864 336105504 336365400.0 768173.25
EVAL-CALLS 10 0 0 0 0 0 0.0
The CCL version:
- SAMPLES TOTAL MINIMUM MAXIMUM MEDIAN AVERAGE DEVIATION REAL-TIME 10 11.744589 1.165611 1.195676 1.172062 1.174459 0.008469 RUN-TIME 10 11.728494 1.164599 1.192888 1.170583 1.172849 0.007959
8. Mapping Functions Against Each Other
8.1. Assertion Functions
I expect all libraries to have the equivalent of is, signals and maybe finishes. This table just validates that assumption AND whether assertions accept an optional diagnostic string.
Library | Optional string | Is (a) | signals | finishes (b) |
---|---|---|---|---|
1am | N | is | signals | |
2am | Y (P) | is | signals | finishes |
assert-p (1) | N | t-p | condition-error-p | |
clunit | Y | assert-true | assert-condition | |
clunit2 | Y | assert-true | assert-condition | |
confidence | N | assert-true | ||
gigamonkeys | N | check | expect | |
fiasco | Y (P) | is | signals | finishes |
fiveam | Y (P) | is | signals | finishes |
lift | Y | ensure | ensure-condition | |
lisp-unit | Y | assert-true | assert-error | |
lisp-unit2 | Y | assert-true | assert-error | |
nst | N | :true | :err | |
parachute | Y (P) | is | fail | finish |
prove | Y | is, ok | is-error | |
ptester (2) | test | test-error | ||
rove | Y | ok | signals | |
rt (2) | ||||
should-test | Y | be | signals | |
simplet (2) | ||||
tap-unit-test | Y | assert-true | assert-error | |
try | Y | is | signals, signals-not | verdict |
unit-test | Y | test-assert | test-condition | |
xlunit | Y | assert-true | assert-condition | |
xptest (2) |
(a) "is" asserts that the form evaluates to not nil (b) "finishes" asserts that the body of the test does not signal aany condition (P) The diagnostic string accepts variables (1) includes cacau for this purpose (2) None - normal CL predicates resolving to T or nil
One potential advantage of other assertion functions is whether they provide built-in additional error messages. The second advantage is that you do not have to write your own if they are more complicated than normal CL predicates. The next few tables will show additional assertion functions and what frameworks have them.
Library | False | Zero | Not Zero | Nil | Not-Nil | Null* | Not Null* |
---|---|---|---|---|---|---|---|
assert-p | not-t-p | zero-p | not-zero-p | nil-p | not-nil-p | null-p | not-null-p |
cacau | not-t-p | zero-p | not-zero-p | nil-p | not-nil-p | null-p | not-null-p |
clunit | assert-false | ||||||
clunit2 | assert-false | ||||||
confidence | assert-nil | ||||||
fiveam | is-false | ||||||
lift | ensure-null | ||||||
lisp-unit | assert-false | assert-nil | |||||
lisp-unit2 | assert-false | ||||||
nst | assert-zero | assert-non-nil | assert-null | ||||
parachute | false | false | |||||
prove | isnt | ||||||
rove | ng | ||||||
tap-unit-test | assert-false | assert-null | assert-not-null | ||||
xlunit | assert-false |
Note to self: Per http://clhs.lisp.se/Body/f_null.htm, null is an empty list or nil, not null in an sql sense.
Library | Eq | Eql | Equal | Equalp | Equality |
---|---|---|---|---|---|
assert-p | eq-p | eql-p | equal-p | equalp-p | |
cacau | eq-p | eql-p | equal-p | equalp-p | |
clunit | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
clunit2 | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
confidence | assert-eq | assert-eql | assert-equal | ||
lisp-unit | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
lisp-unit2 | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
nst | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
parachute | is | ||||
tap-unit-test | assert-eq | assert-eql | assert-equal | assert-equalp | assert-equality |
unit-test | test-equal | ||||
xlunit | assert-eql | assert-equal |
Library | Eq | Eql | Equal | Equalp |
---|---|---|---|---|
assert-p | not-eq-p | not-eql-p | not-equal-p | not-equalp-p |
cacau | not-eq-p | not-eql-p | not-equal-p | not-equalp-p |
nst | assert-not-eq | assert-not-eql | assert-not-equal | assert-not-equalp |
parachute | isnt | |||
xlunit | assert-not-eql |
Library | Available assertions |
---|---|
clunit | assert-equality* |
clunit2 | assert-equality* |
confidence | assert-float-is-approximately-equal, assert-float-is-definitely-greater-than, assert-float-is-definitely-less-than, assert-float-is-essentially-equal |
lisp-unit | assert-norm-equal, assert-float-equal, assert-number-equal, assert-numerical-equal, assert-rational-equal, assert-sigfig-equal, |
lisp-unit2 | assert-norm-equal, assert-float-equal, assert-number-equal, assert-numerical-equal, assert-rational-equal, assert-sigfig-equal, |
try | float-~=, float-~<, float-~> |
Library | Available assertions |
---|---|
confidence | assert-set-equal, assert-vector-equal |
lisp-unit | logically-equal, set-equal |
lisp-unit2 | logically-equal, assert=, assert/=, asssert-char=, assert-char-equal,assert-char/=, assert-char-not-equal, assert-string=, assert-string-equal, assert-string/=, assert-string-not-equal |
tap-unit-test | logically-equal, set-equal, unordered-equal |
Library | Available assertions |
---|---|
confidence | assert-set-equal, assert-vector-equal |
lisp-unit | logically-equal, set-equal |
lisp-unit2 | logically-equal |
tap-unit-test | logically-equal, set-equal, unordered-equal |
Library | Type | Not Type | Values | Not-Values |
---|---|---|---|---|
assert-p | typep-p | not-typep-p | values-p | not-values-p |
cacau | typep-p | not-typep-p | values-p | not-values-p |
confidence | assert-type | |||
lift | ||||
lisp-unit | ||||
lisp-unit2 | assert-typep | |||
nst | ||||
parachute | of-type | is-values | isnt-values | |
protest | ||||
prove | is-type | is-values |
Library | Symbol | List | Tuple | Char | String |
---|---|---|---|---|---|
lift | ensure-symbol | ensure-list | ensure-string |
Library | Functions |
---|---|
confidence | assert-string-equal,assert-string<, assert-string<=, assert-string=, assert-string>, assert-string>= |
Library | Every | Different | Member | Contains |
---|---|---|---|---|
cl-quickcheck | a-member | |||
fiveam | is-every | |||
confidence | assert-subsetp | |||
lift | ensure-every | ensure-different | ensure-member | |
lisp-unit | set-equal | |||
lisp-unit2 | set-equal | |||
tap-unit-test | set-equal | |||
try | match-values | mismatch% | different-elements | |
Library | Prints | Expands (1) | Custom |
---|---|---|---|
cacau | custom-p | ||
clunit | assert-expands | ||
clunit2 | assert-expands | ||
confidence | Yes | ||
lisp-unit | assert-prints | assert-expands | |
lisp-unit2 | assert-prints | assert-expands | |
nst | Yes | ||
prove | is-expand | ||
rove | is-print | expands | |
should-test | print-to | ||
tap-unit-test | assert-prints | assert-expands |
- Tests macro expansion, passes if (EQUALP EXPANSION (MACROEXPAND-1 EXPRESSION)) is true
Library | Error/Conditions (1) | Not (2) |
---|---|---|
1am | signals | |
2am | signals | |
assert-p | condition-error-p | not-error-p, not-condition-p |
cacau | condition-error-p | |
clunit | assert-condition | |
clunit2 | assert-condition | |
gigamonkeys | expect | |
fiasco | signals | not-signals |
fiveam | signals | |
lift | ensure-condition, ensure-error | |
lisp-unit | assert-error | |
lisp-unit2 | assert-error | |
nst | :err | |
parachute | fail | |
prove | ||
ptester | test-error | |
rove | signals | |
rt | ||
should-test | signal | |
simplet | ||
tap-unit-test | assert-error | |
try | signals | signals-not |
unit-test | test-condition | |
xlunit | assert-condition | |
xptest |
- Signals asserts that the body signals a condition of a specified type
- Signals that the body does not signal a condition of a specified type. Might signal some other condition
Name | Assertions |
---|---|
cl-quickcheck | is=, isnt= |
confidence | assert=, assert-p:, assert-t |
lift | ensure-cases, ensure-cases-failure, ensure-expected-no-warning-condition, ensure-failed-error, ensure-no-warning, ensure-not-same, ensure-null-failed-error ensure-random-cases, ensure-random-cases+, ensure-random-cases-failure, ensure-same, ensure-some, ensure-warning,ensure-expected-condition, ensure-directories-exist, ensure-directory, ensure-error, ensure-failed, ensure-function, ensure-generic-function, ensure-no-warning, ensure-null-failed-error |
lisp-unit | assert-result, assert-test, check-type |
lisp-unit2 | assert-no-error, assert-no-signal, assert-no-warning, assert-warning, assert-fail, assert-no-warning, assert-passes?, assert-signal, check-type |
nst | assert-criterion |
prove | ok, is-values, is-type, like, is-print, is-error, |
try | invokes-debugger, invokes-debugger-not, in-time |
unit-test | test-assert |
(a) Every test succeeds iff the form produces the same number of results as the values and each result is equal to the corresponding value
8.2. Defining or Adding Tests
Name | Add Tests |
---|---|
1am | (test test-name body) |
2am | (test test-name body) |
cacau | (deftest "test-name" (any-parameters go here) body) |
cardiogram | (deftest name (<options>*) <docstring>* <form>*) |
clunit | (deftest test-name (suite-name-if-any) docstring body) |
clunit2 | (deftest test-name (suite-name-if-any) docstring body) |
gigamonkeys | (deftest test-name (any-parameters) body) |
fiasco | (deftest test-name (any-parameters) docstring body) |
fiveam | (test test-name docstring body) |
confidence | (define-testcase test-name (any-parameters) docstring body) |
lift | (addtest (test-suite-name) test-name body) |
lisp-unit | (define-test test-name body) |
lisp-unit2 | (define-test test-name (tags) body) |
nst | (def-test (t1 :group name :fixtures (fixture-names)) body) |
parachute | (define-test test-name [:parent parent-name] [(:fixture if any)] body) |
prove | (deftest name body) |
ptester | (test value form) (test-error) (test-no-error) (test-warning) (test-no-warning) |
rove | (deftest test-name body) |
rt | (deftest test-name function value) |
should-test | (deftest name body) |
simplet | (test string body) |
tap-unit-test | (define-test test-name docstring body) |
try | (deftest test-name parameters body) |
unit-test | (deftest :unit unit-name :name test-name body) |
xlunit | (def-test-method method-name ((class-name) run-on-compilation) body) |
xptest | (defmethod method-name ((suite-name fixture-name)) body) |
8.3. Running Tests
Name | Running Tests |
---|---|
1am (a) | (test-name) (run) ; (run) runs all tests |
2am | (run) (run '(list of tests) (name-of-tests) |
clunit | (run-test 'test-name) (run-suite 'suite-name) |
clunit2 | (run-test 'test-name) (run-suite 'suite-name) |
confidence | (test-name) |
gigamonkeys | (test test-name) |
fiasco | (run-tests 'test-name) (run-package-tests :package package-name) |
fiveam | (run 'test-name) (run! 'test-name) (run! 'suite-name) |
lift | (run-tests :name 'test-name) (run-tests :suite 'suite-name) |
lisp-unit(b) | (run-tests :all) (run-tests '(name1 name2 ..) (continue-testing)(c) |
lisp-unit2 | (run-tests :tests 'test-name) (run-tests :tags '(tag-names)) (run-tests) (run-tests :package 'package-name) |
nst | (nst-cmd :run test-name) |
parachute (d) | (test test-name &optional :report report-type) |
prove | (run-test 'test-name) |
ptester | at compilation of (with-tests (:name "test-name") ) |
rove | (run-test 'test-name) (run-suite) |
rt (b) | (do-test test-name) (do-tests); (do-tests) runs all tests |
should-test (b) | (test) (test :test test-name) |
simplet | (run) |
tap-unit-test | (run-tests test-name1 test-name2) (run-tests) |
try | (test-name), (try 'test-name), (funcall 'test-name) |
unit-test | (run-test test-name)(run-all-tests) |
xlunit | (xlunit:textui-test-run (xlunit:get-suite suite-name)) |
xptest | (run-test test-name)(run-test suite-name) |
(a) Shuffles tests (b) runs tests in the order they were defined (c) continue-testing runs tests that have been defined, but not yet run (d) can be a quoted list of test names
Universal interactive "run test at point" for Emacs environment.
If the framework allows to programmatically run an individual test, then it's possible to run the test at point by adding a little Emacs snippet. The example below if specific for Lisp-Unit2 (snippet source).
(defun ambrevar/sly-run-lisp-unit-test-at-point (&optional raw-prefix-arg) "See `sly-compile-defun' for RAW-PREFIX-ARG." (interactive "P") (call-interactively 'sly-compile-defun) (let ((name `(quote ,(intern (sly-qualify-cl-symbol-name (sly-parse-toplevel-form)))))) (sly-eval-async `(cl:string-trim " ?" (cl:with-output-to-string (s) (cl:let ((lisp-unit2:*test-stream* s)) (lisp-unit2:run-tests :tests ,name :run-contexts 'lisp-unit2:with-summary-context)))) (lambda (results) (switch-to-buffer-other-window (get-buffer-create "*Test Results*")) (erase-buffer) (insert results))))) (define-key lisp-mode-map (kbd "C-c C-v") 'ambrevar/sly-run-lisp-unit-test-at-point)
8.4. Fixture Functions
Name | Fixture Functions |
---|---|
cacau | (defbefore-all), (defafter-all), (defbefore-each), (defafter-each) |
clunit | (defclass) and (deffixture) |
clunit2 | (defclass) and (deffixture) |
fiveam | (def-fixture name-of-fixture ()) |
lift | set at suite definition level, with :setup, :takedown, :run-setup |
lisp-unit2 | :contexts are specified in test definitions |
nst | (def-fixtures name () body) |
parachute | (def-fixture name () body) |
rove | (setup)(teardown) are suite fixture functions. (defhook) is a test fixture function |
unit-test | subclass a test-class with define-test-class |
xlunit | (defmethod setup () body) |
xptest | (deftest-fixture fixture-name ()), (defmethod setup ()), (defmethod teardown ()) |
8.5. Removing Tests etc
Name | Removing tests etc |
---|---|
clunit | (undeftest) (undeffixture) (undefsuite) |
clunit2 | (undeftest) (undeffixture) (undefsuite) |
gigamonkeys | (remove-test-function) (clear-package-tests) |
fiasco | (fiasco::delete-test) |
fiveam | (rem-test) (rem-fixture) |
lift | (remove-test :suite x)(remove-test :test-case x) |
lisp-unit | (remove-tests) (remove-tags) |
lisp-unit2 | (uninstall-test) (undefine-test) |
parachute | (remove-test, remove-all-tests-in-package) |
prove | (remove-test) (remove-test-all) |
rt | (rem-test) (rem-all-tests) |
tap-unit-test | (remove-tests) |
try | FMAKUNBOUND, UNINTERN |
xlunit | (remove-test) |
xptest | (remove-test) |
8.6. Suites
Name | Suites |
---|---|
2am | (suite name (optional sub-suite) |
clunit | (defsuite name (parent) (undefsuite name) |
clunit2 | (defsuite name (parent) (undefsuite name) |
fiasco | (define-test-package package-name) (defsuite suite-name) |
fiveam | (def-suite :name-of-suite) |
lift | (deftestsuite name-of-suite (super-test-suite)(slots) |
lisp-unit | packages and tags (tags are specified in the test definition) |
lisp-unit2 | :tags are specified in test definitions |
nst | (def-test-group) |
parachute | (define-test suite) |
prove | (subtest …) |
ptester | just the use of =(with-tests …)+ |
rt | the package is the only unit above tests |
should-test | the package is the only unit above tests |
simplet | (suite string body) |
tap-unit-test | the package is the only unit above tests |
try | just call other tests (they are normal functions) |
unit-test | [effectively tags in the deftest macro before the test-name] |
xlunit | (defclass test-case-name (test-case)(body)) |
xptest | (make-test-suite suite-name docstring body) |
8.7. Generators
- From Frameworks
Table 42: Random Data Generators from Frameworks Name Suites fiveam buffer, character, float, integer, list, one-element, string, tree lift random-number, random-element lisp-unit complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state lisp-unit2 complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state nst tap-unit-test make-random-state - From Helper Libraries
Table 43: Random Data Generators from Check-it and Cl-Quickcheck Check-it Cl-quickcheck Comments character a-char list a-list a-member Produces a value from another generator string a-string a-symbol tuple a-tuple boolean a-boolean real a-real an-index integer an-integer or produces a value from another generator guard ensures generator result within some spec struct generates a struct with given type and slot values map applies a transformation to output of a sub generator chain chaining generators, e.g. to produce matrices user-define define create custom generators k-generator by default an index m-generator by default an integer n-generator by default an integer
9. Generic Usage Example to be Followed for Each Framework Library
The following is pseudo code just trying to show the basic usage that will be demonstrated with each library.
9.1. Basics
Start with the real basics just to see how the framework looks, do tests accept parameters, suite designations, documentation strings, etc.
The first passing test should have a basic "is" test and a signals test. If the libraries has macro expand tests or floating point and rational tests, those get added to flag that they exist. Then a basic failing test. Then run each test and show what the reports look like.
(deftest t1 "describe t1" ; obviously only if the library allows a documentation string. (is (= 1 1)) (signals division-by-zero (error 'division-by-zero))) (deftest t1-fail ; the most basic failing test "describe t1-fail" (is (= 1 2)))
Check and see if you have to manually recompile a test when a function being tested is modified. This is not a problem with most frameworks.
(defun t1-test-function () 1) ;; What happens you are testing a function and you change that function? ;; Do you need to recompile the test? (deftest t1-function ; (is (= (t1-test-function) 1))) ;; Now redefine t1-test-function (defun t1-test-function () 2) ;; re-run test t1-function. What happens?
9.2. Multiple Values, Variables, Loops and Closures
- Make sure the library can have tests with multiple assertions (RT cannot).
- Does it handle values expressions? Most do but only look at the first value, fiveam does not, the lisp-units actually compare each variable in the values expressions.
- What happens with multiple assertions where more than one fail? Lift will only report the first failing assertion if there are multiple failing assertions.
- Ensure that tests can handle loops
- Can tests handle being inside a closure?
- Can tests call other tests? Most frameworks allow this, but you tend to get multiple reports rather than consolidated reports. Some frameworks do not allow this.
(deftest t2 "describe t2" (is (= 1 1)) (is (= 2 2)) (is (= (values 1 2) (values 1 3)))) (let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop (loop for x in l1 for y in l2 do (is (= (char-code x) y))))) (deftest t3 ; a test that tries to call another test in its body "describe t3" (is (= 'a 'a)) (test t2))
9.3. Errors, Conditions and signal handling
Check and see if there are any surprises with respect to condition signalling tests. Some frameworks will treat an unexpected condition in a signalling test as a failure, others will treat it as an error.
(deftest t7-bad-error () (signals division-by-zero (error 'floating-point-overflow) "testing condition assertions. This should fail"))
9.4. Suites, tags and other multiple test abilities
- Can you run a list of tests?
We checked with test t3 to see if tests can call other tests. Can you just call a list of test names? Some frameworks allow, others do not.
(run '(t1 t2))
- Suites/tags
This section for each framework will check if you can create inherited test suites or tags
(defsuite s0 ()); Ultimate parent suite if the library provides inheritance (deftest t4 (s0) ; a test that is a member of a suite "describe t4" (assert-eq 1 1)) ;;a multiple assertion test that is a member of a suite with ;; a passing test, an error signaled and a failing test (deftest t4-error (s0) "describe t4-error" (assert-eq 'a 'a) (assert-condition error (error "t4-errored out")) (assert-true (= 1 2))) (deftest t4-fail (s0) ; "describe t4-fail" (assert-false (= 1 2))) (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance (deftest t5 (s1) (assert-true (= 1 1)))
- Fixtures and Freezing Data
Fixtures are used to create a known (or randomly generated set of data that tests will use. At the end of the test, the fixtures are removed so that the next test can start in a clean environment.
Freezing data may be considered a subset of fixtures. Freezing data is used where a test will use other existing data such as special variables, but may change it for testing purposes. You obviously want to return that special variable to its pre-existing state at the end of the test.
First checking whether we can freeze data, change it in the test, then change it back
(defparameter *keep-this-data* 1) (deftest t-freeze-1 :fix (*keep-this-data*) (setf *keep-this-data* "new") (true (stringp *keep-this-data*))) (deftest t-freeze-2 (is (= *keep-this-data* 1))) (run '(t-freeze-1 t-freeze-2))
Now the classic fixture - create a data set for the test and clean it up afterwards
;; Create a class for data fixture purposes (defclass fixture-data () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (deffixture s1 (@body) ;;IMPORTANT Some frameworks will require a name for the fixture. Others, like CLUNIT apply a fixture to a suite (as ;; in this pseudo code) and require a suite name (let ((x (make-instance 'fixture-data :a 100 :b -100))) @body)) ;; create a sub suite and check fixture inheritance (defsuite s2 (s1)) (deftest t6-s1 (s1) (assert-equal (a x) 100) (assert-equal (b x) -100)) (deftest t6-s2 (s2) (assert-equal (a x) 100) (assert-equal (b x) -100)))
- Removing tests
How do you actually remove a test from the system
- Skip Capability
10. 1am
10.1. Summary
homepage | James Lawrence | MIT | 2014 |
1am will thrown you into the debugger on failures or errors. There is no way to disable this, there is no reporting - in you go. There is no provision for diagnostic strings in assertions, but since it throws you into the debugger, that is probably not relevant. Tests are shuffled on each run.
On the plus side for some people, tests are functions.
On the minus side for people like me, you cannot turn off progress reports. You can create a list of tests, but there is no concept of suites or tags.
10.2. Assertion Functions
1am's assertion functions are limited to is
and signals
.
10.3. Usage
run
will run all the tests in*tests*
(run '(foo))
will run the named tests in the provided parameter list.(name-of-test)
will run the named test because tests are functions in their own right..
(run) FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~ with ~:*~{~S = ~S~^, ~}.~]~:@>" {100BE70F03}>. (run '(foo)) FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~ with ~:*~{~S = ~S~^, ~}.~]~:@>" {100B996103}>.
- Basics
Starting with a basic test where we know everything will pass. These are all the assertion functions that 1am has. There is no provision for documentation strings or test descriptions.
(test t1 (is (equal 1 1)) (signals division-by-zero (/ 1 0))) (run '(t1)) ; or just (t1) T1.. Success: 1 test, 2 checks. ; No value
Now with a deliberately failing test. Notice how it just immediately kicks into the debugger:
(test t1-fail (); the most basic failing test (let ((x 1) (y 2)) (is (= x y)) (signals division-by-zero (error 'floating-point-overflow)))) (t1-fail) The assertion (= X Y) failed with X = 1, Y = 2. [Condition of type SIMPLE-ERROR] Restarts: 0: [CONTINUE] Retry assertion. 1: [RETRY] Retry SLIME REPL evaluation request. 2: [*ABORT] Return to SLIME's top level. 3: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {10035CE213}>)
As you would hope, you do not have to manually recompile a test after a tested function has been modified.
- Conditions
1am works as expected if you signal the expected error. If you signal an unexpected error, it throws you into the debugger just like every other time a test fails.
(test t7-bad-error (signals division-by-zero (error 'floating-point-overflow))) (run '(t7-wrong-error)) T7-BAD-ERROR; Evaluation aborted on #<SIMPLE-ERROR "Expected to signal ~s, but got ~s:~%~a" {102EA4F423}>.
- Edge Cases: Values expressions, loops, closures and calling other tests
1am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. So, for example, the following passes.
(test t2-values-expressions () (is (equal (values 1 2) (values 1 3))))
- Now looping and closures.
1am will handle looping through assertions using variables declared in a closure surrounding the test.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (test t2-loop (loop for x in l1 for y in l2 do (is (= (char-code x) y)))))
- Calling a test inside another test
It works but do not expect composable reports.
(test t3 (is (= 1 1)) (t1)) (t3) T3. T1. Success: 1 test, 1 check. Success: 1 test, 1 check.
- Now looping and closures.
- Suites, tags and other multiple test abilities
- Lists of tests
Tests are defined using (test test-name) and pushed to a list of tests named
*tests*
. If you want to run a list of tests, you could save that out so that you continue to keep a list of all tests, then set*test*
to whatever list of tests you want. That sounds a bit cumbersome. - Suites
1am has no suite capability
- Lists of tests
- Fixtures and Freezing Data
None
- Removing tests
None
- Sequencing, Random and Failure Only
N/A
- Skip Capability
None
- Random Data Generators
None
10.4. Discussion
The comments that I have seen around 1am seem to revolve around having only a single global variable to collect all compiled tests. Various people have suggested different solutions to essentially build test suites capability:
jorams suggested:
(defmacro define-test-framework (tests-variable test-macro run-function) "Define a variable to hold a list of tests, a macro to define tests and a function to run the tests." `(progn (defvar ,tests-variable ()) (defmacro ,test-macro (name &body body) `(let ((1am:*tests* ())) (1am:test ,name ,@body) (dolist (test 1am:*tests*) (pushnew test ,',tests-variable)))) (defun ,run-function () (1am:run ,tests-variable))))
luismbo suggested: "a simpler way might be to have 1am:test associate tests with the current *package*
(e.g., by turning 1am:*tests* into an hash-table mapping package names to lists of tests) and add the ability for 1am:run to filter by package and perhaps default to the current *package*
."
phoe suggested the following very simple 1AM wrapper to achieve multiple test suites.
(defvar *my-tests* '()) (defun run () (1am:run *my-tests*)) (defmacro define-test (name &body body) `(let ((1am:*tests* '())) (1am:test ,name ,@body) (pushnew ',name *my-tests*)))
10.5. Who Uses 1am?
("adopt/test" "authenticated-encryption-test" "beast-test" "binary-io/test" "bobbin/test" "chancery.test" "cl-digraph.test" "cl-netpbm/test" "cl-pcg.test" "cl-rdkafka/test" "cl-scsu-test" "cl-skkserv/tests" "jp-numeral-test" "list-named-class/test" "openid-key-test" "petri" "petri/test" "polisher.test" "protest/1am" "protest/test" "with-c-syntax-test" "xml-emitter/tests")
11. 2am
11.1. Summary
homepage | Daniel Kochmański | MIT | 2016 |
2am is based on 1am with some features wanted for CI and hierarchical tests. As with 1am, 2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is also no provision for only running the tests that failed last time. There is also no way to turn off the progress report.
11.2. Assertion Functions
is | signals | finishes |
11.3. Usage
Unlike 1am which always throws you into the debugger, 2am will only throw you into the debugger if the test crashes, not if it fails.
Note that 2am will distinguish between tests that fail and tests that crash.
(run)
will run the tests in the default suite.(run 'some-suite-name)
will run the tests in the named suite(run '(foo))
will run the named tests in the provided parameter list.
Since tests are functions in 2am, there is no need for a (run 'test-name) function.
- Report Format
First a basic failing test to show the reporting. Notice in the third asssertion we are passing it a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.
(test t1-fail (); the most basic failing test (let ((x 1) (y 2)) (is (= 1 2)) (is (equal 1 2)) (is (= x y) "This test was meant to fail ~a is not = ~a" x y ) (signals floating-point-overflow (error 'division-by-zero))))
Now to run it:
(t1-fail) Running test T1-FAIL ffff Test T1-FAIL: 4 checks. Pass: 0 ( 0%) Fail: 4 (100%) Failure details: -------------------------------- T1-FAIL: FAIL: (= 1 2) FAIL: (EQUAL 1 2) FAIL: This test was meant to fail 1 is not = 2 FAIL: Expected to signal FLOATING-POINT-OVERFLOW, but got DIVISION-BY-ZERO: arithmetic error DIVISION-BY-ZERO signalled --------------------------------
The macro
(test name &body body)
defines a test function and adds it to*tests*
. The following just shows what the report looks like when everything passes. These are all the assertion functions that 2am has.(test t1 ; the most basic test. (is (= 1 1)) (signals division-by-zero (/ 1 0)) (finishes (= 1 1))) (t1) Running test T1 ... Running test T1 ... Test T1: 3 checks. Pass: 3 (100%) Fail: 0 (0%)
As you would hope, you do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Value expressions, loops. closures and calling other tests
- Value expressions
2am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.
- Looping and closures.
Will a test accept looping through assertions using variables from a closure? Yes.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (test t2-loop () (loop for x in l1 for y in l2 do (is (= (char-code x) y))))) (t2-loop) Running test T2-LOOP ... Test T2-LOOP: 3 checks. Pass: 3 (100%) Fail: 0 ( 0%) (test t2-with-multiple-values () (is (= 1 1 2))) ; This should fail and it does
- Calling another test from a test
This succeeds as expected but also as expected there is no composition on the report.
(test t3 (); a test that tries to call another test in its body (is (eql 'a 'a)) (t2)) (t3) Running test T3 . Running test T2 ... Test T2: 3 checks. Pass: 3 (100%) Fail: 0 ( 0%) Test T3: 1 check. Pass: 1 (100%) Fail: 0 ( 0%)
- Value expressions
- Suites, tags and other multiple test abilities
- Lists of tests
2am can run lists of tests
(run '(t1 t2)) Running test T2 ... Running test T1 . Did 2 tests (0 crashed), 4 checks. Pass: 4 (100%) Fail: 0 ( 0%)
- Suites
2am has a hash table named
*suites*
. Any tests not associated with a specific suite are assigned to the default suite. If we called the function (run) with no parameters, it would run all the tests in the default suite which in our case would mean all the tests decribed above:(run)
Suites are defined using (suite 'some-suite-name-here &optional list-of-sub-suites). If you pass and tests are identified with suites by prepending the suite name to the test name.
(suite 's0) ; This suite has no sub-suites (test s0.t4 ; a test that is a member of a suite (is (= 1 1))) (test s0.t4-error (is (eql 'a 'a)) (signals error (error "t4-errored out")) (is (= 1 2))) (test s0.t4-fail (is (not (= 1 2)))) (suite 's1 '(s0)); This suite includes suite s0 as a sub-suite. (test s1.t4-s1 (is (= 1 1)))
Calling run on 's0 will run tests s0.t4, s0.t4-error and s0.t4-fail.
(run 's0) --- Running test suite S0 Running test S0.T4 . Running test S0.T4-FAIL . Running test S0.T4-ERROR ..f Did 3 tests (0 crashed), 5 checks. Pass: 4 (80%) Fail: 1 (20%) Failure details: -------------------------------- S0.T4-ERROR: FAIL: (= 1 2) --------------------------------
Calling run on 's1 will run test s1.t4-s1 and all the tests in suite s0.
- Lists of tests
- Fixtures and Freezing Data
No built-in capability.
- Removing tests
Nothing explicit
- Sequencing, Random and Failure Only
2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is no provision for only running the tests that failed last time.
- Skip Capability
None
- Random Data Generators
None
11.4. Discussion
The documentation indicates that assertions may be run inside threads. I did not validate this.
12. cacau
12.1. Summary
homepage | Noloop | GPL3 | 2020 |
Cacau is interesting in that it uses an external library for assertions and is just a "test runner". The examples shown with Cacau will all assume that the assert-p library by the same author is also loaded.
On the plus side, it has extensive hooks which can perform actions before and after a suite is run or before and after each test is run. It also has explicit async capabilities (but were not tested in this report) which do not exist in other frameworks.
At the same time, it tends to be all or nothing in what runs. The (run)
function either runs the last defined test (if you have not defined suites) or, if you have defined suites, it runs all tests in all the suites. Or if you then define a new function, just that new function. Maybe it is just me but I would be getting lost in what run
is supposed to be checking.
Most frameworks count the individual assertions in a test, Cacau treats the test as a whole - if one assertion fails, the entire test fails and if multiple assertions failed, it will only report the first failure, not all the failures in the test, leaving you with incomplete information.
Cacau is the only framework where, if you change a function that is being tested, you need to manually recompile the tests again.
Not recommended.
12.2. Assertion Functions
Cacau uses the assertion functions from an assertion library, currently you need to use assert-p. Those are:
t-p | not-t-p | zero-p | not-zero-p |
nil-p | not-nil-p | null-p | not-null-p |
eq-p | not-eq-p | eql-p | not-eql-p |
equal-p | not-equal-p | equalp-p | not-equalp-p |
typep-p | not-typep-p | values-p | not-values-p |
error-p | not-error-p | ||
condition-error-p | not-condition-error-p | custom-p |
I have to say for an assertion library, I expected to see some numerical tests as well.
12.3. Usage
You will notice that the test names must be strings whereas the test names in the other frameworks are either quoted symbol or unquoted symbols.
Cacau only has a run
function which accepts fixture, reporter and debugger parameters, but no provisions for telling it what tests you want to run. If you manually compile a single test, it usually runs just that test. Otherwise it runs all the tests in the package whether you want them or not. For purposes of walking through the basic capability, we will look only at the result of the specific test under consideration and not any other test which might be picked up in the report.
I do not see a way to rerun a test except by manually recompiling the test.
- Report Format
Interactive mode can be enabled by passing the keyword parameter :cl-debugger to
(run :cl-debugger t)
Cacau has a few different reporting formats. The function (run) without a reporter specification will provide the default :min level of info. There is also :list and :full which provide different levels of information. You will notice that I am actually creating multiple copies of the failing test to simulate recompiling the test.Cacau treats tests with multiple assertions as a unit. Either everything passes or the test fails and it may be difficult to figure out which assertion was the one that failed. This is clearly shown below where both the first and second asssertions should fail, but only the first failure (the eql) gets reported.
Cacau does not allow us to pass messages in the assertion which might have allowed us to flag potential issues that would aid in debugging failures.
- Min
(deftest "t1-fail-1" () (let ((x 1) (y 2)) (assert-p:eql-p x y) (assert-p:equal-p 1 2))) (run :reporter :min) <=> Cacau <=> From 1 running tests: 0 passed 1 failed NIL
- List
Now the list reporting level
(deftest "t1-fail-2" () (let ((x 1) (y 2)) (assert-p:equal-p x y) (assert-p:equal-p 1 2))) (run :reporter :list) <=> Cacau <=> <- t1-fail-2: Error message: BIT EQL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 2 ------------------------- From 1 running tests: 0 passed 1 failed NIL
- Full
And finally the full reporting level
(deftest "t1-fail-3" () (let ((x 1) (y 2)) (assert-p:eql-p x y) (assert-p:equal-p 1 2))) (run :reporter :full) <=> Cacau <=> <- t1-fail-3: Error message: BIT EQL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 2 Epilogue ------------------------- 0 running suites 1 running tests 0 only suites 0 only tests 0 skip suites 0 skip tests 0 total suites 1 total tests 0 passed 1 failed 1 errors 740673543 run start 740673543 run end 1/1000000 run duration 0 completed suites 1 completed tests Errors ------------------------- Suite: :SUITE-ROOT Test: t1-fail-3 Message: BIT EQL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 2
- Min
- Basics
The empty form after the test name is for particular parameters such as :skip and :async.
(deftest "t1" () (assert-p:eql-p 1 1) (assert-p:condition-error-p (error 'division-by-zero) division-by-zero)) (run) <=> Cacau <=> From 1 running tests: 1 passed 0 failed
You can already anticipate what happens you are testing a function and you change that function. Yes, you need to manually recompile the test and the earlier versions might still be found as well as the new version when you call (run).
- Edge Cases: Value expressions, loops, closures and calling other tests
- Value expressions
Cacau (or really assert-p) has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.
(deftest "t2-values-3" () (assert-p:equalp-p (values 1 2) (values 1 3))) (run) <=> Cacau <=> From 1 running tests: 1 passed 0 failed NIL
Basically it accepted the values expressions but only looked at the first values.
- Looping and closures.
Cacau will test correctly if it is looking at variables from a closure surrounding the test.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest "t2-loop" () (loop for x in l1 for y in l2 do (assert-p:eql-p (char-code x) y)))) (run :reporter :list) <=> Cacau <=> -> t2-loop ------------------------- From 1 running tests: 1 passed 0 failed
- Calling another test from a test
I have not figured out a way for a test to call another test in cacau.
- Value expressions
- Redefinition Ambiguities
Now consider the following where I mis-define a test with multiple values, then attempt to correct it. and am left not knowing where I stand.
(deftest "t2-with-multiple-values" () (assert-p:eql-p 1 1 2)) ; in: DEFTEST "t2-with-multiple-values" ; (NOLOOP.ASSERT-P:EQL-P 1 1 2) ; ; caught STYLE-WARNING: ; The function EQL-P is called with three arguments, but wants exactly two. ; ; compilation unit finished ; caught 1 STYLE-WARNING condition #<NOLOOP.CACAU::TEST-CLASS {1005098793}> (deftest "t2-with-multiple-values" () (assert-p:t-p (= 1 1 2))) #<NOLOOP.CACAU::TEST-CLASS {1005292E43}> (run) ; the minimum level of info report, probably a mistake <=> Cacau <=> From 2 running tests: 0 passed 2 failed NIL (run :reporter :list) ; I try running again with a higher level of information <=> Cacau <=> ------------------------- From 0 running tests: 0 passed 0 failed
Even though the test failed as expected, it is showing 2 running tests. What are the two tests? I would have expected only one test since we are using the same string name.
- Conditions (Failing)
The following fails as expected.
(deftest "t7-bad-error" () (assert-p:condition-error-p (error 'division-by-zero) floating-point-overflow))
- Suites, tags and other multiple test abilities
- Lists of tests
No such capability outside of suites.
- Suites
You can have multiple suites of tests, but the (run) function will run everything that has not been run before:
(defsuite :s0 () (deftest "s0-t1" () (assert-p:eql-p 1 2))) (defsuite :s1 () (let ((x 0)) (deftest "s1-t1" () (assert-p:eql-p x 0)) (defsuite :s2 () (deftest "s2-t1" () (assert-p:eql-p 1 3))))) (run :reporter :list) <=> Cacau <=> :S0 <- s0-t1: Error message: BIT EQL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 2 :S1 -> s1-t1 :S2 <- s2-t1: Error message: BIT EQL (INTEGER 0 4611686018427387903) Actual: 1 Expected: 3 ------------------------- From 3 running tests: 1 passed 2 failed
What I would like to see is an ability to run the suites separately. You cannot do that. What is the value of having suites and sub-suites if you cannot run them separately? top
- Lists of tests
- Fixtures and Freezing Data
This is where cacau brings something to the table. You can use the hooks
defbefore-all
,defafter-all
,defbefore-each
anddefafter-each
to set up or take down data environment or contexts.(defsuite :suite-1 () (defbefore-all "Before-all" () (print ":Before-all")) (defafter-each "After-each" () (print ":After-each")) (defafter-all "After-all" () (print ":After-all")) (defbefore-each "Before-each Suite-1" () (print "run Before-each Suite-1")) (deftest "Test-1" () (print "run Test-1") (t-p t)) (deftest "Test-2" () (print "run Test-2") (t-p t)) (defsuite :suite-2 () (defbefore-each "Before-each Suite-2" () (print "run Before-each Suite-2")) (deftest "Test-3" () (print "run Test-3") (t-p t)) (deftest "Test-4" () (print "run Test-4") (t-p t)))) (run) ":Before-all" "run Before-each Suite-1" "run Test-1" ":After-each" "run Before-each Suite-1" "run Test-2" ":After-each" "run Before-each Suite-1" "run Before-each Suite-2" "run Test-3" ":After-each" "run Before-each Suite-1" "run Before-each Suite-2" "run Test-4" ":After-each" ":After-all" <=> Cacau <=> From 4 running tests: 4 passed 0 failed
- Removing tests
I did not see anything here, but maybe I missed it.
- Sequencing, Random and Failure Only
Sequential only.
- Skip Capability
You can specify to skip a test or a suite (and no, I do not consider this to be a good substitute for being able to specify which test or suite you want to run).
(defsuite :suite-1 () (deftest "Test-1" (:skip) (t-p t)) (deftest "Test-2" () (t-p t))) ;; run! (defsuite :suite-2 (:skip) (let ((x 0)) (deftest "Test-1" () (eql-p x 0)) (deftest "Test-2" () (t-p t)) (defsuite :suite-3 () (deftest "Test-1" () (t-p t)) (deftest "Test-2" () (t-p t)))))
- Async Abilities
I am going to have to cheat here and refer you to the author's page for the description of the async capabilities https://github.com/noloop/cacau#async-test.
- Time Limits for tests
Cacau does have the ability to specify time limits for tests. The time limits can be set by suite (all tests in the suite have the same time limit), by hook or by test. See the author's discussion at https://github.com/noloop/cacau#timeout
- Random Data Generators
I did not see anything here with respect to data generators.
12.4. Discussion
As I said in the summary, the fact that it does not show all the assertions that failed and does not give me the ability to specify suites or tests to run make this unsuitable for me.
12.5. Who Uses
cl-minify-css-test
13. cardiogram
13.1. Summary
homepage | Abraham Aguilar | MIT | 2020 |
Cardiogram starts off with immediate problems. The documentation does not match up with the code, even on simple things like "true" in the documentation is "is-true" in the code. I got the benchmarking to work but I am not going to try to reverse engineer all the other points discussed with other frameworks.
14. checkl
14.1. Summary
homepage | Ryan Pavlik | LLGPL, BSD | 2018 |
Checkl is different. As a result, this section is different from the other frameworks. Checkl assumes that you do informal checks at the REPL as you are coding and saves those results. Assuming you change your program and check the modified function or whatever with exactly the same parameters, it will let you know if the result is now different. As a result it is a bit more difficult to compare based on the wish list. It can, however, be integrated with Fiveam.
14.2. Usage
- Basic Usage
Assume you create two functions
foo-up
andfoo-down
and compile them.(defun foo-up (x) (+ x 2)) (defun foo-down (x) (- x 2))
Now you create checks against the functions and compile them.
(check () (foo-up 2)) (check () (foo-down 2))
If you now revise
foo-down
and compile it, checkl will cause the system to throw an error immediately upon compilefoo-down
because the results were different (different as inequalp
different.)(defun foo-down (x) (- x 3)) Result 0 has changed: -1 Previous result: 0 [Condition of type CHECKL::RESULT-ERROR] Restarts: 0: [USE-NEW-VALUE] The new value is correct, use it from now on. 1: [SKIP-TEST] Skip this, leaving the old value, but continue testing 2: [ABORT] Abort compilation. 3: [*ABORT] Return to SLIME's top level. 4: [ABORT] abort thread (#<THREAD "worker" RUNNING {1010B73BD3}>)
Modifying and recompiling
foo-up
similarly also triggers an error.Suppose you want to have multiple checks on function based on different parameters. You can name the
check
tests.(check (:name :foo-up-integer) (foo-up 4)) (check (:name :foo-down-integer) (foo-down 4)) (check (:name :foo-up-real) (foo-up 4.5)) (check (:name :foo-down-real) (foo-down 4.5))
If you pass those check names to
run
, you get the following (remember that post our modifications,foo-up
andfoo-down
are now adding and subtracting 3.:(run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real) (7) (1) (7.5) (1.5)
Checks can also be defined checking the results of multiple functions using the results function.
(check (:name :foo-up-and-down) (results (foo-up 7) (foo-down 3.2))) (run :foo-up-and-down) (10 0.20000005)
By the way, results will copy structures, sequences and marshalls standard-objects.
- Suites, tags and other multiple test abilities
The
run-all
function will return the results for all the check tests defined in the current package.- Lists of tests
As seen in the basic usage, checkl can run multiple checks.
(run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real)
- Suites/tags/categories
When you are naming checks, you can also pass a category name to the keyword parameter category. You can then pass the category name to
run-all
and get just the values related to the checks with that category flagged.(check (:name :foo :category :some-category) ...) (run-all :some-category ...)
- Lists of tests
- Storage
You can store these named checks by running check-store. E.g.:
(checkl-store "/home/sabrac/checkl-test") ;;; some time later (checkl-load "/home/sabrac/checkl-test")
- Integration with Fiveam
Assuming you have already loaded fiveam, you can also send the checkl tests to fiveam by using check-formal.
(checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John" "Paul")) "JohnPaul" (fiveam:run! :default) Running test suite DEFAULT Running test ONE-CONCAT . Did 1 check. Pass: 1 (100%) Skip: 0 ( 0%) Fail: 0 ( 0%) T NIL
Now, if we go back and add a space to "John", and run check-formal, not only will check-formal fail, but subsequently running fiveam will fail.
(checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John " "Paul")) ; Evaluation aborted on #<CHECKL::RESULT-ERROR {1003FE1433}>. TF-TEST1> (fiveam:run! :default) Running test suite DEFAULT Running test ONE-CONCAT f Did 1 check. Pass: 0 ( 0%) Skip: 0 ( 0%) Fail: 1 (100%) Failure Details: -------------------------------- ONE-CONCAT []: CHECKL::RESULT evaluated to ("John Paul") which is not CHECKL:RESULT-EQUALP to ("JohnPaul") .. -------------------------------- NIL (#<IT.BESE.FIVEAM::TEST-FAILURE {1004189053}>)
15. clunit
15.1. Summary
homepage | Tapiwa Gutu | BSD | 2017 |
Updated 13 June 2021 Based on unresolved issues showing at github, as well as my inability to reach the author, This does not appear to be maintained and subject to bitrot. You should look at clunit2 instead. The difference between clunit2 and clunit is
- clunit2's ability to redirect reporting output,
- clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
- clunit2' ability to test multiple value expressions
- clunit2's suite signaling capability and
- the fact that clunit2 has a maintainer.
15.2. Assertion Functions
Clunit's assertion functions are:
assert-condition | assert-eq | assert-eql |
assert-equal | assert-equality | assert-equality* |
assert-equalp | assert-expands | assert-fail |
assert-false | assert-true | assertion-condition |
assertion-conditions | assertion-error | assertion-expander |
assertion-fail-forced | assertion-failed | assertion-passed |
The predicate used by assert-equality is determined by the setting of *clunit-equality-test*
.
top
15.3. Usage
- Report Format
Report format is controlled by the variable
*clunit-report-format*
. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.
(run-test 'some-test-name :report-progress nil) (run-suite 'some-suite-name :report-progress nil)
Clunit2, unlike clunit has a
*test-output-stream*
variable which can be used to redirect the reports to file or other stream locations.To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.
(run-test test-name :use-debugger t)
To give you a sense of what the failure report looks like, we take a basic failing test with multiple assertions . We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has two variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format-like parameters.
(deftest t1-fail () "describe t1-fail" (let ((x 1) (y 2)) (assert-equal x y "This assert-equal test was meant to fail" x y) (assert-true (= 1 2) "This assert-true test was meant to fail") (assert-false (= 1 1)) (assert-eq 'a 'b) (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3)) (assert-condition division-by-zero (error 'floating-point-overflow) "testing condition assertions") (assert-equalp (values 1 2) (values 1 3 4)))) #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}> (run-test 't1-fail) PROGRESS: ========= T1-FAIL: FFFFFE. FAILURE DETAILS: ================ T1-FAIL: Expression: (EQUAL X Y) Expected: X Returned: 2 This assert-equal test was meant to fail X => 1 Y => 2 T1-FAIL: Expression: (= 1 2) Expected: T Returned: NIL This assert-true test was meant to fail T1-FAIL: Expression: (= 1 1) Expected: NIL Returned: T T1-FAIL: Expression: (EQ 'A 'B) Expected: 'A Returned: B T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3)) Expected: (PROGN (SETQ V1 4) (SETQ V2 3)) Returned: (PROGN (SETQ V1 3) (SETQ V2 3)) T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 7 assertions. Passed: 1/7 ( 14.3%) Failed: 5/7 ( 71.4%) Errors: 1/7 ( 14.3%)
- Basics
Looking a little closer at basic test where we know everything will pass. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.
(defmacro setq2 (v1 v2 e) (list 'progn (list 'setq v1 e) (list 'setq v2 e))) (deftest t1 () "describe t1" (assert-true (= 1 1)) (assert-false (= 1 2)) (assert-eq 'a 'a) (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3)) (assert-condition division-by-zero (error 'division-by-zero) "testing condition assertions") (assert-condition simple-warning (signal 'simple-warning)))
Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.
(run-test 't1) PROGRESS: ========= T1: ...... SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 6 assertion. Passed: 6/6 (100.0%)
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Values expressions, loops. closures and calling other tests
Clunit has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.
(deftest t2-values-expressions () (assert-equal (values 1 2) (values 1 3)))
- Looping and closures.
Will a test accept looping through assertions with lexical variables from a closure? NO. Clunit complains that the variables l1 and l2 are never defined.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop () (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
Clunit is quite happy to loop if the variables are defined within the test or, for that matter, if the closure encompassed tested functions rather than the test itself.:
(deftest t2-loop () (let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
CLunit has no assertions that will handle checking more than two values in a single assertion, so you will have to use
assert-true
with the usual CL functions. - Calling another test from a test
This uses the second version of test t2 which has two failing asssertions and one passing assertion.
(deftest t3 (); a test that tries to call another test in its body "describe t3" (assert-equal 'a 'a) (run-test 't2)) (run-test 't3) PROGRESS: ========= T3: . PROGRESS: ========= T2: FF. FAILURE DETAILS: ================ T2: Expression: (EQUAL 1 2) Expected: 1 Returned: 2 T2: Expression: (EQUAL 2 3) Expected: 2 Returned: 3 SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 3 assertions. Passed: 1/3 some tests not passed Failed: 2/3 some tests failed SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 1 assertion. Passed: 1/1 all tests passed SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 1 assertion. Passed: 1/1 all tests passed
It reports each test separately, but correctly, but obviously no composition..
- Looping and closures.
- Suites, tags and other multiple test abilities
- Lists of tests
clunit will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.
- Suites
Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.
Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.
(deftest testC ((suiteA suiteB)(testA testB)) ...)
Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):
- If REPORT-PROGRESS is non-NIL, the test progress is reported.
- If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
- If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
- If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
- if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
(defsuite s0 ()); Ultimate parent suite if the library provides inheritance (deftest t4 (s0) ; a test that is a member of a suite "describe t4" (assert-eq 1 1)) ;;a multiple assertion test that is a member of a suite with ;; a passing test, an error signaled and a failing test (deftest t4-error (s0) "describe t4-error" (assert-eq 'a 'a) (assert-condition error (error "t4-errored out")) (assert-true (= 1 2))) (deftest t4-fail (s0) ; "describe t4-fail" (assert-false (= 1 2))) (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance (deftest t4-s1 (s1) (assert-true (= 1 1))) (run-suite 's0) PROGRESS: ========= S0: (Test Suite) T4-FAIL: . T4-ERROR: ..F T4: . S1: (Test Suite) T5: . FAILURE DETAILS: ================ S0: (Test Suite) T4-ERROR: Expression: (= 1 2) Expected: T Returned: NIL SUMMARY: ======== Test functions: Executed: 4 Skipped: 0 Tested 6 assertions. Passed: 5/6 some tests not passed Failed: 1/6 some tests failed FAILURE DETAILS: ================ S0: (Test Suite) T4-ERROR: Expression: (= 1 2) Expected: T Returned: NIL SUMMARY: ======== Test functions: Executed: 4 Skipped: 0 Tested 6 assertions. Passed: 5/6 some tests not passed Failed: 1/6 some tests failed
- Early termination
You stop a suite test if a test fails without dropping into the debugger.
(run-suite 'suite-name :stop-on-fail t)
- Fixtures and Freezing Data
(defclass fixture-data () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply (let ((x (make-instance 'fixture-data :a 100 :b -100))) @body)) ;; create a sub suite and checking fixture inheritance (defsuite s2 (s1)) (deftest t6-s1 (s1) (assert-equal (a x) 100) (assert-equal (b x) -100)) (deftest t6-s2 (s2) (assert-equal (a x) 100) (assert-equal (b x) -100)) (run-suite 's1) PROGRESS: ========= S1: (Test Suite) T6-S1: .. T5: . S2: (Test Suite) T6-S2: .. SUMMARY: ======== Test functions: Executed: 3 Skipped: 0 Tested 5 assertions. Passed: 5/5 all tests passed SUMMARY: ======== Test functions: Executed: 3 Skipped: 0 Tested 5 assertions. Passed: 5/5 all tests passed
To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.
(run-suite suite-name :use-debugger t)
- Lists of tests
- Removing tests
(clunit:undeftest t1) (clunit:undeffixture fixture-name) (clunit:undefsuite suite-name)
- Sequencing, Random and Failure Only
The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit has a function
rerun-failed-tests
to rerun failed tests. - Skip Capability
Other than the dependency abilities previously mentioned, clunit has no additional skipping capability.
- Random Data Generators
Clunit has no built in data generators.
15.4. Discussion
The difference between clunit2 and clunit is clunit2's ability to redirect reporting output, suite signaling capability and the fact that it has a maintainer. Both of them are very slow. top
15.5. Who Uses
bt-semaphore, data-frame, cl-kanren, "cl-random-tests" "cl-slice-tests" "listoflist" "lla-tests" "oe-encode-test" "trivial-tco-test")
16. clunit2
16.1. Summary
homepage | Cage (fork of clunit) | BSD | 2020 |
Update 13 June 2021 Clunit2 is a fork of Clunit. For quicklisp system loading purposes, it is clunit2, for package naming purposes it is clunit, not clunit2. The difference between clunit2 and clunit is:
- clunit2's ability to redirect reporting output,
- clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
- clunit2' ability to test multiple value expressions
- clunit2's suite signaling capability and
- the fact that clunit2 has a maintainer.
Clunit2 does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, fixtures are available as are suites and the ability to rerun failed tests. You can specify that a test that depends on other tests passing will be skipped if those prior tests fail.
With respect to the edge cases, as of the 13 June 2021 update, Clunit2 will accept variables declared in closures surrounding the test and does have the ability completely test all the values returned from a values expression.
16.2. Assertion Functions
Clunit2's assertion functions are:
assert-condition | assert-eq | assert-eql |
assert-equal | assert-equality | assert-equality* |
assert-equalp | assert-expands | assert-fail |
assert-false | assert-true | assertion-condition |
assertion-conditions | assertion-error | assertion-expander |
assertion-fail-forced | assertion-failed | assertion-passed |
16.3. Usage
- Report Format
Report format is controlled by the variable
*clunit-report-format*
. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.
(run-test 'some-test-name :report-progress nil) (run-suite 'some-suite-name :report-progress nil)
Clunit2, unlike clunit has a
*test-output-stream*
variable which can be used to redirect the reports to file or other stream locations.To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.
(run-test test-name :use-debugger t)
- Basics
The most basic test. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.
(defmacro setq2 (v1 v2 e) (list 'progn (list 'setq v1 e) (list 'setq v2 e))) (deftest t1 () "describe t1" (assert-true (= 1 1)) (assert-false (= 1 2)) (assert-eq 'a 'a) (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3)) (assert-condition division-by-zero (error 'division-by-zero) "testing condition assertions") (assert-condition simple-warning (signal 'simple-warning)))
Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.:
(run-test 't1) PROGRESS: ========= T1: ..... SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 6 assertion. Passed: 6/6 (100.0%)
You can switch off the progress report for running tests and suites by setting the keyword parameter :report-progress to nil:
(run-test 't1 :report-progress nil)
Now a basic failing test with multiple assertions (and also to see if the library can deal with values expressions). We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has foll variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format like parameters.
(deftest t1-fail () "describe t1-fail" (let ((x 1) (y 2)) (assert-equal x y "This assert-equal test was meant to fail" x y) (assert-true (= 1 2) "This assert-true test was meant to fail") (assert-false (= 1 1)) (assert-eq 'a 'b) (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3)) (assert-condition division-by-zero (error 'floating-point-overflow) "testing condition assertions") (assert-equalp (values 1 2) (values 1 3 4)))) #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}> TF-CLUNIT> (run-test 't1-fail) PROGRESS: ========= T1-FAIL: FFFFFE. FAILURE DETAILS: ================ T1-FAIL: Expression: (EQUAL X Y) Expected: X Returned: 2 This assert-equal test was meant to fail X => 1 Y => 2 T1-FAIL: Expression: (= 1 2) Expected: T Returned: NIL This assert-true test was meant to fail T1-FAIL: Expression: (= 1 1) Expected: NIL Returned: T T1-FAIL: Expression: (EQ 'A 'B) Expected: 'A Returned: B T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3)) Expected: (PROGN (SETQ V1 4) (SETQ V2 3)) Returned: (PROGN (SETQ V1 3) (SETQ V2 3)) T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 7 assertions. Passed: 1/7 ( 14.3%) Failed: 5/7 ( 71.4%) Errors: 1/7 ( 14.3%)
With respect to the values expression, we can see that it passed, proving clunit only looks at the first value in the values expression.
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Values expressions, loops, closures and calling other tests
Update 13 June 2021 Clunit2 as of the date of this update report will compare all the values from two values expressions. The following now properly fails.
(deftest t2-values-expressions () (assert-equal (values 1 2) (values 1 3)))
- Looping and closures.
Update 13 June 2021 Clunit2 will accept variables declared in a closure surrounding the test. The following passes.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop () (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
Clunits is quite happy to loop if the variables are defined within the test:
(deftest t2-loop () (let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
- Calling another test from a test
We will call the second version of test t2 which has failures.
(deftest t3 (); a test that tries to call another test in its body "describe t3" (assert-equal 'a 'a) (run-test 't2)) (run-test 't3) PROGRESS: ========= T3: . PROGRESS: ========= T2: FF. FAILURE DETAILS: ================ T2: Expression: (EQUAL 1 2) Expected: 1 Returned: 2 T2: Expression: (EQUAL 2 3) Expected: 2 Returned: 3 SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 3 assertions. Passed: 1/3 some tests not passed Failed: 2/3 some tests failed SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 1 assertion. Passed: 1/1 all tests passed SUMMARY: ======== Test functions: Executed: 1 Skipped: 0 Tested 1 assertion. Passed: 1/1 all tests passed
It reports each test separately (no compostion), but correctly.
- Looping and closures.
- Conditions
The following fails as expected.
(deftest t7-bad-error () (assert-condition floating-point-overflow (error 'division-by-zero) "testing condition assertions. This should fail"))
- Suites, tags and other multiple test abilities
- Lists of tests
clunit2 will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.
- Suites
Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.
Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.
(deftest testC ((suiteA suiteB)(testA testB)) ...)
Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):
- If REPORT-PROGRESS is non-NIL, the test progress is reported.
- If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
- If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
- If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
- if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
(defsuite s0 ()); Ultimate parent suite if the library provides inheritance (deftest t4 (s0) ; a test that is a member of a suite "describe t4" (assert-eq 1 1)) ;;a multiple assertion test that is a member of a suite with ;; a passing test, an error signaled and a failing test (deftest t4-error (s0) "describe t4-error" (assert-eq 'a 'a) (assert-condition error (error "t4-errored out")) (assert-true (= 1 2))) (deftest t4-fail (s0) ; "describe t4-fail" (assert-false (= 1 2))) (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance (deftest t4-s1 (s1) (assert-true (= 1 1))) (run-suite 's0) PROGRESS: ========= S0: (Test Suite) T4-FAIL: . T4-ERROR: ..F T4: . S1: (Test Suite) T5: . FAILURE DETAILS: ================ S0: (Test Suite) T4-ERROR: Expression: (= 1 2) Expected: T Returned: NIL SUMMARY: ======== Test functions: Executed: 4 Skipped: 0 Tested 6 assertions. Passed: 5/6 some tests not passed Failed: 1/6 some tests failed FAILURE DETAILS: ================ S0: (Test Suite) T4-ERROR: Expression: (= 1 2) Expected: T Returned: NIL SUMMARY: ======== Test functions: Executed: 4 Skipped: 0 Tested 6 assertions. Passed: 5/6 some tests not passed Failed: 1/6 some tests failed
- Early termination
You stop a suite test if a test fails without dropping into the debugger.
(run-suite 'suite-name :stop-on-fail t)
- Fixtures and Freezing Data
(defclass fixture-data () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply (let ((x (make-instance 'fixture-data :a 100 :b -100))) @body)) ;; create a sub suite and checking fixture inheritance (defsuite s2 (s1)) (deftest t6-s1 (s1) (assert-equal (a x) 100) (assert-equal (b x) -100)) (deftest t6-s2 (s2) (assert-equal (a x) 100) (assert-equal (b x) -100)) (run-suite 's1) PROGRESS: ========= S1: (Test Suite) T6-S1: .. T5: . S2: (Test Suite) T6-S2: .. SUMMARY: ======== Test functions: Executed: 3 Skipped: 0 Tested 5 assertions. Passed: 5/5 all tests passed SUMMARY: ======== Test functions: Executed: 3 Skipped: 0 Tested 5 assertions. Passed: 5/5 all tests passed
- Lists of tests
- Removing tests
(undeftest t1) (undeffixture fixture-name) (undefsuite suite-name)
- Sequencing, Random and Failure Only
The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit2 has a function
rerun-failed-tests
to rerun failed tests. - Skip Capability
Other than the dependency abilities previously mentioned, clunit2 has no additional skipping capability.
- Random Data Generators
Clunit2 has no built in data generators.
16.4. Discussion
Clunit2 is a substantial step forward from clunit and should be considered the successor.
16.5. Who Uses
17. com.gigamonkeys.test-framework
17.1. Summary
homepage | Peter Seibel | BSD | 2010 |
This is a basic testing framework without the bells and whistles found in several of the others. For example, it lacks fixtures or suites. Nothing wrong with it but you can find a lot more functionality elsewhere.
17.2. Assertion Functions
17.3. Usage
One thing to note on setup. If you are using quicklisp, the quickload system name is:
(ql:quickload :com.gigamonkeys.test-framework)
however the package name, at least in SBCL, is com.gigamonkeys.test
- Report Format
Gigamonkey tests can be set to go into the debugger on errors, failures or never. this is controlled by the settings of
*debug*
(for error conditions) and*debug-on-faill*
(for test failures). - Basics
The most basic passing test. Unlike most other frameworks, the empty form after the test name is for parameters which can be passed to the test. Also unlike many other frameworks calling the test using the test function uses an unquoted name of the test.
(deftest t1 (x) (check (= 1 x)) (expect division-by-zero (error 'division-by-zero)))
(test t1 1) Okay: 2 passes; 0 failures; 0 aborts. T 2 0 0
Now a basic failing test.
To go interactive - dropping immediately into the debugger for unexpected conditions, set
*debug*
to t.To drop immediately into the debugger when a test fails, set
*debug-on-fail*
to t. This is the default, but we will set it to nil for these examples(deftest t1-fail (); the most basic failing test (let ((x 1) (y 2)) (check (= x y) ))) NIL TEST> (test t1-fail) FAIL ... (T1-FAIL): (= X Y) X => 1 Y => 2 (= X Y) => NIL NOT okay: 0 passes; 1 failures; 0 aborts. NIL 0 1 0
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Values expressions, loops, closures and calling other tests
Gigamonkeys has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.
(deftest t2-values-expressions () (check (equal (values 1 2) (values 1 3))))
- Closures.
Gigamonkeys has no problem with variables declared in a closure encompassing the test.
- Calling another test from a test
(deftest t3 (); a test that tries to call another test in its body (check (eql 'a 'a)) (test t2)) (test t3) Okay: 3 passes; 0 failures; 0 aborts. Okay: 4 passes; 0 failures; 0 aborts. T 4 0 0
So far so good, but no composition.
- Closures.
- Conditions
The following immediately throws us into the debugger. If we hit PROCEED, we will get the feedback shown below.
(deftest t7-bad-error () (expect division-by-zero (error 'floating-point-overflow))) (test t7-bad-error) ABORT ... (T7-BAD-ERROR): arithmetic error FLOATING-POINT-OVERFLOW signalled NOT okay: 0 passes; 0 failures; 1 aborts. NIL 0 0 1
- Suites, tags and other multiple test abilities
- Fixtures and Freezing Data
None
- Removing tests
Gigamonkeys has functions
remove-test-function
andclear-package-tests
- Sequencing, Random and Failure Only
None
- Skip Capability
None
- Random Data Generators
None
17.4. Discussion
17.5. Who Uses com.gigamonkeys.test-framework
18. Confidence
18.1. Summary
homepage | Michaël Le Barbier | MIT | 2023 |
Confidence is a new entry by someone who found the existing frameworks (based on his experience with Stefil and Fiveam) either too complicated, they did not provide enough debugging information to know exactly where the problem is and were not extensible in the sense of adding additional assertions. I am sure that Confidence meets his needs, but would not meet the needs of other users.
It does report all failing assertions in a test. It has suites but no fixtures and does not allow you to provide user created diagnostic strings for the assertions. On the plus side it has some really nice floating point assertions that are not found elsewhere.
18.2. Assertion Functions
assert-char-equal | assert-string-equal |
assert-char< | assert-string-match |
assert-char<= | assert-string< |
assert-char= | assert-string<= |
assert-char> | assert-string= |
assert-char>= | assert-string> |
assert-condition | assert-string>= |
assert-eq | assert-subsetp |
assert-eql | assert-t |
assert-equal | assert-t* |
assert-equalp | assert-type |
assert-float-is-approximately-equal | assert-vector-equal |
assert-float-is-definitely-greater-than | assert< |
assert-float-is-definitely-less-than | assert<= |
assert-float-is-essentially-equal | assert= |
assert-list-equal | assert> |
assert-nil | assert>= |
assert-set-equal |
Confidence also provides a macro for defining more assertions.
18.3. Usage
- Report Format
Test failures in Confidence create a result object. Result object are combined and produce a report.
- Basics
Tests in Confidence are functions. Unless we are calling multiple tests in the following examples, we will just call the test-case function itself. The empty form after the test name is not really described in the documentation, but can be used in parameterized test cases.
The basic passing test below shows the floating point comparisons built into Confidence.
(define-testcase t1 () "describe t1" (assert-t (= 1 1)) (assert-string< "abc" "def") (assert-float-is-approximately-equal 5.100000 5.1000001) (assert-float-is-essentially-equal 5.100000 5.1000001) (assert-float-is-definitely-greater-than 5.100001 5.100000) (assert-float-is-definitely-less-than 5.100000 5.100001) (assert-equal (values 1 2) (values 1 2))) (t1) Name: T1 Total: 7 Success: 7/7 (100%) Failure: 0/7 (0%) Condition: 0/7 (0%) Outcome: Success NIL
A parameterized test case:
(define-testcase t1-p (y); the most basic parameterized (let ((x 1)) (assert-equal x y))) (t1-p 1)
Now a basic failing test. This time we are using a more specific assertion (assert-equal..). Unlike some other frameworks, we cannot pass a descriptive string to the assertion.
(define-testcase t1-fail (); the most basic failing test (let ((x 1) (y 2)) (assert-equal x y) (assert-equal y 3)) (t1-fail) Name: T1-FAIL Total: 2 Success: 0/2 (0%) Failure: 2/2 (100%) Condition: 0/2 (0%) Outcome: Failure ================================================================================ #<ASSERTION-FAILURE {70066055D3}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T1-FAIL Arguments: Argument #1: 1 Argument #2: 2 Form: (ASSERT-EQUAL X Y) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 1 B: 2 ================================================================================ #<ASSERTION-FAILURE {7006605863}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T1-FAIL Arguments: Argument #1: 2 Argument #2: 3 Form: (ASSERT-EQUAL Y 3) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 2 B: 3 NIL
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Multiple assertions, loops. closures and calling other tests
(t2) Name: T2 Total: 3 Success: 1/3 (33%) Failure: 2/3 (67%) Condition: 0/3 (0%) Outcome: Failure ================================================================================ #<ASSERTION-FAILURE {700692EF83}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T2 Arguments: Argument #1: 1 Argument #2: 2 Form: (ASSERT-EQUAL 1 2) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 1 B: 2 ================================================================================ #<ASSERTION-FAILURE {700692F1F3}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T2 Arguments: Argument #1: 2 Argument #2: 3 Form: (ASSERT-EQUAL 2 3) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 2 B: 3 NIL
Confidence had no problem with the values expression as such, but like almost all the frameworks, it only looked at the first value.
- Closures
Confidence has no problem accessing variables defined in a closure encompassing the test.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (define-testcase t2-loop () (loop for x in l1 for y in l2 do (assert-equal (char-code x) y)))) (t2-loop) Name: T2-LOOP Total: 3 Success: 3/3 (100%) Failure: 0/3 (0%) Condition: 0/3 (0%) Outcome: Success NIL
Checking assert= with multiple values.
(define-testcase t2-with-multiple-values () (assert= 1 1 2)) T2-WITH-MULTIPLE-VALUES (t2-with-multiple-values) (t2-with-multiple-values) Name: T2-WITH-MULTIPLE-VALUES Total: 1 Success: 0/1 (0%) Failure: 0/1 (0%) Condition: 1/1 (100%) Outcome: Failure ================================================================================ #<ASSERTION-CONDITION {70070524F3}> is an assertion result of type ASSERTION-CONDITION. Type: :FUNCTION Name: ASSERT= Path: T2-WITH-MULTIPLE-VALUES Arguments: Argument #1: 1 Argument #2: 1 Argument #3: 2 Form: (ASSERT= 1 1 2) Outcome: Condition Condition: #<SB-INT:SIMPLE-PROGRAM-ERROR "invalid number of arguments: ~S" {7006F4B683}> #<SB-INT:SIMPLE-PROGRAM-ERROR "invalid number of arguments: ~S" {7006F.. [condition] Slots with :INSTANCE allocation: FORMAT-CONTROL = "invalid number of arguments: ~S" FORMAT-ARGUMENTS = (3) In this call, forms in argument position evaluate as: A: 1 B: 1 NIL
It failed. All the assertions in Confidence compare two values only.
- Calling another test from a test
If you call a test within a test in Most other frameworks you will effectively get two reports. Confidence actually composes the results.
(define-testcase t3 () "describe t3 which is a test that tries to call another test in its body" (assert-equal 'a 'a) (t1)) (t3) Name: T3 Total: 8 Success: 8/8 (100%) Failure: 0/8 (0%) Condition: 0/8 (0%) Outcome: Success NIL
If test t3 called test t2, we would have seen the following in the debugger which implies that the assertion failure was in t2, not t3:
Name: T3 Total: 4 Success: 2/4 (50%) Failure: 2/4 (50%) Condition: 0/4 (0%) Outcome: Failure ================================================================================ Name: T2 Total: 3 Success: 1/3 (33%) Failure: 2/3 (67%) Condition: 0/3 (0%) Outcome: Failure ================================================================================ #<ASSERTION-FAILURE {7007FEADA3}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T3 T2 Arguments: Argument #1: 1 Argument #2: 2 Form: (ASSERT-EQUAL 1 2) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 1 B: 2 ================================================================================ #<ASSERTION-FAILURE {7007FEB013}> is an assertion result of type ASSERTION-FAILURE. Type: :FUNCTION Name: ASSERT-EQUAL Path: T3 T2 Arguments: Argument #1: 2 Argument #2: 3 Form: (ASSERT-EQUAL 2 3) Outcome: Failure Description: Assert that A and B satisfy the EQUAL predicate. In this call, forms in argument position evaluate as: A: 2 B: 3
- Closures
- Conditions
Confidence has a macro assert-condition that verifies that a form signals a condition of a certain class. It can also examine the slots of the condition with assertions:
(define-testcase t4 () (assert-condition (error 'testing-framework :a "a" :b "b" :c "c") testing-framework (a b) (assert-string= "a" a) (assert-string= "b" b))) (t4) Name: T4 Total: 1 Success: 1/1 (100%) Failure: 0/1 (0%) Condition: 0/1 (0%) Outcome: Success NIL
- Suites, tags and other multiple test abilities
- Fixtures and Freezing Data
There is no additional capability in Confidence for fixtures or freezing data.
- Removing tests
None
- Sequencing, Random and Failure Only
Tests will be called in sequence and there is no random shuffling or skipping ability.
- Skip Capability
None other than provided in the debugger.
- Random Data Generators
None
18.4. Discussion
Confidence does provide the ability to define more assertions.
In summary, Confidence is interesting, but it will not make me change from another framework. I do think some other frameworks might want to follow its lead in having some float comparision assertions.
18.5. Who Uses
19. fiasco
19.1. Summary
homepage | João Távora | BSD 2 Clause | 2020 |
In spite of the fact that Fiasco does not have its own fixture capability (unless I am missing something), it managed to hit most of the other concerns that I have. It does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, suites are available as is the ability to rerun failed tests. It has skipping functions and, with respect to the edge cases, it handles variables declared in a closure surrounding the test. When a test calls another test it actually manages to compose the results rather than reporting two separate sets.
I found using the suite capability to be confusing (and likely would end up defining packages instead of suites, but you lose some compostion that way). It also does not have the ability some other frameworks have to deal with values expressions.
19.2. Assertion Functions
is | finishes | not-signals | signals |
19.3. Usage
- Report Format
Fiasco defaults to a reporting format. To go interactive run the test with :interactive t.
There are two slightly different versions of reporting format, the default and running the test with :verbose t. The verbose version simply adds the docstring for the test, so it does not really add much.
In the following example, look at the difference in reporting between the four assertions. The first assertion has an
=
predicate comparing numbers. The second has anequal
predicate comparing numbers and the third has variables and a diagnostic string that accepts parameters followed by the variables being passed to the test variables.(deftest t1-fail () "Docstring for test t1-fail" (let ((x 1) (y 2)) (is (= 1 2)) (is (equal 1 2)) (is (= x y) "This test was meant to fail because we know ~a is not = to ~a" x y ) (signals division-by-zero (error 'floating-point-overflow) "testing condition assertions. This should fail"))) (run-tests 't1-fail) T1-FAIL...................................................................[FAIL] Test run had 4 failures: Failure 1: UNEXPECTED-ERROR when running T1-FAIL arithmetic error FLOATING-POINT-OVERFLOW signalled Failure 2: FAILED-ASSERTION when running T1-FAIL This test was meant to fail because we know 1 is not = to 2 Failure 3: FAILED-ASSERTION when running T1-FAIL Binary predicate (EQUAL X Y) failed. x: 1 => 1 y: 2 => 2 Failure 4: FAILED-ASSERTION when running T1-FAIL Binary predicate (= X Y) failed. x: 1 => 1 y: 2 => 2 NIL (#<test-run of T1-FAIL: 1 test, 4 assertions, 4 failures in NIL sec (3 failed assertions, 1 error, none expected)>)
The interactive version would look like this:
(run-tests 't1-fail :interactive t) Test assertion failed when running T1-FAIL: Binary predicate (= X Y) failed. x: 1 => 1 y: 2 => 2 [Condition of type FIASCO::FAILED-ASSERTION] Restarts: 0: [CONTINUE] Roger, go on testing... 1: [CONTINUE] Skip the rest of the test T1-FAIL and continue by returning (values) 2: [RETEST] Rerun the test T1-FAIL 3: [CONTINUE-WITHOUT-DEBUGGING] Turn off debugging for this test session and invoke the first CONTINUE restart 4: [CONTINUE-WITHOUT-DEBUGGING-ERRORS] Do not stop at unexpected errors for the rest of this test session and continue by invoking the first CONTINUE restart 5: [CONTINUE-WITHOUT-DEBUGGING-ASSERTIONS] Do not stop at failed assertions for the rest of this test session and continue by invoking the first CONTINUE restart
- Basics
The empty form after the test name is for parameters to pass to the test. Calling the test using
run-tests
returns a list ofcontext
objects which is an internal class. Each test run is pushed to a history of test runs kept in the appropriately named*test-result-history*
.(deftest t1 () "docstring for t1" (is (= 1 1) "first assertion") (is (eq 'a 'a) "second assertion") (signals division-by-zero (error 'division-by-zero)) (finishes (+ 1 1))) T1 (run-tests 't1) ;; or (run-tests '(t1)) T1........................................................................[ OK ] T (#<test-run of T1: 1 test, 4 assertions, 0 failures in 1.4e-5 sec>)
If you add the keyword parameter :verbose, you get slightly more information in that it prints the test docstring (but not the assertion docstrings) and the number of assertions, failures etc.
(run-tests 't1 :verbose t) T1........................................................................[ OK ] (docstring for t1) (4 assertions, 0 failed, 0 errors, 0 expected)
Fiasco tests are funcallable. You will note that when calling the test in this fashion returns a single test-run object rather than a list of test run objects.
(t1) . T #<test-run of T1: 1 test, 1 assertion, 0 failures in 2.8e-5 sec> (funcall 't1) . T #<test-run of T1: 1 test, 4 assertion, 0 failures in 2.6e-5 sec>
Fiasco tests also take parameters as in this example:
(deftest t1-param (x) (is (= 1 x))) (t1-param 1) #<test-run of T1-PARAM: 1 test, 1 assertion, 0 failures in 2.2e-5 sec> (t1-param 2) X; Evaluation aborted on #<FIASCO::FAILED-ASSERTION "Binary assertion function ~A failed.~%~ x: ~S => ~S~%~ y: ~S => ~S" {100297EE03}>.
You do not have to manually recompile a test after a tested function has been modified. We will skip the proof.
- Edge Cases: Values expressions, loops, closures and calling other tests
Fiasco has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.
(deftest t2-values-expressions () (is (equalp (values 1 2) (values 1 3))))
- Looping and closures.
Fiasco has no problems with using variables declared in a closure surrounding the test.
- Calling another test from a test
(deftest t3 (); a test that tries to call another test in its body (is (eq 'a 'a)) (t2)) (t3) .XX. T #<test-run of T3: 2 tests, 4 assertions, 2 failures in 0.063884 sec (2 failed assertions, 0 errors, none expected)>
As hoped, the failures in t2 kicked us into the debugger where we could select continue and correctly end up with 2 tests, 4 assertions and 2 failures. This is better than most frameworks which would present us with two reports rather than a composed report.
- Looping and closures.
- Suites, tags and other multiple test abilities
- Lists of tests
Fiasco has no problem running lists of tests
(run-tests '(t1 t2)) T1........................................................................[ OK ] T2........................................................................[ OK ] T (#<test-run of T1: 1 test, 1 assertion, 0 failures in 1.4e-5 sec> #<test-run of T2: 1 test, 3 assertions, 0 failures in 5.0e-6 sec>)
- Suites
Fiasco can test for suites or all tests associated with a fiasco defined package.
Let's start with packages.
- Packages
You will need to use fiasco's
define-test-package
macro rather thandefine-package
in order to use therun-package-tests
function. Inside the new package, the functionrun-package-tests
is the preferred way to execute the suite. To run the tests from outside, userun-tests
.The function
run-package-tests
will print a report and returns two values. It accepts a :stream keyword parameter, making it easy to redirect the output to a file if so desired.The first value returned will be t if all tests passed, nil otherwise. The second value will be a list of context objects which contain various information about the test run. See the following example, modified slightly from https://github.com/joaotavora/fiasco/blob/master/test/suite-tests.lisp.
(fiasco:define-test-package #:tf-fiasco-examples) (in-package :tf-fiasco-examples) (defun seconds (hours-and-minutes) (+ (* 3600 (first hours-and-minutes)) (* 60 (second hours-and-minutes)))) (defun hours-and-minutes (seconds) (list (truncate seconds 3600) (truncate seconds 60))) (deftest test-conversion-to-hours-and-minutes () (is (equal (hours-and-minutes 180) '(0 3))) (is (equal (hours-and-minutes 4500) '(1 15)))) (deftest test-conversion-to-seconds () (is (= 60 (seconds '(0 1)))) (is (= 4500 (seconds '(1 15))))) (deftest double-conversion () (is (= 3600 (seconds (hours-and-minutes 3600)))) (is (= 1234 (seconds (hours-and-minutes 1234))))) (deftest test-skip-test () (skip) ;; These should not affect the test statistics below. (is (= 1 1)) (is (= 1 2))) (run-package-tests :package :tf-fiasco-examples) TF-FIASCO-EXAMPLES (Suite) TEST-CONVERSION-TO-HOURS-AND-MINUTES....................................[FAIL] TEST-CONVERSION-TO-SECONDS..............................................[ OK ] DOUBLE-CONVERSION.......................................................[FAIL] TEST-SKIP-TEST..........................................................[SKIP] Test run had 3 failures: Failure 1: FAILED-ASSERTION when running DOUBLE-CONVERSION Binary assertion function (= X Y) failed. x: 1234 => 1234 y: (SECONDS (HOURS-AND-MINUTES 1234)) => 1200 Failure 2: FAILED-ASSERTION when running DOUBLE-CONVERSION Binary assertion function (= X Y) failed. x: 3600 => 3600 y: (SECONDS (HOURS-AND-MINUTES 3600)) => 7200 Failure 3: FAILED-ASSERTION when running TEST-CONVERSION-TO-HOURS-AND-MINUTES Binary assertion function (EQUAL X Y) failed. x: (HOURS-AND-MINUTES 4500) => (1 75) y: '(1 15) => (1 15) NIL (#<test-run of TF-FIASCO-EXAMPLES: 5 tests, 6 assertions, 3 failures in 5.5e-4 sec (3 failed assertions, 0 errors, none expected)>)
You can drop the explanations of the failures by passing nil to
:describe-failures
(run-package-tests :package :tf-fiasco-examples :describe-failures nil)
There is an undocumented function
run-failed-tests
which looks at the last test run. My issue with this function is that it seems to need*debug-on-assertion-failure*
and*debug-on-unexpected-error*
set to T in order to work, which means that it forces me into the debugger whether I want it to or not. - Suites
Suites are created by the (defsuite) macro, but they are really just tests that call other tests. Phil Gold's original concern about suites in Stefil was "My only problem with the setup is that I don't see a way to explicitly assign tests to suites, aside from dynamically binding stefil::*suite*. Normally, the current suite is set by in-suite, which requires careful attention if you're jumping between different suites often. (A somewhat mitigating factor is that tests remember which suite they were created in, so the current suite only matters for newly-defined tests.)" I think that concern is just as valid in fiasco.
I find using suites in fiasco very confusing. Everything I looked at in quicklisp that used fiasco used run-package-tests rather than run-suite-tests. YMMV.
- Packages
- Lists of tests
- Fixtures and Freezing Data
None that I am aware of.
- Removing tests
Fiasco has the ability to delete tests, but it is not an exported function:
(fiasco::delete-test 't1)
- Sequencing, Random and Failure Only
Fiasco has a function
run-failed-tests
to run the tests that failed last time. - Skip Capability
- Random Data Generators
None
19.4. Additional Discussion
While Fiasco has a function to re-run failed tests, if I wanted to collect the names of the failing tests so that I could save them for some other purpose, I might do something like:
(defun collect-test-failure-names (package-name) "Runs a package test on the package and returns the names of the failing tests" (multiple-value-bind (x results) (run-package-tests :package package-name :describe-failures nil) (declare (ignore x)) (let ((result (first results))) (when (typep result 'fiasco::context) (loop for test-result in (fiasco::children-contexts-of result) when (fiasco::failures-of test-result) collect (fiasco::name-of (fiasco::test-of test-result)))))))
19.5. Who Uses Fiasco
At last count 24 libraries on quicklisp use Fiasco. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :fiasco)
20. fiveam
20.1. Summary
homepage | Edward Marco Baringer | BSD | 2020 |
Fiveam has a lot of market share. At the time of writing, it has 10 issues and 9 pull requests with no responses. The README at the github page is lacking but good documentation exists at common-lisp.net or the turtleware tutorial. There are also examples at Common Lisp Cookbook
Obviously with its market share there is a lot to like. It does report all the assertion failures in a test, allows user defined diagnostic messages with variables, interactive debugging is optional, it has suites and it runs lists of tests.
From a speed standpoint, it is either middle of the pack or, on a big test package and running in an emacs repl, vying with clunit for painfully slow. In such a case, if you are deciding between more tests with fewer assertions or fewer tests with more assertions, go with more tests and fewer assertions (but this is an emacs problem more than a fiveam problem). I do not know what using other editors would be like.
My wishlist for Fiveam is better fixture capability, the edge case ability to handle value expressions and variables declared in closures surrounding the test and get rid of all those blank lines in the failure reports.
20.2. Assertion Functions
is | is-false | finishes | signals | fail | pass | skip |
20.3. Usage
Generally speaking tests are called using the run
and run!
functions. If you set *run-test-when-defined*
to T, tests will be run as soon as they are defined (which includes hitting C-c C-c in the source code (assuming you are doing this in an editor with slime or sly or some such.
- Report Format
Fiveam will default to a reporting format. The format you get will depend on whether you call
run
orrun!
.run
provides a progress report using the typical dot/f/e format and which returns a list of the assertion result objects.run!
provides the progress report, more details on failure and does not return a list of the passing assertion result objects.
The following will allow you to turn off the progress report:
(let ((fiveam:*test-dribble* (make-broadcast-stream))) (fiveam:run! …))
To demonstrate the difference in the reports assume the following test that has a couple of passes and a couple of failures. We will insert a diagnostic string in the second assertion with a couple of variables to use in the string.
(test t1-fail "describe t1-fail" (let ((x 1) (y 2)) (is (eql 1 2)) (is (equal x y) "We deliberately ensured that the first parameters ~a is not equal to the second parameter ~a" x y) (is-false (eq 'b 'b)) (pass "I do not want to run this test of ~a but will say it passes anyway" '(= 1 1)) (skip "Skip the next test because reasons") (finishes (+ 1 2)) (signals division-by-zero (error 'floating-point-overflow))))
Now using the simple
run
, we get:(run 't1-fail) Running test T1-FAIL fff.ss.X (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {10023556C3}> #<IT.BESE.FIVEAM::TEST-PASSED {10023550B3}> #<IT.BESE.FIVEAM::TEST-SKIPPED {1002354F43}> #<IT.BESE.FIVEAM::TEST-PASSED {1002354CF3}> #<IT.BESE.FIVEAM::TEST-FAILURE {1002354043}> #<IT.BESE.FIVEAM::TEST-FAILURE {1002353463}> #<IT.BESE.FIVEAM::TEST-FAILURE {1002352D03}>)
We can immediately see that
(run)
gives us a progress report showing f for failure, . for pass, s for skip and X for an error and a list of test-result objects. We can get more details, including the diagnostic messages using(run!)
.(run! 't1-fail) Running test T1-FAIL fff.ss.X Did 7 checks. Pass: 2 (28%) Skip: 1 (14%) Fail: 4 (57%) Failure Details: -------------------------------- T1-FAIL in S0 [describe t1-fail]: 2 evaluated to 2 which is not EQL to 1 -------------------------------- -------------------------------- T1-FAIL in S0 []: We deliberately ensured that the first parameters 1 is not equal to the second parameter 2 -------------------------------- -------------------------------- T1-FAIL in S0 []: (EQ 'B 'B) returned the value T, which is true -------------------------------- -------------------------------- T1-FAIL in S0 []: Unexpected Error: #<FLOATING-POINT-OVERFLOW {100236BE63}> arithmetic error FLOATING-POINT-OVERFLOW signalled. -------------------------------- Skip Details: T1-FAIL []: Skip the next test because reasons NIL (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {100236C3C3}> #<IT.BESE.FIVEAM::TEST-FAILURE {100236AD43}> #<IT.BESE.FIVEAM::TEST-FAILURE {100236A163}> #<IT.BESE.FIVEAM::TEST-FAILURE {1002369D33}>) (#<IT.BESE.FIVEAM::TEST-SKIPPED {100236BC43}>)
Did you notice anything about the test results returned using
run
compared torun!
?run!
did not return any test-passed resultsPersonally I hate the immense amount of wasted space fiveam generates using
run!
.By the way, if we set
*verbose-failures*
to T, it will add the failing expression to the failure details.Fiveam does have optionality to drop into the debugger on errors or failures. You can set those individually:
(setf *on-error* :debug)
if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.(setf *on-failure* :debug)
if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.
- Basics
We already saw a test using Fiveam's testing functions above with some passes and fails above, so we will skip repeating ourselves.
As expected, you do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Values expressions, loops, closures and calling other tests
Now let's try a test with values expressions:
(test t2 ; the most basic named test with multiple assertions and values expressions "describe t2" (let ((x 1) (y 2)) (is (equal 1 2)) (is (equal x y)) (is (equal (values 1 2) (values 1 2))))) ; in: ALEXANDRIA:NAMED-LAMBDA %TEST-T2 ; (IT.BESE.FIVEAM:IS (EQUAL (VALUES 1 2) (VALUES 1 2))) ; ; caught ERROR: ; during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept. ; ; Both the expected and actual part is a values expression. ; ; compilation unit finished ; caught 1 ERROR condition
Fiveam threw an error on the values expression on compilation, but continued with the compiliation.
Now to run the failing test.
(run! 't2) Running test T2 ffX Did 3 checks. Pass: 0 ( 0%) Skip: 0 ( 0%) Fail: 3 (100%) Failure Details: -------------------------------- T2 [describe t2]: 2 evaluated to 2 which is not EQUAL to 1 -------------------------------- -------------------------------- T2 [describe t2]: Y evaluated to 2 which is not EQUAL to 1 -------------------------------- -------------------------------- T2 [describe t2]: Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {102D38B283}> Execution of a form compiled with errors. Form: (IS (EQUAL (VALUES 1 2) (VALUES 1 2))) Compile-time error: during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept. Both the expected and actual part is a values expression.. -------------------------------- NIL (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {102D38BB73}> #<IT.BESE.FIVEAM::TEST-FAILURE {102D38B1B3}> #<IT.BESE.FIVEAM::TEST-FAILURE {102D38AD93}>) NIL
Notice the ffX in the report. The f indicate failing assertions. The X indicates that the assertion threw an error instead of failing. No, fiveam does not like value expressions.
What happens if we try to call a test inside a test?
(test t3 "a test that tries to call another test in its body" (is (equal 'a 'a)) (run! t2)) (run! 't3) Running test T3 . Running test T2 ..X Did 3 checks. Pass: 2 (66%) Skip: 0 ( 0%) Fail: 1 (33%) Failure Details: -------------------------------- T2 in S1 [describe t2]: Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {1008EE35A3}> Execution of a form compiled with errors. Form: (IS (EQUAL (VALUES 1 2) (VALUES 1 2))) Compile-time error: during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept. Both the expected and actual part is a values expression.. -------------------------------- Did 1 check. Pass: 1 (100%) Skip: 0 ( 0%) Fail: 0 ( 0%) T
So we can run tests within tests.
- Suites, tags and other multiple test abilities
- Lists of tests
Fiveam can run lists of tests
(run! '(t1 t2))
- Suites
Suites are relatively straight forward in fiveam so long as you remember that you need to define yourself as being in a suite and any test defined after that will be placed in that suite. I surprised myself once after compiling a test file and then defining some tests in the REPL. As far as fiveam was concerned I was still in the suite defined in the test file, so the the tests defined in the REPL had been added to the suite.
(def-suite :s0 ; Ultimate parent suite :description "describe suite 0") (in-suite :s0) ;; Any test defined after this will be in suite s0 until a new suite is specified (test t4 ; a test that is a member of a suite "describe t4" (is (equal 1 1))) (run! :s0)
Suites can be nested. Here we have suite :s1 that is nested in suite :s0
(def-suite :s1 :in :s0)
- Lists of tests
- Fixtures and Freezing Data
As far as I can tell, fixtures and freezing data are basically the same for Fiveam. The fiveam maintainer admits that maybe it's fixture capability is not "the best designed feature".
;; Create a class for data fixture purposes (defclass class-A () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (defparameter *some-existing-data-parameter* (make-instance 'class-A :a 17.3 :b -12)) (def-fixture f1 () (let ((old-parameter *some-existing-data-parameter*)) (setf *some-existing-data-parameter* (make-instance 'class-A :a 100 :b -100)) (&body) (setf *some-existing-data-parameter* old-parameter))) (def-test t6-f1 (:fixture f1) (is (equal (a *some-existing-data-parameter*) 100)) (is (equal (b *some-existing-data-parameter*) -100))) ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter* (run! 't6-f1) Running test T6-F1 .. Did 2 checks. Pass: 2 (100%) Skip: 0 ( 0%) Fail: 0 ( 0%)
- Removing tests
Fiveam has the functions
rem-test
andrem-fixture
- Sequencing, Random and Failure Only
The tests are randomly shuffled. The
run!
function will return a list of failed-test objects (therun
function does not). - Skip Capability
Fiveam does some skip capability
- Random Testing and Data Generators
Fiveam Generates lambda functions for buffers, character, float, integer, list, one-element, string and tree character, string and lists. Some examples:
(funcall (gen-float)) 1.3259344e38 (funcall (gen-buffer)) #(115 238 129 72 84 40 230) (funcall (gen-character :code-limit 256)) #\Etx (funcall (gen-integer :max 27 :min -16)) -4 (funcall (gen-list )) (-1 4) (funcall (gen-string)) "𣻨𪽉鞀蠿𬚽𬁬㍨𠫼㽈鉄" (funcall (gen-string :elements (gen-character :code-limit 122 :alphanumericp t))) "exAarlUllrgsQZQAnUYeKIbZQuPYAKNLvTyMcIYlLoYS" (funcall (gen-tree :size 10)) ((((-2 ((-3 6) (2 ((3 (6 (10 10))) ((10 4) -9))))) (-8 -8)) (((-7 8) -3) (-10 ((((1 -5) (6 ((-9 -6) 4))) ((5 -9) (0 (-4 -8)))) -2)))) (((((9 (5 ((3 -1) ((0 -10) -5)))) (((4 (7 -8)) (-5 (6 7))) -4)) -3) (6 (2 ((-5 6) (2 (((9 -1) -5) -5)))))) 6))
20.4. Discussion
If you recall, run
returns a list of all test-result objects, but run!
returns just the failing test-result objects. If you wanted to use run
, but just wanted a list of the failing test names, you can do something like the following:
(defun collect-failing-test-case-names (suite) "Takes a suite, calls the run function and returns a list of the test names that failed." (loop for x in (run suite) when (typep x 'fiveam::test-failure) collect (fiveam::name (fiveam::test-case x)))) (collect-failing-test-case-names :s0) Running test suite S0 Running test T4 . Running test T4-ERROR ..f Running test T4-FAIL f Running test T6-F1 .. Running test T5 f Running test T4-FAIL-2 f (T4-ERROR T4-FAIL T5 T4-FAIL-2)
Fiveam does not have a time limit threshold that you can set like Parachute or Prove, but you can set a *max-trials*
variable to prevent infinite loops. It also has undocumented profiling capability that I did not look at.
21. lift
21.1. Summary
homepage | Gary Warren King | MIT | 2019 (c) |
Documentation for Lift can be found here, but a lot of sections are "To be written".
The original Phil Gold review noted his concerns about speed and memory footprint: "The larger problem, though, was its speed and memory footprint. Defining tests is very slow; when using LIFT, the time necessary to compile and load all of my Project Euler code jumped from the other frameworks' average of about 1.5 minutes to over nine minutes. Redefining tests felt even slower than defining them initially, but I don't have solid numbers on that. After loading everything, memory usage was more than twice that of other frameworks. Running all of the tests took more than a minute longer than other frameworks, though that seems mostly to be a result of swapping induced by LIFT's greater memory requirements."
I did not benchmark compiling tests, just running the tests and as you can see from the benchmarks, lift is one of the fastest frameworks. His concern on runtime was not borne out in my benchmark, but uax-15 is also very different from Project Euler. YMMV.
There is a lot I like about Lift and there are undocumented features that you could spent a few days exploring.
There are two annoyances.
- Multiple assertion problem:. If you have multiple assertions in a test, the tests stop at the first assertion failure. I can understand that if the intent is to get thrown into the debugger and fix the failure immediately, but not when you are running reports. There are reasons why I would put multiple assertions into a test. The obvious work around is only one assertion per test. You then have to create possibly hundreds of suites and then use
addtest
to add the test to your suite. - Clumsy failure reporting
Lift has both hierarchical suites and tags, what it calls categories.
21.2. Assertion Functions
ensure | ensure-cases |
ensure-cases-failure | ensure-condition |
ensure-different | ensure-directories-exist |
ensure-directory | ensure-error |
ensure-every | ensure-expected-condition |
ensure-expected-no-warning-condition | ensure-failed |
ensure-failed-error | ensure-function |
ensure-generic-function | ensure-list |
ensure-member | ensure-no-warning |
ensure-not-same | ensure-null |
ensure-null-failed-error | ensure-random-cases |
ensure-random-cases+ | ensure-random-cases-failure |
ensure-same | ensure-some |
ensure-string | ensure-symbol |
ensure-warning |
21.3. Usage
Unlike most frameworks, lift provides variables for *test-maximum-error-count*
, *test-maximum-failure-count*
and *test-maximum-time*
which can be set so that large sets of failing tests can be shutdown down early without wasting time.
Lift has a lot of undocumented functionality. For example, you can generate log-entries which seems to have something to do with sample counts and profiling, but those have no documentation either.
- Report Format
Like some other frameworks, Lift will run tests as you compile them. If you use
run-test
(singular) orrun-tests
(plural), you get a very limited amount of information that will look something like this:(run-test :name 't4-s1-1) #<S1.T4-S1-1 failed> (run-tests) Start: S0 #<Results for S0 1 Tests, 1 Failure>
If you
(setf *test-describe-if-not-successful?* t)
you get a lot more information, but you are really just running(describe)
on the test result object. We run only the single test version this time:(run-test :name 't4-s1-1) #<S1.T4-S1-1 failed Failure: s1 : t4-s1-1 Documentation: NIL Source : /tmp/slimeBSclUz Condition : Ensure failed: (= 2 3) () During : (END-TEST) Code : ( ((ENSURE (= 2 3)) (ENSURE (= 4 4)) (ENSURE (EQL 'B 'C)))) >
Lift can print also test result details. The first parameter is the stream to which to direct the output, the second parameter is getting the test result. The third parameter is show-expected-p and the fourth is show-code-p. All parameters must be provided.
(print-test-result-details *standard-output* (run-test :name 't1-fail) t t) Failure: tf-lift : t1-fail Documentation: NIL Source : NIL Condition : Ensure failed: (= X Y) (This test was meant to fail because 1 is not = 2) During : (END-TEST) Code : ( ((LET ((X 1) (Y 2)) (ENSURE (= X Y) :REPORT "This test was meant to fail because ~a is not = ~a" :ARGUMENTS (X Y)))))
The function
run-tests
takes a keyword parameter:report-pathname
which will direct a substantial amount of information to the designated file. The following example runs all the tests associated with suite s0 (either directly or indirectly). You can also set the variable*lift-report-pathname*
to a pathname. Any subsequent failure reports will be printed there. Lift does not have progress reports if that is important to you.(run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
Opening that file may show results looking something like:
((:RESULTS-FOR . S0) (:ARGUMENTS . (:SUITE ("S0" . "TF-LIFT") :REPORT-PATHNAME #P"/tmp/lift-1.txt")) (:FEATURES . (:HUNCHENTOOT-SBCL-DEBUG-PRINT-VARIABLE-ALIST :5AM :OSICAT-FD-STREAMS :ITER :NAMED-READTABLES :UTF-32 :TOOT :SBCL-DEBUG-PRINT-VARIABLE-ALIST :SPLIT-SEQUENCE CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX :CFFI CFFI-SYS::FLAT-NAMESPACE :FLEXI-STREAMS :CL-FAD :CHUNGA :LISP-UNIT :CLOSER-MOP :CL-PPCRE :BORDEAUX-THREADS ALEXANDRIA::SEQUENCE-EMPTYP :THREAD-SUPPORT :SWANK :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :GENCGC :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS :SB-THREAD :SB-UNICODE :SBCL :UNIX)) (:DATETIME . 3831119803) ) ( (:SUITE . ("S0" . "TF-LIFT")) (:NAME . ("T4-S0-1" . "TF-LIFT")) (:START-TIME . 3831119803000) (:END-TIME . 3831119803000) (:SECONDS . 0.0d0) (:CONSES . 0) (:RESULT . T) ) ( (:SUITE . ("S0" . "TF-LIFT")) (:NAME . ("T4-S0-2" . "TF-LIFT")) (:START-TIME . 3831119803000) (:END-TIME . 3831119803000) (:PROBLEM-KIND . "failure") (:PROBLEM-STEP . :END-TEST) (:PROBLEM-CONDITION . "#<ENSURE-FAILED-ERROR {100DA7E143}>") (:PROBLEM-CONDITION-DESCRIPTION . "Ensure failed: (= 1 2) ()") ) ( (:TEST-CASE-COUNT . 2) (:TEST-SUITE-COUNT . 1) (:FAILURE-COUNT . 1) (:ERROR-COUNT . 0) (:EXPECTED-FAILURE-COUNT . 0) (:EXPECTED-ERROR-COUNT . 0) (:SKIPPED-TESTSUITES-COUNT . 0) (:SKIPPED-TEST-CASES-COUNT . 0) (:START-TIME-UNIVERSAL . 3831119803) (:END-TIME-UNIVERSAL . 3831119803) (:FAILURES . ((("S0" . "TF-LIFT") ("T4-S0-2" . "TF-LIFT")))))
Adding a test and compiling it will, as noted, cause it to be run immediately, but all you get is pass or fail. Let's try to get a little more information by using
describe *
.(addtest (s1) t4-s1-4 (ensure (= 3 4))) #<Test failed> (describe *) Test Report for S1: 1 test run, 1 Failure. Failure: s1 : t4-s1-4 Documentation: NIL Source : NIL Condition : Ensure failed: (= 3 4) () During : (END-TEST) Code : ( ((ENSURE (= 3 4)))) Test Report for S1: 1 test run, 1 Failure.
I would note that in tests with multiple assertions, Lift only shows the first failure, not all failures. That is a real problem for me because I want the results of multiple assertions if I am tracking down one of my many bugs.
To go interactive - dropping immediately into the debugger, you would set one or more key word parameter based on what condition should throw you into the debugger. The following will throw you into the debugger on failures but not on errors.
(run-test :name 't1-function :break-on-errors? nil :break-on-failures? t)
- Basics
Lift really wants a suite to be defined first before any tests are defined. The first form after the suite name would contain the name of a parent suite (if any). The second form would be used for suite slot specifications which are used with fixtures and will be discussed below.
(deftestsuite tf-lift () ())
Starting with the most basic named test. This adds a test to the most recently defined suite or you can insert a form before the test name which specifies which suite for the test
(addtest t1 (ensure (equal 1 1)) (ensure-condition division-by-zero (error 'division-by-zero)) (ensure-same 1 1) (ensure-different '(1 2 3) '(1 3 4)))
Besides running the test on compilation, we can also run the test using the (run-test function), specifying the test name with the keyword parameter :name.
(run-test :name 't1) #<TF-LIFT.T1 passed>
Now adding a basic failing test. Let's make sure we get a bit more information on failing tests by inserting a report keyword with a descriptive string with format like parameters and an arguments keyword with parameters to pass to the report keyword. Then we use
describe
against the results of running the test.(addtest t1-fail (let ((x 1) (y 2)) (ensure (= x y) :report "This test was meant to fail because ~a is not = ~a" :arguments (x y)))) (describe (run-test :name 't1-fail))) Test Report for TF-LIFT: 1 test run, 1 Failure. Failure: tf-lift : t1-fail Documentation: NIL Source : NIL Condition : Ensure failed: (= X Y) (This test was meant to fail because 1 is not = 2) During : (END-TEST) Code : ( ((LET ((X 1) (Y 2)) (ENSURE (= X Y) :REPORT "This test was meant to fail because ~a is not = ~a" :ARGUMENTS (X Y))))) Test Report for TF-LIFT: 1 test run, 1 Failure.
As one would hope, you do not need to manually recompile a test just because a tested function is modified.
- Edge Cases: Multiple failing assertions, Values expressions, loops, closures and calling other tests
- Multiple assertions and Value expressions
First checking a test with multiple assertions. The answer is yes and no, which surprised me.
(addtest t2 (ensure (= 1 2)) (ensure (= 2 3))) (print-test-result-details *standard-output* (run-test :name 't2) t t) Failure: tf-lift : t2 Documentation: NIL Source : NIL Condition : Ensure failed: (= 1 2) () During : (END-TEST) Code : ( ((ENSURE (= 1 2)) (ENSURE (= 2 3))))
Obviously we expected the test to fail. But I expected two assertions to be shown as failing, not only the first one. I can understand that if the intent is to just throw me into the debugger on the first failure, but not in a reporting situation.
Lift has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.
- Now looping and closures.
Checking whether Lift can run tests using variables declared in a closure encompassing the test. Yes.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (addtest t2-loop (loop for x in l1 for y in l2 do (ensure (= (char-code x) y))))) #<Test passed>
- Calling another test from a test
We know tests are not functions in Lift, but can a test call another test in its body? We know test t2 should fail.
(addtest t3 (ensure (eql 'a 'a)) (run-test :name 't2)) (run-test :name 't3) #<TF-LIFT.T3 passed>
It does not look like t3 actually called t2.
- Multiple assertions and Value expressions
- Suites, tags and other multiple test abilities
- Lists of tests
Lift cannot run lists of tests outside the suite functionality.
- Suites
We already know that we had to set up a suite for tests, in our case a suite named tf-lift (print-tests)
Unlike most test frameworks, lift actually provides a function which will print out the names of the tests included in the suite.
(print-tests :start-at 'tf-lift) TF-LIFT (10) T1 T1-FAIL T1-FUNCTION T2 T2-VALUES T2-LOOP T2-WITH-MULTIPLE-VALUES T3 T7-ERROR T7-BAD-ERROR
Let's start with the same simple inheritance structure we have been using with other frameworks.
(deftestsuite s0 () ()) ;; a test that is a member of a suite because it is defined after a defsuite (addtest t4-s0-1 (ensure-same 1 1)) ;; Add another test suite (deftestsuite s1 () ()) ;; add another test, but preface the name with s0 in the form, making this test part of suite s0 (addtest (s0) t4-s0-2 (ensure (= 1 2))) ;; add a test, specifying that this one is part of suite s1 (addtest (s1) t4-s1-1 (ensure (= 2 3))) ;; Now run tests for suite s0 and s1 respectively and we see that s0 does indeed have two tests and suite s1 has one test. (run-tests :suite 's0) Start: S0 #<Results for S0 2 Tests, 1 Failure> (run-tests :suite 's1) Start: S1 #<Results for S1 1 Test, 1 Failure>
We now define suite s2 which is a child suite of s0 and add a test
(deftestsuite s2 (s0) ()) (addtest (s2) t4-s2-1 (ensure (= 1 1)) (ensure (eq 'a 'a)))
If we now apply RUN-TESTS to suite s0, we see that it runs both s2 and s0
(run-tests :suite 's0) Start: S0 Start: S2 #<Results for S0 3 Tests, 1 Failure>
If we run it with a :report-pathname keyword parameter set, we can get a lot more information sent to a file:
(run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
- Lists of tests
- Fixtures and Freezing Data
Variables can be created and set at the suite level, making those variables available down the suite inheritance chain.
(deftestsuite s3 () ((a 1) (b 2) (c 3))) (addtest t4-s3-1 (ensure-same 1 a)) (deftestsuite s4 (s3) ((d 4))) (addtest t4-s4-1 (ensure-same 2 b) (ensure-same 4 d)) (addtest (s3) t4-s3-2 (ensure-same 2 b))
These all pass because the tests can see the variables creates in their suite and the parent suites. If we created a test in suite s3 that tried to reference the variable d creates in suite s4 (a lower level suite), we would get an undefined variable error.
Suites also have :setup and :takedown keyword parameters and an additional :run-setup parameter that controls when the setup provisions are performed. The default is :once-per-test-case (setup again for each and every test in the suite). The other alternatives are :once-per-suite and :never.
(deftestsuite s5 ()() (:setup (setf db (open-data "bar" :if-exists :supersede))) (:teardown (setf db nil)) (:run-setup :once-per-test-case))
- Removing tests
Tests and suites can be removed using
remove-tests
(remove-test :suite 's2) (remove-test :test-case 't1)
- Sequencing, Random and Failure Only
Do the tests in a suite run in sequential order, randomly or is it optional? Failure only testing (just running all the tests that failed last time) is nice to have, but of course you still need to be able to run everything at the end to ensure that fixing one bug did not create another.
- Skip Capability
- Random Data Generators
Lift has various random data generators:
;; (random-number suite min max) (random-number 's4 1 100) ;; (random-element suite sequence) (random-element 's4 '(a b c 23))
If anyone can give a good example of the use of the
DEFRANDOM-INSTANCE
macro besides what is in the random-testing file, feel free to submit a pull request.
21.4. Discussion
I really want to like lift, but I really have a hard time getting over the fact that it stops at the first assertion failure.
21.5. Who Uses Lift
Many libraries on quicklisp use Lift. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :lift)
22. lisp-unit
22.1. Summary
homepage | Thomas M. Hermann | MIT | 2017 |
Phil Gold's original concerns about lisp-unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities. Lisp-unit still focuses on counting assertions rather than tests, but you can now collect information on just the failing tests.
I generally like lisp-unit. It does not have progress reports which might bother some people. My bigger concern is its lack of fixtures and you can turn on debugging only for errors, not for failures. If you need floating point tests, those are built-in. Documentation can be found at the wiki. top
22.2. Assertion Functions
assert-eq | assert-eql | assert-equal |
assert-equality | assert-equalp | assert-error |
assert-expands | assert-false | assert-float-equal |
assert-nil | assert-norm-equal | assert-number-equal |
assert-numerical-equal | assert-prints | assert-rational-equal |
assert-result | assert-sigfig-equal | assert-test |
assert-true | check-type | logically-equal |
set-equal |
22.3. Usage
- Report Format
Lisp-unit defaults to a reporting format shown below. You can do
(setf *use-debugger* :ask)
or(setf *use-debugger* t)
, but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error). So, not complete debugger optionality.Lisp-unit will normally just count assertions passed, failed, and execution errors and report those. You will see in the first failing test examples how to get more information. You can hav also have it kick you into the debugger on errors by calling
use-debugger
. This only applies to errors and not failures.Calling
run-tests
will return an instance of a test-results-db object. You can get a list of failed test objects with thefailed-tests
function which also accepts an optional stream, allowing easy printing to a file:(failed-tests (run-tests :all) optional-stream)
You can print the detailed failure information using the
print-failures
function:(print-failures (run-tests :all))
Lisp-unit also has
print
andprint-errors
which also take an optional stream.If you like the TAP format, Lisp-unit also has
(write-tap-to-file test-results path)
and(write-tap test-results [stream])
. - Basics
First, the basic test where we know everything is going to pass. Since Lisp-unit has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/floating-point.lisp and https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/rational.lisp for more information on the floating point and rational tests.
(defmacro my-macro (arg1 arg2) (let ((g1 (gensym)) (g2 (gensym))) `(let ((,g1 ,arg1) (,g2 ,arg2)) "Start" (+ ,g1 ,g2 3)))) (define-test t1 "describe t1" (assert-true (= 1 1)) (assert-equal 1 1) (assert-float-equal 17 17.0000d0) (assert-rational-equal 3/2 3/2) (assert-true (set-equal '(a b c) '(b a c))) ;every element in both sets needs to be in the other (assert-true (logically-equal t t)) ; both true or both false (assert-true (logically-equal nil nil)) ; both true or both false (assert-expands (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3)) (my-macro a b)) (assert-prints "12" (format t "~a" 12)) (assert-error 'division-by-zero (error 'division-by-zero) "testing condition assertions"))
Now run this test:
(run-tests '(t1)) Unit Test Summary | 10 assertions total | 10 passed | 0 failed | 0 execution errors | 0 missing tests #<TEST-RESULTS-DB Total(6) Passed(6) Failed(0) Errors(0)>
Now a basic failing test.
(define-test t1-fail "describe t1-fail" (let ((x 1) (y 2)) (assert-true (= x y)) (assert-equal x y) (assert-expands (let ((#:G1 D) (#:G2 B)) "Start" (+ #:G1 #:G2 3)) (my-macro a b)) (assert-prints "12" (format nil "~a" 12)) (assert-error 'division-by-zero (error 'floating-point-overflow) "testing condition assertions"))) (run-tests '(t1-fail)) Unit Test Summary | 5 assertions total | 0 passed | 5 failed | 0 execution errors | 0 missing tests
That told us assertions failed, but did not give a lot of information. Let's change the setup slightly, setting
*PRINT-FAILURES*
to t. (You can also print info just on errors by setting*PRINT-FAILURES*
to nil and*PRINT-ERRORS*
to t.)(setf *print-failures* t) (run-tests '(t1-fail)) | Failed Form: (ERROR 'FLOATING-POINT-OVERFLOW) | Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100E6D4D13}> | "testing condition assertions" => "testing condition assertions" | | Failed Form: (FORMAT NIL "~a" 12) | Should have printed "12" but saw "" | | Failed Form: (MY-MACRO A B) | Should have expanded to (LET ((#:G1 D) (#:G2 B)) "Start" (+ #:G1 #:G2 3)) but saw (LET ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3)); T | | Failed Form: Y | Expected 1 but saw 2 | | Failed Form: (= X Y) | Expected T but saw NIL | X => 1 | Y => 2 | T1-FAIL: 0 assertions passed, 5 failed. Unit Test Summary | 5 assertions total | 0 passed | 5 failed | 0 execution errors | 0 missing tests #<TEST-RESULTS-DB Total(5) Passed(0) Failed(5) Errors(0)>
That gives more information, but notice the slight difference between the information provided for assert-equal - Failed Form: Y and the information provided for assert-true - Failed Form: (= X Y).
We can get still more if we pass more info to the assertion clause. While the assertions compares the following two items, we can pass more information that it will print on failures. Unlike some other frameworks, we cannot pass a diagnostic string which accepts interpolated variables, but we can pass a string and variables. This time we will reduce the test to just the assert-equal clause.
(define-test t1-fail-short (let ((x 1) (y 2)) (assert-equal x y "Diagnostic Message: X ~a should equal Y ~a" x y))) (run-tests '(t1-fail-short)) | Failed Form: Y | Expected 1 but saw 2 | "Diagnostic Message: X should equal Y" => "Diagnostic Message: X should equal Y" | X => 1 | Y => 2 | T1-FAIL: 0 assertions passed, 1 failed. Unit Test Summary | 1 assertions total | 0 passed | 1 failed | 0 execution errors | 0 missing tests
Of course, the usefulness of the diagnostic message will depend on the context.
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Loops. closures and calling other tests
- Value expressions
Lisp-Unit has a pleasant surprise with respect to values expressions. Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:
(define-test t2-values-expressions (assert-equal (values 1 2) (values 1 3)) (assert-equal (values 1 2 3) (values 1 3 2))) (print-failures (run-tests '(t2-values-expressions))) Unit Test Summary | 2 assertions total | 0 passed | 2 failed | 0 execution errors | 0 missing tests | Failed Form: (VALUES 1 3 2) | Expected 1; 2; 3 but saw 1; 3; 2 | | Failed Form: (VALUES 1 3) | Expected 1; 2 but saw 1; 3 | T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
- Closure Variables
Lisp-Unit will not see the variables declared in a closure surrounding the test function, so the following would fail.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (define-test t2-loop (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
- Calling another test from a test
While tests are not functions in lisp-unit, they can call other tests.
(define-test t3 (); a test that tries to call another test in its body "describe t3" (assert-equal 'a 'a) (run-tests '(t2))) T3 LISP-UNIT-EXAMPLES> (run-tests '(t3)) | Failed Form: 3 | Expected 2 but saw 3 | | Failed Form: 2 | Expected 1 but saw 2 | T2: 1 assertions passed, 2 failed. Unit Test Summary | 3 assertions total | 1 passed | 2 failed | 0 execution errors | 0 missing tests T3: 0 assertions passed, 0 failed. Unit Test Summary | 0 assertions total | 0 passed | 0 failed | 0 execution errors | 0 missing tests
A bit of a surprise here. Test t3 does call test t2 but does not track its own assertion. If you reverse the order so that t3's assertions come after the call to run-tests on t2, then it does work properly. In neither case are the situations composed.
- Value expressions
- Suites, tags and other multiple test abilities
Lisp-unit uses both packages and tags rather than suites. That provides a bit more flexibility in terms of reusing tests in different situations, but does not create the automatic inheritance that some people like.
- Lists of tests
Lisp-unit can run lists of tests
(run-tests '(t1 t2))
Lisp-unit makes it easy to get a list of the names of the failing tests which you can then save and
run-tests
against.Run-tests
returns atest-results-db
object. Just callfailed-tests
on that to get a list of the names of the tests that failed in that run. Thenrun-tests
against that smaller list. - Packages
Assuming you have set up your tests in a separate package (that package can cover many application packages) and that test package is the current package, you can run all the tests in the current package with:
(run-tests :all)
The reporting can get confusing as in this sample run with
*print-failures*
set to nil:(run-tests :all) Diagnostic Message: X 1 should equal Y 2 Unit Test Summary | 3 assertions total | 1 passed | 2 failed | 0 execution errors | 0 missing tests Unit Test Summary | 13 assertions total | 6 passed | 7 failed | 0 execution errors | 0 missing tests #<TEST-RESULTS-DB Total(13) Passed(6) Failed(7) Errors(0)>
Why do we have two Unit Test Summaries having different numbers of tests and assertions? If you recall, test t3 calls test t2 and that first summary is a secondary summary from that call.
To run all the tests in a non-current package, add the name of the package after the keyword parameter :all
(lisp-unit:run-tests :all :date-tests)
You can list the names of all the tests in a package
(list-tests [package])
- Tags
As noted, lisp-unit provides the ability to define tests with multiple tags:
(define-test foo "This is the documentation." (:tag :tag1 :tag2 symtag) exp1 exp2 ...)
So assume three tests that we want tagged differently:
(define-test t6-1 "Test t6-1 tagged simple and complex" (:tag :simple :complex) (assert-true (= 1 1 1))) (define-test t6-2 "Test t6-2 tagged simple only" (:tag :simple) (assert-equal 1 1)) (define-test t6-3 "Test t6-2 tagged simple only" (:tag :complex) (assert-equal 'a 'a))
Then using
run-tages
does what we expect. We will set*print-summary*
t for simplicity.(setf *print-summary* t) (run-tags '(:simple)) T6-2: 1 assertions passed, 0 failed. T6-1: 1 assertions passed, 0 failed. Unit Test Summary | 2 assertions total | 2 passed | 0 failed | 0 execution errors | 0 missing tests #<TEST-RESULTS-DB Total(2) Passed(2) Failed(0) Errors(0)> LISP-UNIT> (run-tags '(:complex)) T6-3: 1 assertions passed, 0 failed. T6-1: 1 assertions passed, 0 failed. Unit Test Summary | 2 assertions total | 2 passed | 0 failed | 0 execution errors | 0 missing tests
Tags can be listed with
(LIST-TAGS [PACKAGE])
.TAGGED-TESTS
returns the tests associated with the listed tags. All tagged tests are returned with no arguments or if the keyword :all is provided instead of a list of tags. Use*package*
if package is not specified.(tagged-tests '(tag1 tag2 ...) [package]) (tagged-tests :all [package]) (tagged-tests)
- Lists of tests
- Fixtures and Freezing Data
None
- Removing tests
Lisp-unit has both
remove-tests
andremove-tags
functions. - Sequencing, Random and Failure Only
- Skip Capability
None
- Random Data Generators
Lisp-unit2 has various functions for generating random data. See examples below:
(complex-random #C(5 3)) #C(4 1) (make-random-2d-array 2 3) #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425)) (make-random-2d-list 2 3) ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146)) (make-random-list 3) (0.5449568 0.32319236 0.7780224) (make-random-state) #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32) :INITIAL-CONTENTS '(0 2567483615 454 2531281407 4203062579 3352536227 284404050 622556438 ...)))
22.4. Who Uses Lisp-Unit
Many libraries on quicklisp use Lisp-Unit. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :lisp-unit)
23. lisp-unit2
23.1. Summary
homepage | Russ Tyndall | MIT | 2018 |
I really like Lisp-Unit2. Phil Gold's original concerns about Lisp-Unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities.
Unlike the situation with Clunit and Clunit2 which are almost identical, Lisp-Unit and Lisp-Unit2 have definitely diverged over the years. Lisp-unit2 has fixtures and can run just the previously failing tests and you can turn on debugging for failures as well as errors. It does not have progress reports which might bother some people. If you need floating point tests, those are built-in.
It will report all the assertion failures in a test, gives the opportunity to provide user generated diagnostic messages in assertions and has a tags system that allows different ways to re-use tests not found in the typical hierarchical setup. It can re-run failed tests and interactive debugging is optional. I did find the fixture structure to be confusing.
23.2. Assertion Functions
Lisp-unit2 has more assertions than just about anyone. Using the correct assertions will increase speed and efficiency. Whether that matters will, of course, depend on your test case.
assert= | assert/= | |
asssert-char= | assert-char-equal | assert-char/= |
assert-char-not-equal | ||
assert-eq | assert-eql | assert-equal |
assert-equality | assert-equalp | assert-error |
assert-expands | assert-fail | assert-false |
assert-float-equal | assert-no-error | assert-no-signal |
assert-no-warning | assert-norm-equal | assert-number-equal |
assert-numerical-equal | assert-passes? | assert-prints |
assert-rational-equal | assert-sigfig-equal | assert-signal |
assert-string= | assert-string-equal | assert-string/= |
assert-string-not-equal | ||
assert-true | assert-typep | assert-warning |
assertion-fail | assertion-pass | check-type |
logically-equal |
23.3. Usage
If you store the results of a test run, you can call rerun-failures
on those results to just rerun the failing tests rather than go through all the tests again.
- Report Format
You can choose to run one test or more than one test. When running more than one test, you can choose to have the summary provided at the end or have progress summary reports generated during the test run.
As an example, the following will generate summary reports during a test run:
(run-tests :package :uax-15-lisp-unit2-tests :run-contexts #'with-summary-context) ------- STARTING Testing: UAX-15-LISP-UNIT2-TESTS Starting: UAX-15-LISP-UNIT2-TESTS::PART0-NFKC UAX-15-LISP-UNIT2-TESTS::PART0-NFKC - PASSED (0.01s) : 100 assertions passed Starting: UAX-15-LISP-UNIT2-TESTS::PART1-NFKC UAX-15-LISP-UNIT2-TESTS::PART1-NFKC - PASSED (0.29s) : 68116 assertions passed ..... Starting: UAX-15-LISP-UNIT2-TESTS::PART2-NFD UAX-15-LISP-UNIT2-TESTS::PART2-NFD - PASSED (0.04s) : 9220 assertions passed Starting: UAX-15-LISP-UNIT2-TESTS::PART3-NFD UAX-15-LISP-UNIT2-TESTS::PART3-NFD - PASSED (0.01s) : 880 assertions passed Test Summary for :UAX-15-LISP-UNIT2-TESTS (16 tests 1.08 sec) | 343332 assertions total | 343332 passed | 0 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests ------- ENDING Testing: UAX-15-LISP-UNIT2-TESTS
Wrap it with print summary and Remove the internal call to with-summary-context will put the summary at the end:
(print-summary (run-tests :package :uax-15-lisp-unit2-tests)) UAX-15-LISP-UNIT2-TESTS::PART0-NFKC - PASSED (0.01s) : 100 assertions passed UAX-15-LISP-UNIT2-TESTS::PART1-NFKC - PASSED (0.18s) : 68116 assertions passed UAX-15-LISP-UNIT2-TESTS::PART2-NFKC - PASSED (0.04s) : 7376 assertions passed UAX-15-LISP-UNIT2-TESTS::PART3-NFKC - PASSED (0.01s) : 704 assertions passed UAX-15-LISP-UNIT2-TESTS::PART0-NFKD - PASSED (0.00s) : 100 assertions passed UAX-15-LISP-UNIT2-TESTS::PART1-NFKD - PASSED (0.16s) : 68116 assertions passed UAX-15-LISP-UNIT2-TESTS::PART2-NFKD - PASSED (0.04s) : 7376 assertions passed UAX-15-LISP-UNIT2-TESTS::PART3-NFKD - PASSED (0.00s) : 704 assertions passed UAX-15-LISP-UNIT2-TESTS::PART0-NFC - PASSED (0.01s) : 125 assertions passed UAX-15-LISP-UNIT2-TESTS::PART1-NFC - PASSED (0.22s) : 85145 assertions passed UAX-15-LISP-UNIT2-TESTS::PART2-NFC - PASSED (0.05s) : 9220 assertions passed UAX-15-LISP-UNIT2-TESTS::PART3-NFC - PASSED (0.01s) : 880 assertions passed UAX-15-LISP-UNIT2-TESTS::PART0-NFD - PASSED (0.01s) : 125 assertions passed UAX-15-LISP-UNIT2-TESTS::PART1-NFD - PASSED (0.18s) : 85145 assertions passed UAX-15-LISP-UNIT2-TESTS::PART2-NFD - PASSED (0.05s) : 9220 assertions passed UAX-15-LISP-UNIT2-TESTS::PART3-NFD - PASSED (0.01s) : 880 assertions passed Test Summary for :UAX-15-LISP-UNIT2-TESTS (16 tests 0.96 sec) | 343332 assertions total | 343332 passed | 0 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests #<TEST-RESULTS-DB Tests:(16) Passed:(343332) Failed:(0) Errors:(0) Warnings:(0) {10067D9AF3}>
The variable
*debugger-hook*
is set by default to#<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>
. If you want to stay out of interactive debugging, set*debugger-hook*
to nil.You could allow the system to put you in the debugger using the
with-failure-debugging
wrapper or provide that wrapper to the keyword parameter:run-contexts
when callrun-tests
as in the following two examples(with-failure-debugging () (run-tests :tests '(t7-bad-error))) (run-tests :tests 'tf-lisp-unit2::tf-find-str-in-list-t :run-contexts #'with-failure-debugging-context)
- Basics
Tests in lisp-unit2 are functions. They are also compiled at the time of definition (so that any compile warnings or errors are immediately noticeable) and also before every run of the test (so that macro expansions are never out of date).
The define-test macro takes a name parameter and a form specifying :tags, :contexts or :package before you get to the assertions.
First, the basic test where we know everything is going to pass. Since Lisp-unit2 has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/AccelerationNet/lisp-unit2/blob/master/floating-point.lisp and https://github.com/AccelerationNet/lisp-unit2/blob/master/rational.lisp for more information on the floating point and rational tests including setting the epsilon values etc.
(defmacro my-macro (arg1 arg2) (let ((g1 (gensym)) (g2 (gensym))) `(let ((,g1 ,arg1) (,g2 ,arg2)) "Start" (+ ,g1 ,g2 3)))) (define-test t1 (:tags '(tf-basic)) (assert-true (= 1 1)) (assert-eq 'a 'a) (assert-rational-equal 3/2 3/2) (assert-float-equal 17 17.0000d0) (assert-true (logically-equal t t)) ; both true or both false (assert-true (logically-equal nil nil)) ; both true or both false (assert-expands (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3)) (my-macro a b)) (assert-error 'division-by-zero (error 'division-by-zero) "testing condition assertions"))
Now run this test. The keyword parameter :tests will accept a single test symbol or a list of tests. E.g.
(run-tests :tests 't1) (run-tests :tests '(t1)) #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006A22CE3}>
Short and to the point. For a slightly different format you can use any of the following:
(with-summary () (run-tests :tests '(t1))) (print-summary (run-tests :tests '(t1))) (run-tests :run-contexts #'with-summary-context :tests '(t1))
Both will provide something like the following:
TF-LISP-UNIT2::T1 - PASSED (0.01s) : 6 assertions passed Test Summary for :TF-LISP-UNIT2 (1 tests 0.01 sec) | 8 assertions total | 8 passed | 0 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006E6EE53}>
Now with a basic failing test. This time we will give the test a description string and first assertion gets a diagnostic string and the variables in question.
(define-test t1-fail (:tags '(tf-basic)) "describe t1-fail" (let ((x 1)) (assert-true (= x 2) ) (assert-equal x 3) (assert-error 'division-by-zero (error 'floating-point-overflow) "testing condition assertions") (assert-true (logically-equal t nil)) ; both true or both false (assert-true (logically-equal nil t))))
Now if we simply run the basic test, we get thrown into the debugger on the error assertion. If we hit continue, we are handed a
test-results-db
object. Why did we get thrown into the debugger rather than just fail? Because the variable*debugger-hook*
is set by default to #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>. If you want to stay out of interactive debugging when errors get thrown, set*debugger-hook*
to nil.(run-tests :tests '(t1-fail)) #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {10063359D3}>
If we use the print-summary wrapper, we still get through into the debugger on the error assertion, but assuming we hit continue, we get the report below and a
test-results-db
object.(print-summary (run-tests :tests '(t1-fail))) TF-LISP-UNIT2::T1-FAIL - ERRORS (5.33s) : 0 assertions passed | ERRORS (1) | ERROR: arithmetic error FLOATING-POINT-OVERFLOW signalled | #<FLOATING-POINT-OVERFLOW {100619DEE3}> | | FAILED (2) | Failed Form: (ASSERT-TRUE (= X 2) | "deliberate failure here because we know ~a is not equal to ~a" | X 2) | Expected T | but saw NIL | "deliberate failure here because we know ~a is not equal to ~a" | X => 1 | 2 | Failed Form: (ASSERT-EQUAL X 3) | Expected 1 | but saw 3 | | Test Summary for :TF-LISP-UNIT2 (1 tests 5.33 sec) | 2 assertions total | 0 passed | 2 failed | 1 execution errors | 0 warnings | 0 empty | 0 missing tests #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {1005F7D0E3}>
You do not have to manually recompile a test after a tested function has been modified.
- Edge Cases: Value Expressions, loops. closures and calling other tests
- Value expressions
Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:
(define-test t2-values-expressions (:tags '(tf-multiple)) (assert-equal (values 1 2) (values 1 2 3)) (assert-equal (values 1 2) (values 1 3)) (assert-equal (values 1 2 3) (values 1 3 2))) #<FUNCTION T2-VALUES-EXPRESSIONS> LISP-UNIT2> (print-summary (run-tests :tests '(t2-values-expressions))) LISP-UNIT2::T2-VALUES-EXPRESSIONS - FAILED (0.00s) : 1 assertions passed | FAILED (2) | Failed Form: (ASSERT-EQUAL (VALUES 1 2) (VALUES 1 3)) | Expected 1; 2 | but saw 1; 3 | Failed Form: (ASSERT-EQUAL (VALUES 1 2 3) (VALUES 1 3 2)) | Expected 1; 2; 3 | but saw 1; 3; 2 | | Test Summary for :LISP-UNIT2 (1 tests 0.00 sec) | 3 assertions total | 1 passed | 2 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests #<TEST-RESULTS-DB Tests:(1) Passed:(1) Failed:(2) Errors:(0) Warnings:(0) {10263B53C3}>
- Closures.
Lisp-Unit2 will not see variables declared in a closure encompassing the test.The following will throw an error and drop you into the debugger.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (define-test t2-loop-closure (:tags '(tf-multiple tf-basic)) (loop for x in l1 for y in l2 do (assert-equal (char-code x) y))))
- Calling another test from a test
Since we know that tests are functions in lisp-unit2, we can just have test t3 call test t2 directly rather than indirectly running through the RUN-TESTS function.
(define-test t3 ; a test that tries to call another test in its body (:tags '(tf-calling-other-tests)) (assert-equal 'a 'a) (t2)) (print-summary (run-tests :tests '(t3))) LISP-UNIT2-EXAMPLES::T3 - FAILED (0.00s) : 2 assertions passed | FAILED (2) | Failed Form: (ASSERT-EQUAL 1 2) | Expected 1 | but saw 2 | Failed Form: (ASSERT-EQUAL 2 3) | Expected 2 | but saw 3 | Test Summary for :LISP-UNIT2-EXAMPLES (1 tests 0.00 sec) | 4 assertions total | 2 passed | 2 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests #<TEST-RESULTS-DB Tests:(1) Passed:(2) Failed:(2) Errors:(0) Warnings:(0) {1009F73183}>
Unlike lisp-unit, everything works as expected and we actually got composed results.
- Value expressions
- Suites, Tags, Packages and other multiple test abilities
If
run-tests
is called without any keyword parameters, it will run all the tests in the current package. It accepts keyword parameters for :tests, :tags and :package.(lisp-unit2:run-tests &key tests tags package reintern-package)
- Lists of tests
As previously stated, Lisp-unit2 can run lists of tests.
(run-tests :tests '(t1 t2))
- Tags
As you would expect, you can run all the tests having a specific tag. In the following example we wrap
run-tests
in a call toprint-summary
in order to get useful results:(print-summary (run-tests :tags '(tf-basic)))
(with-summary () (t1-fail)) ;; here we just call t1 as a function. We need WITH-SUMMARY or PRINT-SUMMARY to get results printed Starting: LISP-UNIT2::T1-FAIL LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed | FAILED (1) | Failed Form: (ASSERT-EQL 1 2) | Expected 1 | but saw 2 |
Tags can be listed using LIST-TAGS.
- Lists of tests
- Fixtures and Contexts
What we have been referring to as fixtures is called contexts in Lisp-Unit2.
- Writing Summary to file
The (write-tap-to-file) macro takes input that will generate a report and writes it to a file in TAP format.
We show the results in the TAP format for the successful t1 test and the deliberately failing t1-fail test.
(write-tap-to-file (run-tests :tests 't1) #P "/tmp/lisp-unit2.tap") cat /tmp/lisp-unit2.tap TAP version 13 1..1 ok 1 LISP-UNIT2::T1 (0.00 s) (write-tap-to-file (run-tests :tests 't1-fail) #P "/tmp/lisp-unit2.tap") cat /tmp/lisp-unit2.tap TAP version 13 1..1 not ok 1 LISP-UNIT2::T1-FAIL (0.00 s) --- # FAILED (1) # Failed Form: (ASSERT-EQL 1 2) # Expected 1 # but saw 2 # # ...
Or we can wrap a (with-open-file macro targetting lisp-unit2::*test-stream* to write any of the other formats to file.
(with-open-file (*test-stream* #P "/tmp/lisp-unit2.summary" :direction :output :if-exists :supersede :external-format :utf-8 :element-type :default) (print-summary (run-tests :tests 't1-fail))) cat /tmp/lisp-unit2.summary LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed | FAILED (1) | Failed Form: (ASSERT-EQL 1 2) | Expected 1 | but saw 2 | | Test Summary for :LISP-UNIT2 (1 tests 0.00 sec) | 1 assertions total | 0 passed | 1 failed | 0 execution errors | 0 warnings | 0 empty | 0 missing tests
- Fixtures and Freezing Data
Lisp-unit2 refers to fixtures as "context". The following is an example from the source files of how to build up a context that can be used in a test.
(defun meta-test-context (body-fn) (let ((lisp-unit2::*test-db* *example-db*) *debugger-hook* (lisp-unit2::*test-stream* (make-broadcast-stream))) (handler-bind ((warning #'muffle-warning)) (funcall body-fn)))) (defmacro with-meta-test-context (() &body body) `(meta-test-context (lambda () ,@body))) (define-test test-with-test-results (:tags '(meta-tests) :contexts #'meta-test-context) (let ( results ) (lisp-unit2:with-test-signals-muffled () (lisp-unit2:with-test-results (:collection-place results) (lisp-unit2:run-tests :tags 'warnings) (lisp-unit2:run-tests :tags 'examples))) ;; subtract-integer-test calls run-tests (assert-eql 3 (len results)) (assert-typep 'lisp-unit2::test-results-db (first results))))
- Fixtures and Freezing Data
- Removing tests
Lisp-unit2 has a
uninstall-test
function and aundefine-test
macro. - Sequencing, Random and Failure Only
Tests are run in sequential order. There does not appear to be provision for only running the tests that failed last time.
- Skip Capability
None noted
- Random Data Generators
Lisp-unit2 has various functions for generating random data. See examples below:
(complex-random #C(5 3)) #C(4 1) (make-random-2d-array 2 3) #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425)) (make-random-2d-list 2 3) ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146)) (make-random-list 3) (0.5449568 0.32319236 0.7780224) (make-random-state) #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32) :INITIAL-CONTENTS '(0 2567483615 454 2531281407 4203062579 3352536227 284404050 622556438 ...)))
23.4. Discussion
Lisp-Unit2 is very good. Atlas Engineering put it at the top of their list top
23.5. Who Uses Lift-Unit2
Many libraries on quicklisp use Lisp-Unit2. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :lisp-unit2)
24. nst
24.1. Summary
homepage | John Maraist | LLGPL3 latest | 2021 |
You get a sense of what NST is focused on when the README starts with fixtures and states that the criterion testing has its own DSL.
This focus on fixtures is further reinforced by the definition of a test (what I have been calling an assertion in the context of the other frameworks).:
(def-test NAME ( [ :group GROUP-NAME ] [ :setup FORM ] [ :cleanup FORM ] [ :startup FORM ] [ :finish FORM ] [ :fixtures (FIXTURE FIXTURE ... FIXTURE) ] [ :aspirational FLAG ] [ :documentation STRING ] ) criterion FORM ... FORM)
Obviously this is a framework intended for complexity. That brings two problems. The first is the learning curve for the DSL. The second is the overhead that the infrastructure brings with it. Look at the stacked-ranking-benchmarks. It is almost as bad as clunit in runtime and multiple orders of magnitude worse than anything is bytes consed and eval-calls.
If you do not require the ability to handle serious complexity in your tests, look elsewhere.
24.2. Assertion Functions
assert-criterion | assert-eq | assert-eql |
assert-equal | assert-equalp | assert-non-nil |
assert-not-eq | assert-not-eql | assert-not-equal |
assert-not-equalp | assert-null | assert-zero |
24.3. Usage
First some terminology. What I have been referring to as assertions, NST refers to as a test. What I have been referring to as a test, NST refers to as a test-group.
Second, NST has its own DSL that you need to understand in order to use it. It can obviously handle complex systems, but that comes at a learning curve cost and some things that I find easy in CL I do not find as easy in NST's criterion language (probably speaks to my limitations).
- Report Format
The level of detail from reports is dependent on the verbosity setting. Valid commands are:
(nst-cmd :set :verbose :silent) (nst-cmd :set :verbose :quiet) (nst-cmd :set :verbose :verbose) (nst-cmd :set :verbose :vverbose) (nst-cmd :set :verbose :trace)
You can get a little more detail using the :detail parameter.
(nst-cmd :detail [blank, package-name, group-name, or group-name *and* test-tname])
To switch to interactive debug behavior the following commands are necessary:
(nst-cmd :debug-on-error t) (nst-cmd :debug-on-fail t) (nst-cmd :debug) ;; will set both to t
The README contains the following warning: "This behavior is less useful than it may seem; by the time the results of the test are examined for failure, the stack from the actual form evaluation will usually have been released. Still, this switch is useful for inspecting the environment in which a failing test was run."
We will show the different reporting levels with the first two groups of examples.
- Basics
NST requires that you have groups defined and each test must belong to a group. If you are defining the tests within the definition of a group, you do not need to specify the group. The empty form after the group name in the following example is used for fixtures
- All Passing Basic Test
(def-test-group tf-basic-pass () (def-test t1-1 :true (= 1 1)) (def-test t1-2 :true (not (= 1 2))) (def-test t1-3 (:eq 'a) 'a) (def-test t1-4 :forms-eq (cadr '(a b c)) (caddr '(a c b))) (def-test t1-5 (:err :type division-by-zero) (error 'division-by-zero)))
Now you need to run a mini command line to run tests and get a report. We will go from least verbose to most verbose.
- Silent
(nst-cmd :set :verbose :silent) (nst-cmd :run tf-basic-pass) Running group TF-BASIC-PASS Group TF-BASIC-PASS: 5 of 5 passed TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings) (nst-cmd :run t1) Check T1 (group TF-BASIC) passed TOTAL: 1 of 1 passed (0 failed, 0 errors, 0 warnings)
- Quiet
(nst-cmd :set :verbose :quiet) (nst-cmd :run tf-basic-pass) Running group TF-BASIC-PASS - Executing test T1-1 Check T1-1 (group TF-BASIC-PASS) passed. - Executing test T1-2 Check T1-2 (group TF-BASIC-PASS) passed. - Executing test T1-3 Check T1-3 (group TF-BASIC-PASS) passed. - Executing test T1-4 Check T1-4 (group TF-BASIC-PASS) passed. - Executing test T1-5 Check T1-5 (group TF-BASIC-PASS) passed. Group TF-BASIC-PASS: 5 of 5 passed TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
- VVerbose (verbose is generally the same as quiet)
(nst-cmd :set :verbose :vverbose) (nst-cmd :run tf-basic-pass) Running group TF-BASIC-PASS Starting run loop for #S(NST::GROUP-RECORD :NAME TF-BASIC-PASS :ANON-FIXTURE-FORMS NIL :ASPIRATIONAL NIL :GIVEN-FIXTURES NIL :DOCUMENTATION NIL :TESTS #<HASH-TABLE :TEST EQ :COUNT 5 {1008848E43}> :FIXTURES-SETUP-THUNK NIL :FIXTURES-CLEANUP-THUNK NIL :WITHFIXTURES-SETUP-THUNK NIL :WITHFIXTURES-CLEANUP-THUNK NIL :EACHTEST-SETUP-THUNK NIL :EACHTEST-CLEANUP-THUNK NIL :INCLUDE-GROUPS NIL) - Executing test T1-1 Applying criterion :TRUE to (MULTIPLE-VALUE-LIST (= 1 1)) Result at :TRUE is Check T1-1 (group TF-BASIC-PASS) passed. Check T1-1 (group TF-BASIC-PASS) passed. - Executing test T1-2 Applying criterion :TRUE to (MULTIPLE-VALUE-LIST (NOT (= 1 2))) Result at :TRUE is Check T1-2 (group TF-BASIC-PASS) passed. Check T1-2 (group TF-BASIC-PASS) passed. - Executing test T1-3 Applying criterion :EQ 'A to (MULTIPLE-VALUE-LIST 'A) Result at :EQ is Check T1-3 (group TF-BASIC-PASS) passed. Check T1-3 (group TF-BASIC-PASS) passed. - Executing test T1-4 Applying criterion :FORMS-EQ to (LIST (CADR '(A B C)) (CADDR '(A C B))) Applying criterion :PREDICATE EQ to (LIST (CADR '(A B C)) (CADDR '(A C B))) Result at :PREDICATE is Check T1-4 (group TF-BASIC-PASS) passed. Result at :FORMS-EQ is Check T1-4 (group TF-BASIC-PASS) passed. Check T1-4 (group TF-BASIC-PASS) passed. - Executing test T1-5 Applying criterion :ERR :TYPE DIVISION-BY-ZERO to (MULTIPLE-VALUE-LIST (ERROR 'DIVISION-BY-ZERO)) Result at :ERR is Check T1-5 (group TF-BASIC-PASS) passed. Check T1-5 (group TF-BASIC-PASS) passed. Group TF-BASIC-PASS: 5 of 5 passed TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
- Silent
- All Failing Basic Test
Now a test where everything fails or creates an error. This time we define the test-group separately, then each test separately. As a result we will need to define the group in the test definition. You can insert a
:documentation string
into a test but it only gets printed at the vverbose level of verbosity.(def-test-group tf-basic-fail ()) (def-test (t1-1 :group tf-basic-fail) :true (= 1 2)) (def-test (t1-2 :group tf-basic-fail) :true (not (= 2 2))) (def-test (t1-3 :group tf-basic-fail) (:eq 'a) 'b) (def-test (t1-4 :group tf-basic-fail) :forms-eq (cadr '(a d c)) (caddr '(a c b))) (def-test (t1-5 :group tf-basic-fail) (:err :type division-by-zero) (error 'floating-point-overflow))
Again, going from least verbose to most verbose.
- Silent
(nst-cmd :set :verbose :silent) (nst-cmd :run tf-basic-fail) Running group TF-BASIC-FAIL Group TF-BASIC-FAIL: 0 of 5 passed - Check T1-1 failed - Expected non-null, got: NIL - Check T1-2 failed - Expected non-null, got: NIL - Check T1-3 failed - Value B not eq to value of A - Check T1-4 failed - Predicate EQ fails for (D B) - Check T1-5 raised an error TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
- Quiet
(nst-cmd :set :verbose :quiet) (nst-cmd :run tf-basic-fail) Running group TF-BASIC-FAIL - Executing test T1-1 Check T1-1 (group TF-BASIC-FAIL) failed - Expected non-null, got: NIL - Executing test T1-2 Check T1-2 (group TF-BASIC-FAIL) failed - Expected non-null, got: NIL - Executing test T1-3 Check T1-3 (group TF-BASIC-FAIL) failed - Value B not eq to value of A - Executing test T1-4 Check T1-4 (group TF-BASIC-FAIL) failed - Predicate EQ fails for (D B) - Executing test T1-5 Check T1-5 (group TF-BASIC-FAIL) raised an error Group TF-BASIC-FAIL: 0 of 5 passed - Check T1-1 failed - Expected non-null, got: NIL - Check T1-2 failed - Expected non-null, got: NIL - Check T1-3 failed - Value B not eq to value of A - Check T1-4 failed - Predicate EQ fails for (D B) - Check T1-5 raised an error TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
For the sake of brevity, we will skip verbose and vverbose and trace. You get the picture.
- Silent
- All Passing Basic Test
- Edge Cases: Value expressions, loops, closures and calling other tests
- Value expressions
NST has a values criterion that looks at the results coming from a values expression individually. Otherwise it will only look at the first value. The following two versions pass.
(def-test-group tf-basic-values () (def-test (t1-1 :group tf-basic-values) (:equalp (values 1 2)) 1) (def-test (t1-2 :group tf-basic-values) (:values (:eql 1) (:eql 2)) (values 1 2)))
- Looping and closures.
NST does not have a loop construct. What it does have is
:each
, which takes an optional item and a comparision test and applies it to a list. In the following examples, we(def-test-group tf-basic-each ()) (def-test (each1 :group tf-basic-each) (:each (:symbol a)) '(a a a a a)) (def-test (each2 :group tf-basic-each) (:each (:apply write-to-string (:equal "A"))) '(a a a a a))
Like the clunits and lisp-units, nst does not look for variables in a closure surrounding the test defintion. The following will not work.
(let ((lst '(a a a a a))) (def-test (each3 :group tf-basic-each) (:each (:apply write-to-string (:equal "A"))) lst))
When I tried calling other tests from inside an NST test, I triggered stack exhaustion errors. So probably do not do that.
- Value expressions
- Suites, tags and other multiple test abilities
You can define an
nst-package
which can containnst-groups
which can containnst-tests
. That seems to be the limit of nestability. So annst-package
is what I have been calling a suite when talking about other frameworks. - Fixtures and Freezing Data
NST has fixtures. And fixtures. and… The following is just a simple example, please look at the documentation for more details. First we define three groups of fixtures. We intend to use the first one at the test-group level (all tests in that group have access) and the next two at individual test level (just the tests specifying that will have access). We will pretend the first two groups have an expensive calculation that we want to cache to avoid doing the calculation every time the fixture is called. The empty form after the fixtures group name takes a lot of different options and you will have to read the documentation for that..
(def-fixtures suite-fix-1 () (sf1-a '(1 2 3 4)) ((:cache t) sf1-b (* 23 47))) (def-fixtures test-fix-1 () (tf1-a '("a" "b" "c" "d")) ((:cache t) tf1-b (- 2 1))) (def-fixtures test-fix-2 () (tf2-a "some boring string here"))
Now we define a test-group that uses those fixtures. In test t3, we do not need to call out the suite level fixture in the list of fixtures to be accessed by that test.
(nst:def-test-group tf-nst-fix (suite-fix-1) (def-test (t1 :group tf-nst-fix :fixtures (test-fix-1)) (:eql b) 1) (def-test (t2 :group tf-nst-fix :fixtures (test-fix-1)) (:each (:apply length (:equal 1))) a) (def-test (t3 :group tf-nst-fix :fixtures (test-fix-1 test-fix-2)) (:equal "some boring string here-a1-1081") (format nil "~a-~a~a-~a" tf2-a (first tf1-a) tf1-b sf1-a)))
- Removing tests
I do not see a function for removing tests.
- Sequencing, Random and Failure Only
I did not see any shuffle functionality and the tests seem to run only in sequential order.
There is a
make-failure-report
function, but I did not see something that looked like the ability to rerun just failing tests. - Skip Capability
NST seems to offer skips only in the context of running interactively and letting a condition handler in the debugger ask you if you want to skip the test-group or remaining tests.
- Random Data Generators
NST has extensive random data generators. Please see the documentation for details.
24.4. Discussion
We have already seen in the benchmarking session that either I am doing something wrong with NST or it's infrastructure overhead creates speed issues. It can obviously handle very complex systems. That comes, however, at the cost of having to learn a new DSL and I found the criterion learning curve much steeper than I expected.
Take something as simple as a list of characters and integers and validating that the integer is the char-code for the character. First a plain CL version. There are a lot of different ways to do this in CL using every
or loop
or mapcar
. Below is just one of those ways.
(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100))) (every #'(lambda (x) (eq (char-code (first x)) (second x))) *test-lst*)
Now an NST version. There may be better ways to write this but as far as I can tell :apply
does not accept a lambda function.
(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100))) (defun test-char-code-sublist (lst) (eq (char-code (first lst)) (second lst))) (def-fixtures fixture1 () (lst *test-lst*)) (nst:def-test-group tf-nst () (def-test (t-char-codes :group tf-nst :fixtures (lst)) (:each (:apply test-char-code-sublist (:eq t))) lst))
I find myself writing a lot more than I feel necessary in what seems like simple situations. A large part may be simply because I am not going to get far enough up the learning curve for NST given my needs. An article introducing NST claims:
[For simple examples] "the overhead of a separate criteria language seems hardly justifiable. In fact, the criteria bring two primary advantages over Lisp forms. First, criteria can report more detailed information than just pass or fail. In a larger application where the tested values are more complicated objects and structures, the reason for a test's failure may be more subtle. More informative reports can significantly assist the programmer, especially when validating changes to less familiar older or others' code. Moreover, NST's criteria can report multiple reasons for failure. Such more complicated analyses can reduce the boilerplate involved in writing tests; one test against a conjunctive criterion can provide as informative a result as a series of sep-arate tests on the same form. As a project grows larger and more complex, and as a team of programmers and testers becomes less intimately familiar with all of the components of a system, criteria can both reduce tests’ overall verbosity, while at the same time raising the usefulness of their results."
Unfortunately for me, many times in trying to learn the DSL, NST reported that it raised an error but refused to tell me what the error was, regardless of the level of verbosity I set.
Maybe it is just me, but every time I tried to redefine a test, I triggered a hash table error and needed to define a test with a new name.
24.5. Who Uses NST
25. parachute
25.1. Summary
homepage | Yukari Hafner | zlib | 2022 |
It hits almost everything on my wish list - optionality on progress reports and debugging, good suite setup and reporting, good default error-reporting and the ability to provide diagnostic strings with variables, skip failing dependencies, set time limits on long running tests and has decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on (see Discussion). While it is not the fastest, it is in the pack as opposed to the also-rans.
The name of a test is coerced into a string internally, so test names are not functions that can be called on their own.
There are four types of reports: quiet, plain (the default), summary and interactive (throwing you into the debugger).
Parachute does allow you to set time limits for tests and will report the times for tests.
My wish list would be for there to be a built-in ability for tests to keep a list of the last tests that failed so that you could just run those over again after you think you have fixed all your bugs.
25.2. Assertion Functions
true | false | fail | is | isnt |
is-values | isnt-values | of-type | finish |
As with other frameworks, finish simply indicates that the test does not produce an error.
25.3. Usage
- Report Format
Parachute provides four types of reports at the moment. Each returns a test-result object as well as printing a report to stream.
- Quiet for when you just want the summary
- Interactive for when you want to go into the debugger on failures, and
- Plain (the default) for the nice progress report with checks and timing and
- Summary (plain but without the progress report)
So a basic failing test to show the differences in reporting. We are passing the third assertion a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.
(define-test t1-fail (let ((x 1) (y 2)) (is = 1 2) (is equal 1 2) (is = x y "Intentional failure ~a does not equal ~a" x y) (fail (error 'floating-point-overflow) 'division-by-zero)))
Now the quiet report version:
- Quiet
(test 't1-fail :report 'quiet) #<QUIET 5, FAILED results>
- Default "Plain" Report
(test 't1-fail) ? TF-PARACHUTE::T1-FAIL 0.000 ✘ (is = 1 2) 0.000 ✘ (is equal 1 2) 0.000 ✘ (is = x y) 0.000 ✘ (fail (error 'floating-point-overflow) 'division-by-zero) 0.010 ✘ TF-PARACHUTE::T1-FAIL ;; Summary: Passed: 0 Failed: 4 Skipped: 0 ;; Failures: 4/ 4 tests failed in TF-PARACHUTE::T1-FAIL The test form 2 evaluated to 2 when 1 was expected to be equal under =. The test form 2 evaluated to 2 when 1 was expected to be equal under EQUAL. The test form y evaluated to 2 when 1 was expected to be equal under =. Intentional failure 1 does not equal 2 The test form (capture-error (error 'floating-point-overflow)) evaluated to [floating-point-overflow] arithmetic error floating-point-overflow signalled when division-by-zero was expected to be equal under TYPEP. #<PLAIN 5, FAILED results>
If you start looking at summary reports for nested tests and notice that the number of test results is not greater than the number of assertions, just remember that a nested test is itself considered an assertion and will pass if all its assertions pass or fail if any of its assertions fail.
- Summary Report
The difference between the summary report and the plain report is that the summary report suppresses the progress reports. See also
parachute:largescale
if a percentage report with the first five test failures (if any) is desired.(test 't1-fail :report 'summary) ;; Summary: Passed: 0 Failed: 4 Skipped: 0 ;; Failures: 4/ 4 tests failed in UAX-15-PARACHUTE-TESTS::T1-FAIL The test form 2 evaluated to 2 when 1 was expected to be equal under =. The test form 2 evaluated to 2 when 1 was expected to be equal under EQUAL. The test form y evaluated to 2 when 1 was expected to be equal under =. Intentional failure 1 does not equal 2 The test form (capture-error (error 'floating-point-overflow)) evaluated to [floating-point-overflow] arithmetic error floating-point-overflow signalled when division-by-zero was expected to be equal under TYPEP. #<SUMMARY 5, FAILED results> (test 't1-fail :report 'largescale) Total: 5 Passed: 0 ( 0%) Failed: 5 (100%) Skipped: 0 ( 0%) ;; Failures: (limited to 5) The test form 2 evaluated to 2 when 1 was expected to be equal under =. The test form 2 evaluated to 2 when 1 was expected to be equal under EQUAL. The test form y evaluated to 2 when 1 was expected to be equal under =. Intentional failure 1 does not equal 2 The test form (capture-error (error 'floating-point-overflow)) evaluated to [floating-point-overflow] arithmetic error floating-point-overflow signalled when division-by-zero was expected to be equal under TYPEP.
- Interactive
The interactive report which throws you into the debugger:
(test 't1-fail :report 'interactive) Test (is = 1 2) failed: The test form 2 evaluated to 2 when 1 was expected to be equal under =. [Condition of type SIMPLE-ERROR] Restarts: 0: [RETRY] Retry testing (is = 1 2) 1: [ABORT] Continue, failing (is = 1 2) 2: [CONTINUE] Continue, skipping (is = 1 2) 3: [PASS] Continue, passing (is = 1 2) 4: [RETRY] Retry testing TF-PARACHUTE::T1-FAIL 5: [ABORT] Continue, failing TF-PARACHUTE::T1-FAIL
- Basics
So lets look at a test where we know everything will pass, using the default report. This will give us a view of the syntax for various types of assertions.
(define-test t1 (true (= 1 1)) (true "happy") (false (numberp "no")) (of-type integer 5) (of-type character #\space) (is = 1 1) (is equal "abc" "abc") (isnt equal "abc" "d") (is-values (values 1 2) (= (values 1 2))) (is-values (values 1 "a") (= 1) (string= "a")) (fail (error 'division-by-zero) 'division-by-zero)) (test 't1) ? TF-PARACHUTE::T1 0.000 ✔ (true (= 1 1)) 0.000 ✔ (true "happy") 0.000 ✔ (false (numberp "no")) 0.000 ✔ (of-type integer 5) 0.000 ✔ (of-type character #\ ) 0.000 ✔ (is = 1 1) 0.000 ✔ (is equal "abc" "abc") 0.000 ✔ (isnt equal "abc" "d") 0.000 ✔ (is-values (values 1 2) (= (values 1 2))) 0.000 ✔ (is-values (values 1 "a") (= 1) (string= "a")) 0.000 ✔ (fail (error 'division-by-zero) 'division-by-zero) 0.030 ✔ TF-PARACHUTE::T1 ;; Summary: Passed: 11 Failed: 0 Skipped: 0 #<PLAIN 12, PASSED results>
As you would hope, changing tested functions does not require manually recompiling parachute tests. We will skip the proof.
- Edge Cases: Values expressions, loops. closures and calling other tests
- Values expressions
Parachute has special functionality for dealing with values expressions with its
is-values
testing function as we saw just above. If you used another testing function and passed values expressions to it, they would be accepted but, as expected, Parachute would only look at the first value in the values expression. - Now looping and closures
Parachute has no problem with loops or finding variables that have been set in a closure containing the test.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (define-test t2-loop (loop for x in l1 for y in l2 do (true (= (char-code x) y))))) (test 't2-loop :report 'quiet) #<QUIET 4, PASSED results>
- Calling a test inside another test
(define-test t3 ; a test that calls another test in its body (true (eql 'a 'a)) (test 't2)) (test 't3) ? TF-PARACHUTE::T3 0.000 ✔ (true (eql 'a 'a)) ? TF-PARACHUTE::T2 0.000 ✔ (true (= 1 1)) 0.000 ✘ (true (= 2 1)) 0.000 ✔ (is-values (values 1 2) (= (values 1 2))) 0.000 ✔ (is-values (values 1 "a") (= 1) (string= "a")) 0.000 ✘ TF-PARACHUTE::T2 ;; Summary: Passed: 3 Failed: 1 Skipped: 0 ;; Failures: 1/ 4 tests failed in TF-PARACHUTE::T2 t2 description here The test form (= 2 1) evaluated to () when t was expected to be equal under GEQ. 0.003 ✘ TF-PARACHUTE::T3 ;; Summary: Passed: 1 Failed: 1 Skipped: 0 ;; Failures: 2/ 3 tests failed in TF-PARACHUTE::T3 Test for T2 failed. #<PLAIN 3, FAILED results>
- Values expressions
- Suites, tags and other multiple test abilities
- Lists of tests
Parachute can handle lists of tests.
(test '(t1 t2))
- Suites
Everything is a test in parachute and tests can have parent tests just by adding a :parent <insert-name-here>. This makes suite inheritance easy as demonstrated below..
(define-test s0) (define-test t4 :parent s0 (true (= 1 1)) (false (= 1 2))) (define-test t4-1 :parent t4 (true (= 1 1)) (false (= 1 2)))
Now we can test 's0 and we will get the results for 't4 and t4-1
(test 's0) ? TF-PARACHUTE::S0 ? TF-PARACHUTE::T4 0.000 ✔ (true (= 1 1)) 0.000 ✔ (false (= 1 2)) ? TF-PARACHUTE::T4-1 0.000 ✔ (true (= 1 1)) 0.000 ✔ (false (= 1 2)) 0.003 ✔ TF-PARACHUTE::T4-1 0.010 ✔ TF-PARACHUTE::T4 0.010 ✔ TF-PARACHUTE::S0 ;; Summary: Passed: 4 Failed: 0 Skipped: 0 #<PLAIN 7, PASSED results>
- Lists of tests
- Fixtures and Freezing Data
First checking whether we can freeze data, change it in the test, then change it back
(defparameter *keep-this-data* 1) (define-test t-freeze-1 :fix (*keep-this-data*) (setf *keep-this-data* "new") (true (stringp *keep-this-data*))) (define-test t-freeze-2 (is = *keep-this-data* 1)) (test '(t-freeze-1 t-freeze-2))
Now the classic fixture - create a data set for some series of tests and clean it up afterwards
;; Create a class for data fixture purposes (defclass class-A () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (defparameter *some-existing-data-parameter* (make-instance 'class-A :a 17.3 :b -12)) (define-fixture f1 () (let ((old-parameter *some-existing-data-parameter*)) (setf *some-existing-data-parameter* (make-instance 'class-A :a 100 :b -100)) (&body) (setf *some-existing-data-parameter* old-parameter))) (def-test t6-f1 (:fixture f1) (is (equal (a *some-existing-data-parameter*) 100)) (is (equal (b *some-existing-data-parameter*) -100))) ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter* (run! 't6-f1) Running test T6-F1 .. Did 2 checks. Pass: 2 (100%) Skip: 0 ( 0%) Fail: 0 ( 0%)
See also
define-fixture-capture
anddefine-fixture-restore
on symbols that name databases, and then annotating your tests with:fix (db-name)
. The only constraint right now is that fixtures are not inherited, meaning if you run the child test directly, it will not trigger the fixture.(defparameter *my-param* 1) (define-test parent-test-1 :fix '*my-param* (setf *my-param* 2) (is = 2 *my-param*)) (define-test child-test-1 :parent parent-test-1 (is = 2 *my-param*) (is eq 'a 'a)) (test 'parent-test-1) ? TF-PARACHUTE::PARENT-TEST-1 0.000 ✔ (is = 2 *my-param*) ? TF-PARACHUTE::CHILD-TEST-1 0.000 ✔ (is = 2 *my-param*) 0.000 ✔ (is eq 'a 'a) 0.030 ✔ TF-PARACHUTE::CHILD-TEST-1 0.050 ✔ TF-PARACHUTE::PARENT-TEST-1 ;; Summary: Passed: 3 Failed: 0 Skipped: 0 #<TEST:PLAIN 5, PASSED results>
- Build Up and Tear Down Example
Possibly more interesting is the following example which demonstrates building up and tearing down a test environment extending beyond the lisp environment. Suppose I want to create 3 databases that will be used in a suite of tests, then drop the databases at the end of the tests.
First we need a predicate that takes a symbol and decides if that symbol meets some criteria.
(defun names-global-database-p (symbol) (member symbol '(db1 db2 db3)))
Now creating the functions that create and drop the database.
(defun create-test-db (symbol) (with-test-connection (pomo:create-database symbol))) (defun drop-test-db (symbol) (with-test-connection (pomo:drop-database symbol)))
Now we set up define-fixture-capture and define-fixture-restore. They will take a symbol, apply the predicate and, if t, create (or drop) a database where the name has some relation to the symbol. In this particular case, we do not need the value parameter for define-fixture-restore.
(define-fixture-capture database (symbol) (when (names-global-database-p symbol) (values (create-test-db symbol) T))) (define-fixture-restore database (symbol db) (declare (ignore db)) (when (names-global-database-p symbol) (drop-test-db symbol)))
Now we can define the parent test of the suite:
(define-test db-suite :fix (db1 db2 db3) (with-test-connection (true (pomo:database-exists-p 'db1)) (true (pomo:database-exists-p 'db2)) (true (pomo:database-exists-p 'db3))))
And now a child of the parent:
(define-test db-suite-child1 :parent db-suite (with-test-connection (true (pomo:database-exists-p 'db1)) (true (pomo:database-exists-p 'db2)) (true (pomo:database-exists-p 'db3))))
Now if we run (test 'db-suite), it will create the three databases, run the tests in the parent, then run the tests in the child, and then drop the databases:
? POSTMODERN-KITSCH-TESTS::DB-SUITE 0.000 ✔ (true (database-exists-p 'db1)) 0.000 ✔ (true (database-exists-p 'db2)) 0.000 ✔ (true (database-exists-p 'db3)) ? POSTMODERN-KITSCH-TESTS::DB-SUITE-CHILD1 0.000 ✔ (true (database-exists-p 'db1)) 0.000 ✔ (true (database-exists-p 'db2)) 0.000 ✔ (true (database-exists-p 'db3)) 0.030 ✔ POSTMODERN-KITSCH-TESTS::DB-SUITE-CHILD1 0.480 ✔ POSTMODERN-KITSCH-TESTS::DB-SUITE ;; Summary: Passed: 6 Failed: 0 Skipped: 0 #<PLAIN 8, PASSED results>
We can now validate outside the db-suite of tests that all the databases have been dropped:
(with-test-connection (pomo:database-exists-p 'db1)) NIL 1
And they have.
- Wrapping around Multiple Tests
You can wrap a fixture around multiple tests. Consider the following:
(defclass class-A () ((a :initarg :a :initform 0 :accessor a) (b :initarg :b :initform 0 :accessor b))) (defparameter *some-existing-data-parameter* (make-instance 'class-A :a 17.3 :b -12)) (with-fixtures '(*some-existing-data-parameter*) (define-test t-fix1 (is = -12 (b *some-existing-data-parameter*)) (is = 17.3 (a *some-existing-data-parameter*))) (define-test t-fix2 (is = 17.3 (a *some-existing-data-parameter*)) (is = -12 (b *some-existing-data-parameter*)))) (test 't-fix1) ? UAX-15-PARACHUTE-TESTS::T-FIX1 0.000 ✔ (is = -12 (b *some-existing-data-parameter*)) 0.000 ✔ (is = 17.3 (a *some-existing-data-parameter*)) 0.007 ✔ UAX-15-PARACHUTE-TESTS::T-FIX1 ;; Summary: Passed: 2 Failed: 0 Skipped: 0 #<PLAIN 3, PASSED results> (test 't-fix2) ? UAX-15-PARACHUTE-TESTS::T-FIX2 0.000 ✔ (is = 17.3 (a *some-existing-data-parameter*)) 0.000 ✔ (is = -12 (b *some-existing-data-parameter*)) 0.007 ✔ UAX-15-PARACHUTE-TESTS::T-FIX2 ;; Summary: Passed: 2 Failed: 0 Skipped: 0 #<PLAIN 3, PASSED results> (with-fixtures '(*some-existing-data-parameter*) (define-test t-parent-fix3 (is = -12 (b *some-existing-data-parameter*)) (is = 17.3 (a *some-existing-data-parameter*))) (define-test t-child-fix3 :parent t-parent-fix3 (is = 12 (abs (b *some-existing-data-parameter*))))) (test 't-parent-fix3) ? UAX-15-PARACHUTE-TESTS::T-PARENT-FIX3 0.000 ✔ (is = -12 (b *some-existing-data-parameter*)) 0.000 ✔ (is = 17.3 (a *some-existing-data-parameter*)) ? UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3 0.000 ✔ (is = 12 (abs (b *some-existing-data-parameter*))) 0.003 ✔ UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3 0.010 ✔ UAX-15-PARACHUTE-TESTS::T-PARENT-FIX3 ;; Summary: Passed: 3 Failed: 0 Skipped: 0 #<PLAIN 5, PASSED results> (test 't-child-fix3) ? UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3 0.000 ✔ (is = 12 (abs (b *some-existing-data-parameter*))) 0.000 ✔ UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3 ;; Summary: Passed: 1 Failed: 0 Skipped: 0 #<PLAIN 2, PASSED results>
You can also nest with-fixtures:
(defparameter *some-int* 3) (with-fixtures '(*some-int*) (with-fixtures '(*some-existing-data-parameter*) (define-test t-parent-fix5 (is = -12 (b *some-existing-data-parameter*)) (is = 3 *some-int*)) (define-test t-child-fix5 :parent t-parent-fix5 (is = 12 (abs (b *some-existing-data-parameter*))) (is = 3 *some-int*)))) (test 't-child-fix5) ? UAX-15-PARACHUTE-TESTS::T-CHILD-FIX5 0.000 ✔ (is = 12 (abs (b *some-existing-data-parameter*))) 0.000 ✔ (is = 3 *some-int*) 0.007 ✔ UAX-15-PARACHUTE-TESTS::T-CHILD-FIX5 ;; Summary: Passed: 2 Failed: 0 Skipped: 0 #<PLAIN 3, PASSED results>
- Build Up and Tear Down Example
- Removing tests
Parachute can remove specific tests with
remove-test
or all the tests in a package withremove-all-tests-in-package
.(remove-test 't1) (remove-all-tests-in-package optional-package-name)
- Sequencing, Random, Dependencies and Failure Only
While tests normally follow a sequential order, parachute allows you to specify to either shuffle the assertions in a test or shuffle the tests within a suite by setting :serial to NIL.
(define-test shuffle :serial NIL ...) (define-test shuffle-suite :serial NIL)
You can also shuffle the tests within a test using with-shuffling.
(define-test random-2 (let ((a 0)) (with-shuffling (is = 1 (incf a)) (is = 2 (incf a)) (is = 3 (incf a)))))
You can specify that running a test depending on the success of another test. In the following example, test3 will only run if test1 is successful and test2 is not successful.
(define-test test3 :depends-on (:and test1 (:not test2)) (of-type number (+ 2 3)))
- Skip Capability
Parachute has multiple skip abilities including skipping based on assertions, tests or implementations.
- Random Data Generators
25.4. Discussion
It is certainly possible to extend parachute to retest just the tests that failed last time. Since parachute can run against a list of tests, all you need is a function to save a list of the names of the tests that fail. The following might be one way to do that.
(defun collect-test-failure-names (test-results) "This function takes the report output of a parachute test and returns a list of the names of the tests that failed." (when (typep test-results 'parachute:report) (loop for test-result across (results test-results) when (and (typep test-result 'parachute::test-result) (eq (status test-result) :failed)) collect (name (expression test-result)))))
You might also consider whether the helper library Protest would be a good add-on as it has a Parachute module.
Since I questioned what is meant by extensibility at the very beginning of this report, allow me to quote the Parachute documentation:
"Extending Parachute Test and Result Evaluation
"Parachute follows its own evaluation semantics in order to run tests. Primarily this means that most everything goes through one central function called eval-in-context. This functions allows you to customise evaluation based on both what the context is, and what the object being "evaluated" is.
Usually the context is a report object, but other situations might also be conceived. Either way, it is your responsibility to add methods to this function when you add a new result type, some kind of test subclass, or a new report type that you want to customise according to your desired behaviour.
The evaluation of results is decoupled from the context and reports in the sense that their behaviour does not, by default, depend on it. At the most basic, the result class defines a single :around method that takes care of recording the duration of the test evaluation, setting a default status after finishing without errors, and skipping evaluation if the status is already set to something other than :unknown.
Next we have a result object that is interesting for anything that actually produces direct test results– value-result. Upon evaluation, if the value slot is not yet bound, it calls its body function and stores the return value thereof in the value slot.
However, the result type that is actually used for all standard test forms is the comparison-result. This also takes a comparator function and an expected result to compare against upon completion of the test. If the results match, then the test status is set to :passed, otherwise to :failed.
Since Parachute allows for a hierarchy in your tests, there have to be aggregate results as well, and indeed there are. Two of them, actually. First is the base case, namely parent-result which does two things on evaluation: one, it binds *parent*
to itself to allow other results to register themselves upon construction, and two it sets its status to :failed if any of the children have failed.
Finally we have the test-result which takes care of properly evaluating an actual test object. What this means is to evaluate all dependencies before anything else happens, and to check the time limit after everything else has happened. If the time limit has exceeded, set the description accordingly and mark the result as :failed. For its main eval-in-context method however it checks whether any of the dependencies have failed, and if so, mark itself as :skipped. Otherwise it calls eval-in-context on the actual test object.
The default evaluation procedure for a test itself is to simply call all the functions in the tests list in a with-fixtures environment.
And that describes the semantics of default test procedures. Actual test forms like is are created through macros that emit an (eval-in-context context (make-instance 'comparison-result #|…|#)) form. The *context*
object is automatically bound to the context object on call of eval-in-context and thus always refers to the current context object. This allows results to be evaluated even from within opaque parts like user-defined functions.
Report Generation
"It should be possible to get any kind of reporting behaviour you want by adding methods that specialise on your report object to eval-in-context. For the simple case where you want something that prints to the REPL but has a different style than the preset plain report, you can simply subclass that and specialise on the report-on and summarize functions that then produce the output you want.
Since you can control pretty much every aspect of evaluation rather closely, very different behaviours and recovery mechanisms are also possible to achieve. One final aspect to note is result-for-testable, which should return an appropriate result object for the given testable. This should only return fresh result objects if no result is already known for the testable in the given context. The standard tests provide for this, however they only ever return a standard test-result instance. If you need to customise the behaviour of the evaluation for that part, it would be a wise idea to subclass test-result and make sure to return instances thereof from result-for-testable for your report.
Finally it should be noted that if you happen to create new result types that you might want to run using the default reports, you should add methods to format-result that specialise on the keywords :oneline and :extensive for the type. These should return a string containing an appropriate description of the test in one line or extensively, respectively. This will allow you to customise how things look to some degree without having to create a new report object entirely." top
25.5. Who uses parachute
The following list is just pulling the results (ql:who-depends-on :parachute)
and adding urls to a few of them.
("3b-hdr" 3d-matrices "3d-vectors" array-utils
atomics "binpack"
"canonicalized-initargs" cesdi "cl-elastic"
"cl-markless" "class-options" "classowary" colored
com-on "compatible-metaclasses" "definitions-systems"
"enhanced-boolean" "enhanced-defclass" "enhanced-find-class"
"enhanced-typep" "evaled-when" "fakenil"
"first-time-value" float-features "inheriting-readers"
"its" "method-hooks" mmap "nyaml/test" "object-class"
"origin.test"
pathname-utils "protest/parachute" "radiance"
"shared-preferences" "shasht/test" "simple-guess"
"slot-extra-options" "trivial-custom-debugger/test"
"trivial-jumptables" uax-14 uax-9
"with-output-to-stream" "with-shadowed-bindings")
26. prove
26.1. Summary
homepage | Eitaro Fukamachi | MIT | 2020 |
As most readers will know, the author has archived Prove in favor of Rove. Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.
Prove does report all assertion failures in a test and allows user generated diagnostic messages, albeit without the ability to provide variables. Interactive debugging is optional and it does have a *default-slow-threshold*
parameter which defaults to 150 milleconds to handle slow tests.
On the downside, it does not have fixtures and I find the situation with suites and tags to be confusing. I really want to be able to turn off progress reports. Finally, it is somewhat slower than most of the frameworks, but not the orders of magnitude slower that you are faced with using clunit, clunit2 or nst.
26.2. Assertion Functions
ok | is | isnt | is-values | is-type | like |
is-print | is-error | is-expand | pass | fail | skip |
26.3. Usage
- Report Format
Set prove:*debug-on-error* T for invoking CL debugger whenever getting an error during running tests.
Prove has four different reporters (:list, :dot, :tap or :fiveam) with :list being the default) for different formatting. Set
prove:*default-reporter*
to the desired reporter to change the format. Lets take a very basic failing test just to see what the syntax and output looks like.We have inserted diagnostic strings into the first two assertions. The second string has a format directive just to show that prove does not use them in these tests.
(deftest t1-fail (let ((x 1) (y 2)) (ok (= 1 2) "We know 1 is not = 2") (is 3 4 "We know 3 is not ~a" 4) (ok (equal 1 2)) (ok (equal x y))))
If we now use
run-test
:- List Reporter
(run-test 't1-fail) T1-FAIL × We know 1 is not = 2 NIL is expected to be T × We know 3 is not ~a 3 is expected to be 4 × NIL is expected to be T × NIL is expected to be T
- TAP Reporter
(setf *default-reporter* :tap) (run-test 't1-fail) # T1-FAIL not ok 1 - We know 1 is not = 2 # got: NIL # expected: T not ok 2 - We know 3 is not ~a # got: 3 # expected: 4 not ok 3 # got: NIL # expected: T not ok 4 # got: NIL # expected: T not ok 4 - T1-FAIL NIL
- DOT Reporter
(setf *default-reporter* :dot) (run-test 't1-fail) . NIL
- Fiveam Reporter
(setf *default-reporter* :fiveam) (run-test 't1-fail) f #\f
We will stick with the default reporter for the rest of this section.
- List Reporter
- Basics
Prove has a limited number of test functions so we can check them all out in a single test that we know will pass. We will skip the diagnostic strings except for the
like
function since we know everything will pass.(deftest t1 (let ((x 1) (y 2)) (ok (= x 1)) (is #(1 2 3) #(1 2 3) :test #'equalp) (isnt y 3) (is-values (values 1 2) '(1 2)) (is-type #(1 2 3) 'simple-vector) (like "su9" "\\d" "Do we have a digit in the tested string?") (is-print (princ "jabberwok") "jabberwok") (is-error (error 'division-by-zero) 'division-by-zero)))
The
like
test function uses regex for cl-ppcre. The default list reporter will turn this around as:(run-test 't1) T1 ✓ T is expected to be T ✓ #(1 2 3) is expected to be #(1 2 3) ✓ 2 is not expected to be 3 ✓ (1 2) is expected to be (1 2) ✓ #(1 2 3) is expected to be a type of SIMPLE-VECTOR ✓ Do we have a digit in the tested string? ✓ (PRINC "jabberwok") is expected to output "jabberwok" (got "jabberwok") ✓ (ERROR 'DIVISION-BY-ZERO) is expected to raise a condition DIVISION-BY-ZERO (got #<DIVISION-BY-ZERO {10027D1673}>) NIL
As you would hope, changing tested functions does not require manually recompiling prove tests. We will skip the proof.
- Edge Cases: Values expressions, loops. closures and calling other tests
- Value expressions
Similar to NST and Parachute, Prove does have special functionality with respect to values expressions and can look at the individual values coming from a values expression.
(deftest t2-values (is-values (values 1 2) '(1 2)) ; passes (is-values (values 1 2) '(1 3)) ; fails (ok (equalp (values 1 2) (values 1 3)))) ; passes
- Looping and closures.
Prove has no problems with looping and taking variables declared in a closure surrounding the test. The following passes.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop (loop for x in l1 for y in l2 do (ok (= (char-code x) y)))))
- Calling other tests
Prove has a subtest macro which is intended to allow tests to be nested. So, for example, I compile the following and I get a nice indented report.
(subtest "sub-1" (is 1 1) (is 1 2) (subtest "sub-2" (is 'a 'a) (is 'a 'b))) sub-1 ✓ 1 is expected to be 1 × 1 is expected to be 2 sub-2 ✓ A is expected to be A × A is expected to be B NIL
- Value expressions
- Suites, tags and other multiple test abilities
- Lists of tests
Prove does not provide a way to run against lists of tests.
- Suites
In spite of the fact that all my examples above are really done in the REPL, prove is at heart based on files of tests. So even without looking for "suite" functions or classes, each file is effectively a suite. Each package is also considered a suite and the macro
subtest
also creates a suite.Prove provides multiple functions to run different sets of tests.
run
runs a test which can be a file pathname, a directory pathname or an asdf system name.run-test
runs a single test as we have been using above.run-test-all
runs all the tests in the current packagerun-test-package
runs all the tests in a specific packagerun-test-system
runs a testing ASDF system.
- Lists of tests
- Fixtures and Freezing Data
None
- Removing tests
Prove has
remove-test
andremove-test-all
- Sequencing, Random and Failure Only
Prove tests run sequentially and I do not see any shuffle or random order functionality. I also do not see a way to collect just the failing tests to be able to rerun just those.
- Skip Capability
Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped. You can provide a string as to why they were skipped, but why mark them as passed? In fact, why do you need to be counting tests? You should be able to mark particular tests as skip.
(skip 3 "No need to test these on Mac OS X") ;-> ✓ No need to test these on Mac OS X (Skipped) ; ✓ No need to test these on Mac OS X (Skipped) ; ✓ No need to test these on Mac OS X (Skipped)
- Random Data Generators
26.4. Discussion
Prove has a lot of "market share", but I am not sure how much of that is due to cl-project and some of the other libraries by Eitaro Fukamachi like caveman2 and clack that hard code prove into what you are building. Whether you like prove or not, at least it was an attempt to get people to actually test their code.
In spite of the fact that the author has archived prove and stated that rove is now the successor, his libraries have not moved over to rove and prove still has functionality lacking in rove (and vice versa).
If I were to use prove, I would write another test reporter that did not have progress reports and would return a list of just failing tests. I would still have to write my own fixture macros. Or I could just use a framework that does that.
26.5. Who Uses Prove?
Many libraries on quicklisp use prove. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :prove)
27. ptester
27.1. Summary
homepage | Kevin Layer | LLGPL | 2016 |
Ptester was released by Allegro. Phil Gold's commentary on ptester in his 2007 blog is still relevant today. "Ptester is barely a test framework. It has no test suites and no test functions. All it provides is a set of macros for checking function results (test (analogous to lisp-unit:assert-equal), test-error, test-no-error, test-warning, and test-no-warning) and a wrapper macro designed to enclose the test clauses which merely provides a count of success and failures at the end. ptester expects that all testing is done in predefined functions and lacks the dynamic approach present in other frameworks."
Yes, you can do testing with it, but you can do much better with other frameworks.
27.2. Assertion Functions
None - normal CL predicates resolving to T or nil
27.3. Usage
- Report Format
Reporting or interactivity is optional. Set
*break-on-test-failures*
if you want to go into interactive debugging when a test failure occurs. - Basics
The
test
macro by default applieseql
to the subsequent arguments. This can be changed by specifying the actual test to use. The following including assertions about errors and warnings. The one item that might need a little explanation is the values test where we can explicitly flag to the test that it needs to be looking at multiple values.(with-tests (:name "t1") (test 1 1) (test 'a 'a) (test "ptester" "ptester" :test 'equal) (test '(a b c) (values 'a 'b 'c) :multiple-values t) (test-error (error 'division-by-zero) :condition-type 'division-by-zero) (test-warning (warn "foo"))) Begin t1 test ********************************** End t1 test Errors detected in this test: 0 Successes this test:6
Now with a deliberately failing test. No, you cannot compare two values expressions with each other.
(with-tests (:name "t2") (let ((x 2) (y 'd)) (test x 1) (test y 'a) (test "ptester" "ptester" :test 'equal) (test '(values 'a 'b 'c) (values 'a 'b 'c) :multiple-values t) (test-error (error 'division-by-zero) :condition-type 'floating-point-overflow) (test-warning (warn "foo")))) Begin t2 test * * * UNEXPECTED TEST FAILURE * * * Test failed: 1 wanted: 2 got: 1 * * * UNEXPECTED TEST FAILURE * * * Test failed: 'A wanted: D got: A * * * UNEXPECTED TEST FAILURE * * * Test failed: (VALUES 'A 'B 'C) wanted values: VALUES, 'A, 'B, 'C got values: A, B, C * * * UNEXPECTED TEST FAILURE * * * Test failed: (ERROR 'DIVISION-BY-ZERO) Reason: detected an incorrect condition type. wanted: FLOATING-POINT-OVERFLOW got: #<SB-PCL::CONDITION-CLASS COMMON-LISP:DIVISION-BY-ZERO> ********************************** End t2 test Errors detected in this test: 4 UNEXPECTED: 4 Successes this test:2
- Edge Cases: Closures and calling other tests
Ptester has no problem dealing with variables declared in a closure encompassing the test or with loops..
Since ptester does not have a callable test "instance", a ptester test cannot call another test.
- Suites, tags and other multiple test abilities
- Fixtures and Freezing Data
None
- Removing tests
None
- Sequencing, Random and Failure Only
None
- Skip Capability
None
- Random Data Generators
None
27.4. Discussion
You can do better with other frameworks.
27.5. Who depends on ptester?
("cl-base64-tests" "getopt-tests" "puri-tests")
28. rove
28.1. Summary
homepage | Eitaro Fukamachi | BSD 3 Clause | 2020 |
If you use package-inferred systems, there may be more capabilities than if you do not. Without a package-inferred system, you get no consolidated summary of all the tests and you may have to write your tests differently than if you have a package-inferred system. It does have fixtures that can be used once per package or once per test, but there is no ability to use different fixtures with respect to different tests and no composable fixtures. In addition,signal testing seems incomplete compared to other frameworks.
As noted in the functionality tables, there have been reports that rove crashes with multithreaded results. See https://tychoish.com/post/programming-in-the-common-lisp-ecosystem/: "rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."
Rove is sensitive to how you write the tests, at least if compiled with BSCL. The first way I wrote the tests triggered memory errors I could not recover from (but only in SBCL, not with CCL). The second way I wrote the tests worked perfectly fine. Your mileage may vary.
Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.
Given the multithreaded concerns, the issue I had with benchmarking and the missing functionality both with respect to non-package-inferred systems and in comparison to Prove, I cannot recommend Rove.
28.2. Assertion Functions
As mentioned above, Rove does not have as many assertion functions as Prove, the library it is supposed to be replacing. The assertion functions are limited to:
ok | ng (not-good?) | signals | outputs | expands | pass | fail |
28.3. Usage
- Report Format
It has three different styles of reporting. The default style is the detailed :spec style. A simpler style that just shows dot progression is :dot and a style that just reports the result is :none. Turning off progress reporting is using the :none style. We show them all in the first basic passing test.
To go interactive rather than reporting (setf rove:*debug-on-error* t).
- Basics
Starting off with a basic multiple passing assertion test. We have added a macro and an assertion that uses the
expands
capability provided by Rove. I admit to not being entirely clear whydeftest
andtesting
are separate macros. Adding the testing macro allows a description string, but I am not seeing other additional functionality. Can anyone hit me with a clue stick?(defmacro defun-addn (n) (let ((m (gensym "m"))) `(defun ,(intern (format nil "ADD~A" n)) (,m) (+ ,m ,n)))) (deftest t1 (testing "Basic passing test" (ok (equal 1 1)) (ok (signals (error 'division-by-zero) 'division-by-zero)) (ng (equal 1 2)) (ok (expands '(defun-addn 10) `(defun add10 (#:m) (+ #:m 10)))))) (rove:run-test 't1) t1 Basic passing test ✓ Expect (EQUAL 1 1) to be true. ✓ Expect (ERROR 'DIVISION-BY-ZERO) to signal DIVISION-BY-ZERO. ✓ Expect (EQUAL 1 2) to be false. ✓ Expect '(DEFUN-ADDN 10) to be expanded to `(DEFUN ADD10 (#:M) (+ #:M 10)). ✓ 1 test completed T
You can add a :compile-at keyword parameter to deftest. The available options are :definition-time (the default) or :run-time.
You can add a :style keyword parameter to
run-test
to get different formats. The above was the default :spec style. Below we show the :dot and :none styles.(rove:run-test 't1 :style :dot) .... ✓ 1 test completed T (rove:run-test 't1 :style :none) T
On to a failing test. In this case we pass a diagnostic string on to the first two assertions. Rove does not allow variables to be passed to the diagnostic string. In the :spec style, Rove will show the parameters were that were provided to the second assertion that failed.
(deftest t2 (testing "Basic failing test" (let ((x 1) (y 2)) (ok (equal 1 2) "we know 1 is not equal to 2") (ok (equal x y) "we know ~a is not equal to ~a") (ok (equal (values 1 2) (values 1 2)))))) T2 ROVE> (run-test 't2) t2 Basic failing test × 0) we know 1 is not equal to 2 × 1) we know ~a is not equal to ~a ✓ Expect (EQUAL (VALUES 1 2) (VALUES 1 2)) to be true. × 1 of 1 test failed 0) t2 › Basic failing test we know 1 is not equal to 2 (EQUAL 1 2) 1) t2 › Basic failing test we know ~a is not equal to ~a (EQUAL X Y) X = 1 Y = 2
Rove does not require you to manually recompile a test after a tested function has been modified.
- Edge Cases: Values Expressions, loops. closures and calling other tests
- Values Expressions
Unlike Prove (or Lift or Parachute) Rove has no special functionality for dealing with values expressions. It accepts values expressions but only compares the first value in each. Thus the following passes:
(deftest t2-values-expressions (testing "values expressions" (ok (equalp (values 1 2) (values 1 3))) (ok (equalp (values 1 2 3) (values 1 3 2 7)))))
- Looping and closures
Rove has no problem looping through assertions pulling the variables from a closure.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop (loop for x in l1 for y in l2 do (ok (= (char-code x) y)))))
- Tests calling tests
Rove tests can call other Rove tests. As with most frameworks, this results in two test results rather than a combined test result.
- Values Expressions
- Conditions
We saw in the first basic passing test Rove checking an error condition. I want to show what happens when an error condition fails because the result is different depending on whether the assertion function is
ok
orng
. It does not throw you into the debugger because we have*debug-on-error*
set to nil, but shows the typical debugger output.(deftest t7-wrong-error (ok (signals (error 'floating-point-overflow) 'division-by-zero))) (rove:run-test 't7-wrong-error) t7-wrong-error × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO. (3333ms) × 1 of 1 test failed 0) t7-wrong-error Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO. FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO) 1: ((FLET "H0" :IN #:DROP-THRU-TAG-2) arithmetic error FLOATING-POINT-OVERFLOW signalled) 2: (SB-KERNEL::%SIGNAL arithmetic error FLOATING-POINT-OVERFLOW signalled) 3: (ERROR FLOATING-POINT-OVERFLOW) 4: ((LABELS ROVE/CORE/ASSERTION::MAIN :IN #:DROP-THRU-TAG-2)) 5: ((FLET "MAIN0" :IN #:DROP-THRU-TAG-2)) 6: ((LAMBDA NIL)) 7: ((LAMBDA NIL :IN RUN-TEST)) 8: ((:METHOD ROVE/REPORTER:INVOKE-REPORTER (T T)) #<SPEC-REPORTER PASSED=0, FAILED=1> #<FUNCTION (LAMBDA NIL :IN RUN-TEST) {102D9E34FB}>) 9: (SB-INT:SIMPLE-EVAL-IN-LEXENV (RUN-TEST (QUOTE T7-WRONG-ERROR)) #<NULL-LEXENV>) 10: (EVAL (RUN-TEST (QUOTE T7-WRONG-ERROR))) 11: (SWANK::EVAL-REGION (rove:run-test 't7-wrong-error) ) 12: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL)) 13: (SWANK-REPL::TRACK-PACKAGE #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E317B}>) 14: (SWANK::CALL-WITH-RETRY-RESTART Retry SLIME REPL evaluation request. #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E311B}>) 15: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E30FB}>)
I am surprised, however, at the results if we change the assertion from
ok
tong
. We know it is going to be the wrong error, so I would have expected theng
assertion function to return a pass. But it does not.(deftest t7-wrong-error-NG (ng (signals (error 'floating-point-overflow) 'division-by-zero))) × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO. × 1 of 1 test failed 0) t7-wrong-error-ng Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO. FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO) ...
signals
returns true or an error.ng
accepts T or nil and getting back an error triggers an error rather than a failure. - Suites, tags and other multiple test abilities
- Lists of tests
Rove does not run lists of tests.
- Suites
Rove's RUN-SUITE function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing.
Rove's RUN function does accept a style parameter but seems to handle only package-inferred systems. I confirm issue #42 that it will not run with non-package inferred systems.
Since the author really likes the lots of packages style of structuring CL programs, I would not be surprised if he recommends having lots of test packages as the equivalent of how other testing frameworks treat suites of tests.
(run-suite :tf-rove)
- Lists of tests
- Fixtures and Freezing Data
Rove provides SETUP for fixtures that are done once only in a package and TEARDOWN for cleanup. For a fixture that should be run before and after every test, Rove provides DEFHOOK.
(defparameter *my-var-suite* 0) (defparameter *my-var-hook* 0) (setup (incf *my-var-suite*)) (teardown (format t "Myvar ~a~%" *my-var-suite*)) (defhook :before (incf *my-var-hook*) :after (format t "My-var-hook ~a~%" *my-var-hook*))
- Removing tests
None apparently
- Sequencing, Random and Failure Only
Everything is just done in sequential order. There is no obvious way to collect and run just failed tests.
- Skip Capability
- Random Data Generators
None
28.4. Additional Discussion Points
The author claims rove is the successor to prove and cites the following differences. Rove supports package-inferred systems, has fewer dependencies, reports details of failure tests, has thread support and has fixtures.
Rove is clearly targetted at package-inferred systems. In fact some of the functionality doesn't work unless your system is package-inferred. Personally I do not like package-inferred systems. Other people have the completely opposite view. In any event I did not test any of the frameworks with a package inferred system so I cannot comment on whether they work or do not work in that circumstance.
To show that Rove actually is improved over Prove with respect to reporting details on failure, the following shows first prove, then rove on a simple
(let ((x 1) (y 2))
(deftest t35
(ok (= x y))))
Running with Prove
(run-test 't35) T34-PROVE × NIL is expected to be T
Now Rove:
(run-test 't35) t35 × 0) Expect (= X Y) to be true. × 1 of 1 test failed 0) t34-rove Expect (= X Y) to be true. (= X Y) X = 1 Y = 2 NIL
Both prove and rove would have accepted diagnostic message strings in the assertion.
On the whole, my concerns expressed in the summary still stand. There are better frameworks out there.
28.5. Who Uses
Many libraries on quicklisp use rove. If you have quicklisp, you can get a list of those with:
(ql:who-depends-on :rove)
29. rt
29.1. Summary
Kevin M. Rosenberg | MIT | 2010 |
RT reminds me of Ptester (I wonder why) and is a part of CL history. See, e.g. Supporting the Regression Testing of Lisp Programs in 1991. Tests are limited to a single assertion and everything seems to be an A-B comparison using EQUAL. While you might think it is just of historical significance, there are still a surprising number of packages in quicklisp (29 at last count) that use it including major packages like ironclad, cffi, usocket, clsql and anaphora.
29.2. Assertion Functions
RT's tests do not accept multiple assertions. The test itself acts as an assertion that the included form is true or nil.
29.3. Usage
- Report Format and Basics
We start with a basic passing test just to show the reporting.
(deftest t1 (= 1 1) t) (do-test 't1) T1 ; processing (DEFTEST T4 ...)
Now a deliberately failing test:
(deftest t1-fail (= 1 2) t) (do-test 't1-fail) Test T1-FAIL failed Form: (= 1 2) Expected value: T Actual value: NIL.
- Multiple assertions, loops. closures and calling other tests
RT tests do not handle multiple assertions, loops, closures or calling other tests
- Suites, tags and other multiple test abilities
- Lists of tests
RT cannot directly handle lists of tests (although you could loop through list, the results would not be composable)
- Suites
RT does not have suites per se. You can run all the tests that have been defined using the
DO-TESTS
function. By default it prints to*standard-output*
but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice.do-tests
will print the results for each individual test and then summarize with something like the following:5 out of 8 total tests failed: T4, T1-FAIL, T1-FUNCTION, T2-LOOP, T2-LOOP-CLOSURE.
- Lists of tests
- Fixtures and Freezing Data
None, although rt's package which tests rt has a setup macro that could have been placed in the rt package to use for fixtures. You could use it for reference in writing your own.
- Removing tests
RT has functions for
rem-test
andrem-all-tests
. - Sequencing, Random and Failure Only
RT runs tests in their order of original definition.
- Skip Capability
None
- Random Data Generators
None
29.4. Discussion
While it is still used in major projects, I think Parachute or Fiasco would be better if you are starting a new project.
29.5. Who depends on rt?
("anaphora" "cffi" "cl-azure" "cl-cont" "cl-irc" "cl-performance-tuning-helper" "cl-photo" "cl-sentiment" "cl-store" "clsql" "cxml-stp/" "hyperobject" "infix-dollar-reader" "ironclad" "kmrcl" "lapack" "lml" "lml2" "narrowed-types" "nibbles/s" "osicat" "petit.string-utils" "qt" "quadpack" "trivial-features" "trivial-garbage" "umlisp" "usocket" "xhtmlgen")
30. should-test
30.1. Summary
homepage | Vsevolod Dyomkin | MIT | 2019 |
Should test is pretty basic. It will report all the failing assertions in a test and does offer the opportunity to provide diagnostic strings to assertions, albeit without variables. It does offer the opportunity to just run the tests that failed last time, so you do not have to run through all the tests in the package every time. Unfortunately you cannot turn off progress reporting, you cannot go interactive into the debugger, it has no fixture capacity and cannot run lists of tests. Its suite capability are limited to you creating separate packages.
30.2. Assertion Functions
30.3. Usage
- Report Format
The summary report will contain full failure reports if
*verbose*
is set to T (the default) or just test names otherwise.There is no optionality with respect to reporting or interactive. It is all reporting.
- Basics
The basic all assertions passing test showing use of both
be
andsignal
. Callingtest
with the keyword parameter:test
enables us to specify the test to be run. One item that is not clear is function of the empty form following the test name.(deftest t1 () (should be = 1 1) (should signal division-by-zero (error 'division-by-zero))) (test :test 't1) Test T1: OK T
It just reported that the entire test passed.
Now a basic failing test. This should have two failing assertions and one passing assertion. We put a diagnostic string in the first test which shows in the result but it does not allow us to insert variables into the string.
(deftest t1-fail () "describe t1-fail" (let ((x 1)(y 2)) (should be = x y "intentional failure x ~a y ~a" x y) (should be = (+ x 2) (+ x 3)) (should be equal (values 1 2) (values 1 2)) (should signal division-by-zero (error 'floating-point-overflow)))) (test :test 't1-fail) Test T1-FAIL: Y FAIL expect: 1 2 "intentional failure x ~a y ~a" 1 actual: 2 (+ X 3) FAIL expect: 3 actual: 4 (ERROR 'FLOATING-POINT-OVERFLOW) FAIL expect: DIVISION-BY-ZERO actual: #<FLOATING-POINT-OVERFLOW {1009B89F63}> FAILED NIL (#<FLOATING-POINT-OVERFLOW {1009B89F63}> (4) (2)) NIL
Should-test has no special functionality for dealing with values expressions. It does accept them but as you would expect only looks at the first value in each values epxression. The following will pass.
(deftest t1-unequal-values () (should be equal (values 1 2) (values 1 3)))
We get the expected and actual values without the extra blank lines that annoy me in fiveam. The list at the end shows the specific actual assertion values that failed.
If we had set
*verbose*
to nil we would have just gotten the last three lines of the report.Test T1-FAIL: FAILED NIL ((4) (2)) NIL
Should-test handles redefinitions of tested functions without forcing you to manually recompile the test. We will skip the proof.
- Edge Cases: Closures and calling other tests
- Looping and closures.
Should-test cannot access the variables declared in a closure encompassing the test. This does not work:
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (deftest t2-loop-closure () (loop for x in l1 for y in l2 do (should be = (char-code x) y))))
- Calling other tests
Suppose you defined a test which also calls another test.
(deftest t3 () (should be = 1 1) (test :test 't1-fail))
We know that t1-fail will fail. Will embedding it in test t3 cause t3 to fail as well? Yes.
Test T3: Test T1-FAIL: Y FAIL expect: 1 2 "intentional failure x ~a y ~a" 1 actual: 2 (+ X 3) FAIL expect: 3 actual: 4 (ERROR 'FLOATING-POINT-OVERFLOW) FAIL expect: DIVISION-BY-ZERO actual: #<FLOATING-POINT-OVERFLOW {100A218BC3}> FAILED FAILED NIL (#<FLOATING-POINT-OVERFLOW {100A218BC3}> (4) (2)) NIL
- Looping and closures.
- Suites, tags and other multiple test abilities
- Lists of tests
Should-test does not handle lists of tests.
- Suites
The
test
function for Should-test runs all the tests in the current package by default. As you have seen above, giving it a :test keyword parameter will trigger just the named test. Giving it a :package keyword parameter will cause it to run all the tests in the specified package. The :failed key totest
will re-test only the tests which failed at their last run. All in all, there are better frameworks.(test :failed t)
- Lists of tests
- Fixtures and Freezing Data
None
- Removing tests
None
- Sequencing or Random
Should-test will run tests in the same order each time (no shuffle capability). As noted in the suite discussion, it is one of the few frameworks to have failure only functionality built in.
(test :failed t)
- Skip Capability
None
- Random Data Generators
None
30.4. Who Depends on Should-test
("cl-redis-test" "mexpr-tests" "rutils-test)
31. simplet
31.1. Summary
homepage | Noloop | GPLv3 | 2019 |
In simplet, tests can have only one assertion that they get from some function and suites can take multiple tests. From the standpoint of other frameworks, simplet "tests" are the assertion clauses and simplet "suites" are the way to package multiple assertions. If a suite has no tests, or a test has no function returning T or NIL, they are marked "PENDING".
Simplet's run
function takes only an optional parameter to return a string rather than printing to the REPL.
I am just going to show one example of usage and leave it at that. Given all the functionality in other frameworks, I cannot recommend.
(suite "suite 2" (test "one" #'(lambda () (let ((x 1)) (= x 1)))) (test "two" #'(lambda ()(eq 'a 'a))) (test "three" #'(lambda ()(= 2 1))) (test "four" #'(lambda ()(= 1 1)))) (#<FUNCTION (LAMBDA () :IN NOLOOP.SIMPLET::CREATE-SUITE) {100A391A6B}>) (run) #...Simplet...# one: T two: T three: NIL four: T ----------------------------------- suite 2: NIL Runner result: NIL NIL
The author uses simplet in testing assert-p, eventbus and skeleton-creator
32. tap-unit-test
32.1. Summary
homepage | Christopher K. Riesbeck, John Hanley | MIT | 2017 |
Tap-unit-test is a version of a slightly older version of lisp-unit with TAP reporting. There have not been any real updates since 2011 and I cannot find anyone using it, so I would simply look to either lisp-unit or lisp-unit2 if you like their approach to things.
32.2. Assertion Functions
assert-eq | assert-eql | assert-equal | assert-equality |
assert-equalp | assert-error | assert-expands | assert-false |
assert-prints | assert-true | fail | logically-equal |
set-equal | unordered-equal |
32.3. Usage
- Report Format and basic syntax
TAP-unit-test defaults to a reporting format shown below. You can do
(setf *use-debugger* :ask)
or(setf *use-debugger* t)
, but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error).We can start with a basic failing test to show the reporting format. We will provide a diagnostic string in the first assertion. Tap-unit-test has an unordered-equal assertion helper that might be useful for some which is shown in this example:
(define-test t1-fail "describe t1-fail" (let ((x 1)) (assert-true (= x 2) "Deliberate failure. We know 2 is not ~a" x) (assert-equal x 3) (assert-true (unordered-equal '(3 2 1 1) '(1 2 3 2))) ; Return true if l1 is a permuation of l2. (assert-true (set-equal '(a b c d) '(b a c c))) ;every element in both sets needs to be in the other (assert-error 'division-by-zero (error 'floating-point-overflow) "testing condition assertions") (assert-true (unordered-equal '(3 2 1) '(1 3 4))) (assert-true (logically-equal t nil)) ; both true or both false (assert-true (logically-equal nil t)))) ; both true or both false
Unlike lisp-unit, when you call
run-tests
in tap-unit-test, you call unquoted test names, even when you are running it on several tests. Also note that it does not return any type of object as a test result. If we now run it we get the following report:(run-tests t1-fail) T1-FAIL: (= X 2) failed: Expected T but saw NIL "Deliberate failure. We know 2 is not ~a" => "Deliberate failure. We know 2 is not ~a" X => 1 T1-FAIL: 3 failed: Expected 1 but saw 3 T1-FAIL: (UNORDERED-EQUAL '(3 2 1 1) '(1 2 3 2)) failed: Expected T but saw NIL T1-FAIL: (SET-EQUAL '(A B C D) '(B A C C)) failed: Expected T but saw NIL T1-FAIL: (ERROR 'FLOATING-POINT-OVERFLOW) failed: Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100D41D203}> "testing condition assertions" => "testing condition assertions" T1-FAIL: (UNORDERED-EQUAL '(3 2 1) '(1 3 4)) failed: Expected T but saw NIL T1-FAIL: (LOGICALLY-EQUAL T NIL) failed: Expected T but saw NIL T1-FAIL: (LOGICALLY-EQUAL NIL T) failed: Expected T but saw NIL T1-FAIL: 0 assertions passed, 8 failed. NIL
Tap-unit test does not need to manually recompile tests when a tested function is modified. We will skip the proof.
- Edge Cases: Value expressions, closures and calling other tests
- Values expressions
Tap-unit-test has no special functionality for dealing with values expressions. It does accept them as input, but as expected, only compares the first value in a values expression.
(define-test t2 "describe t2" (assert-equal 1 2) (assert-equal 2 3) (assert-equalp (values 1 2) (values 1 2))) (run-tests t2)
We get what we expected, two failing assertions and one passing assertion. Does tap-unit-test follow the lisp-unit ability to actually look at all members of the values expression or just the first one? Yes. So far only the two lisp-units and tap-unit-test actually compared each item in two values expressions.
(define-test t2-values-expressions (assert-equal (values 1 2) (values 1 3)) (assert-equal (values 1 2 3) (values 1 3 2))) (run-tests t2-values-expressions) T2-VALUES-EXPRESSIONS: (VALUES 1 3) failed: Expected 1; 2 but saw 1; 3 T2-VALUES-EXPRESSIONS: (VALUES 1 3 2) failed: Expected 1; 2; 3 but saw 1; 3; 2 T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
- Closures
Unfortunately no luck with closure variables. It does, however, handle looping through assertions if the variables are dynamic or defined within the test. We will skip the proof.
- Values expressions
- Calling another test
While tests are not functions in tap-unit-test, they can call other tests.
(define-test t3 () "describe t3" (assert-equal 'a 'a) (run-tests t2)) (run-tests t3) T2: 2 failed: Expected 1 but saw 2 T2: 3 failed: Expected 2 but saw 3 T2: 1 assertions passed, 2 failed. T3: 1 assertions passed, 0 failed.
- Edge Cases: Value expressions, closures and calling other tests
- Suites, tags and other multiple test abilities
- Lists of tests
As mentioned earlier, tap-unit-test uses unquoted test names and does not return any kind of test-results object. Running multiple specific tests would look like the following.
(run-tests t7-bad-error t1-fail) T7-BAD-ERROR: (ERROR 'FLOATING-POINT-OVERFLOW) failed: Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100BEDBFA3}> "testing condition assertions. This should fail" => "testing condition assertions. This should fail" T7-BAD-ERROR: 0 assertions passed, 1 failed. T1-FAIL: Y failed: Expected 1 but saw 2 12 T1-FAIL: 1 assertions passed, 1 failed. TOTAL: 1 assertions passed, 2 failed, 0 execution errors.
- Packages
If you want to run all the tests in a package, just call
run-tests
with no parameters - Suites and Tags
Tap-unit-test has no suites or tags capability
- Lists of tests
- Fixtures and Freezing Data
None
- Removing tests
Tap-unit-test has a
remove-tests
function which actually does take a quoted list of test names unlike some of the other functions which use unquoted names.(remove-tests '(t1 t2))
- Sequencing, Random and Failure Only
None
- Skip Capability
- Generators
Tap-unit-test has a
make-random-state
function for generating random data. See example below:(make-random-state) #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32) :INITIAL-CONTENTS '(0 2567483615 454 2531281407 4203062579 3352536227 284404050 622556438 ...)))
32.4. Discussion
Basically lisp-unit and lisp-unit2 have moved on and tap-unit-test exists for historical reasons. There are enough syntactic differences that if someone is using it for an existing code base, pulling it out of quicklisp could be a breakage. No one is using it As far as I can tell.
33. try
33.1. Summary
homepage | Gábor Melis | MIT | 2022 |
Try's self description is:
"Try is what we get if we make tests functions and build a test framework on top of the condition system as Stefil did but also address the issue of rerunning and replaying, make the IS check more capable, use the types of the condition hierarchy to parameterize what to debug, print, rerun, and finally document the whole thing."
"Try is a library for unit testing with equal support for interactive and non-interactive workflows. Tests are functions, and almost everything else is a condition, whose types feature prominently in parameterization."
33.2. Assertion Functions
is | ||||
expected-result-success | unexpected-result-success | expected-result-failure | unexpected-result-failure |
33.3. Usage
- Basics and Report Format
- Interactive
Each defined test is a lisp function that records its execution in "trial" objects. There are different defaults intended for interactive and non-interactive modes. When a test is called as a Lisp function, the interactive defaults are used and unexpected failures invoke the debugger. When a test is called with
TRY
, the non-interactive defaults are used and the debugger is not invoked. In more detail, when the debugger is to invoked is determined by*DEBUG*
and*TRY-DEBUG*
, which are both event types.- Basic Passing Test
Here we define a test that takes one parameter and then call the test with a parameter we know will pass and see the results.
(deftest test1-pass (x) (is (= x 1)))
We can either funcall the test or just call it directly (test1-pass 1)
(test1-pass 1) TEST1-PASS ⋅ (IS (= X 1)) ⋅ TEST1-PASS ⋅1 #<TRIAL (TEST1-PASS 1) EXPECTED-SUCCESS 0.000s ⋅1>
The first line shows the beginning of a defined test, the second the successful assertion, the third the cumulative test result and the fourth indicates what was recorded in the trial object. If you look at the last three lines, the dot before the 1 is a "marker" that indicates an a "category" of expected success and the 1 represents the number of expected successes actually found.
The total list of categories and their markers is:
((abort* :marker "⊟") (unexpected-failure :marker "⊠") (unexpected-success :marker "⊡") (skip :marker "-") (expected-failure :marker "×") (expected-success :marker "⋅"))
- Basic Failing Test
The following has two assertions, both of which fail. In this case we will funcall the test and it will throw us into the debugger with restarts. We would get the same result if we just called it as a normal function (t1-fail):
(deftest t1-fail () (let ((x 1) (y 2)) (is (equal 1 2)) (is (= x y) :msg "Intentional failure x does not equal y" :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%" *package* *print-case*)))) (funcall 't1-fail) UNEXPECTED-FAILURE in check: (IS (EQUAL 1 2)) [Condition of type UNEXPECTED-RESULT-FAILURE] Restarts: 0: [RECORD-EVENT] Record the event and continue. 1: [FORCE-EXPECTED-SUCCESS] Change outcome to TRY:EXPECTED-RESULT-SUCCESS. 2: [FORCE-UNEXPECTED-SUCCESS] Change outcome to TRY:UNEXPECTED-RESULT-SUCCESS. 3: [FORCE-EXPECTED-FAILURE] Change outcome to TRY:EXPECTED-RESULT-FAILURE. 4: [ABORT-CHECK] Change outcome to TRY:RESULT-ABORT*. 5: [SKIP-CHECK] Change outcome to TRY:RESULT-SKIP. 6: [RETRY-CHECK] Retry check. 7: [ABORT-TRIAL] Record the event and abort trial UAX-15-TRY-TESTS::T1-FAIL. 8: [SKIP-TRIAL] Record the event and skip trial UAX-15-TRY-TESTS::T1-FAIL. 9: [RETRY-TRIAL] Record the event and retry trial UAX-15-TRY-TESTS::T1-FAIL. 10: [SET-TRY-DEBUG] Supply a new value for :DEBUG of TRY:TRY. 11: [RETRY] Retry SLIME REPL evaluation request. 12: [*ABORT] Return to SLIME's top level. 13: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {1002487C03}>) Backtrace: 0: (TRY::SIGNAL-OUTCOME T FAILURE (:CHECK (IS (EQUAL 1 2)) :ELAPSED-SECONDS 0 :CAPTURES NIL ...)) 1: ((FLET "DEFTEST" :IN T1-FAIL) #<unused argument>) 2: (TRY::CALL-TEST) 3: (T1-FAIL)
If you call restart 13 to abort, the REPL prints out the following:
T1-FAIL ⊟ non-local exit ⊟ T1-FAIL ⊟1
You can see the markers summarizing the conditions being triggered in the line items and the net result.
Now lets tell it that we expect failure
(deftest t1-expected-fail () (with-failure-expected (t) (let ((x 1) (y 2)) (is (equal 1 2)) (is (= x y) :msg "Intentional failure x does not equal y" :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%" *package* *print-case*))))) (funcall 't1-expected-fail) T1-EXPECTED-FAIL × (IS (EQUAL 1 2)) × Intentional failure x does not equal y where X = 1 Y = 2 *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE ⋅ T1-EXPECTED-FAIL ×2 #<TRIAL (T1-EXPECTED-FAIL) EXPECTED-SUCCESS 0.000s ×2>
Again you see the markers showing the conditions triggers, that there were two failures and both were expected.
Now a version where we have two assertions, the first one fails unexpectedly and the second passes:
(deftest t1b-unexpected-fail () (let ((x 1) (y 2)) (is (= x y)) (is (equal 1 x)))) (t1b-unexpected-fail) UNEXPECTED-FAILURE in check: (IS (= X Y)) where X = 1 Y = 2 [Condition of type UNEXPECTED-RESULT-FAILURE] Restarts: 0: [RECORD-EVENT] Record the event and continue. 1: [FORCE-EXPECTED-SUCCESS] Change outcome to TRY:EXPECTED-RESULT-SUCCESS. 2: [FORCE-UNEXPECTED-SUCCESS] Change outcome to TRY:UNEXPECTED-RESULT-SUCCESS. 3: [FORCE-EXPECTED-FAILURE] Change outcome to TRY:EXPECTED-RESULT-FAILURE. 4: [ABORT-CHECK] Change outcome to TRY:RESULT-ABORT*. 5: [SKIP-CHECK] Change outcome to TRY:RESULT-SKIP. 6: [RETRY-CHECK] Retry check. 7: [ABORT-TRIAL] Record the event and abort trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL. 8: [SKIP-TRIAL] Record the event and skip trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL. 9: [RETRY-TRIAL] Record the event and retry trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL. 10: [SET-TRY-DEBUG] Supply a new value for :DEBUG of TRY:TRY. 11: [RETRY] Retry SLIME REPL evaluation request. 12: [*ABORT] Return to SLIME's top level. 13: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {1009C947E3}>)
If we choose RECORD-EVENT, we get the following:
T1B-UNEXPECTED-FAIL ⊠ (IS (= X Y)) where X = 1 Y = 2 ⋅ (IS (EQUAL 1 X)) ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1 #<TRIAL (T1B-UNEXPECTED-FAIL) UNEXPECTED-FAILURE 161.676s ⊠1 ⋅1>
The last line shows two categories of results, one unexpected failure (indicated by the ⊠) and one expected success.
If we choose SKIP-CHECK, we get:
T1B-UNEXPECTED-FAIL - (IS (= X Y)) ⋅ (IS (EQUAL 1 X)) ⋅ T1B-UNEXPECTED-FAIL -1 ⋅1 #<TRIAL (T1B-UNEXPECTED-FAIL) EXPECTED-SUCCESS 10.917s -1 ⋅1>
Now the markers show one failure that was skipped and one expected success.
Now a version where we have two assertions and we expect one specifically to fail:
(deftest t1a-expected-fail () (let ((x 1) (y 2)) (is (equal 1 x)) (with-failure-expected (t) (is (= x y) :msg "Intentional failure x does not equal y" :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%" *package* *print-case*))))) T1A-EXPECTED-FAIL (t1a-expected-fail) T1A-EXPECTED-FAIL ⋅ (IS (EQUAL 1 X)) × Intentional failure x does not equal y where X = 1 Y = 2 *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE ⋅ T1A-EXPECTED-FAIL ×1 ⋅1 #<TRIAL (T1A-EXPECTED-FAIL) EXPECTED-SUCCESS 0.000s ×1 ⋅1>
- Basic Passing Test
- Quiet Reports
The Try function allows you to report only the trial object summary (by passing :print nil):
(try 't1-fail :print nil) #<TRIAL (T1-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2>
or only the unexpected events (by passing
:print 'unexpected
).(try 'test9-pass-and-fail :print 'unexpected) TEST9-PASS-AND-FAIL T1B-UNEXPECTED-FAIL ⊠ (IS (= X Y)) where X = 1 Y = 2 ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1 T1A-UNEXPECTED-FAIL ⊠ Intentional failure x does not equal y where X = 1 Y = 2 *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE ⊠ T1A-UNEXPECTED-FAIL ⊠1 ⋅1 ⊠ TEST9-PASS-AND-FAIL ⊠2 ⋅6 #<TRIAL (TEST9-PASS-AND-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2 ⋅6>
In general, what to print is parameterized as event types, of which
nil
andunexpected
are two instances.The
try
function does not allow you to pass parameters to the tests, so you would need to wrap those tests inside another test or alambda
, or call the test directly. For example:(deftest test6-pass (x y) (is (= x 1)) (is (= y 2)) (is (not (= x y)))) (deftest test7 () (test6-pass 1 2) (test1-pass 1)) (try 'test7 :print 'unexpected) ==> #<TRIAL (TEST7) EXPECTED-SUCCESS 0.000s ⋅4> (try (lambda () (test6-pass 1 2))) ==> #<TRIAL (TRY #<FUNCTION (LAMBDA ()) {100689750B}>) EXPECTED-SUCCESS 0.000s ⋅3> (test6-pass 1 2) ==> #<TRIAL (TEST6-PASS 1 2) EXPECTED-SUCCESS 0.000s ⋅3>
As you can see from the result, it indicated that there were 4 expected successes in the total combination of tests.
If we had not used
:print 'unexpected
, the result would have looked like:(try 'test7) TEST7 TEST6-PASS ⋅ (IS (= X 1)) ⋅ (IS (= Y 2)) ⋅ (IS (NOT (= X Y))) ⋅ TEST6-PASS ⋅3 TEST5-PASS ⋅ (IS (= X 1)) ⋅ TEST5-PASS ⋅1 ⋅ TEST7 ⋅4 #<TRIAL (TEST7) EXPECTED-SUCCESS 0.000s ⋅4>
Running try on its version of the uax-15 test suite would (a) have printed 343332 lines of successful assertions and the trial object would show the first version running in the emacs slime REPL and the second in a terminal (both under SBCL 2.3.0)
..... 343332 lines later ... #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 31.773s ⋅343332> ..... 343332 lines later ... #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 10.887s ⋅343332>
You can see the time discrepancy: Almost 32 seconds in the REPL to run the test and print every assertion and almost 11 seconds in the terminal. Now with
:print 'unexpected
, we get the following (first in the emacs REPL and then in a terminal:(try 'uax-suite :print 'unexpected) #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.527s ⋅343332> #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.460s ⋅343332>
And the test run is down from 32 secs to 1.5 seconds.
- Print Compactly
If we set the
*print-compactly*
variable to t, and run try without the:print 'unexpected
, try would have printed out a single dot for each assertion and the result for the running the uax-15 suite would have been (again, first in the emacs REPL and then running in a terminal):..... name of test ... lots of dots representing passing assertions, concluding in #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.990s ⋅343332> #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.900s ⋅343332>
So printing the dots only saved us 30 seconds of wasted time in the REPL and 8 seconds in the terminal.
- Print Compactly
- Failure reports only
Lets create a test which calls tests we know will fail and tests we know will not fail:
(deftest t1a-unexpected-fail () (let ((x 1) (y 2)) (is (= x y) :msg "Intentional failure x does not equal y" :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%" *package* *print-case*)) (is (equal 1 x)))) (deftest t1b-unexpected-fail () (let ((x 1) (y 2)) (is (= x y)) (is (equal 1 x)))) (deftest test8-pass-and-fail () (t1b-unexpected-fail) (test7) (t1a-unexpected-fail))
Now we call try with
test8-pass-and-fail
with the:print 'unexpected
parameters(try 'test8-pass-and-fail :print 'unexpected) TEST8-PASS-AND-FAIL T1B-UNEXPECTED-FAIL ⊠ (IS (= X Y)) where X = 1 Y = 2 ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1 T1A-UNEXPECTED-FAIL ⊠ Intentional failure x does not equal y where X = 1 Y = 2 *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE ⊠ T1A-UNEXPECTED-FAIL ⊠1 ⋅1 ⊠ TEST8-PASS-AND-FAIL ⊠2 ⋅6 #<TRIAL (TEST8-PASS-AND-FAIL) UNEXPECTED-FAILURE 0.003s ⊠2 ⋅6>
As expected the net result (last line) is 2 unexpected failures and 6 expected successes. We do not get thrown into the debugger, and we only get details of the unexpected failures.
- Streams and Printer
The try function accepts key parameters
:stream
and:printer
to redirect the report printing. - Other Reports
The following shows the condition classes of events which can be signaled and printed:
(let ((*debug* nil) (*print* '(not trial-start)) (*describe* nil)) (with-test (verdict-abort*) (with-test (expected-verdict-success)) (with-expected-outcome ('failure) (with-test (unexpected-verdict-success))) (handler-bind (((and verdict success) #'force-expected-failure)) (with-test (expected-verdict-failure))) (handler-bind (((and verdict success) #'force-unexpected-failure)) (with-test (unexpected-verdict-failure))) (with-test (verdict-skip) (skip-trial)) (is t :msg "EXPECTED-RESULT-SUCCESS") (with-failure-expected ('failure) (is t :msg "UNEXPECTED-RESULT-SUCCESS") (is nil :msg "EXPECTED-RESULT-FAILURE")) (is nil :msg "UNEXPECTED-RESULT-FAILURE") (with-skip () (is nil :msg "RESULT-SKIP")) (handler-bind (((and result success) #'abort-check)) (is t :msg "RESULT-ABORT*")) (catch 'foo (with-test (nlx-test) (throw 'foo nil))) (error "UNHANDLED-ERROR"))) .. VERDICT-ABORT* ; TRIAL-START .. ⋅ EXPECTED-VERDICT-SUCCESS .. ⊡ UNEXPECTED-VERDICT-SUCCESS .. × EXPECTED-VERDICT-FAILURE .. ⊠ UNEXPECTED-VERDICT-FAILURE .. - VERDICT-SKIP .. ⋅ EXPECTED-RESULT-SUCCESS .. ⊡ UNEXPECTED-RESULT-SUCCESS .. × EXPECTED-RESULT-FAILURE .. ⊠ UNEXPECTED-RESULT-FAILURE .. - RESULT-SKIP .. ⊟ RESULT-ABORT* .. NLX-TEST ; TRIAL-START .. ⊟ non-local exit ; NLX .. ⊟ NLX-TEST ⊟1 ; VERDICT-ABORT* .. ⊟ "UNHANDLED-ERROR" (SIMPLE-ERROR) .. ⊟ VERDICT-ABORT* ⊟3 ⊠1 ⊡1 -1 ×1 ⋅1 .. ==> #<TRIAL (WITH-TEST (VERDICT-ABORT*)) ABORT* 0.004s ⊟3 ⊠1 ⊡1 -1 ×1 ⋅1>
- Duration
The try global variable
*print-duration*
defaults to nil. If set to true, the number of second spent during execution on each assertion is also printed. E.g.(let ((*print-duration* t) (*debug* nil) (*describe* nil)) (with-test (timed) (is (progn (sleep 0.3) t)) (is (progn (sleep 0.2) t)) (error "xxx"))) .. TIMED .. 0.300 ⋅ (IS (PROGN (SLEEP 0.3) T)) .. 0.200 ⋅ (IS (PROGN (SLEEP 0.2) T)) .. ⊟ ""xxx (SIMPLE-ERROR) .. 0.504 ⊟ TIMED ⊟1 ⋅2 .. ==> #<TRIAL (WITH-TEST (TIMED)) ABORT* 0.504s ⊟1 ⋅2>
- Interactive
- Edge Cases: Values expressions, loops. closures and calling other tests
- Values expressions
Try handles values expressions with no problem.
(is (match-values (values (1+ 5) "sdf") (= * 0) (string= * "sdf")))
- Now looping and closures
Try has no problem with loops or finding variables that have been set in a closure containing the test.
- Calling a test inside another test
Try has no problem with calling tests inside other tests.
(let ((x 1)) (deftest test3-pass () (is (= x 1)) (is (funcall 'test2 1))) (funcall 'test3-pass)) TEST3-PASS ⋅ (IS (= X 1)) ⋅ (IS (= X Y)) ⋅ (IS (FUNCALL 'TEST2 1)) ⋅ TEST3-PASS ⋅3 #<TRIAL (TEST3-PASS) EXPECTED-SUCCESS 0.000s ⋅3>
- Values expressions
- Float Tests
Try has an extensive library of float comparisons. I will just quote from the documentation:
"Float comparisons following https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
[function] FLOAT-~= X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)
Return whether two numbers, X and Y, are approximately equal either according to MAX-DIFF-IN-VALUE or MAX-DIFF-IN-ULP.
If the absolute value of the difference of two floats is not greater than MAX-DIFF-IN-VALUE, then they are considered equal.
If two floats are of the same sign and the number of representable floats (ULP, unit in the last place) between them is less than MAX-DIFF-IN-ULP, then they are considered equal.
If neither X nor Y are floats, then the comparison is done with =. If one of them is a DOUBLE-FLOAT, then the other is converted to a double float, and the comparison takes place in double float space. Else, both are converted to SINGLE-FLOAT and the comparison takes place in single float space.
[variable] \*MAX-DIFF-IN-VALUE* 1.0e-16
The default value of the MAX-DIFF-IN-VALUE argument of FLOAT-~=.
[variable] \*MAX-DIFF-IN-ULP* 2
The default value of the MAX-DIFF-IN-ULP argument of FLOAT-~=.
[function] FLOAT-~< X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)
Return whether X is approximately less than Y. Equivalent to <, but it also allows for approximate equality according to FLOAT-~=.
[function] FLOAT-~> X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)
Return whether X is approximately greater than Y. Equivalent to >, but it also allows for approximate equality according to FLOAT-~=."
- Suites, tags and other multiple test abilities
- Lists of tests
Try can handle lists of tests and the function
list-package-tests
will return a list of the try tests in a package.(test '(t1 t2)) (list-package-tests)
- Suites
Test suites in try are just tests that callother tests
(deftest test6-pass (x y) (is (= x 1)) (is (= y 2)) (is (not (= x y)))) (deftest test7 () (test6-pass 1 2) (test1-pass 1))
Now we can call test7 and get the results for test6 and test1
(test7) TEST7 TEST6-PASS ⋅ (IS (= X 1)) ⋅ (IS (= Y 2)) ⋅ (IS (NOT (= X Y))) ⋅ TEST6-PASS ⋅3 TEST1-PASS ⋅ (IS (= X 1)) ⋅ TEST1-PASS ⋅1 ⋅ TEST7 ⋅4 #<TRIAL (TEST7) EXPECTED-SUCCESS 0.003s ⋅4>
- Lists of tests
- Replay Events and Defer Descriptions
Try has a
replay-events
function which reprocesses the events which where collected in a trial object. It does not re-run the tests, it just signals the events collected in the trial object for further processing. For example, you ran a large test without :print 'unexpected and need to winnow through the output to find just the failing tests. You can replay and just get the unexpected results:(replay-events ! :print 'unexpected)
As you might expect, the variable
!
passed in is the most recent trial.!!
or!!!
would be second or third most recent respectively.The function
(recent-trial &optional (n 0))
will return the Nth most recent trial or NIL if there are not enough trials recorded. Every TRIAL returned by TRY gets pushed onto a list of trials, but only*n-recent-trials*
are kept. - Rerunning Tests
Rerun only the tests with unexpected results from the previous run:
(try !)
Alternatively, one could
(funcall !)
(or any other trial object). What is rerun is controlled by*rerun*
and*try-rerun*
(both event types). - Fixtures and Freezing Data
Per the documentation, Try has no direct support for fixtures. One suggestion is writing macros like the following:
(defvar *server* nil) (defmacro with-xxx (&body body) `(flet ((,with-xxx-body () ,@body)) (if *server* (with-xxx-body) (with-server (make-expensive-server) (with-xxx-body)))))
With Try, fixtures are needed less often because one can rerun only the failing tests from the entire test suite. If the higher level tests establish the dynamic environment and call subtests, then things will just work.
- Removing tests
Often, it suffices to remove the call to test function (if it is invoked explicitly by another test). If it is invoked via a package (
list-package-tests
list all tests in a given package), then it needs to be deleted byfmakunbound
,unintern
, or by redefining the function withdefun
. - Sequencing, Random and Failure Only
While tests and the forms inside tests normally follow a sequential order, try allows you to shuffle the list of forms inside the body of a test.
(loop repeat 3 do (with-shuffling () (prin1 1) (prin1 2))) .. 122112 => NIL
- Skip Capability
As we already saw when looking at tests that had failures, you can look at the conditions triggered and decide to skip an assertion result.
The following shows a test that skips the test (server-available-p).
(deftest my-suite () (with-skip ((not (server-available-p))) (test-server)))
- Random Data Generators
None, but the helper libraries can fullfill this well.
33.4. Discussion
I found try to be easy to use. I was a bit surprised at the lack of additional assertion comparisons, but in try, everything is an extension to is
. You can easily use the normal CL comparison functions and customize reporting for them.
33.5. Who uses try
The following list is just pulling the results (ql:who-depends-on :try)
and adding their homepage urls.
34. unit-test
34.1. Summary
homepage | Manuel Odendahl, Alain Picard | MIT | 2012 |
Again, another framework that does the basics. It will report all assertions that failed in a test. It will do a progress report on the tests, not the assertions which cannot be turned off. It does allow you to provide diagnostic strings to assertions to help in debugging, but does not allow you to pass in any variables. It has no interactivity option, so you cannot just hop into the debugger on a test failure. It has no fixture capacity, but it does have suites.
34.2. Assertion Functions
test-assert | test-condition | test-equal |
34.3. Usage
- Report Format
Everything a return of a list of test-result objects. There is no provision for dropping into the debugger.
run-test
has an optional parameter for output that sends output by defult to*debug-io*
- Basics
Unit-test has a limited vocabulary for test functions. The
deftest
macro will create an instance of a unit-test class with the first parameter being the unit name (used to group related tests) and the second parameter being the name of the test itself.(deftest :test "t1" (let ((x 1)) (test-assert (= x 1)) (test-equal "a" "a" ) (test-condition (/ 1 (- x 1)) 'division-by-zero)))
In this case we used a string "t1" as the name of the test. We could have used a symbol 't1 or a keyword :t1. Unfortunately the
run-test
method is only defined for unit tests, which makes calling a single test a little clumsy. We have to callget-test-by-name
and pass that torun-test
. In your test I would assume that you would add another method to handle however you write your test names. We will continue to useget-test-by-name
as a reminder.(run-test (get-test-by-name "t1")) (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL> #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL> #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
Not the most exciting report in the world. But lets take a look at a failing test. We can put a diagnostic string into
test-assert, but not into =test-equal
. The equality test fortest-equal
is equal, but you can change that using a keyword parameter as shown below(deftest :test "t1-fail" (let ((x 1)) (test-assert (= x 2) "we know that X (1) does not equal 2") (test-equal "a" 'a :test #'eq ) (test-condition (/ 1 (- x 1)) 'floating-point-overflow))) (run-test (get-test-by-name "t1-fail")) (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: CRASH REASON: NIL> #<TEST-EQUAL-RESULT FORM: 'A STATUS: FAIL REASON: NIL> #<TEST-EQUAL-RESULT FORM: (= X 2) STATUS: FAIL REASON: we know that X (1) does not equal 2> #<TEST-EQUAL-RESULT FORM: (VALUES 1 3 4 5) STATUS: PASS REASON: NIL> #<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL> #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL> #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
The first thing I notice is that the list of results also includes the list of results from when we ran test t1. Everything just gets pushed to a non-exported list
*unit-test-results*
. So if you want to just see the results for the next test you are going to run, you need to run some cleanup.T1-fail generated three results, so again a little clumnsy on ensuring you see all the results from the test. Let's set
*unit-test-results*
to nil after every test so we can this clean.There is an exported variable =*unit-test-debug*, but looking at the source code, it does not appear to be actually used for anything, leaving it open for your to write your own code using it as a flag.
If a test calls a function that is later modified, the test does not need to be recompiled to check the tested function correctly. We will skip the proof.
- Edge Cases: Value expressions, closures and calling other tests
- Value expressions
Like most of the frameworks, unit-test will test a values expression by only checking the first value. We will skip the proof.
- Looping and closures
Unit-test provided a little bit of a surprise here. If you run a test where the assertion is inside a loop, the test-result object will be pushed to
unit-test::*unit-test-results*
, but that list will not be printed to the REPL. You just get NIL in the REPL. We will skip the proof.Unit-test had no problem testing functions that use variables provided in a closure. We will skip the proof.
Tests can call other tests, but there is no composition, just another test result.
(deftest :test "t3" (test-assert (= 1 1)) (run-test (get-test-by-name "t1")))
- Value expressions
- Suites, tags and other multiple test abilities
- Lists of tests
Unit-test has no provision to handle lists of tests although you could write a method on
run-test
that would do so. - Suites
Looking at the examples above, we gave all the tests the unit name of ":test". This is essentially the suite named :test. If we call
run-all-tests
, all the tests would be run. If we had given tests different unit names, we could run all the tests with those names by passing the keyword parameter :unit torun-all-tests
(run-all-tests :unit :some-unit-name)
- Lists of tests
- Fixtures and Freezing Data
From the source code:
;;;; For more complex tests requiring fancy setting up and tearing down ;;;; (as well as reclamation of resources in case a test fails), users are expected ;;;; to create a subclass of the unit-test class using the DEFINE-TEST-CLASS macro. ;;;; The syntax is meant to be reminiscent of CLOS, e.g ;;;; ;;;; (define-test-class my-test-class ;;;; ((my-slot-1 :initarg :foo ...) ;;;; (my-slot-2 (any valid CLOS slot options) ;;;; ....)) ;;;; After this, the methods ;;;; (defgeneric run-test :before ;;;; ((test my-test-class) &key (output *debug-io*)) and ;;;; (defgeneric run-test :after ;;;; ((test my-test-class) &key (output *debug-io*)) may be ;;;; specialized to perform the required actions, possibly accessing the ;;;; my-slot-1's, etc. ;;;; ;;;; The test form is protected by a handler case. Care should be taken ;;;; than any run-test specialization also be protected not to crash.
- Removing tests
None
- Sequencing, Random and Failure Only
I do not see any capability to shuffle test order or to run only the tests that have previously failed.
- Skip Capability
- Random Data Generators
None
34.4. Discussion
Compared to other frameworks it feels a little clumsy and basic. I would top
34.5. Who Depends on Unit-Test?
It is used by cl-fad and several of the bknr programs.
35. xlunit
35.1. Summary
homepage | Kevin RosenBerg | BSD | 2015 |
Xlunit stops at the first failure in a test, so you only get partial failure reporting (joining lift in this regard). That, in and of itself would cause me to look elsewhere. Phil Gold's original concern was that while you can create hierarchies of test suites, they are not composable.
35.2. Assertion Functions
assert-condition | assert-eql | assert-equal |
assert-false | assert-not-eql | assert-true |
35.3. Usage
I find the terminology of xlunit to be confusing after getting used to other frameworks.
Xlunit requires that you create a class for a test-case or suite. Every "test" is then a named test-method on that class. def-test-method
adds a test to the suite. The class can, of course, have slots for variables that any test in the suite can use.
Test-methods can have multiple assertions and can be applied to either a test-case
or a test-suite
. The macro get-suite
applies to either test-case
or a test-suite
classes and creates an instance of that class.
I notice that cambl-test (one of the libraries that uses xlunit) wraps a define-test
macro around def-test-method
to make this feel more natural. That version is here:
(defclass amount-test-case (test-case) () (:documentation "test-case for CAMBL amounts")) (defmacro define-test (name &rest body-forms) `(def-test-method ,name ((test amount-test-case) :run nil) ,@body-forms))
- Report Format
Xlunit reports a single dot for a test with at least a passing assertion, an F for failure and E for errors. In testing suites, xlunit will provide one dot per test, the time it took to run the suite and, if everything is successful, OK with a count of the tests and a count of the assertions.
- Basics
We will create a test-case named tf-xlunit that we can use to attach tests, each of whichcan have multiple assertions. The form immediately after the test method name takes both the class to which it applies and whether to run the method immediately upon compilation. The libraries using xlunit seem to define all methods with :run nil.
(defclass tf-xlunit (xlunit:test-case) ()) (def-test-method t1 ((test tf-xlunit) :run nil) (assert-equal "a" "a") (assert-condition 'division-by-zero (error 'division-by-zero)) (assert-false (= 1 2)) (assert-eql 'a 'a) (assert-not-eql 'a 'b) (assert-true (= 1 1)))
Unfortunately you need to run all the tests methods applicable to a test-case or suite at once. For clarity purposes, we create a separate test-case class for each method so that we do not get burdened with results from other methods. Effectively a test-case can be viewed the same as other frameworks think of suites.
The reporting is a bit underwhelming. Even more so as we get to failures.
(xlunit:textui-test-run (xlunit:get-suite tf-xlunit)) . Time: 0.0 OK (1 tests) #<TEST-RESULTS {10016CD973}>
All the assertions passed in order for the entire test to pass.
Now a test that should have six assertion failures. This time we are going to put a diagnostic message intothe first assertion. Xlunit does not provide the ability to insert variables into the diagnostic message or provide trailing variables.
I am going to create a new suite so that we just see the results for this test and will continue to do that until we get to the suites discussion.
(defclass tf-xlunit-t1-fail (xlunit:test-case) ()) (def-test-method t1-fail ((test tf-xlunit-t1-fail) :run nil) (assert-equal "a" "b" "Deliberate failure on our part") (assert-condition 'division-by-zero (error 'floating-point-overflow)) (assert-false (= 1 1)) (assert-eql 'a 'b) (assert-not-eql 'a 'a) (assert-true (= 1 2)))
And now, the failure report:
(xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail)) .F Time: 0.0 There was 1 failure: 1) T1-FAIL: Assert equal: "a" "b" Deliberate failure on our part FAILURES!!! Run: 1 Failures: 1 Errors: 0 #<TEST-RESULTS {1003FF1353}>
Yes, the test failed. Unfortunately it only reported the first assertion failure, not all of them. No, I do not know why a dot appeared before the failure indicator. I was really hoping for it to tell me all the different assertion failures.
- Edge Cases: Value Expressions, closures and calling other tests
- Value expressions
XLunit has no special functionality for dealing with values expressions. Like most of the frameworks, xlunit will check values expressions but only look at the first value.
- Closures.
Xlunit has no problem dealing with variables from closures. We will skip the proof.
- Calling tests from inside tests
As with several frameworks, xlunit allows a test to call another test, but there is no composition - you get two separate reports.
(defclass tf-xlunit-t3 (xlunit:test-case) ()) (def-test-method t3 ((test tf-xlunit-t3) :run nil); (assert-equal 1 1) (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail))) (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t3)) ..F Time: 0.0 There was 1 failure: 1) T1-FAIL: Assert equal: "a" "b" Deliberate failure on our part FAILURES!!! Run: 1 Failures: 1 Errors: 0 Time: 0.003333 OK (1 tests) #<TEST-RESULTS {100B2823A3}>
- Value expressions
- Suites, fixtures and other multiple test abilities
- Lists of tests
I did not see a way to run only a subset of the methods applicable to a test-case.
- Suites and fixtures
I am going to cheat here combine the discussion of fixtures and suites together and use an example from the source code. Here we create a test-case named math-test-case with two additional slots for numbera and numberb. Before any tests are run, there is a
set-up
method which initialises those slots. We then add three test methods having a single assertion earch (one slightly modified from the source code).(defclass math-test-case (test-case) ((numbera :accessor numbera) (numberb :accessor numberb)) (:documentation "Test test-case for math testing")) (defmethod set-up ((tcase math-test-case)) (setf (numbera tcase) 2) (setf (numberb tcase) 3)) (def-test-method test-addition ((test math-test-case) :run nil) (let ((result1 (+ (numbera test) (numberb test))) (result2 (+ 1 (numbera test) (numberb test)))) (assert-true (= result1 5)) (assert-true (= result2 6)))) (def-test-method test-subtraction ((test math-test-case) :run nil) (let ((result (- (numberb test) (numbera test)))) (assert-equal result 1))) ;;; This method is meant to signal a failure (def-test-method test-subtraction-2 ((test math-test-case) :run nil) (let ((result (- (numbera test) (numberb test)))) (assert-equal result 1 "This is meant to failure")))
Now we run all the methods applicable to math-test-case classes.
(xlunit:textui-test-run (xlunit:get-suite math-test-case)) .F.. Time: 0.0 There was 1 failure: 1) TEST-SUBTRACTION-2: Assert equal: -1 1 This is meant to failure FAILURES!!! Run: 3 Failures: 1 Errors: 0 #<TEST-RESULTS {10051EF373}>
As we can see, while there are four assertions in total, the report shows 3, meaning the number of methods run. If we looked at the internal details of the test-results instance returned at the end, it would show a count of 3 as well.
- Lists of tests
- Removing tests
Xlunit has a
remove-test
function - Sequencing, Random and Failure Only
Everything is sequential. There are no provisions for collecting and re-running only failed tests.
- Skip Capability
None
- Random Data Generators
None
35.4. Discussion
Phil Gold's 2007 review essentially concluded that xlunit feels clunky and lacks composition. I see no reason to differ from his conclusion.
35.5. Who Depends on XLUnit?
cambl, cl-heap (no longer maintained) and cl-marshal
36. xptest
36.1. Summary
No homepage | Craig Brozensky | Public Domain | 2015 |
XPtest is very old and it does the basics. It will report all the failed assertions and provides the ability to generate failure reports with diagnostic strings. It does not provide an interactive session (no debugger, just the report). It also does not seem to provide any signal testing, so you would have to write your own condition handlers. Overall it just feels clumsy. It was not tested in the original Phil Gold blogging note.
36.2. Assertion Functions
None - It just relies on CL predicates.
36.3. Usage
Xptest is very simple. You create a test-suite and a fixture. Tests are methods of the fixture and you then add them to the test-suite. You use regular CL predicates in your test and trigger a failure
function if they are not true.
- Report Format and Basic Operation
;; A test fixture and a suite get defined up front (def-test-fixture tf-xptest-fixture () ()) (defparameter *tf-xptest-suite* (make-test-suite "tf-xptest-suite" "test framework demonstration")) (defmethod t1 ((test tf-xptest-fixture)) (let ((x 1) (y 'a)) (unless (equal 1 x) (failure "t1.1 failed")) (unless (eq 'a y) (failure "t1.2 failed")))) (add-test (make-test-case "t1" 'tf-xptest-fixture :test-thunk 't1) *tf-xptest-suite*) (defmethod t1-fail ((test tf-xptest-fixture)) (let ((x 1) (y 'a)) (unless (equal 2 x) (failure "t1-fail.1 failed")) (unless (eq 'b y) (failure "t1-fail.2 failed")))) (add-test (make-test-case "t1-fail" 'tf-xptest-fixture :test-thunk 't1-fail) *tf-xptest-suite*)
You can use
run-test
on the test-suite. That will return a list of test-result objects, but that is not terribly useful. Digging into those objects will give you start and stop times, the test-fixture, a test-failure condition (if it failed) or a error condition if something else bad happened. Slightly more useful is runningreport-result
on the list returned fromrun-test
, but it only reports which tests passed and which tests failed.(run-test *tf-xptest-suite*) (#<TEST-RESULT {1002F88803}> #<TEST-RESULT {1002F88C93}>) (report-result (run-test *tf-xptest-suite*)) Test t1 Passed Test t1-fail Failed
There is a keyword parameter option of :verbose, but if you try to use it, it generates a format control error in the xptest source code that I am not going to try to debug.
Xptest properly picks up changes in tested functions without having to manually recompile tests.
- Multiple assertions, loops. closures and calling other tests
- Multiple assertions and value expressions
Xptest relies on CL for predicates and assertions, so you have to build your own multiple assertion test and decide how you would handle value expressions.
- Closures.
Xptest has no problem with the loop inside a closure test.
(let ((l1 '(#\a #\B #\z)) (l2 '(97 66 122))) (defmethod t2-loop ((test tf-xptest-fixture)) (loop for x in l1 for y in l2 do (unless (equal (char-code x) y) (failure "t2-loop"))))) (add-test (make-test-case "t2-loop" 'tf-xptest-fixture :test-thunk 't2-loop) *tf-xptest-suite*) (report-result (run-test *tf-xptest-suite*))
- Multiple assertions and value expressions
- Conditions
You would have to write your own condition handlers.
- Suites and fixtures
I am going to cheat here again and show the example from the source code.
(defparameter *math-test-suite* nil) (def-test-fixture math-fixture () ((numbera :accessor numbera) (numberb :accessor numberb)) (:documentation "Test fixture for math testing")) (defmethod setup ((fix math-fixture)) (setf (numbera fix) 2) (setf (numberb fix) 3)) (defmethod teardown ((fix math-fixture)) t) (defmethod addition-test ((test math-fixture)) (let ((result (+ (numbera test) (numberb test)))) (unless (= result 5) (failure "Result was not 5 when adding ~A and ~A" (numbera test) (numberb test))))) (defmethod subtraction-test ((test math-fixture)) (let ((result (- (numberb test) (numbera test)))) (unless (= result 1) (failure "Result was not 1 when subtracting ~A ~A" (numberb test) (numbera test))))) ;;; This method is meant to signal a failure (defmethod subtraction-test2 ((test math-fixture)) (let ((result (- (numbera test) (numberb test)))) (unless (= result 1) (failure "Result was not 1 when subtracting ~A ~A" (numbera test) (numberb test))))) (setf *math-test-suite* (make-test-suite "Math Test Suite" "Simple test suite for arithmetic operators." ("Addition Test" 'math-fixture :test-thunk 'addition-test :description "A simple test of the + operator") ("Subtraction Test" 'math-fixture :test-thunk 'subtraction-test :description "A simple test of the - operator"))) (add-test (make-test-case "Substraction Test 2" 'math-fixture :test-thunk 'subtraction-test2 :description "A broken substraction test, should fail.") *math-test-suite*) (report-result (run-test *math-test-suite*))
- Removing tests
Xptest has a
remove-test
function - Sequencing, Random and Failure Only
Sequential only
- Skip Capability
None
- Random Data Generators
None
36.4. Discussion
I do not see anything here that would really make me consider it.
37. Helper Libraries
37.1. assert-p
- Summary
homepage Noloop GPL3 2020 This is a library to help build your own assertions and is built on assertion-error by the same author (see below). The only library currently using it is Cacau.
I was really hoping for more here. Consider the following code from the library:
(defun not-equalp-p (actual expected) "Check actual not equalp expected" (assertion (not (equalp actual expected)) actual expected 'not-equalp))
Seven of the test frameworks described above provide assertions that accept diagnostic messages and pass variables to those diagnostic messages. Another eight provide assertions that accept diagnostic messages but without variables. Compared to those, this seems really elementary. I will leave it to writers of testing frameworks as to whether it is worthwhile, but from my perspective, it does not add anything useful to the forest of CL testing.
37.2. assertion-error
- Summary
homepage Noloop GPL3 2019 This is a library to build your own assertion-error conditions. It does depend on dissect. The only library currently using it is cacau.
The entire source code is:
(define-condition assertion-error (error) ((assertion-error-message :initarg :assertion-error-message :reader assertion-error-message) (assertion-error-result :initarg :assertion-error-result :reader assertion-error-result) (assertion-error-actual :initarg :assertion-error-actual :reader assertion-error-actual) (assertion-error-expected :initarg :assertion-error-expected :reader assertion-error-expected) (assertion-error-stack :initarg :assertion-error-stack :reader assertion-error-stack))) (defun get-stack-trace () (stack))
I will leave it to writers of testing frameworks as to whether it is worthwhile.
37.3. check-it
- Summary
homepage Kyle Littler LLGPL 2015 Check-it is the opposite of a mock and stub library which provides known values. Check-it provides randomized input values based on properties of the input. Some testing frameworks provide random value generators, but this is more complete, so use this with your favorite test framework. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.
- Usage
- General Usage
The general usage is to call
generate
on agenerator
given a specific type with optional specifications. The following examples use optional lower and upper bounds.(check-it:generate (check-it:generator (integer -3 10))) 6 (check-it:generate (check-it:generator (character #\a #\k))) #\f (let ((gen-i (check-it:generator (list (integer -10 10) :min-length 3 :max-length 10)))) (check-it:generate gen-i)) (5 0 8 -2 9) (check-it:generate (check-it:generator (string :min-length 3 :max-length 10))) "Uw76ZV"
- Values must meet a predicate
You can ensure that values meet a specific predicate. The generator will keep trying until that predicate is met. In the following example we want a character between #\a and #\f but not #\c.
(check-it:generate (check-it:generator (check-it:guard (lambda (x) (not (eql x #\c))) (character #\a #\f)))) #\e
- Or Generator
The OR generator takes subgenerators and randomly chooses one. For example:
(let ((gen-num (check-it:generator (or (integer) (real))))) (loop for x from 1 to 5 collect (check-it:generate gen-num))) (7 6.685932 6 -9 9)
- Struct Generator
If you have a struct that has default constructor functions, you can use a struct generator to build out the slots.
(check-it:generate (check-it:generator (check-it:struct b-struct :slot-1 (integer) :slot-2 (string) :slot-3 (real)))) #S(B-STRUCT :SLOT-1 2 :SLOT-2 "iE4qZ5U00oOs" :SLOT-3 5.9885387)
For more fun and games you can do with this library, see https://github.com/DalekBaldwin/check-it
- General Usage
37.4. cl-fuzz
- Summary
homepage Neil T. Dantam BSD 2 Clause 2018 Cl-fuzz is another random data generating library. To use it you need to define a function to generate random data, then generate a function to perform some tests, then pass both to
fuzz:run-tests
(notperform-tests
as the readme states). To be honest, I do not think there is much utility here compared to the frameworks we have looked at plus check-it and cl-quickcheck.
37.5. cl-quickcheck
- Summary
homepage Andrew Pennebaker MIT 2020 Cl-quickcheck focuses on "property based tests". In other words, write tests use random inputs matching some specification, apply a operation to the data and assert something about that result. Cl-quickcheck is effectively an assertion library with the ability to generate different types of inputs. If you look at packages in quicklisp which use it, Burgled-Batteries uses it in conjunction with Lift; Test-utils uses it in conjunction with Prove and only json-streams uses it on its own. As such I decided to put it in the Helpers section rather than in the frameworks section.
Cl-quickcheck has somewhat more functionality than check-it in that it does have assertions. I still think the generators are the real raison d'être for both these libraries. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.
- Assertions
is is= isnt isnt= should-signal - Report Format
Cl-quickcheck follows the typical pattern of . for a passing test. Instead of an f, it prints X for failures.
To jump immediately into the debugger rather than a report format, set
*break-on-failure*
to t.To eliminate progress reports, set
*loud*
to nil. - Usage
The number of iterations of a test using a generator is set by
*num-trials*
which starts with a default value of 100.To take a silly example, the following is a test that asserts that any integer multiplied by two will equal the integer plus itself and we will set
*num-trials*
20. Thusn
will be set to a random integer generated byan-integer
and the assertion will be run 20 times with a newn
generated each time.(setf *num-trials* 20) (for-all ((n an-integer)) (is= (* 2 n) (+ n n))) ....................
If we modify that to be always wrong, we only get a single resulting X
(for-all ((n an-integer)) (is= (* 2 n) (+ n n 1))) X
The following are all passing assertions.
(is= 1 1) (is = 1 1 1) (should-signal 'division-by-zero (error 'floating-point-overflow)) (for-all ((n an-integer)) (is= (* 2 n) (+ n n)))
- Miscellaneous Comments
The first thing I had to learn looking at cl-quickcheck was that a-boolean, a-real, an-index, an-integer, k-generator, m-generator and n-generator are functions in a variable (funcallable) but a-char, a-list, a-member, a-string, a-symbol and a-tuple were functions of their own. The difference in how they can called is confusing, at least for me.
37.6. hamcrest
- Summary
homepage Alexander Artemenko New BSD 2020 Hamcrest's idea is to use pattern matching to make unit tests more readable.
- Usage
37.7. mockingbird
- Summary
homepage Christopher Eames MIT 2017 Stubs and Mocks are used to ensure constant values are returned instead of computed values for use in testing.
- Usage
Assume two functions for this usage demonstration:
(defun foo (x) x) (defun bar (x y) (+ x (foo x)))
The WITH-STUBS macro provides lexical scoping for calling functions with guaranteed results.
(with-stubs ((foo 10)) (foo 1)) 10 (with-stubs ((foo 10)) (bar 3)) 6
As an example of how this would look used in a testing framework, the following uses parachute and mb is the nickname for mockingbird.
(define-test mockingbird-1 (mb:with-stubs ((foo 10)) (is = (bar 3) 6))) MOCKINGBIRD-1 (test 'mockingbird-1) ? TF-PARACHUTE::MOCKINGBIRD-1 0.003 ✔ (is = (bar 3) 6) 0.007 ✔ TF-PARACHUTE::MOCKINGBIRD-1 ;; Summary: Passed: 1 Failed: 0 Skipped: 0 #<PLAIN 2, PASSED results>
The WITH-DYNAMIC-STUBS macro provides dynamic scoping for calling functions with guaranteed results.
(with-dynamic-stubs ((foo 10)) (bar 3)) 13
The WITH-MOCKS macro provides lexical scoping for calling functions but ensuring they return nil.
(with-mocks (foo) (foo 5)) NIL (with-mocks (foo) (bar 5)) 10
37.8. portch
37.9. protest
- Summary
homepage Michał Herda LLGPL 2020 Protest is a wrapper around other testing libraries, currently 1am and parachute. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step. Other useful reading would be The concept of a protocol by Robert Strandh.
- Usage
- Discussion
37.10. rtch
37.11. testbild
37.12. test-utils
- Summary
homepage Leo Zovic MIT 2020 Test-utils provides convenience functions and macros for prove and cl-quickcheck. It adds new generators to cl-quickcheck such as a-ratio, a-number, a-keyword, an-atom, a-pair, a-vector and a-hash.
It also has QUIET-CHECK which runs a cl-quickcheck suite but only sends to
*standard-output*
on failure.It provides additional generators for:
- a-ratio
- a-number
- a-keyword
- an-atom
- a-pair
- a-vector
- a-hash
- a-value
- a-alist
- a-plist
- an-improper-list
- an-array
38. Test Coverage Tools
38.1. sb-cover
The following is a sample sequence running sb-cover on the package you want to test
(require :sb-cover) ;; now you need to tell SBCL to instrument what it is about to load (declaim (optimize sb-cover:store-coverage-data)) (asdf:oos 'asdf:load-op :your-package-name-here :force t) ;; Now run your tests. (run-all-tests 'blah-blah-blah-package) (sb-cover:report "path-to-directory-for-the-coverage-htmlpages" :form-mode :car) ;; now restore SBCL to its normal state (declaim (optimize (sb-cover:store-coverage-data 0))) ;; to restore
The last line turns off the instrumentation after the report has been generated. The sb-cover:report line should have generated one or more html pages, starting with a page named cover-index.html in the specified directory directory which shows:
- expression
- branch
on a file by file basis in the your package. Now the html pages will also print out the source file, color coded showing not executed expressions and, where the expression might have conditionals or branches, whether each of those conditional points or branches were actually triggered in the test. E.g.
(defun foo (x) (if (evenp x) 1 2))
If the test only ran with (foo some-even-number) and not (foo some-odd-number), that fact would be highlighted.)
sb-cover can be enabled globally. (eval '(declaim (optimize sb-cover:store-coverage-data)))
Per pfdietz: "The problem I have with sb-cover is that is can screw up when the readtable is changed. It needs to somehow record readtable information to properly annotate source files."
38.2. CCL code coverage
I have not used this tool.
(setf ccl:*compile-code-coverage* t)
Comment: when ccl:*compile-code-coverage*
was set to t, compiling ironclad triggered an error:
[package ironclad]……. > Error: The value (&LAP . 0) is not of the expected type VAR. > While executing: DECOMP-VAR, in process listener(1).
38.3. cover
38.4. Source Code
39. Appendix
39.1. Problem Space
Testing covers a lot of ground, there are unit tests, regression tests, test driven development, etc. Testing often runs on an automated basis, but CL being CL, it can be part of an interactive development process. Some people write their unit tests first, then develop to pass the tests (test driven development). Testing is also not error checking.
Ideally a testing framework should make it as easy as possible to write tests, cover different inputs and produce a report showing what passed and failed. If you are writing a library rather than an application, it can be useful to recognize that your test suites are a client to your library's API (and if you find it hard to write the tests, think about how a client user will feel).
Assuming the source is available, the tests should be part of your user documentation in showing how to use the library and an ideal testing framework should make it easy for users to see the tests as examples.
I have seen reasoned arguments that unit tests should only cover exported functions, generally on the grounds that this implicitly tests the internal functions and any additional testing is just adding technical debt. My response is typically, fine, so long as the tests on the exported function can show how it failed. If it depends on 100 internal functions, can you trace back to the real point of failure? If testing is a defense against change, then testing code that has no reason to change does not add value to your test suite - until, of course, you refactor and suddenly it does. By the way, for those who are concerned about static typing, unit tests do not replace static typing unless you actually test by really throwing different inputs at the function being tested.
39.2. Terminology
Different frameworks address different problem sets but before I discuss the problem space, I want to get some terminology out of the way first.
- Testing Types
- Integration testing deals with how units of software interact.
- Mutation Testing - Pfdietz brought the concept of mutation testing to my attention. This concept targets testing your test suite by inserting errors into programs and measuring the ability of the test suite to detect them.
- Property based testing (PBT) makes statements about the output of your code based on the input. The statements are tested by feeding random data to tests that are focused on the stated properties. E.g. in testing an addition function, adding zero to thenumber should result in the same number and changing the order of the inputs should result in the same number. Similarly, a function that reverses a list should always result in (a) a list and (b) the first element of the result was the last element of the input, etc. This obviously requires more thinking about each test. In some respects, what this gets you thinking about is what constitutes valid input and edge cases, and then you need to write generators to randomly generate input that meets (or fails to meet) those criteria. PBT is not a replacement for what I will call result testing, it is an additional testing strategy. cl-quickcheck and nst provide property based testing.
- Regression tests verify that software that has already been written is still correct after it is changed or combined with other software. In other words, it worked yesterday, does it still work after a bug fix, after refactoring or after another system has been connected. Interactive development from the repl does not address this problem.
- Unit testing deals with a separate software system or subsystem. (I am not interested in arguing how small the unit needs to be. I leave that up to the TDD missionaries and the TDD haters.) Unit testing can be a part of regression testing - regression tests are often built on suites of unit tests. You might have multiple tests for each function and a suite of tests for every function in a file. As I use the term "unit test", I am talking about how much code is covered, not whether the unit tests are "property based tests" or result testing.
- Other Terms
- Assertions - the types of equality tests available. "Assert-eq", "Assert-equal" and "Assert-true" are typical. Some packages provide assertions that have descriptive messages to help debug failures. Some packages (e.g. Confidence) provide built-in assertions test float comparisions. Some packages allow you to define your own assertions.
- Code coverage apparently means different things to different people. I have seen test suites that cover every function, but only with a single simple expected input and 100% code coverage victory has been declared. That is barely a hand wave. As one person has said, that checks that your code is right, but does not check that your code is not wrong. Of course, there are trivial bits of code where it is pointless to try to think about possible different inputs to test.
- Fixtures (sometimes referred to as contexts)- Fixtures create a temporary environment with a known data set used for the tests. They may be static variables, constructed database tables, etc. Typically there is a setup and teardown process to ensure that the testing environment is in a known state.
- Mocks - Mocking is a variation on Fixtures. While fixtures are intended to create a known data collection to test against, mocking is intended to eliminate external dependencies in code and create known faux code which can be used as an input (sometimes called stubs) or compared after a test is run to see if there are expected or unexpected side effects.
- Parametrization means running the same test body with different input each time. You can do this either by running a test against a collection of test data or you can do it within a single test by running the test body against a list of forms or test data. How you do it will depend on which way will make it easier to determine what test and what parameters triggered the failure.
- Refactoring typically requires rewriting unit tests for everything that was touched, then re-running test suites to ensure that everything still works together.
- Reporting - a failing test should generate a usable bug report. Do we know the input, output, function involved, expected result and what we thought we were testing for? Note that what we thought we are testing for is not the same as the expected result.
- Shuffle Testing - Randomly changing the sequence in which tests are applied.
- TAP - TAP (the Test Anything Protocol) is a text based interface between testing modules, decoupling reporting of errors from the presentation of reports. In other words, you can write a TAP consumer which takes TAP spec output from the test harness and the TAP consumer would be responsible for generating user friendly reports (or do other things like compare the tests against your own list of functions to generate your own code coverage report). Development on the SPEC seems to have ceased in 2017. There was a hackernews discussion in JUne 2020 which can be found here.
- Verification means that your code contains every bug in the specification.
- Validation means that it is doing the right thing. This may or may not be possible to automate. I do not envy front end developers or designers dealing with clients.
- TAP Output - Test Anything Protocol is a formally specified output, considered by some to be a superior alternative to xUnit type testing. Depending on the output mechanisms, TAP can be easy to read but difficult to parse.
- Discussion
In general, each test has three parts - the setup, the action and the validation. Does the testing framework make it easy to see each of those segments when reading the test or reading any report coming out of the test?
Similarly, when the tests are run and a test fails, is it obvious from the report what happened or do you need to start a debugging session with limited information? It is one thing for a test to report failure, another thing to report what was expected compared to what was generated, and still a much better result to indicate that the correct value was in (aref array-name 2) instead of the expected (aref array-name 0) - the context of the failure.
Does the test framework allow long enough names for tests and hierarchies (or accept comments) to give meaningful reports?
How easy is it to run parameterized tests - the test logic is the same, but you run different parameters through the same tests and expect different results?
Footnotes:
Not available from QuickLisp
Looking for new maintainer. Has been forked to clunit2 and you should only consider clunit2.
Fork of stefil
Port to Clasp, otherwise 2015. Author has stated that it is no longer maintained and he is no longer involved in CL.
The authors have specified it as obsolete, so it will not be further considered.