Comparison of Common Lisp Testing Frameworks (28 Aug 2023 Edition)

1. Changelog

28 Aug 2023 - Picked up Shinmera's name change.

14 Aug 2023 - A number of updates for Try

18 Feb 2023 - Added a build-up and tear down example for Parachute fixtures.

30 Jan 2023 - Updated for some new functionality in Parachute with respect to parent fixtures now being inherited by children and added a useful editor customization.

27 Jan 2023 - Adding input from Pierre Neidhardt: adding vocabulary criteria, correcting that Lisp-Unit2 does have progress reports. Adding some additional examples for Parachute and Lisp-Unit2.

16 Jan 2023 - Updated adding Confidence and Try and deleting Kaputt. Thank you Michaël Le Barbier for actually doing the work of submitting the proposed pull request for Confidence.

2 October 2021 - Updated for changes at the Common Lisp Cookbook on testing. Updated number of issues and pull requests for FiveAM.

13 June 2021 - Updated clunit2 on its huge performance improvement update and new edge case abilities to compare multiple values in values expressions and handle variables declared in closures. Clunit2 is now substantially differentiated from clunit.

2. Introduction

What testing framework should I use? The response should not be X or Y because it is "battle-tested", or "extensible", or "has color coding". Those are just content-free advertising buzzwords. Some webpages which mention different Common Lisp unit test frameworks merely parrot comments from the framework authors without ensuring their veracity.

The real response should be to respond, "How do you code?" But even armed with that information, how do you match a testing framework to the user's needs? Common Lisp has a ridiculous number of testing framework libraries.

Some past reviews of testing frameworks were done in 2007 and 2010 (NST review). There was also some work in 2012 by the author of clunit, which seems to have bitrotten and is now only accessible via the Wayback Machine - part 1 and part 2. I thought it was time for an update. I am open to pull requests on this document for corrections, additions, whatever. See https://github.com/sabracrolleton/sabracrolleton.github.io

The best testing framework is context-dependent, and that context includes how you work. As an example, there was an exchange on r/common​_lisp between dzecniv and Shinmera. dzecniv likes Prove more than Parachute because they could run tests just by hitting (C-c) the source. Shinmera pointed out a way you could easily add that to Parachute, but he views compilation and execution as two different things. I'm in Shinmera's camp - I do not want to run the test until I want to run it. At the same time, I want to be clear on terminology. If I understand the context, dzecniv was talking about was what I would call an assertion. In the interest of clear terminology, consider the following pseudocode -

(defsuite s0  ; (1)
  (deftest t1 ; (2)
    (assert-true (= 1 1)))) ; (3)
(1)
I call this a suite - it contains one or more tests (or other suites) and hopefully composes the results
(2)
I will call this a test - it contains one or more assertions and hopefully composes the results
(3)
I will call this an assertion and the assert-true or equivalent as a testing function

I do not want to run assertions at compile-time, but I do not compile assertions outside of tests anyway. So, just for Dzecniv and those like him, there is a functionality table on running assertions on compilation.

Some people are just looking for regression testing - they focus on running all the tests at once. For example, the README for Prove does not even mention deftest or run-test. Others are looking for TDD or other continuous testing for their development. I use individual tests a lot during development, but obviously need regression testing as well.

Some testing frameworks insist on tracking progress by printing little dots or checkmarks for every assertion or test that passed or failed. Consider a library like uax-9, which has 1,815,582 assertions (yes, the tests are autogenerated). I don't want to waste screen space on that many dots; I don't want to waste time on printing those dots to screen, nor, if a test fails, in trying to find the failed test amidst the haystack of successful tests. Therefore, I prefer a testing framework that allows me to only collect failures, and to turn off progress reporting. Of course, some frameworks do not display progress reports at all - or do so at the test level rather than the assertion level - but you might really want or need those progress reports.

Some frameworks cannot find lexical variables declared in a closure containing the test. Most people will not care, but if you use closures, you might.

Some people find that whether you use deftest v. define-test or similar vocabulary terms makes a difference. As Pierre Neidhardt said: `define-test` and `assert-false` are well chosen words because:

  • They are full, legible words, which is more Lispy (as opposed to "deftest").
  • Emacs colors them properly by default.

Other contexts are best served by other libraries. Complex fixture requirements might drive you towards something like NST, the need to define your own bespoke assertions might be served by should-test or confidence. Numerical comparisons might lead you to use lisp-unit, lisp-unit2 or confidence (or just writing your own). For macro-expansion testing, clunit, clunit2, lisp-unit, lisp-unit2 and rove have specific assertion functions. Your situation should govern which testing framework you use, not your project skeleton or market share.

I occasionally hear "extensibility" used as a buzzword applied to a framework. I'm not sure what it means to people. Consider the following example in sxql, which is not a testing framework but which included this macro in its own test file to simplify its use of prove -

(defmacro is-mv (test result &optional desc)
  `(is (multiple-value-list (yield ,test))
       ,result
       ,desc))

(is-mv (select ((:+ 1 1)))
       '("SELECT (? + ?)" (1 1))
       "field")

This is not hard to do, but may not be as composable.

In any event, yes, I will state my opinions, but what you think is important will drive your preferences in testing frameworks.

3. Testing Libraries Considered

3.1. Testing Frameworks

Table 1: Libraries Considered (updated 16 Jan 2023)
Library Homepage Author License Last Update
1am homepage James Lawrence MIT 2014
2am 1 homepage Daniel Kochmański MIT 2016
cacau homepage Noloop GPL3 2020
cardiogram homepage Abraham Aguilar MIT 2020
clunit 2 homepage Tapiwa Gutu BSD 2017
clunit2 homepage Cage (fork of clunit) BSD 2022
com.gigamonkeys.test-framework homepage Peter Seibel BSD 2010
confidence homepage Michaël Le Barbier MIT 2023
fiasco 3 homepage João Távora BSD 2 Clause 2020
fiveam homepage Edward Marco Baringer BSD 2020
lift homepage Gary Warren King MIT 2019 4
lisp-unit homepage Thomas M. Hermann MIT 2017
lisp-unit2 homepage Russ Tyndall MIT 2018
nst homepage John Maraist LLGPL3 latest 2021
parachute homepage Yukari Hafner zlib 2021
prove (archived) homepage Eitaro Fukamachi MIT 2020
ptester homepage Kevin Layer LLGPL 2016
rove homepage Eitaro Fukamachi BSD 3 Clause 2022
rt none Kevin M. Rosenberg MIT 2010
should-test homepage Vsevolod Dyomkin MIT 2019
simplet homepage Noloop GPLv3 2019
stefil 5 homepage Attila Lendvai, Tamas Borbely, Levente Meszaros BSD/Public Domain 2018
tap-unit-test 6 homepage Christopher K. Riesbeck, John Hanley MIT 2017
try homepage Gábor Melis MIT 2022
unit-test homepage Manuel Odendahl, Alain Picard MIT 2012
xlunit homepage Kevin RosenBerg BSD 2015
xptest none Craig Brozensky Public Domain 2015

3.2. Speciality Libaries

Table 2: Speciality Libaries
Library Homepage Author License Last Update
checkl homepage Ryan Pavlik LLGPL, BSD 2018
Table 3: Selenium Interface Libaries
Library Homepage Author License Last Update Selenium
cl-selenium-webdriver homepage TatriX MIT 2018 2.0
selenium homepage Matthew Kennedy LLGPL 2016 1.0?

The selenium interfaces are here for reference purposes and are not further discussed.

3.3. Helper Libraries

Table 4: Libraries Considered
Library Homepage Author License Last Update
assert-p homepage Noloop GPL3 2020
assertion-error homepage Noloop GPL3 2019
check-it homepage Kyle Littler LLGPL 2015
cl-fuzz homepage Neil T. Dantam BSD 2 Clause 2018
cl-quickcheck homepage Andrew Pennebaker MIT 2020
cover homepage Richard Waters MIT 1991
hamcrest homepage Alexander Artemenko BSD 3 Clause 2022
mockingbird homepage Christopher Eames MIT 2017
portch (not in quicklisp) homepage Nick Allen BSD 3 Clause 2009
protest homepage Michał Herda LLGPL 2020
rtch (not in quicklisp) download David Thompson LLGPL 2008
slite homepage Arnold Noronha Apache 2.0 2022
testbild homepage Alexander Kahl GPLv3 2010
test-utils homepage Leo Zovic MIT 2020

assert-p, assertion-error, check-it, cl-fuzz, cl-quickcheck, cover, hamcrest, protest, slite, testbild and test-utils are not, per se, testing frameworks. They are designed to be used in conjunction with other testing frameworks.

  • check-it and cl-quickcheck are randomized property-based testing libraries (Quickcheck style). See https://en.wikipedia.org/wiki/QuickCheck
  • cl-fuzz is another variant of testing with random data.
  • assert-p and Assertion-error are collections of assertions or assertion error macros that can be used in testing frameworks or by a test runner.
  • cover is a test coverage library, much like SBCL's sb-cover, CCL's code-cover, or LispWorks Code Coverage
  • hamcrest uses pattern matching for building tests.
  • mockingbird provides stubbing and mocking macros for unit testing. These are used when specified functions in a test should not be computed but should instead return a provided constant value.
  • portch helps organize tests written with Franz's portable ptester library
  • protest is a wrapper around other testing libraries, currently 1am and parachute. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step.
  • rtch helps organize RT tests based on their position in a directory hierarchy
  • slite is a Slite stands for SLIme TEst runner (also works with SLY). Slite interactively runs your Common Lisp tests (currently only FiveAM and Parachute are supported). It allows you to see the summary of test failures, jump to test definitions, rerun tests with debugger all from inside Emacs.
  • testbild provides a common interface for unit testing output, supporting TAP (versions 12 and 13) and xunit styles.
  • test-utils provides convenience functions and macros for prove and cl-quickcheck.

top

3.4. Dependencies

Libraries not in the table below do not show any dependencies in their asd files.

Table 5: Library Dependencies
Library Dependencies
cacau eventbus, assertion-error
checkl marshal
fiasco alexandria, trivial-gray-streams
fiveam alexandria, net.didierverna.asdf-flv, trivial-backtrace
lisp-unit2 alexandria, cl-interpol, iterate, symbol-munger
nst (#+(or allegro sbcl clozure openmcl clisp) closer-mop, org-sampler)
parachute documentation-utils, form-fiddle
prove cl-ppcre, cl-ansi-text, cl-colors, alexandria, uiop
rove trivial-gray-streams, uiop
should-test rutils, local-time, osicat, cl-ppcre

4. Quick Summary

4.1. Opinionated Awards

For those who want the opinionated quick summary. The awards are -

  • Best General Purpose: Parachute followed by Lisp-Unit2.

    Parachute hits almost everything on my wish list - optional progress reports and debugging, good suite setup and reporting, good default error-reporting and the ability to provide diagnostic strings with variables, the ability to skip failing test dependencies, to set time limits on tests, to report the time for each test and decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on. While it is not the fastest, it is in the pack as opposed to the also-rans.

    Atlas Engineering makes a strong case for Lisp-Unit2 here.

    My use cases give Parachute a slight edge because I can get more detail on assertions within a test than with Lisp-Unit2. Atlas Engineering's use cases give Lisp-Unit2 a slight edge because its tests are functions. Your choice will depend on your use case.

    My next pick would be Fiasco, but I like Parachute and Lisp-Unit2's fixture capability and suite setup better.

    (Update 13 June 2021 - based on the latest update of Clunit2, it needs to be included for consideration as well) (Update 16 Jan 2023 - maybe consider the two newest entries Confidence and Try)

  • If Only Award: Lift If only it reported all failing assertions and did not stop at the first one. Why? Why can't I change this?
  • If you only care about speed: Lift and 2am Go to Benchmarking
  • Best General Purpose Fixtures (Suite/Tag and test level): Lisp-Unit2 and Lift
  • Ability to reuse tests in multiple suites: Lisp-Unit2 (because of composable tags), Try
  • If you need tests to take parameters: Fiasco, Confidence and Try
  • If you need progress reporting to be optional: Parachute, Lisp-Unit2, Fiasco, Clunit2, or Try
  • Favorite Hierarchy Setup (nestable suites): Parachute and Lisp-Unit2 (which has a different setup using tags)

    Everything is a test and its :parents all the way up; can easily specify parents at the child level.

    Honorable mentions - 2am and Lift

  • Assertions that take diagnostic comments with variables: Parachute, Fiasco, 2am, Fiveam, Lift, Clunit2, Confidence, Try This is something that I like for debugging purposes along with whatever reporting comes built in with the framework. See error-reporting
  • Values expression testing: Lisp-Unit2, Lisp-Unit, Parachute, (Update Clunit2 and Try as well)
  • I want to track if my functions changed results: Checkl
  • Tests that specify suite or tags (does not rely on location in file): Parachute, Lisp-Unit (tags), Lisp-Unit2(tags), Lift, Clunit2
  • Heavy duty complex fixtures: NST (but there are trade-offs in the shape of the learning curve and performance)
  • Ability to define new assertions: Confidence, NST, Try (but they have their issues in other areas)
  • Ability to rerun failures only: Fiasco, Lisp-Unit2, Try (you can extend Parachute and Fiveam to get this, but it is not there now)
  • Favorite Random Data Generator: Check-it
  • Can redirect output to a different stream (a): Clunit2, Confidence, Fiasco, Lift, Lisp-Unit, Lisp-Unit2, RT and Try
  • Randomized Property Tests: Check-it with any framework
  • Choice of Interactive Debugging or Reporting: Most frameworks at this point
  • Rosetta Stone Award for reading different test formats: Parachute (can read Fiveam, Prove and Lisp-Unit tests)
  • Code Coverage Reports: Use your compiler
  • I use it because it was included in my project skeleton generator: Prove

(a) Most frameworks just write to *standard-output* so you have to redirect that to a file.

top

4.2. Features Considered

  • Ease of use and documentation: Most of the frameworks are straightforward. Some have no documentation, others have partial documentation (often documenting only one use case). The documentation may be out of sync with the code. Some get so excited about writing up the implementation details that it becomes difficult to see the forest for the trees. NST has a high learning curve. Prove and Rove will require digging into the source code if you want to do more than simple regression testing. Lift has a lot of undocumented functionality that might be just what you need but you have no way of knowing.
  • Tests
    • Tests should take multiple assertions and report ALL the assertion failures in the test (Looking at you Lift, and Xlunit - I put multiple assertions into a test for a reason, please do not lose some of the evidence.)
    • Are tests functions or otherwise funcallable? (Faré and others requested this in an exchange with Tapiwa, the author of Clunit, back in 2013. At the same time others want or do not want test names in the function namespace. You choose your preference. Those who want funcallable tests typically cite either the ability to program running the test or the ability to go to definion from test name.)
    • Immediate access to source code (Integration with debugger or funcallable tests?)
    • Does a failure or error throw you immediately into the debugger, never into the debugger, and is that optional?
    • Easy to test structures/classes (does the framework provide assistance in determining that all parts of a structure or class meet a test)
    • Tests can call other tests (This is not the same as funcallable tests. To be useful this does require a minimum level of test labeling in the reporting.)
  • Assertions (aka Assertion Functions)
    • There are frameworks with only a few assertion test functions. There are frameworks with so many assertions that you wonder if you have to learn them all. The advantage of specialized assertions is less typing, possibly faster (or slower) performance and possibly relevant built-in error messages. You will have to check for yourself whether performance is positively or negatively impacted. You have to decide for yourself how much weight to put on extra assertions like having assert-symbolp instead of (is (symbolp x)).
    • Assertions that either automatically explain why the the test failed or allow a diagnostic string that describes the assertion and what failed. (Have you ever seen a test fail but the report of what it should have been and what the result was look exactly the same? Maybe the test required EQL and you thought it was EQUALP? These might or might not help)
    • Can assertions can access variables in a closure containing the test? (Most frameworks can, but Clunit, Clunit2, Lisp-Unit, Lisp-Unit2 and NST cannot).
    • Do the assertions have macroexpand assertion functions? (Clunit, Clunit2, Lisp-Unit, Lisp-Unit2, Prove, Rove and Tap-Unit-Test have this)
    • Do the assertions have floating point and rational comparisons or do you have to write your own? (Confidence, Lift, Lisp-Unit, Lisp-Unit2, have these functions for you.)
    • Signal and condition testing or at least be able to validate that the right condition was signalled.
    • Definable assertions/criteria (can you easily define additional assertions?)
    • Do assertions or tests run on compilation (C-c C-c in the source file)?
    • Do the assertions handle values expressions? Most frameworks accept a values expression but compare just the first value. Fiveam complains about getting a values expression and throws an error. Parachute and NST will compare a single values expression against multiple individual values. Prove will compare a values expression against a list. Lisp-Unit and Lisp-Unit2 (Update Clunit2) will actually compare two values expressions value by value.
  • Easy to set up understandable suites and hierarchies or tags. Many frameworks automatically add tests to the last test suite that was defined. That it makes things easy if you work very linearly or just in files for regression testing. If you are working in the REPL and switching between multiple test sub-suites that can create unexpected behavior. I like to able to specify the suite (or tags) when defining the test, but that creates more unecessary typing if you work differently.
  • Choice of Interactive (drop directly into the debugger) or Reporting (run one or more tests and show which ones fail and which ones pass).
  • Data generators are nice to have, but the helper libraries Check-it and Cl-Quickcheck can also be used and probably have more extensive facilities.
  • Easy to setup and clean up Fixtures
    • Composable fixtures (fixtures for multiple test suites can be composed into a single fixture)
    • Freezing existing data while a test temporarily changes it
    • Systems that can wrap around tests, allowing you to pass local variables around.
  • Compilation: Some people want the ability to compile before running tests for two reasons. First, deferred compilation can seriously slow down extensive tests. Second, getting compile errors and warnings at the test run stage can be hard to track down in the middle of a lot of test output. Other people want deferred compilation (running the test compiles it, so no pre-compilation step required) and tested functions which have changed will get picked up when running the test.
  • Reports
    • Easy to read reports with descriptive comments (this requires that each test have description or documentation support)
    • Does the framework have progress reporting, at what level and can it be turned off?
    • Report just failing tests with descriptive info
    • Composable Reports (in the sense of a single report aggregating multiple tests or test suites)
    • Reports to File. I know most developers do not care, but I have seen situations where the ability to prove that the software at date A is documented to have passed xyz tests would have been nice. See Dribble and Output Streams
    • Test Timing. See Timing
    • TAP Output (some people like to pass this test results in this format on to other tools).
    • Reports of Function (and parameter) test coverage (Rove was the only framework that has something in this area and it depends on using SBCL. I would suggest looking to your compiler and did not test this.)
  • Error tracking (Do test runs create a test history so that you can run only against failing tests?) As far as I can tell, no framework creates a database to allow historical analysis.
  • Test Sequencing Shuffling
    • Can choose test sequencing or shuffle
    • Can choose consistent or random or fuzzing data
    • Can choose just the tests that failed last time (Chris Riesbeck exchange with Tapiwa in 2013)
  • Ability to skip tests Skipping
    • Skip tests
    • Skip assertions
    • Skip based on implementations
    • also skip tests that exceed a certain time period
  • Benchmarks In general, functionality should matter to you more than benchmarks. The timing benchmark provided here is a regression test on UAX-15 with 16 tests containing 343332 assertions run 10 times. While useful for regression tests, this is not necessarily indicative of development testing.
  • Asynchronous and parallel testing (not tested in this report)
  • Case safety (Max Mikhanosha asked for this an an exchange with Tapiwa in 2013. Not tested in this report)
  • Memory, time and resource usage reports (no one documented this and I did not dive into the source code looking for it.)

I am not covering support for asdf package-inferred systems, roswell script support and integration with travis ci, github actions, Coveralls, etc. If someone wants to do that and submit a pull request, I am open to that.

I am not including a pie chart describing which library has market share because (a) I do not like pie charts and (b) I do not believe market share is a measure of quality. That being said, because someone asked nicely, I pulled the following info out of quicklisp just based on who-depends-on. The actual count in the wild is completely unknown.

Table 6: User Count on Quicklisp
Name Count
1am 22
2am 0
fiveam 323
clunit 11
clunit2 4
confidence 2
fiasco 24
lift 54
lisp-unit 42
lisp-unit2 21
nst 10
parachute 49
prove 163
ptester 5
rove 31
rt 29
should-test 3
try 7
xlunit 4
xptest 0

5. Functionality Comparison

5.1. Hierarchy Overview

Table 7: Overview-1
Name Hierarchies/suites/tags/lists Composable Reports
1am ❌️ (2)(5) ❌️ ❌️
2am ✅️ ✅️ (5) (4)
cacau (6)   (4)
clunit ✅️ ✅️ (4)
clunit2 ✅️ ✅️ (4)
confidence ❌️ (9) ✅️ ✅️
fiasco ✅️ ✅️  
fiveam ✅️ ✅️  
gigamonkeys ❌️    
lift ✅️ ✅️  
lisp-unit (tags) (3)   (1,4)
lisp-unit2 (tags) (3)(5) ✅️ (5) (1,4)
nst ✅️ ✅️  
parachute ✅️ ✅️ (1)
prove ✅️ ✅️ (4)
ptester ❌️    
rove (7) (7)  
rt package (8)  
should-test package    
simplet ❌️    
tap-unit-test ❌️   (4)
try ✅️ ✅️ (5) ✅️
unit-test ✅️ ✅️  
xlunit ✅️ ✅️  
xptest ✅️ ❌️  
  1. report objects are provided which are expected to be extended by the user
  2. uses a flat list of tests. You can pass any list of test-names to run. See, e.g. macro provided by Phoe in the 1am discussion.
  3. lisp-unit and lisp-unit2 organize by packages and by tags. You can run all the tests in a package, or all the tests for a list of tags, but they do not have the strict sense of hierarchy that other libraries have.
  4. TAP Formatted Reports are available
  5. Because tests are functions, tests can call other functions so you can create ad-hoc suites or hierarchies.
  6. Has suites but no real capacity to run them independently - all or nothing
  7. Rove's run-suite function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing. Rove's run function does accept a style parameter but seems to handle only package-inferred systems. I confirm Rove's issue #42 that it will not run with non-package inferred systems.
  8. RT does not have suites per se. You can run all the tests that have been defined using the DO-TESTS function. By default it prints to *standard-output* but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice. do-tests will print the results for each individual test and then summarize with something like the following:
  9. Tests are hierarchical and compose results but there is no tagging system. Each suite is a function that can directly be executed.

5.2. Run on compile, funcallable tests

There are multiple benefits to tests being funcallable. One of them being that you can "go to the test definition" easily.

Table 8: Run on Compile and Funcallable Tests
Library Run on compile Are Tests Funcallable?
1am A Y
2am (not in quicklisp) A Y
cacau N N
clunit A N
clunit2 A N
confidence N Y
fiasco A Y
fiveam Optional N
gigamonkeys N N
lift A, T(1) N
lisp-unit N N
lisp-unit2 N Y
nst N N
parachute N N
prove A N
ptester N N
rove A N
rt N N
should-test N N
tap-unit-test N N
try T(4) Y
unit-test N N
xlunit T(2) N
xptest N N
  • A means assertions run on compile, T means tests run on compile
  • (1) if compiled at REPL
  • (2) Optional by test, specified at definition: (def-test-method t1 ((test tf-xlunit) :run nil) body)
  • (3) *run-test-when-defined* controls this option
  • (4) =*run-deftest-when* controls this option

5.3. Fixtures

Table 9: Fixtures
Library Fixtures Suite Fixtures Test Fixtures Multiple Fixtures
1am ❌️      
2am (not in quicklisp) ❌️      
cacau ✅️ ✅️ ✅️  
clunit ✅️ ✅️ ✅️ ✅️
clunit2 ✅️ ✅️ (c) ✅️ ✅️
confidence ❌️      
fiasco ❌️      
fiveam (a) K ✅️ ✅️  
gigamonkeys ❌️      
lift ✅️ ✅️   inherited from higher level suites
lisp-unit ❌️      
lisp-unit2 ✅️   ✅️  
nst ✅️ ✅️ ✅️ ✅️
parachute ✅️ ✅️ ✅️ ✅️
prove ❌️      
ptester ❌️      
rove ✅️ ✅️ ✅️ ✅️
rt ❌️      
should-test ❌️      
tap-unit-test ❌️      
try ❌️      
unit-test (b) ✅️ (b) (b) (b)
xlunit ✅️ ✅️ ✅️ ✅️
xptest ✅️   ✅️  

(a) Not really recommended, but does exist. (b) Users are expected to create a subclass of the unit-test class using the define-test-class macro. (c) Only one fixture per suite

top

5.4. Control over debugging, and user-provided diagnostic messages

Does a failure (not error) trigger the debugger, is it optional, and do assertions allow user-provided diagnostic messages. If yes, can you further provide variables for a failure message?

Table 10: Overview Reporting v. Debugger Optionality / Diagnostic Messages
Library Failure triggers debugger Diagnostic Messags in Assertions
1am (always) N
2am (optional) with vars
cacau (optional) N
clunit (optional) with vars
clunit2 (optional) with vars
confidence (never) with vars
gigamonkeys (optional) N
fiasco (optional) with vars
fiveam (optional) with vars
lift (optional) with vars
lisp-unit (optional) Y
lisp-unit2 (optional) Y
nst (optional) N
parachute (optional) with vars
prove (optional) Y
ptester (optional) N
rove (optional) Y
rt (never) N
should-test (never) Y
simplet (never) N
tap-unit-test (optional) Y
try (optional) Y
unit-test (never) Y
xlunit (never) Y
xptest (never) N

Also see error-reporting

5.5. Output of Run Functions (other than what is printed to the stream)

Table 11: Output of Run Functions (other than what is printed to the stream)
Library Function Returns
1am run nil
2am (not in quicklisp) run nil
cacau run nil
clunit run-test, run-suite nil
clunit2 run-test, run-suite nil
confidence name-of-test nil
fiasco run-tests test-run object
fiveam run list of test-passed, test-skipped, test-failure objects
  run! nil
gigamonkeys test nil
lift run-test, run-tests results object
lisp-unit run-tests test-results-db object
lisp-unit2 run-tests test-results-db object
nst :run nil
parachute test a result object
prove run Returns 3 multiple-values, a flag if the tests passed as T or NIL, passed test files as a list and failed test files also as a list.
  run-test-system passed-files, failed-files
  run-test nil
ptester with-tests nil
rove run-test, run-suite t or nil
rt do-test nil
should-test test hash-table (1)
tap-unit-test run-tests nil
try try trial object
unit-test run-test test-equal-result object
xlunit textui-test-run test-results-object
xptest run-test list of test-result objects

(1) Should-test: at the lowest level should returns T or NIL and signals information about the failed assertion. This information is aggregated by deftest which will return aggregate information about all the failed assertions in the hash-table at the highest level test will once again aggregate information over all tests.

5.6. Progress Reports

Does the framework provide a progress report, is it optional, and does it run just at the test level or also at the asserts level?

Table 12: Overview - Progress Reports
Library Progress Reports
1am Every assert
2am Every assert
cacau optional
clunit optional
clunit2 optional
confidence never
gigamonkeys never
fiasco optional
fiveam optional (1)
lift never
lisp-unit never
lisp-unit2 optional
nst Every test
parachute optional
prove Every assert
ptester Every assert
rove Optional
rt Every test
should-test Every assert
simplet Every test
tap-unit-test never
try optional (2)
unit-test Every test
xlunit never
xptest never

(1) The following will allow fiveam to run without output (2) What to print is parameterized by event type (e.g. everything, nothing, UNEXPECTED-FAILURE, etc).

(let ((fiveam:*test-dribble*
        (make-broadcast-stream)))
  (fiveam:run! …))

top

5.7. Skipping, Shuffling and Re-running

Table 13: Overview-2 Skipping, Shuffling and Rerunning Abilities
Name Skip failing dependencies Shuffle Re-run only failed tests
1am   Y (auto)  
2am   Y (auto)  
cacau S, T    
clunit D Y (auto)  
clunit2 D Y (auto) Y
confidence     Y
fiasco P(1), A   Y
fiveam P(2)   (3)
gigamonkeys      
lift T    
lisp-unit      
lisp-unit2     Y
nst      
parachute D, C, P Y  
prove (4)    
ptester      
rove A    
rt      
should-test   N Y
simplet P    
tap-unit-test      
try S,T.A Y Y (5)
unit-test      
xlunit      
xptest      

D - failing dependencies, C - children, P - pending, S - suites, T - tests, A - assertions

  1. skip based on conditions when and skip-unless
  2. skip when specified
  3. run! returns a list of failed-test-results that you could save and use for this purpose
  4. Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped.
  5. What to re-rerun is parameterized by event type.

5.8. Timing Reporting and Time Limits

Table 14: Timing Reporting and Time Limits
Library Time Reporting Time Limits
1am N N
2am (not in quicklisp) N N
cacau N Y(T or S)
clunit N N
clunit2 N N
confidence N N
fiasco N N
fiveam (a) ? N
gigamonkeys N N
lift Y Y
lisp-unit Y N
lisp-unit2 Y N
nst Y Y
parachute Y Y
prove N Y
ptester N N
rove N N
rt N N
should-test N N
tap-unit-test Y N
try Y Y
unit-test N N
xlunit N N
xptest N N

(a) Fiveam has some undocumented profiling capabilities that I did not look at

5.9. Dribble and Output Streams

Table 15: Dribble and Output Streams
Library Dribble output streams
1am N S
2am (not in quicklisp) N S
cacau N S
clunit N S
clunit2 N *test-output-stream*
confidence N optional parameter
fiasco N optional parameter
fiveam Y *test-dribble* S
gigamonkeys N S
lift Y *lift-dribble-pathname* optional parameter
lisp-unit N optional parameter
lisp-unit2 N *test-stream*
nst N optional parameter
parachute N (setf output)
prove N *test-result-output*
ptester N S
rove N *report-stream*
rt N optional parameter
should-test N *test-output*
tap-unit-test N S
try N optional parameter
unit-test N S
xlunit N S
xptest N S

Where S is *standard-output*

5.10. Edge Cases: Float Testing, Value Expressions and Closure Variables

top This table is looking at whether the framework provides float equality tests, looks at all the values coming from a values expression, and can access variables declared in a closure surrounding the test.

Table 16: Edge Cases
Name float tests Handles value expressions Variables in Closures
1am   First value only Y
2am   First value only Y
cacau   First value only Y
clunit   First value only N
clunit2 (a)   Y N
confidence Y First value only Y
fiasco   First value only Y
fiveam   N N
gigamonkeys   First value only Y
lift   First value only N
lisp-unit Y Y N
lisp-unit2 Y Y N
nst   Y N
parachute   Y Y
prove   Y Y
ptester   First value only Y
rove   First value only Y
rt   N N
should-test   First value only N
tap-unit-test   Y N
try Y Y Y
unit-test   First value only Y
xlunit   First value only Y
xptest   relies on CL predicates Y

(a) Updated 13 June 2021

5.11. Vocabulary - Define and Assert

top This table is looking at whether the framework uses define- and assert- which provide makes it easier to read as Emacs will color them. This is important to at least one group that commented on earlier versions of this report.

Table 17: Vocabulary
Name Define- Assert-
1am    
2am    
cacau    
clunit   assert-true, assert-condition
clunit2   assert-true, assert-condition
confidence define-testcase assert-true, assert-p, assert-t
fiasco    
fiveam    
gigamonkeys    
lift    
lisp-unit define-test asset-true, assert-error, assert-result, assert-test, check-type
lisp-unit2 define-test assert=, assert/=, asssert-char=, assert-char-equal,assert-char/=, assert-char-not-equal, assert-eq, assert-eql, assert-equal, assert-equality, assert-equalp, assert-error, assert-expands, assert-fail, assert-false, assert-float-equal, assert-no-error, assert-no-signal, assert-no-warning, assert-norm-equal, assert-number-equal, assert-numerical-equal, assert-passes?, assert-prints, assert-rational-equal, assert-sigfig-equal, assert-signal, assert-string=, assert-string-equal, assert-string/=, assert-string-not-equal, assert-true, assert-typep, assert-warning, assertion-fail, assertion-pass, check-type, logically-equal
nst    
parachute define-test  
prove    
ptester    
rove    
rt    
should-test    
tap-unit-test define-test assert-true, assert-error
try    
unit-test    
xlunit   assert-true, assert-condition
xptest defmethod  

top

5.12. Compatibility and Customizable Assertions

Table 18: Overview-4 Misc
Name compatibility layers Customizeable Assertion Functions
cacau   Y
confidence   Y
parachute fiveam lisp-unit prove  
nst   Y

(a) Running suites without tests or tests without test functions will result in tests marked PENDING rather than success or fail

5.13. Claims Not Tested

Table 19: Overview-5 Claims Not Tested
Name Async Thread Ready Package Inferred
1am   X  
2am   X  
Cacau X    
Rove   X (1) X

(1) Tycho Garen reported in February 2021 that "Rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."

top

6. Assertion Failure Comments

There are two reasons you test. First, to pat yourself on the back when all test pass. Second, to find any bugs. Assertions in the test frameworks have different amounts of automatically generated information that they will provide on failures. The following are the automatically generated failure messages on an assertion that (= x y) where x is 1 and y is 2. We also note whether the framework also accepts diagnostic strings and variables for those strings.

6.1. 1am

What, you wanted a report? Let me introduce you to the debugger.

6.2. 2am

Assertions also accept diagnostic strings with variables

T1-FAIL-34:
FAIL: (= X Y)

6.3. cacau

Error message:
BIT EQUAL (INTEGER 0 4611686018427387903)
Actual:
1
Expected:
2

6.4. clunit and clunit2

Assertions also accept diagnostic strings with variables

T1-FAIL-34: Expression: (= X Y)
Expected: T
Returned: NIL

6.5. confidence

Into the debugger you never go.

 Test assertion failed:
  (ASSERT-T (= X Y))
In this call, the composed forms in argument position evaluate as:
  (= X Y) => NIL
The assertion (ASSERT-T EXPR) is true, iff EXPR is a true generalised boolean.

6.6. fiasco

Assertions also accept diagnostic strings with variables

Failure 1: FAILED-ASSERTION when running T1-FAIL
Binary predicate (= X Y) failed.
x: X => 1
y: Y => 2

6.7. fiveam

Assertions also accept diagnostic strings with variables. I deleted several blank lines. Why do you waste so much screen space Fiveam?

T1-FAIL-34 []:
Y
evaluated to
2
which is not
=
to
1

6.8. gigamonkeys

FAIL ... (T1-FAIL): (= X Y)
X                 => 1
Y                 => 2
(= X Y)           => NIL

6.9. lift

Assertions also accept diagnostic strings with variables

Failure: s0 : t1-fail-34
Documentation: NIL
Source       : NIL
Condition    : Ensure failed: (= X Y) ()
During       : (END-TEST)
Code         : (
                ((LET ((X 1) (Y 2))
                   (ENSURE (= X Y)))))

6.10. lisp-unit

Assertions also accept diagnostic strings but no variables

Failed Form: (= X Y)
 | Expected T but saw NIL
 | X => 1
 | Y => 2

6.11. lisp-unit2

Assertions also accept diagnostic strings but no variables

| FAILED (1)
 | Failed Form: (ASSERT-TRUE (= X Y))
 | Expected T
 | but saw NIL

6.12. parachute

Assertions also accept diagnostic strings with variables

test 't1-fail-34)
        ? TF-PARACHUTE::T1-FAIL-34
  0.000 ✘   (is = x y)
  0.010 ✘ TF-PARACHUTE::T1-FAIL-34

;; Failures:
   1/   1 tests failed in TF-PARACHUTE::T1-FAIL-34
The test form   y
evaluated to    2
when            1
was expected to be equal under =.

6.13. ptester

Test failed: Y
  wanted: 1
     got: 2

6.14. prove

Assertions also accept diagnostic strings but no variables

× NIL is expected to be T (prove)

6.15. rove

Assertions also accept diagnostic strings but no variables

(EQUAL X Y) (rove)
X = 1
Y = 2

6.16. rt

Form: (LET ((X 1) (Y 2))
        (= X Y))
Expected value: T
Actual value: NIL.

6.17. should-test

Assertions also accept diagnostic strings but no variables

Test T1-FAIL-34:
Y FAIL
expect: 1
actual: 2
FAILED

6.18. tap-unit-test

Assertions also accept diagnostic strings but no variables

T1-FAIL-34: (= X Y) failed:
Expected T but saw NIL

6.19. try

(deftest t1-fail ()
  (let ((x 1) (y 2))
   (is (equal 1 2))
    (is (= x y)
        :msg "Intentional failure x does not equal y"
        :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%"
             *package* *print-case*))))

  (try 't1-fail :print 'unexpected)
T1-FAIL
  ⊠ (IS (EQUAL 1 2))
  ⊠ Intentional failure x does not equal y
    where
      X = 1
      Y = 2
    *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE

⊠ T1-FAIL ⊠2
#<TRIAL (T1-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2>

6.20. unit-test

Assertions also accept diagnostic strings but no variables

(#<TEST-EQUAL-RESULT FORM: (= X Y) STATUS: FAIL REASON: NIL>)

top

7. Benchmarking

First some points about what I have discovered about benchmarking these frameworks:

  1. In general, functionality and comfort will drive your framework decision not benchmarks.
  2. That said, bad benchmarks can say one of four things:

    a. I did not fully understand the best way to use the framework. As an example, my first naive version of the test for confidence had a runtime of 742 seconds. The author showed me how to rewrite the tests and it dropped to 24 seconds. This could very well be the explanation for the bad results of nst. (If someone experienced with nst would like to re-write that benchmark test, please let me know.)

    b. There is a problem in the framework code. The author of clunit2 cut the run time from the original runtime of 600 seconds to 14 seconds.

    c. If your framework reports using what emacs thinks is long lines, run the tests in a terminal, do not run it in an emacs repl. As an example, fiveam's runtime was 10 seconds, but real-time (total time to get a result in the emacs repl was 37 minutes).

    d. There might be something of interest for the SBCL and CCL developers. See, e.g. the wildly different result between the results for try and prove.

The benchmarks are based on a regression test, not development or functional testing. All the benchmark times below were done in a terminal window with SBCL version 2.3.0 and CCL version 1.12.1 on a linux server. I tried to rewrite the tests for UAX-15 for each framework. The uax-15 tests have 16 separate tests with a total of 343332 assertions (all of which pass) and the assertions are all straight-forward. The tests were stripped to the minimum. No diagnostic strings were used. For the frameworks which allowed it, the test was set to no progress reporting and overall summary only. Trivial-benchmark was used with 10 repetitions for each test. (Since Cacau does not run tests again unless they are recompiled, I have multiplied a single run by 10 to get some kind of comparable.)

Since all of the assertions pass, any real world test with failing assertions generating failure reports will be different.

Unsurprisingly, the simplest frameworks were the fastest. Your context will be important as to whether these benchmarks are at all meaningful to you.

7.1. Stack Ranking

Considering that the benchmark is based on 10 test runs of 16 tests with 343332 passing assertions (3433320 total assertions, 160 tests), test speed on regression tests are not going to drive your decision. Development and functional testing will obviously have a different result.

Table 20: Summary of Regression Test Benchmark (lower is better) (updated 16 Jan 2023
Library SBCL RunTime CCL Runtime
xptest 5.8903 11.7284
xlunit 5.9102 11.7532
cacau 6.0173 11.6543
lift 6.0686 11.9706
1am 6.1541 13.6843
ptester 6.2000 12.5130
rt 6.2079 11.9097
2am 6.2408 14.0614
unit-test 6.3616 17.6696
tap-unit-test 6.9988 13.2284
lisp-unit 7.1250 13.4544
should-test 7.1710 25.1831
gigamonkeys 7.8872 30.7511
fiasco 8.8940 38.4574
lisp-unit2 9.2966 30.2440
parachute 9.9155 40.3792
fiveam 10.1231 19.1292
rove 11.8615 35.8269
cardiogram 13.3693 29.2526
clunit2 14.3416 35.9992
try 14.6188 177.8841
confidence 24.7286 56.8546
prove 30.5618 132.1456
nst 517.8853 500.4885
Table 21: Order by Benchmark Bytes Consed (lower is better)
Library Bytes Consed
2am (not in quicklisp) 3361006176
1am 3361283232
xptest 3363654192
xlunit 3367157968
cacau 3383879200
lift 3480602800
rt 3586779376
unit-test 3668546192
ptester 3690505472
should-test 3859488496
cardiogram 3965584752
fiasco 4077786880
tap-unit-test 4217541312
lisp-unit 4222413456
confidence 4338353600
fiveam 4512244000
com.gigamonkeys.test-framework 4788180016
lisp-unit2 4939787968
clunit 5262303120
parachute 5212345824
rove 7427051216
try 9019833552
clunit2 (a) 15407395920
prove 14018185696
clunit2 (b) 15377667616
nst 306684151680
Table 22: Order by Benchmark Eval calls (lower is better)
Library Eval Calls
1am 0
cacau 0
com.gigamonkeys.test-framework 0
fiveam 0
confidence 0
lift 0
lisp-unit2 0
parachute 0
prove 0
ptester 0
rove 0
should-test 0
unit-test 0
xlunit 0
xptest 0
clunit2 (a) 0
2am (not in quicklisp) 0
fiasco 10
lisp-unit 160
tap-unit-test 160
clunit 320
rt 480
nst 6860220
   

Now the detailed report.

7.2. 1am

1am seems to have no way to turn off the progress reports. The benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines. YMMV with other editors.

(benchmark:with-timing (10) (uax-15-1am-tests::run))

Success: 16 tests, 343332 checks.

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.609976    0.88333    1.05333    0.943332   0.960998     0.050641
RUN-TIME         10       6.154181    0.607383   0.64799    0.610502   0.615418     0.011578
USER-RUN-TIME    10       6.117711    0.601297   0.631522   0.609088   0.611771     0.008027
SYSTEM-RUN-TIME  10       0.036479    0          0.016462   0.003285   0.003648     0.004761
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       99.246      6.092      39.833     6.396      9.9246       9.983065
BYTES-CONSED     10       3361283232  336076080  336267312  336108464  336128320.0  50528.4
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version:

       SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       25.275585  2.384657  2.649059  2.491271  2.527559  0.080996
RUN-TIME   10       13.684325  1.337679  1.436498  1.360898  1.368432  0.028661

7.3. 2am

2am seems to have no way to turn off the progress reports. As with the 1am benchmark, the benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines.

(benchmark:with-timing (10) (uax-15-2am-tests::run))

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.886643    0.899998   1.073331   0.993331   0.988664     0.043823
RUN-TIME         10       6.240839    0.618159   0.648082   0.620287   0.624084     0.008487
USER-RUN-TIME    10       6.214425    0.614933   0.644856   0.618156   0.621442     0.008304
SYSTEM-RUN-TIME  10       0.026427    0          0.00694    0.003034   0.002643     0.002041
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       147.903     11.379     36.216     11.669     14.7903      7.227465
BYTES-CONSED     10       3361006176  336067904  336158016  336087456  336100600.0  27877.61
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version

-      SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       29.176336  2.674119  3.128804  2.907567  2.917634  0.11941
RUN-TIME   10       14.061409  1.364417  1.471937  1.403766  1.406141  0.03288

7.4. cacua

Since Cacau does not run tests unless they are recompiled, you need to multiply numbers below by 10 to get some kind of comparable here. Running at the minimum reporting.

(benchmark:with-timing (10) (uax-15-cacau-tests::run :reporter :min))
-                SAMPLES  TOTAL      MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        1        0.603331   0.603331   0.603331   0.603331   0.603331   0.0
RUN-TIME         1        0.601739   0.601739   0.601739   0.601739   0.601739   0.0
USER-RUN-TIME    1        0.581692   0.581692   0.581692   0.581692   0.581692   0.0
SYSTEM-RUN-TIME  1        0.020046   0.020046   0.020046   0.020046   0.020046   0.0
PAGE-FAULTS      1        0          0          0          0          0          0.0
GC-RUN-TIME      1        24.25      24.25      24.25      24.25      24.25      0.0
BYTES-CONSED     1        338387920  338387920  338387920  338387920  338387920  0.0
EVAL-CALLS       1        0          0          0          0          0          0.0

The CCL version (total multiplied by 10 get try to get a comparable)

-          SAMPLES  TOTAL     MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  1        1.168082  1.168082  1.168082  1.168082  1.168082  0
RUN-TIME   1        1.165435  1.165435  1.165435  1.165435  1.165435  0

7.5. cardiogram

  1. sbcl
      -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
    REAL-TIME        10       15.639978   1.529998   1.61333    1.553331   1.563998     0.025854
    RUN-TIME         10       13.36939    1.289864   1.409203   1.324704   1.336939     0.03158
    USER-RUN-TIME    10       9.034622    0.860167   0.933511   0.897475   0.903462     0.020972
    SYSTEM-RUN-TIME  10       4.334774    0.392391   0.479334   0.426708   0.433477     0.025352
    PAGE-FAULTS      10       0           0          0          0          0            0.0
    GC-RUN-TIME      10       883.805     70.762     112.345    85.737     88.3805      13.442876
    BYTES-CONSED     10       3965584752  396493296  396752736  396527616  396558460.0  73038.71
    EVAL-CALLS       10       0           0          0          0          0            0.0
    
  2. ccl
      -          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
    REAL-TIME  10       31.406332  3.098291  3.187677  3.141698  3.140633  0.02607
    RUN-TIME   10       29.252617  2.900375  2.954562  2.923258  2.925262  0.017778
    

7.6. clunit

Clunit has always had a concern about performance. Running this benchmark was painful. Unlike fiveam, which should not be run in a REPL in emacs on tests with lots of assertions because of emacs' issues with long lines, clunit has no one to blame but itself. But look at the CCL results compared to the SBCL results. Clunit was the only framework faster under CCL than SBCL. Still unacceptably slow, but … Wwith SBCL in a terminal.

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       601.4108    57.19953   64.00593   59.65935   60.141087  2.303678
RUN-TIME         10       601.0652    57.161556  63.96824   59.62751   60.106518  2.301759
USER-RUN-TIME    10       600.65216   57.108273  63.941593  59.587543  60.06522   2.305383
SYSTEM-RUN-TIME  10       0.413016    0.019989   0.059948   0.043303   0.041302   0.011839
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       1158.426    87.246     145.034    115.57     115.8426   17.674866
BYTES-CONSED     10       5262303120  526034656  527650448  526069408  526230312  473977.47
EVAL-CALLS       10       320         32         32         32         32         0.0
NIL

The CCL result

-          SAMPLES  TOTAL     MINIMUM    MAXIMUM    MEDIAN    AVERAGE    DEVIATION
REAL-TIME  10       272.9831  27.003325  27.478271  27.37946  27.298307  0.179919
RUN-TIME   10       272.8588  26.99254   27.466413  27.36916  27.28588   0.179731

7.7. clunit2

Update 13 June 2021: Clunit2 has had a huge performance increase, most of it apparently involving moving from using lists to using arrays. Clunit2 should now be considered a member of the pack from a performance standpoint.

I ran the new improved clunit2 two ways and there is a performance difference to be considered here.

First I let CL equal do the comparision and then clunit2 just checked whether the assertion was true (assert-true) which was how all the other frameworks were also tested.

-                SAMPLES  TOTAL        MINIMUM     MAXIMUM     MEDIAN      AVERAGE     DEVIATION
REAL-TIME        10       14.846632    1.366663    1.749995    1.389997    1.484663    0.138653
RUN-TIME         10       14.341602    1.36469     1.746867    1.375746    1.43416     0.113668
USER-RUN-TIME    10       13.959459    1.327791    1.650369    1.35222     1.395946    0.095244
SYSTEM-RUN-TIME  10       0.382167     0.020135    0.096501    0.029916    0.038217    0.021336
PAGE-FAULTS      10       0            0           0           0           0           0.0
GC-RUN-TIME      10       1396.363     79.062      426.267     94.67       139.6363    102.216064
BYTES-CONSED     10       15407395920  1540473248  1542494352  1540569936  1540739592  586165.7
EVAL-CALLS       10       0            0           0           0           0           0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       36.061737  3.571143  3.670895  3.591581  3.606174  0.031347
RUN-TIME   10       35.999214  3.565532  3.666779  3.587588  3.599922  0.031904

7.8. confidence

Confidence has no built in capability for running all the tests in a suite or package, so this is based on creating a function that just runs all the tests for uax-15-confidence-tests.

There is no way to turn off the progress report.

  1. sbcl
      -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
    REAL-TIME        10       24.766607   2.373328   2.619993   2.449995   2.476661   0.074266
    RUN-TIME         10       24.728697   2.370314   2.620167   2.44499    2.47287    0.074612
    USER-RUN-TIME    10       24.206568   2.333738   2.529879   2.408485   2.420657   0.061988
    SYSTEM-RUN-TIME  10       0.52216     0.033206   0.119802   0.043091   0.052216   0.025093
    PAGE-FAULTS      10       0           0          0          0          0          0.0
    GC-RUN-TIME      10       2399.87     157.192    422.991    199.997    239.987    77.71274
    BYTES-CONSED     10       4338353600  430977312  457204272  431014240  433835360  7812895.0
    EVAL-CALLS       10       0           0          0          0          0          0.0
    
    
  2. ccl
      -          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
    REAL-TIME  10       56.97802   4.964914  6.698658  5.584261  5.697802  0.544525
    RUN-TIME   10       56.854607  4.940436  6.691704  5.57673   5.685461  0.55132
    

7.9. fiasco

With progress reporting turned off

(setf *print-test-run-progress* nil)
(in-package :uax-15-fiasco-suite)
(benchmark:with-timing (10) (run-package-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       8.906644    0.836663   1.036664   0.873331   0.890664   0.055253
RUN-TIME         10       8.894009    0.833614   1.035559   0.87176    0.889401   0.055545
USER-RUN-TIME    10       8.567723    0.821425   0.982309   0.835402   0.856772   0.046346
SYSTEM-RUN-TIME  10       0.326294    0.009906   0.05325    0.036599   0.032629   0.012558
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       1226.872    82.416     269.812    99.856     122.6872   54.588013
BYTES-CONSED     10       4077786880  407500144  409297696  407538304  407778688  543070.25
EVAL-CALLS       10       10          1          1          1          1          0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       38.544464  3.658651  4.815928  3.764052  3.854446  0.323914
RUN-TIME   10       38.457424  3.651049  4.807069  3.759031  3.845742  0.3241

7.10. fiveam

With progress reporting turned off:

(benchmark:with-timing (10)
                       (let ((fiveam:*test-dribble* (make-broadcast-stream)))
                         (run 'uax-15-fiveam)))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       10.133307   0.976664   1.06333    1.006664   1.013331   0.025777
RUN-TIME         10       10.123116   0.977754   1.062285   1.005493   1.012312   0.025377
USER-RUN-TIME    10       9.826661    0.958934   1.025672   0.964349   0.982666   0.023269
SYSTEM-RUN-TIME  10       0.29648     0.013407   0.043277   0.029962   0.029648   0.008991
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       887.548     56.765     130.141    83.046     88.7548    21.552885
BYTES-CONSED     10       4512244000  451134320  451410384  451200752  451224400  74965.13
EVAL-CALLS       10       0           0          0          0          0          0.0

If you do not have progress reporting turned off, besides wasting a huge amount of screen space and time, it creates interesting issues based on what you are running fiveam on. Emacs has known problems with long lines and fiveam's progress reporting in a benchmark like this creates lots of long line. It gets even worse if you set the run keyword parameter :print-names to nil.

Rule of thumb for big test systems and fiveam. Run it from a terminal, not an emacs REPL.

The CCL version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       19.166885  1.904093  1.946594  1.914582  1.916688  0.011433
RUN-TIME   10       19.129244  1.90118   1.94372   1.911262  1.912924  0.011477

7.11. gigamonkeys

Gigamonkeys does not do progress reporting

(benchmark:with-timing (10) (test-package))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       7.893314    0.783331   0.813332   0.786665   0.789331     0.008406
RUN-TIME         10       7.887205    0.780954   0.816817   0.785493   0.78872      0.009683
USER-RUN-TIME    10       7.850746    0.777678   0.796806   0.785181   0.785075     0.005411
SYSTEM-RUN-TIME  10       0.036482    0          0.02001    0.000027   0.003648     0.005858
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       190.862     14.962     40.392     16.909     19.0862      7.12733
BYTES-CONSED     10       4788180016  478635504  479903744  478708672  478818000.0  363684.38
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       19.166885  1.904093  1.946594  1.914582  1.916688  0.011433
RUN-TIME   10       19.129244  1.90118   1.94372   1.911262  1.912924  0.011477

7.12. lift

Lift says that there were 16 successful tests, but does not specify the number of successful assertions, so no progress reports..

(benchmark:with-timing (10) (run-tests :suite 'uax-lift-15))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       6.076652    0.596666   0.666665   0.599999   0.607665   0.019779
RUN-TIME         10       6.068622    0.596521   0.664457   0.601027   0.606862   0.019258
USER-RUN-TIME    10       6.035427    0.593182   0.644546   0.599972   0.603543   0.013911
SYSTEM-RUN-TIME  10       0.03322     0          0.019911   0.000003   0.003322   0.005932
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       189.801     14.242     51.038     14.762     18.9801    10.741669
BYTES-CONSED     10       3480602800  347045776  356690112  347112656  348060280  2876791.8
EVAL-CALLS       10       0           0          0          0          0          0.0

The CCL version of the benchmark resulted in this:

(benchmark:with-timing (10) (run-tests :suite 'uax-lift-15))
-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.987319  1.186334  1.216747  1.196841  1.198732  0.008089
RUN-TIME   10       11.970636  1.185057  1.215185  1.195852  1.197064  0.008253

7.13. lisp-unit

No progress reports

(benchmark:with-timing (10) (run-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       7.129982    0.699998   0.786665   0.703331   0.712998     0.025186
RUN-TIME         10       7.125096    0.699222   0.788776   0.7028     0.71251      0.026041
USER-RUN-TIME    10       7.068603    0.692563   0.765489   0.699554   0.70686      0.019895
SYSTEM-RUN-TIME  10       0.056505    0.000003   0.023286   0.003277   0.00565      0.00699
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       260.177     19.339     60.275     21.956     26.0177      11.598406
BYTES-CONSED     10       4222413456  421806272  425655232  421847136  422241340.0  1138669.0
EVAL-CALLS       10       160         16         16         16         16           0.0

Now the CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       13.487027  1.339966  1.362137  1.349544  1.348703  0.007266
RUN-TIME   10       13.454422  1.337902  1.359824  1.343596  1.345442  0.007124

7.14. list-unit2

No progress reports

  (benchmark:with-timing (10) (run-tests))
  -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.306638    0.899997   0.956663   0.923331   0.930664     0.019709
RUN-TIME         10       9.29662     0.90105    0.953252   0.924323   0.929662     0.018766
USER-RUN-TIME    10       9.126992    0.895342   0.93455    0.908129   0.912699     0.01386
SYSTEM-RUN-TIME  10       0.169641    0.00335    0.033231   0.016619   0.016964     0.010221
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       512.84      32.691     73.531     42.772     51.284       15.129883
BYTES-CONSED     10       4939787968  493387600  498633408  493461152  493978780.0  1552274.1
EVAL-CALLS       10       0           0          0          0          0            0.0

Now the CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       30.287363  2.994757  3.274799  3.001426  3.028736  0.082134
RUN-TIME   10       30.244043  2.990828  3.270884  2.996805  3.024404  0.082271

7.15. nst

NST's results were surprisingly bad. I ran tests with and without :cache being set on each fixture and it did not seem to make much of a difference. At this point I do not know if the issue is with NST or an error between chair and keyboard.

(benchmark:with-timing (10) (nst-cmd :run :uax-15-nst))
-                SAMPLES  TOTAL         MINIMUM      MAXIMUM      MEDIAN       AVERAGE      DEVIATION
REAL-TIME        10       516.88855     51.526524    51.85985     51.67986     51.688854    0.105207
RUN-TIME         10       517.8853      51.65252     51.97778     51.795273    51.788532    0.090498
USER-RUN-TIME    10       515.4022      51.376358    51.718292    51.56273     51.540226    0.092005
SYSTEM-RUN-TIME  10       2.483108      0.206179     0.282712     0.258424     0.248311     0.027323
PAGE-FAULTS      10       0             0            0            0            0            0.0
GC-RUN-TIME      10       11704.455     1081.936     1224.102     1173.542     1170.4456    33.417885
BYTES-CONSED     10       306684151680  30666874416  30677103952  30667547952  30668415168  2911205.3
EVAL-CALLS       10       6860220       686022       686022       686022       686022       0.0

The CCL version:

-          SAMPLES  TOTAL     MINIMUM   MAXIMUM    MEDIAN    AVERAGE    DEVIATION
REAL-TIME  10       501.6569  49.96337  50.555893  50.08843  50.16569   0.18232
RUN-TIME   10       500.4885  49.85959  50.36908   49.97486  50.048847  0.165674

7.16. parachute

Progress reporting turned off by using the quiet report.

(benchmark:with-timing (10) (test 'suite :report 'quiet))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.929974    0.913331   1.089997   0.969997   0.992997     0.063411
RUN-TIME         10       9.91559     0.912637   1.089303   0.970189   0.991559     0.062786
USER-RUN-TIME    10       9.403031    0.879351   1.042706   0.916937   0.940303     0.056901
SYSTEM-RUN-TIME  10       0.512589    0.02995    0.073216   0.046596   0.051259     0.014293
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       1677.654    106.9      267.475    143.608    167.7654     51.604637
BYTES-CONSED     10       5212345824  512596848  598222384  512680656  521234600.0  25662644.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The same benchmark with CCL:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       40.437767  4.010897  4.063527  4.042738  4.043777  0.013439
RUN-TIME   10       40.37925   4.007424  4.053993  4.037376  4.037925  0.011995
NIL

7.17. prove

The prove tests were done with the *default-reporter* set to :dot because there is no way to turn off the progress reporting. The times were surprisingly slow (not clunit slow, but roughly five times longer than the other frameworks), with no real difference between running in a terminal window or in an emacs REPL.

(benchmark:with-timing (10) (run-all-uax-15))
-                SAMPLES  TOTAL        MINIMUM     MAXIMUM     MEDIAN      AVERAGE       DEVIATION
REAL-TIME        10       66.02316     6.323317    6.976648    6.516648    6.602316      0.192082
RUN-TIME         10       30.561893    2.896409    3.223873    3.053162    3.056189      0.099394
USER-RUN-TIME    10       29.63921     2.823159    3.153962    2.939861    2.963921      0.092085
SYSTEM-RUN-TIME  10       0.922708     0.069913    0.133064    0.079994    0.092271      0.020397
PAGE-FAULTS      10       0            0           0           0           0             0.0
GC-RUN-TIME      10       2216.068     137.512     437.243     197.665     221.6068      81.27371
BYTES-CONSED     10       14018185696  1394824144  1428527632  1395103136  1401818600.0  10494809.0
EVAL-CALLS       10       0            0           0           0           0             0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME  10       229.77249  12.113011  57.055473  12.571996  22.97725   16.502855
RUN-TIME   10       132.14566  12.067512  16.609463  12.487875  13.214567  1.402868

7.18. ptester

The benchmarking was done with a single function (ptester-tests) that called all the tests. Progess reporting cannot be turned off.

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.209992    0.613333   0.663334   0.616665   0.620999     0.014303
RUN-TIME         10       6.20006     0.611543   0.660918   0.615629   0.620006     0.013763
USER-RUN-TIME    10       6.150195    0.604901   0.630972   0.614696   0.61502      0.006247
SYSTEM-RUN-TIME  10       0.049889    0          0.029945   0.003299   0.004989     0.008579
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       166.527     12.391     45.026     13.742     16.6527      9.467175
BYTES-CONSED     10       3690505472  369001712  369094144  369043328  369050560.0  26112.266
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       12.558413  1.240772  1.280743  1.251957  1.255841  0.010159
RUN-TIME   10       12.513009  1.239555  1.27798   1.24801   1.251301  0.009971

7.19. rove

(benchmark:with-timing (10) (run :uax-15-rove-tests :style :none))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       11.879972   1.139999   1.223331   1.189997   1.187997     0.025957
RUN-TIME         10       11.86156    1.138402   1.223247   1.187435   1.186156     0.026327
USER-RUN-TIME    10       11.185844   1.068141   1.144235   1.126029   1.118584     0.021292
SYSTEM-RUN-TIME  10       0.675742    0.03663    0.09343    0.06979    0.067574     0.017528
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       2986.486    259.002    332.193    304.989    298.6486     24.707445
BYTES-CONSED     10       7427051216  740502400  762046592  740546112  742705150.0  6447249.0
EVAL-CALLS       10       9           0          9          0          0.9          2.7

Now the CCL versioin

(benchmark:with-timing (10) (run :uax-15-rove-tests :style :none))
-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       35.897484  3.53852   3.666501  3.562608  3.589748  0.044875
RUN-TIME   10       35.826942  3.532155  3.660664  3.558934  3.582694  0.045938

7.20. rt

Rt reports only the tests, not the assertions.

(benchmark:with-timing (10) (do-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.216662    0.613332   0.643333   0.616665   0.621666     0.009689
RUN-TIME         10       6.207924    0.610565   0.640804   0.618108   0.620792     0.009539
USER-RUN-TIME    10       6.118069    0.598097   0.630868   0.610568   0.611807     0.00845
SYSTEM-RUN-TIME  10       0.089871    0          0.020013   0.003355   0.008987     0.007601
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       219.521     16.74      44.619     18.213     21.9521      7.871968
BYTES-CONSED     10       3586779376  358638608  358705088  358675856  358677950.0  17602.818
EVAL-CALLS       10       480         48         48         48         48           0.0

Now CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.99317   1.183099  1.219643  1.194585  1.199317  0.012819
RUN-TIME   10       11.909768  1.180592  1.202258  1.190041  1.190977  0.006007

7.21. should-test

Should-test prints out the name of each test with OK.

(benchmark:with-timing (10) (test))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       7.179996    0.709999   0.766666   0.713332   0.718        0.016343
RUN-TIME         10       7.171032    0.709636   0.768002   0.711339   0.717103     0.017043
USER-RUN-TIME    10       7.12796     0.703418   0.751316   0.7081     0.712796     0.013232
SYSTEM-RUN-TIME  10       0.043095    0          0.016688   0.003242   0.00431      0.004734
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       167.82      12.366     53.552     12.436     16.782       12.267456
BYTES-CONSED     10       3859488496  385585888  388806016  385627600  385948860.0  952933.5
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       25.263529  2.514491  2.538153  2.52439   2.526353  0.007306
RUN-TIME   10       25.18316   2.508538  2.532314  2.518607  2.518316  0.006587

7.22. tap-unit-test

No progress reporting.

(benchmark:with-timing (10) (run-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       7.009999    0.686666   0.736665   0.693334   0.701        0.014761
RUN-TIME         10       6.998872    0.688417   0.734644   0.694193   0.699887     0.013857
USER-RUN-TIME    10       6.949086    0.687007   0.718025   0.689829   0.694909     0.010584
SYSTEM-RUN-TIME  10       0.049802    0.000008   0.016621   0.003327   0.00498      0.004523
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       152.373     7.746      40.42      11.712     15.2373      9.486478
BYTES-CONSED     10       4217541312  421678080  421820960  421733744  421754140.0  41600.227
EVAL-CALLS       10       160         16         16         16         16           0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       13.258068  1.311242  1.367641  1.322649  1.325807  0.01496
RUN-TIME   10       13.228476  1.309399  1.364858  1.319752  1.322848  0.014998

7.23. try

First the SBCL version

  -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       14.643297   1.453329   1.529996   1.456663   1.46433      0.022113
RUN-TIME         10       14.618829   1.449815   1.530918   1.454007   1.461883     0.023133
USER-RUN-TIME    10       14.588861   1.447368   1.514259   1.452136   1.458886     0.018746
SYSTEM-RUN-TIME  10       0.030001    0          0.016658   0.000004   0.003        0.005039
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       208.982     16.999     54.556     17.101     20.8982      11.22053
BYTES-CONSED     10       9019833552  901957296  902012768  901979472  901983360.0  15399.991
EVAL-CALLS       10       0           0          0          0          0            0.0

Now the CCL Version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME  10       178.21579  17.72332  17.91731   17.834017  17.82158   0.06565
RUN-TIME   10       177.88414  17.69016  17.892391  17.789274  17.788414  0.066341

7.24. unit-test

Progress reporting on tests, not assertions

(benchmark:with-timing (10) (run-all-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.373335    0.613332   0.710001   0.629999   0.637334     0.026658
RUN-TIME         10       6.361601    0.613024   0.708814   0.628271   0.63616      0.02653
USER-RUN-TIME    10       6.28175     0.610608   0.702157   0.618595   0.628175     0.026044
SYSTEM-RUN-TIME  10       0.079867    0.000004   0.016669   0.00665    0.007987     0.005205
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       327.129     18.211     57.563     28.737     32.7129      12.666767
BYTES-CONSED     10       3668546192  363377264  397349504  363448128  366854620.0  10165068.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       17.734257  1.730896  1.833714  1.737197  1.773426  0.045684
RUN-TIME   10       17.669695  1.724412  1.830537  1.729753  1.766969  0.045973

7.25. xlunit

Xlunit does progress reports only on the tests, not the assertions.

(benchmark:with-timing (10) (xlunit:textui-test-run (xlunit:get-suite uax-15)))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       5.916651    0.586665   0.609998   0.586666   0.591665     0.008465
RUN-TIME         10       5.910279    0.585645   0.610623   0.58654    0.591028     0.008659
USER-RUN-TIME    10       5.880327    0.582374   0.598663   0.585712   0.588033     0.005418
SYSTEM-RUN-TIME  10       0.029973    0          0.013315   0.000023   0.002997     0.004059
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       102.22      8.052      17.68      8.219      10.222       3.462927
BYTES-CONSED     10       3367157968  336039856  340887936  336103936  336715800.0  1445466.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.781408  1.159445  1.203044  1.176365  1.178141  0.010789
RUN-TIME   10       11.753234  1.158247  1.200881  1.171458  1.175323  0.01094

7.26. xptest

(benchmark:with-timing (10) (report-result (run-test *uax-15-suite*)))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       5.899985    0.583332   0.629999   0.586665   0.589998     0.013499
RUN-TIME         10       5.89038     0.580846   0.626912   0.584578   0.589038     0.012844
USER-RUN-TIME    10       5.860286    0.577223   0.616916   0.582145   0.586029     0.010817
SYSTEM-RUN-TIME  10       0.030112    0.000034   0.00999    0.003309   0.003011     0.003122
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       111.974     7.402      41.731     7.481      11.1974      10.193354
BYTES-CONSED     10       3363654192  336079376  338668864  336105504  336365400.0  768173.25
EVAL-CALLS       10       0           0          0          0          0            0.0

The CCL version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.744589  1.165611  1.195676  1.172062  1.174459  0.008469
RUN-TIME   10       11.728494  1.164599  1.192888  1.170583  1.172849  0.007959

top

8. Mapping Functions Against Each Other

8.1. Assertion Functions

I expect all libraries to have the equivalent of is, signals and maybe finishes. This table just validates that assumption AND whether assertions accept an optional diagnostic string.

Table 23: Assertion Functions-1a
Library Optional string Is (a) signals finishes (b)
1am N is signals  
2am Y (P) is signals finishes
assert-p (1) N t-p condition-error-p  
clunit Y assert-true assert-condition  
clunit2 Y assert-true assert-condition  
confidence N assert-true    
gigamonkeys N check expect  
fiasco Y (P) is signals finishes
fiveam Y (P) is signals finishes
lift Y ensure ensure-condition  
lisp-unit Y assert-true assert-error  
lisp-unit2 Y assert-true assert-error  
nst N :true :err  
parachute Y (P) is fail finish
prove Y is, ok is-error  
ptester (2)   test test-error  
rove Y ok signals  
rt (2)        
should-test Y be signals  
simplet (2)        
tap-unit-test Y assert-true assert-error  
try Y is signals, signals-not verdict
unit-test Y test-assert test-condition  
xlunit Y assert-true assert-condition  
xptest (2)        

(a) "is" asserts that the form evaluates to not nil (b) "finishes" asserts that the body of the test does not signal aany condition (P) The diagnostic string accepts variables (1) includes cacau for this purpose (2) None - normal CL predicates resolving to T or nil

One potential advantage of other assertion functions is whether they provide built-in additional error messages. The second advantage is that you do not have to write your own if they are more complicated than normal CL predicates. The next few tables will show additional assertion functions and what frameworks have them.

Table 24: Additional Assertion Functions-1b
Library False Zero Not Zero Nil Not-Nil Null* Not Null*
assert-p not-t-p zero-p not-zero-p nil-p not-nil-p null-p not-null-p
cacau not-t-p zero-p not-zero-p nil-p not-nil-p null-p not-null-p
clunit assert-false            
clunit2 assert-false            
confidence       assert-nil      
fiveam is-false            
lift ensure-null            
lisp-unit assert-false     assert-nil      
lisp-unit2 assert-false            
nst   assert-zero     assert-non-nil assert-null  
parachute false     false      
prove isnt            
rove ng            
tap-unit-test assert-false         assert-null assert-not-null
xlunit assert-false            

Note to self: Per http://clhs.lisp.se/Body/f_null.htm, null is an empty list or nil, not null in an sql sense.

Table 25: Assertion Functions-2a Equality
Library Eq Eql Equal Equalp Equality
           
assert-p eq-p eql-p equal-p equalp-p  
cacau eq-p eql-p equal-p equalp-p  
clunit assert-eq assert-eql assert-equal assert-equalp assert-equality
clunit2 assert-eq assert-eql assert-equal assert-equalp assert-equality
confidence assert-eq assert-eql assert-equal    
lisp-unit assert-eq assert-eql assert-equal assert-equalp assert-equality
lisp-unit2 assert-eq assert-eql assert-equal assert-equalp assert-equality
nst assert-eq assert-eql assert-equal assert-equalp assert-equality
parachute     is    
tap-unit-test assert-eq assert-eql assert-equal assert-equalp assert-equality
unit-test     test-equal    
xlunit   assert-eql assert-equal    
Table 26: Assertion Functions-2b Not-Equality
Library Eq Eql Equal Equalp
         
assert-p not-eq-p not-eql-p not-equal-p not-equalp-p
cacau not-eq-p not-eql-p not-equal-p not-equalp-p
nst assert-not-eq assert-not-eql assert-not-equal assert-not-equalp
parachute     isnt  
xlunit   assert-not-eql    

Table 27: Assertion Functions-2c Bounded Equality
Library Available assertions
clunit assert-equality*
clunit2 assert-equality*
confidence assert-float-is-approximately-equal, assert-float-is-definitely-greater-than, assert-float-is-definitely-less-than, assert-float-is-essentially-equal
lisp-unit assert-norm-equal, assert-float-equal, assert-number-equal, assert-numerical-equal, assert-rational-equal, assert-sigfig-equal,
lisp-unit2 assert-norm-equal, assert-float-equal, assert-number-equal, assert-numerical-equal, assert-rational-equal, assert-sigfig-equal,
try float-~=, float-~<, float-~>
Table 28: Assertion Functions-2d other Equality
Library Available assertions
   
confidence assert-set-equal, assert-vector-equal
lisp-unit logically-equal, set-equal
lisp-unit2 logically-equal, assert=, assert/=, asssert-char=, assert-char-equal,assert-char/=, assert-char-not-equal, assert-string=, assert-string-equal, assert-string/=, assert-string-not-equal
tap-unit-test logically-equal, set-equal, unordered-equal
   
Table 29: Assertion Functions-
Library Available assertions
   
confidence assert-set-equal, assert-vector-equal
lisp-unit logically-equal, set-equal
lisp-unit2 logically-equal
tap-unit-test logically-equal, set-equal, unordered-equal
Table 30: Assertion Functions-3a Types
Library Type Not Type Values Not-Values
assert-p typep-p not-typep-p values-p not-values-p
cacau typep-p not-typep-p values-p not-values-p
confidence assert-type      
lift        
lisp-unit        
lisp-unit2 assert-typep      
nst        
parachute of-type   is-values isnt-values
protest        
prove is-type   is-values  
Table 31: Assertion Functions-3b Specific Value Types Cont
Library Symbol List Tuple Char String
lift ensure-symbol ensure-list     ensure-string
Table 32: Assertion Functions-3c Strings
Library Functions
confidence assert-string-equal,assert-string<, assert-string<=, assert-string=, assert-string>, assert-string>=
Table 33: Assertion Functions-4 Membership
Library Every Different Member Contains
cl-quickcheck     a-member  
fiveam is-every      
confidence       assert-subsetp
lift ensure-every ensure-different ensure-member  
lisp-unit set-equal      
lisp-unit2 set-equal      
tap-unit-test set-equal      
try match-values mismatch% different-elements  
         

Table 34: Assertion Functions-4 (Prints, Macro Expansion and Custom)
Library Prints Expands (1) Custom
cacau     custom-p
clunit   assert-expands  
clunit2   assert-expands  
confidence     Yes
lisp-unit assert-prints assert-expands  
lisp-unit2 assert-prints assert-expands  
nst     Yes
prove   is-expand  
rove is-print expands  
should-test print-to    
tap-unit-test assert-prints assert-expands  
  1. Tests macro expansion, passes if (EQUALP EXPANSION (MACROEXPAND-1 EXPRESSION)) is true
Table 35: Assertion Functions-5 Specific Errors, Signals and Conditions
Library Error/Conditions (1) Not (2)
1am signals  
2am signals  
assert-p condition-error-p not-error-p, not-condition-p
cacau condition-error-p  
clunit assert-condition  
clunit2 assert-condition  
gigamonkeys expect  
fiasco signals not-signals
fiveam signals  
lift ensure-condition, ensure-error  
lisp-unit assert-error  
lisp-unit2 assert-error  
nst :err  
parachute fail  
prove    
ptester test-error  
rove signals  
rt    
should-test signal  
simplet    
tap-unit-test assert-error  
try signals signals-not
unit-test test-condition  
xlunit assert-condition  
xptest    
  1. Signals asserts that the body signals a condition of a specified type
  2. Signals that the body does not signal a condition of a specified type. Might signal some other condition

top

Table 36: Misc. Assertions
Name Assertions
cl-quickcheck is=, isnt=
confidence assert=, assert-p:, assert-t
lift ensure-cases, ensure-cases-failure, ensure-expected-no-warning-condition, ensure-failed-error, ensure-no-warning, ensure-not-same, ensure-null-failed-error ensure-random-cases, ensure-random-cases+, ensure-random-cases-failure, ensure-same, ensure-some, ensure-warning,ensure-expected-condition, ensure-directories-exist, ensure-directory, ensure-error, ensure-failed, ensure-function, ensure-generic-function, ensure-no-warning, ensure-null-failed-error
lisp-unit assert-result, assert-test, check-type
lisp-unit2 assert-no-error, assert-no-signal, assert-no-warning, assert-warning, assert-fail, assert-no-warning, assert-passes?, assert-signal, check-type
nst assert-criterion
prove ok, is-values, is-type, like, is-print, is-error,
try invokes-debugger, invokes-debugger-not, in-time
unit-test test-assert

(a) Every test succeeds iff the form produces the same number of results as the values and each result is equal to the corresponding value

top

8.2. Defining or Adding Tests

Table 37: Defining or Adding Test Functions
Name Add Tests
1am (test test-name body)
2am (test test-name body)
cacau (deftest "test-name" (any-parameters go here) body)
cardiogram (deftest name (<options>*) <docstring>* <form>*)
clunit (deftest test-name (suite-name-if-any) docstring body)
clunit2 (deftest test-name (suite-name-if-any) docstring body)
gigamonkeys (deftest test-name (any-parameters) body)
fiasco (deftest test-name (any-parameters) docstring body)
fiveam (test test-name docstring body)
confidence (define-testcase test-name (any-parameters) docstring body)
lift (addtest (test-suite-name) test-name body)
lisp-unit (define-test test-name body)
lisp-unit2 (define-test test-name (tags) body)
nst (def-test (t1 :group name :fixtures (fixture-names)) body)
parachute (define-test test-name [:parent parent-name] [(:fixture if any)] body)
prove (deftest name body)
ptester (test value form) (test-error) (test-no-error) (test-warning) (test-no-warning)
rove (deftest test-name body)
rt (deftest test-name function value)
should-test (deftest name body)
simplet (test string body)
tap-unit-test (define-test test-name docstring body)
try (deftest test-name parameters body)
unit-test (deftest :unit unit-name :name test-name body)
xlunit (def-test-method method-name ((class-name) run-on-compilation) body)
xptest (defmethod method-name ((suite-name fixture-name)) body)

top

8.3. Running Tests

Table 38: Running Test Functions
Name Running Tests
1am (a) (test-name) (run) ; (run) runs all tests
2am (run) (run '(list of tests) (name-of-tests)
clunit (run-test 'test-name) (run-suite 'suite-name)
clunit2 (run-test 'test-name) (run-suite 'suite-name)
confidence (test-name)
gigamonkeys (test test-name)
fiasco (run-tests 'test-name) (run-package-tests :package package-name)
fiveam (run 'test-name) (run! 'test-name) (run! 'suite-name)
lift (run-tests :name 'test-name) (run-tests :suite 'suite-name)
lisp-unit(b) (run-tests :all) (run-tests '(name1 name2 ..) (continue-testing)(c)
lisp-unit2 (run-tests :tests 'test-name) (run-tests :tags '(tag-names)) (run-tests) (run-tests :package 'package-name)
nst (nst-cmd :run test-name)
parachute (d) (test test-name &optional :report report-type)
prove (run-test 'test-name)
ptester at compilation of (with-tests (:name "test-name") )
rove (run-test 'test-name) (run-suite)
rt (b) (do-test test-name) (do-tests); (do-tests) runs all tests
should-test (b) (test) (test :test test-name)
simplet (run)
tap-unit-test (run-tests test-name1 test-name2) (run-tests)
try (test-name), (try 'test-name), (funcall 'test-name)
unit-test (run-test test-name)(run-all-tests)
xlunit (xlunit:textui-test-run (xlunit:get-suite suite-name))
xptest (run-test test-name)(run-test suite-name)

(a) Shuffles tests (b) runs tests in the order they were defined (c) continue-testing runs tests that have been defined, but not yet run (d) can be a quoted list of test names

Universal interactive "run test at point" for Emacs environment.

If the framework allows to programmatically run an individual test, then it's possible to run the test at point by adding a little Emacs snippet. The example below if specific for Lisp-Unit2 (snippet source).

(defun ambrevar/sly-run-lisp-unit-test-at-point (&optional raw-prefix-arg)
  "See `sly-compile-defun' for RAW-PREFIX-ARG."
  (interactive "P")
   (call-interactively 'sly-compile-defun)
  (let ((name `(quote ,(intern (sly-qualify-cl-symbol-name (sly-parse-toplevel-form))))))
    (sly-eval-async
        `(cl:string-trim "

?"
                         (cl:with-output-to-string (s)
                                                   (cl:let ((lisp-unit2:*test-stream* s))
                                                           (lisp-unit2:run-tests :tests ,name :run-contexts 'lisp-unit2:with-summary-context))))
      (lambda (results)
        (switch-to-buffer-other-window  (get-buffer-create "*Test Results*"))
        (erase-buffer)
        (insert results)))))

(define-key lisp-mode-map (kbd "C-c C-v") 'ambrevar/sly-run-lisp-unit-test-at-point)

8.4. Fixture Functions

Table 39: Fixtures
Name Fixture Functions
cacau (defbefore-all), (defafter-all), (defbefore-each), (defafter-each)
clunit (defclass) and (deffixture)
clunit2 (defclass) and (deffixture)
fiveam (def-fixture name-of-fixture ())
lift set at suite definition level, with :setup, :takedown, :run-setup
lisp-unit2 :contexts are specified in test definitions
nst (def-fixtures name () body)
parachute (def-fixture name () body)
rove (setup)(teardown) are suite fixture functions. (defhook) is a test fixture function
unit-test subclass a test-class with define-test-class
xlunit (defmethod setup () body)
xptest (deftest-fixture fixture-name ()), (defmethod setup ()), (defmethod teardown ())

top

8.5. Removing Tests etc

Table 40: Removing Tests
Name Removing tests etc
clunit (undeftest) (undeffixture) (undefsuite)
clunit2 (undeftest) (undeffixture) (undefsuite)
gigamonkeys (remove-test-function) (clear-package-tests)
fiasco (fiasco::delete-test)
fiveam (rem-test) (rem-fixture)
lift (remove-test :suite x)(remove-test :test-case x)
lisp-unit (remove-tests) (remove-tags)
lisp-unit2 (uninstall-test) (undefine-test)
parachute (remove-test, remove-all-tests-in-package)
prove (remove-test) (remove-test-all)
rt (rem-test) (rem-all-tests)
tap-unit-test (remove-tests)
try FMAKUNBOUND, UNINTERN
xlunit (remove-test)
xptest (remove-test)

top

8.6. Suites

Table 41: Suite Functions
Name Suites
2am (suite name (optional sub-suite)
clunit (defsuite name (parent) (undefsuite name)
clunit2 (defsuite name (parent) (undefsuite name)
fiasco (define-test-package package-name) (defsuite suite-name)
fiveam (def-suite :name-of-suite)
lift (deftestsuite name-of-suite (super-test-suite)(slots)
lisp-unit packages and tags (tags are specified in the test definition)
lisp-unit2 :tags are specified in test definitions
nst (def-test-group)
parachute (define-test suite)
prove (subtest …)
ptester just the use of =(with-tests …)+
rt the package is the only unit above tests
should-test the package is the only unit above tests
simplet (suite string body)
tap-unit-test the package is the only unit above tests
try just call other tests (they are normal functions)
unit-test [effectively tags in the deftest macro before the test-name]
xlunit (defclass test-case-name (test-case)(body))
xptest (make-test-suite suite-name docstring body)

top

8.7. Generators

  1. From Frameworks
    Table 42: Random Data Generators from Frameworks
    Name Suites
    fiveam buffer, character, float, integer, list, one-element, string, tree
    lift random-number, random-element
    lisp-unit complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state
    lisp-unit2 complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state
    nst  
    tap-unit-test make-random-state
  2. From Helper Libraries

    Table 43: Random Data Generators from Check-it and Cl-Quickcheck
    Check-it Cl-quickcheck Comments
    character a-char  
    list a-list  
      a-member Produces a value from another generator
    string a-string  
      a-symbol  
    tuple a-tuple  
    boolean a-boolean  
    real a-real  
      an-index  
    integer an-integer  
    or   produces a value from another generator
    guard   ensures generator result within some spec
    struct   generates a struct with given type and slot values
    map   applies a transformation to output of a sub generator
    chain   chaining generators, e.g. to produce matrices
    user-define define create custom generators
      k-generator by default an index
      m-generator by default an integer
      n-generator by default an integer

    top

9. Generic Usage Example to be Followed for Each Framework Library

The following is pseudo code just trying to show the basic usage that will be demonstrated with each library.

9.1. Basics

Start with the real basics just to see how the framework looks, do tests accept parameters, suite designations, documentation strings, etc.

The first passing test should have a basic "is" test and a signals test. If the libraries has macro expand tests or floating point and rational tests, those get added to flag that they exist. Then a basic failing test. Then run each test and show what the reports look like.

(deftest t1
  "describe t1" ; obviously only if the library allows a documentation string.
  (is (=  1 1))
  (signals division-by-zero (error 'division-by-zero)))

(deftest t1-fail ; the most basic failing test
  "describe t1-fail"
  (is (=  1 2)))

Check and see if you have to manually recompile a test when a function being tested is modified. This is not a problem with most frameworks.

(defun t1-test-function ()
  1)
;; What happens you are testing a function and you change that function?
;; Do you need to recompile the test?

(deftest t1-function ;
    (is (= (t1-test-function) 1)))

;; Now redefine t1-test-function
(defun t1-test-function ()
  2)

;; re-run test t1-function. What happens?

9.2. Multiple Values, Variables, Loops and Closures

  • Make sure the library can have tests with multiple assertions (RT cannot).
  • Does it handle values expressions? Most do but only look at the first value, fiveam does not, the lisp-units actually compare each variable in the values expressions.
  • What happens with multiple assertions where more than one fail? Lift will only report the first failing assertion if there are multiple failing assertions.
  • Ensure that tests can handle loops
  • Can tests handle being inside a closure?
  • Can tests call other tests? Most frameworks allow this, but you tend to get multiple reports rather than consolidated reports. Some frameworks do not allow this.
(deftest t2
  "describe t2"
  (is (= 1 1))
  (is (= 2 2))
  (is (= (values 1 2) (values 1 3))))

(let ((l1 '(#\a #\B #\z))
      (l2 '(97 66 122)))
  (deftest t2-loop
      (loop for x in l1 for y in l2 do
        (is (= (char-code x) y)))))

(deftest t3 ; a test that tries to call another test in its body
  "describe t3"
  (is (= 'a 'a))
  (test t2))

9.3. Errors, Conditions and signal handling

Check and see if there are any surprises with respect to condition signalling tests. Some frameworks will treat an unexpected condition in a signalling test as a failure, others will treat it as an error.

(deftest t7-bad-error ()
(signals division-by-zero
   (error 'floating-point-overflow)
   "testing condition assertions. This should fail"))

9.4. Suites, tags and other multiple test abilities

  1. Can you run a list of tests?

    We checked with test t3 to see if tests can call other tests. Can you just call a list of test names? Some frameworks allow, others do not.

    (run '(t1 t2))
    
  2. Suites/tags

    This section for each framework will check if you can create inherited test suites or tags

    (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
    
      (deftest t4 (s0) ; a test that is a member of a suite
        "describe t4"
        (assert-eq 1 1))
    
      ;;a multiple assertion test that is a member of a suite with
      ;; a passing test, an error signaled and a failing test
      (deftest t4-error (s0)
        "describe t4-error"
        (assert-eq  'a 'a)
        (assert-condition error (error "t4-errored out"))
        (assert-true (= 1 2)))
    
      (deftest t4-fail (s0) ;
        "describe t4-fail"
        (assert-false (= 1 2)))
    
    (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
      (deftest t5 (s1)
       (assert-true (= 1 1)))
    
    
  3. Fixtures and Freezing Data

    Fixtures are used to create a known (or randomly generated set of data that tests will use. At the end of the test, the fixtures are removed so that the next test can start in a clean environment.

    Freezing data may be considered a subset of fixtures. Freezing data is used where a test will use other existing data such as special variables, but may change it for testing purposes. You obviously want to return that special variable to its pre-existing state at the end of the test.

    First checking whether we can freeze data, change it in the test, then change it back

      (defparameter *keep-this-data* 1)
    
      (deftest t-freeze-1
          :fix (*keep-this-data*)
          (setf *keep-this-data* "new")
          (true (stringp *keep-this-data*)))
    
      (deftest t-freeze-2
        (is (= *keep-this-data* 1)))
    
    (run '(t-freeze-1 t-freeze-2))
    

    Now the classic fixture - create a data set for the test and clean it up afterwards

      ;; Create a class for data fixture purposes
    (defclass fixture-data ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (deffixture s1 (@body)
    ;;IMPORTANT Some frameworks will require a name for the fixture. Others, like CLUNIT apply a fixture to a suite (as ;; in this pseudo code) and require a suite name
      (let ((x (make-instance 'fixture-data :a 100 :b -100)))
        @body))
    
    ;; create a sub suite and check fixture inheritance
    (defsuite s2 (s1))
    
    (deftest t6-s1 (s1)
      (assert-equal  (a x) 100)
      (assert-equal   (b x) -100))
    
    (deftest t6-s2 (s2)
      (assert-equal  (a x) 100)
      (assert-equal   (b x) -100)))
    
  4. Removing tests

    How do you actually remove a test from the system

  5. Skip Capability
    1. Assertions

      Can you skip an assertion?

    2. Tests

      Can you skip an entire test?

    3. Implementation

      Can you skip something if the CL implementation is XYZ?

9.5. Random Data Generators

Can you generate different types of random data to feed to the testing framework? What does the framework have to help?

10. 1am

top

10.1. Summary

homepage James Lawrence MIT 2014

1am will thrown you into the debugger on failures or errors. There is no way to disable this, there is no reporting - in you go. There is no provision for diagnostic strings in assertions, but since it throws you into the debugger, that is probably not relevant. Tests are shuffled on each run.

On the plus side for some people, tests are functions.

On the minus side for people like me, you cannot turn off progress reports. You can create a list of tests, but there is no concept of suites or tags.

10.2. Assertion Functions

1am's assertion functions are limited to is and signals.

10.3. Usage

  • run will run all the tests in *tests*
  • (run '(foo)) will run the named tests in the provided parameter list.
  • (name-of-test) will run the named test because tests are functions in their own right..
(run)
FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~
                                         with ~:*~{~S = ~S~^, ~}.~]~:@>" {100BE70F03}>.
(run '(foo))

FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~
                                         with ~:*~{~S = ~S~^, ~}.~]~:@>" {100B996103}>.
  1. Basics

    Starting with a basic test where we know everything will pass. These are all the assertion functions that 1am has. There is no provision for documentation strings or test descriptions.

      (test t1
            (is (equal 1 1))
            (signals division-by-zero
                     (/ 1 0)))
    
      (run '(t1)) ; or just (t1)
    T1..
    Success: 1 test, 2 checks.
    ; No value
    
    

    Now with a deliberately failing test. Notice how it just immediately kicks into the debugger:

    (test t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (is (= x y))
        (signals division-by-zero (error 'floating-point-overflow))))
    
      (t1-fail)
    The assertion (= X Y) failed with X = 1, Y = 2.
       [Condition of type SIMPLE-ERROR]
    
    Restarts:
     0: [CONTINUE] Retry assertion.
     1: [RETRY] Retry SLIME REPL evaluation request.
     2: [*ABORT] Return to SLIME's top level.
     3: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {10035CE213}>)
    

    As you would hope, you do not have to manually recompile a test after a tested function has been modified.

  2. Conditions

    1am works as expected if you signal the expected error. If you signal an unexpected error, it throws you into the debugger just like every other time a test fails.

     (test t7-bad-error
           (signals division-by-zero (error 'floating-point-overflow)))
    
    (run '(t7-wrong-error))
    T7-BAD-ERROR; Evaluation aborted on #<SIMPLE-ERROR "Expected to signal ~s, but got ~s:~%~a" {102EA4F423}>.
    
  3. Edge Cases: Values expressions, loops, closures and calling other tests

    1am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. So, for example, the following passes.

    (test t2-values-expressions ()
          (is (equal (values 1 2)
                     (values 1 3))))
    
    1. Now looping and closures.

      1am will handle looping through assertions using variables declared in a closure surrounding the test.

      (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
        (test t2-loop
          (loop for x in l1 for y in l2 do
            (is (= (char-code x) y)))))
      
    2. Calling a test inside another test

      It works but do not expect composable reports.

        (test t3
            (is (= 1 1))
          (t1))
      
       (t3)
      T3.
      T1.
      Success: 1 test, 1 check.
      Success: 1 test, 1 check.
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Tests are defined using (test test-name) and pushed to a list of tests named *tests*. If you want to run a list of tests, you could save that out so that you continue to keep a list of all tests, then set *test* to whatever list of tests you want. That sounds a bit cumbersome.

    2. Suites

      1am has no suite capability

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    N/A

  8. Skip Capability

    None

  9. Random Data Generators

    None

10.4. Discussion

The comments that I have seen around 1am seem to revolve around having only a single global variable to collect all compiled tests. Various people have suggested different solutions to essentially build test suites capability:

jorams suggested:

(defmacro define-test-framework (tests-variable
                                 test-macro
                                 run-function)
  "Define a variable to hold a list of tests, a macro to define tests and a
function to run the tests."
  `(progn
     (defvar ,tests-variable ())
     (defmacro ,test-macro (name &body body)
       `(let ((1am:*tests* ()))
          (1am:test ,name ,@body)
          (dolist (test 1am:*tests*)
            (pushnew test ,',tests-variable))))
     (defun ,run-function ()
       (1am:run ,tests-variable))))

luismbo suggested: "a simpler way might be to have 1am:test associate tests with the current *package* (e.g., by turning 1am:*tests* into an hash-table mapping package names to lists of tests) and add the ability for 1am:run to filter by package and perhaps default to the current *package*."

phoe suggested the following very simple 1AM wrapper to achieve multiple test suites.

(defvar *my-tests* '())

(defun run ()
  (1am:run *my-tests*))

(defmacro define-test (name &body body)
  `(let ((1am:*tests* '()))
     (1am:test ,name ,@body)
     (pushnew ',name *my-tests*)))

10.5. Who Uses 1am?

("adopt/test" "authenticated-encryption-test" "beast-test" "binary-io/test" "bobbin/test" "chancery.test" "cl-digraph.test" "cl-netpbm/test" "cl-pcg.test" "cl-rdkafka/test" "cl-scsu-test" "cl-skkserv/tests" "jp-numeral-test" "list-named-class/test" "openid-key-test" "petri" "petri/test" "polisher.test" "protest/1am" "protest/test" "with-c-syntax-test" "xml-emitter/tests")

top

11. 2am

11.1. Summary

homepage Daniel Kochmański MIT 2016

2am is based on 1am with some features wanted for CI and hierarchical tests. As with 1am, 2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is also no provision for only running the tests that failed last time. There is also no way to turn off the progress report.

11.2. Assertion Functions

is signals finishes

top

11.3. Usage

Unlike 1am which always throws you into the debugger, 2am will only throw you into the debugger if the test crashes, not if it fails.

Note that 2am will distinguish between tests that fail and tests that crash.

  • (run) will run the tests in the default suite.
  • (run 'some-suite-name) will run the tests in the named suite
  • (run '(foo)) will run the named tests in the provided parameter list.

Since tests are functions in 2am, there is no need for a (run 'test-name) function.

  1. Report Format

    First a basic failing test to show the reporting. Notice in the third asssertion we are passing it a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.

    (test t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (is (= 1 2))
        (is (equal 1 2))
        (is (= x y) "This test was meant to fail ~a is not =  ~a" x y )
            (signals floating-point-overflow
       (error 'division-by-zero))))
    

    Now to run it:

      (t1-fail)
    Running test T1-FAIL ffff
    Test T1-FAIL: 4 checks.
       Pass: 0 ( 0%)
       Fail: 4 (100%)
    
    Failure details:
    --------------------------------
     T1-FAIL:
       FAIL: (= 1 2)
       FAIL: (EQUAL 1 2)
       FAIL: This test was meant to fail 1 is not =  2
       FAIL: Expected to signal FLOATING-POINT-OVERFLOW, but got DIVISION-BY-ZERO:
    arithmetic error DIVISION-BY-ZERO signalled
    --------------------------------
    

    The macro (test name &body body) defines a test function and adds it to *tests*. The following just shows what the report looks like when everything passes. These are all the assertion functions that 2am has.

    (test t1 ; the most basic test.
      (is (=  1 1))
      (signals division-by-zero
        (/ 1 0))
      (finishes (= 1 1)))
    
    (t1)
    Running test T1 ...
    Running test T1 ...
    Test T1: 3 checks.
    Pass: 3 (100%)
    Fail: 0 (0%)
    

    As you would hope, you do not have to manually recompile a test after a tested function has been modified.

  2. Edge Cases: Value expressions, loops. closures and calling other tests
    1. Value expressions

      2am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.

    2. Looping and closures.

      Will a test accept looping through assertions using variables from a closure? Yes.

        (let ((l1 '(#\a #\B #\z))
                (l2 '(97 66 122)))
          (test t2-loop ()
            (loop for x in l1 for y in l2 do
              (is (= (char-code x) y)))))
      
      (t2-loop)
      Running test T2-LOOP ...
      Test T2-LOOP: 3 checks.
         Pass: 3 (100%)
         Fail: 0 ( 0%)
      
      (test t2-with-multiple-values ()
        (is (= 1 1 2)))  ; This should fail and it does
      
    3. Calling another test from a test

      This succeeds as expected but also as expected there is no composition on the report.

      (test t3 (); a test that tries to call another test in its body
                (is (eql 'a 'a))
                (t2))
      
      (t3)
      Running test T3 .
      Running test T2 ...
      Test T2: 3 checks.
         Pass: 3 (100%)
         Fail: 0 ( 0%)
      Test T3: 1 check.
         Pass: 1 (100%)
         Fail: 0 ( 0%)
      
  3. Suites, tags and other multiple test abilities
    1. Lists of tests

      2am can run lists of tests

      (run '(t1 t2))
      Running test T2 ...
      Running test T1 .
      Did 2 tests (0 crashed), 4 checks.
         Pass: 4 (100%)
         Fail: 0 ( 0%)
      

      top

    2. Suites

      2am has a hash table named *suites*. Any tests not associated with a specific suite are assigned to the default suite. If we called the function (run) with no parameters, it would run all the tests in the default suite which in our case would mean all the tests decribed above:

      (run)
      

      Suites are defined using (suite 'some-suite-name-here &optional list-of-sub-suites). If you pass and tests are identified with suites by prepending the suite name to the test name.

        (suite 's0) ; This suite has no sub-suites
      
        (test s0.t4  ; a test that is a member of a suite
          (is (= 1 1)))
      
         (test s0.t4-error
          (is (eql 'a 'a))
          (signals error (error "t4-errored out"))
          (is (= 1 2)))
      
        (test s0.t4-fail
          (is (not (= 1 2))))
      
      (suite 's1 '(s0)); This suite includes suite s0 as a sub-suite.
      
      (test s1.t4-s1
         (is (= 1 1)))
      

      Calling run on 's0 will run tests s0.t4, s0.t4-error and s0.t4-fail.

      (run 's0)
      --- Running test suite S0
      Running test S0.T4 .
      Running test S0.T4-FAIL .
      Running test S0.T4-ERROR ..f
      Did 3 tests (0 crashed), 5 checks.
         Pass: 4 (80%)
         Fail: 1 (20%)
      
      Failure details:
      --------------------------------
       S0.T4-ERROR:
         FAIL: (= 1 2)
      --------------------------------
      

      Calling run on 's1 will run test s1.t4-s1 and all the tests in suite s0.

  4. Fixtures and Freezing Data

    No built-in capability.

  5. Removing tests

    Nothing explicit

  6. Sequencing, Random and Failure Only

    2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is no provision for only running the tests that failed last time.

  7. Skip Capability

    None

  8. Random Data Generators

    None

11.4. Discussion

The documentation indicates that assertions may be run inside threads. I did not validate this.

top

12. cacau

12.1. Summary

homepage Noloop GPL3 2020

Cacau is interesting in that it uses an external library for assertions and is just a "test runner". The examples shown with Cacau will all assume that the assert-p library by the same author is also loaded.

On the plus side, it has extensive hooks which can perform actions before and after a suite is run or before and after each test is run. It also has explicit async capabilities (but were not tested in this report) which do not exist in other frameworks.

At the same time, it tends to be all or nothing in what runs. The (run) function either runs the last defined test (if you have not defined suites) or, if you have defined suites, it runs all tests in all the suites. Or if you then define a new function, just that new function. Maybe it is just me but I would be getting lost in what run is supposed to be checking.

Most frameworks count the individual assertions in a test, Cacau treats the test as a whole - if one assertion fails, the entire test fails and if multiple assertions failed, it will only report the first failure, not all the failures in the test, leaving you with incomplete information.

Cacau is the only framework where, if you change a function that is being tested, you need to manually recompile the tests again.

Not recommended.

12.2. Assertion Functions

Cacau uses the assertion functions from an assertion library, currently you need to use assert-p. Those are:

t-p not-t-p zero-p not-zero-p
nil-p not-nil-p null-p not-null-p
eq-p not-eq-p eql-p not-eql-p
equal-p not-equal-p equalp-p not-equalp-p
typep-p not-typep-p values-p not-values-p
error-p not-error-p    
condition-error-p not-condition-error-p custom-p  

I have to say for an assertion library, I expected to see some numerical tests as well.

12.3. Usage

You will notice that the test names must be strings whereas the test names in the other frameworks are either quoted symbol or unquoted symbols.

Cacau only has a run function which accepts fixture, reporter and debugger parameters, but no provisions for telling it what tests you want to run. If you manually compile a single test, it usually runs just that test. Otherwise it runs all the tests in the package whether you want them or not. For purposes of walking through the basic capability, we will look only at the result of the specific test under consideration and not any other test which might be picked up in the report.

I do not see a way to rerun a test except by manually recompiling the test.

  1. Report Format

    Interactive mode can be enabled by passing the keyword parameter :cl-debugger to (run :cl-debugger t) Cacau has a few different reporting formats. The function (run) without a reporter specification will provide the default :min level of info. There is also :list and :full which provide different levels of information. You will notice that I am actually creating multiple copies of the failing test to simulate recompiling the test.

    Cacau treats tests with multiple assertions as a unit. Either everything passes or the test fails and it may be difficult to figure out which assertion was the one that failed. This is clearly shown below where both the first and second asssertions should fail, but only the first failure (the eql) gets reported.

    Cacau does not allow us to pass messages in the assertion which might have allowed us to flag potential issues that would aid in debugging failures.

    1. Min
        (deftest "t1-fail-1" ()
          (let ((x 1) (y 2))
            (assert-p:eql-p x y)
            (assert-p:equal-p 1 2)))
        (run :reporter :min)
      <=> Cacau <=>
      
      From 1 running tests:
      
      0 passed
      1 failed
      NIL
      
    2. List

      Now the list reporting level

      (deftest "t1-fail-2" ()
        (let ((x 1) (y 2))
          (assert-p:equal-p x y)
          (assert-p:equal-p 1 2)))
      
      (run :reporter :list)
      <=> Cacau <=>
      
      <- t1-fail-2:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      -------------------------
      From 1 running tests:
      
      0 passed
      1 failed
      NIL
      
    3. Full

      And finally the full reporting level

      (deftest "t1-fail-3" ()
        (let ((x 1) (y 2))
          (assert-p:eql-p x y)
          (assert-p:equal-p 1 2)))
      
      (run :reporter :full)
      <=> Cacau <=>
      
      <- t1-fail-3:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      Epilogue
      -------------------------
      0 running suites
      1 running tests
      0 only suites
      0 only tests
      0 skip suites
      0 skip tests
      0 total suites
      1 total tests
      0 passed
      1 failed
      1 errors
      740673543 run start
      740673543 run end
      1/1000000 run duration
      0 completed suites
      1 completed tests
      
      Errors
      -------------------------
      Suite: :SUITE-ROOT
      Test: t1-fail-3
      Message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      
  2. Basics

    The empty form after the test name is for particular parameters such as :skip and :async.

    (deftest "t1" ()
      (assert-p:eql-p 1 1)
      (assert-p:condition-error-p
       (error 'division-by-zero)
       division-by-zero))
    
    (run)
    <=> Cacau <=>
    
    From 1 running tests:
    
    1 passed
    0 failed
    

    You can already anticipate what happens you are testing a function and you change that function. Yes, you need to manually recompile the test and the earlier versions might still be found as well as the new version when you call (run).

  3. Edge Cases: Value expressions, loops, closures and calling other tests
    1. Value expressions

      Cacau (or really assert-p) has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

      (deftest "t2-values-3" ()
        (assert-p:equalp-p (values 1 2) (values 1 3)))
      (run)
      <=> Cacau <=>
      
      From 1 running tests:
      
      1 passed
      0 failed
      NIL
      

      Basically it accepted the values expressions but only looked at the first values.

    2. Looping and closures.

      Cacau will test correctly if it is looking at variables from a closure surrounding the test.

          (let ((l1 '(#\a #\B #\z))
                  (l2 '(97 66 122)))
            (deftest "t2-loop" ()
              (loop for x in l1 for y in l2 do
                (assert-p:eql-p (char-code x) y))))
      
        (run :reporter :list)
      <=> Cacau <=>
      
       -> t2-loop
      
      -------------------------
      From 1 running tests:
      
      1 passed
      0 failed
      
    3. Calling another test from a test

      I have not figured out a way for a test to call another test in cacau.

  4. Redefinition Ambiguities

    Now consider the following where I mis-define a test with multiple values, then attempt to correct it. and am left not knowing where I stand.

       (deftest "t2-with-multiple-values" () (assert-p:eql-p  1 1 2))
       ; in: DEFTEST "t2-with-multiple-values"
       ;     (NOLOOP.ASSERT-P:EQL-P 1 1 2)
       ;
       ; caught STYLE-WARNING:
       ;   The function EQL-P is called with three arguments, but wants exactly two.
       ;
       ; compilation unit finished
       ;   caught 1 STYLE-WARNING condition
       #<NOLOOP.CACAU::TEST-CLASS {1005098793}>
    
       (deftest "t2-with-multiple-values" () (assert-p:t-p  (= 1 1 2)))
       #<NOLOOP.CACAU::TEST-CLASS {1005292E43}>
    
     (run) ; the minimum level of info report, probably a mistake
       <=> Cacau <=>
       From 2 running tests:
    
       0 passed
       2 failed
       NIL
    
    (run :reporter :list) ; I try running again with a higher level of information
       <=> Cacau <=>
       -------------------------
       From 0 running tests:
    
       0 passed
       0 failed
    

    Even though the test failed as expected, it is showing 2 running tests. What are the two tests? I would have expected only one test since we are using the same string name.

  5. Conditions (Failing)

    The following fails as expected.

    (deftest "t7-bad-error" ()
      (assert-p:condition-error-p
         (error 'division-by-zero)
         floating-point-overflow))
    
  6. Suites, tags and other multiple test abilities
    1. Lists of tests

      No such capability outside of suites.

    2. Suites

      You can have multiple suites of tests, but the (run) function will run everything that has not been run before:

        (defsuite :s0 ()
          (deftest "s0-t1" () (assert-p:eql-p 1 2)))
      
        (defsuite :s1 ()
          (let ((x 0))
            (deftest "s1-t1" () (assert-p:eql-p x 0))
             (defsuite :s2 ()
              (deftest "s2-t1" () (assert-p:eql-p 1 3)))))
      
      (run :reporter :list)
      <=> Cacau <=>
      
      :S0
       <- s0-t1:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      :S1
       -> s1-t1
       :S2
        <- s2-t1:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      3
      -------------------------
      From 3 running tests:
      
      1 passed
      2 failed
      

      What I would like to see is an ability to run the suites separately. You cannot do that. What is the value of having suites and sub-suites if you cannot run them separately? top

  7. Fixtures and Freezing Data

    This is where cacau brings something to the table. You can use the hooks defbefore-all, defafter-all, defbefore-each and defafter-each to set up or take down data environment or contexts.

          (defsuite :suite-1 ()
            (defbefore-all "Before-all" () (print ":Before-all"))
            (defafter-each "After-each" () (print ":After-each"))
            (defafter-all "After-all" () (print ":After-all"))
            (defbefore-each "Before-each Suite-1" ()
              (print "run Before-each Suite-1"))
            (deftest "Test-1" () (print "run Test-1") (t-p t))
            (deftest "Test-2" () (print "run Test-2") (t-p t))
            (defsuite :suite-2 ()
              (defbefore-each "Before-each Suite-2" ()
                (print "run Before-each Suite-2"))
              (deftest "Test-3" () (print "run Test-3") (t-p t))
              (deftest "Test-4" () (print "run Test-4") (t-p t))))
    
          (run)
    
    ":Before-all"
    "run Before-each Suite-1"
    "run Test-1"
    ":After-each"
    "run Before-each Suite-1"
    "run Test-2"
    ":After-each"
    "run Before-each Suite-1"
    "run Before-each Suite-2"
    "run Test-3"
    ":After-each"
    "run Before-each Suite-1"
    "run Before-each Suite-2"
    "run Test-4"
    ":After-each"
    ":After-all" <=> Cacau <=>
    
    From 4 running tests:
    
    4 passed
    0 failed
    
  8. Removing tests

    I did not see anything here, but maybe I missed it.

  9. Sequencing, Random and Failure Only

    Sequential only.

  10. Skip Capability

    You can specify to skip a test or a suite (and no, I do not consider this to be a good substitute for being able to specify which test or suite you want to run).

    (defsuite :suite-1 ()
      (deftest "Test-1" (:skip) (t-p t))
      (deftest "Test-2" () (t-p t))) ;; run!
    
    (defsuite :suite-2 (:skip)
      (let ((x 0))
        (deftest "Test-1" () (eql-p x 0))
        (deftest "Test-2" () (t-p t))
        (defsuite :suite-3 ()
          (deftest "Test-1" () (t-p t))
          (deftest "Test-2" () (t-p t)))))
    
  11. Async Abilities

    I am going to have to cheat here and refer you to the author's page for the description of the async capabilities https://github.com/noloop/cacau#async-test.

  12. Time Limits for tests

    Cacau does have the ability to specify time limits for tests. The time limits can be set by suite (all tests in the suite have the same time limit), by hook or by test. See the author's discussion at https://github.com/noloop/cacau#timeout

  13. Random Data Generators

    I did not see anything here with respect to data generators.

12.4. Discussion

As I said in the summary, the fact that it does not show all the assertions that failed and does not give me the ability to specify suites or tests to run make this unsuitable for me.

12.5. Who Uses

cl-minify-css-test

top

13. cardiogram

top

13.1. Summary

homepage Abraham Aguilar MIT 2020

Cardiogram starts off with immediate problems. The documentation does not match up with the code, even on simple things like "true" in the documentation is "is-true" in the code. I got the benchmarking to work but I am not going to try to reverse engineer all the other points discussed with other frameworks.

14. checkl

top

14.1. Summary

homepage Ryan Pavlik LLGPL, BSD 2018

Checkl is different. As a result, this section is different from the other frameworks. Checkl assumes that you do informal checks at the REPL as you are coding and saves those results. Assuming you change your program and check the modified function or whatever with exactly the same parameters, it will let you know if the result is now different. As a result it is a bit more difficult to compare based on the wish list. It can, however, be integrated with Fiveam.

14.2. Usage

  1. Basic Usage

    Assume you create two functions foo-up and foo-down and compile them.

    (defun foo-up (x)
      (+ x 2))
    
    (defun foo-down (x)
      (- x 2))
    

    Now you create checks against the functions and compile them.

    (check () (foo-up 2))
    (check () (foo-down 2))
    

    If you now revise foo-down and compile it, checkl will cause the system to throw an error immediately upon compile foo-down because the results were different (different as in equalp different.)

    (defun foo-down (x)
      (- x 3))
    
    Result 0 has changed: -1
    Previous result: 0
       [Condition of type CHECKL::RESULT-ERROR]
    
    Restarts:
     0: [USE-NEW-VALUE] The new value is correct, use it from now on.
     1: [SKIP-TEST] Skip this, leaving the old value, but continue testing
     2: [ABORT] Abort compilation.
     3: [*ABORT] Return to SLIME's top level.
     4: [ABORT] abort thread (#<THREAD "worker" RUNNING {1010B73BD3}>)
    

    Modifying and recompiling foo-up similarly also triggers an error.

    Suppose you want to have multiple checks on function based on different parameters. You can name the check tests.

    (check (:name :foo-up-integer) (foo-up 4))
    (check (:name :foo-down-integer) (foo-down 4))
    (check (:name :foo-up-real) (foo-up 4.5))
    (check (:name :foo-down-real) (foo-down 4.5))
    

    If you pass those check names to run, you get the following (remember that post our modifications, foo-up and foo-down are now adding and subtracting 3.:

    (run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real)
    (7)
    (1)
    (7.5)
    (1.5)
    

    Checks can also be defined checking the results of multiple functions using the results function.

    (check (:name :foo-up-and-down)
      (results (foo-up 7) (foo-down 3.2)))
    
    (run :foo-up-and-down)
    
    (10 0.20000005)
    

    By the way, results will copy structures, sequences and marshalls standard-objects.

  2. Suites, tags and other multiple test abilities

    The run-all function will return the results for all the check tests defined in the current package.

    1. Lists of tests

      As seen in the basic usage, checkl can run multiple checks.

      (run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real)
      
    2. Suites/tags/categories

      When you are naming checks, you can also pass a category name to the keyword parameter category. You can then pass the category name to run-all and get just the values related to the checks with that category flagged.

      (check (:name :foo :category :some-category) ...)
      
      (run-all :some-category ...)
      
  3. Storage

    You can store these named checks by running check-store. E.g.:

    (checkl-store "/home/sabrac/checkl-test")
      ;;; some time later
    (checkl-load "/home/sabrac/checkl-test")
    
  4. Integration with Fiveam

    Assuming you have already loaded fiveam, you can also send the checkl tests to fiveam by using check-formal.

    (checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John" "Paul"))
    "JohnPaul"
    
    (fiveam:run! :default)
    
    Running test suite DEFAULT
     Running test ONE-CONCAT .
     Did 1 check.
        Pass: 1 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    
    T
    NIL
    

    Now, if we go back and add a space to "John", and run check-formal, not only will check-formal fail, but subsequently running fiveam will fail.

    (checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John " "Paul"))
    ; Evaluation aborted on #<CHECKL::RESULT-ERROR {1003FE1433}>.
    TF-TEST1> (fiveam:run! :default)
    
    Running test suite DEFAULT
     Running test ONE-CONCAT f
     Did 1 check.
        Pass: 0 ( 0%)
        Skip: 0 ( 0%)
        Fail: 1 (100%)
    
     Failure Details:
     --------------------------------
     ONE-CONCAT []:
    
    CHECKL::RESULT
     evaluated to
    ("John Paul")
     which is not
    CHECKL:RESULT-EQUALP
     to
    ("JohnPaul")
    ..
     --------------------------------
    NIL
    (#<IT.BESE.FIVEAM::TEST-FAILURE {1004189053}>)
    

    top

14.3. Who Uses Checkl?

15. clunit

top

15.1. Summary

homepage Tapiwa Gutu BSD 2017

Updated 13 June 2021 Based on unresolved issues showing at github, as well as my inability to reach the author, This does not appear to be maintained and subject to bitrot. You should look at clunit2 instead. The difference between clunit2 and clunit is

  • clunit2's ability to redirect reporting output,
  • clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
  • clunit2' ability to test multiple value expressions
  • clunit2's suite signaling capability and
  • the fact that clunit2 has a maintainer.

15.2. Assertion Functions

Clunit's assertion functions are:

assert-condition assert-eq assert-eql
assert-equal assert-equality assert-equality*
assert-equalp assert-expands assert-fail
assert-false assert-true assertion-condition
assertion-conditions assertion-error assertion-expander
assertion-fail-forced assertion-failed assertion-passed

The predicate used by assert-equality is determined by the setting of *clunit-equality-test*. top

15.3. Usage

  1. Report Format

    Report format is controlled by the variable *clunit-report-format*. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.

    The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.

    (run-test 'some-test-name :report-progress nil)
    
    (run-suite 'some-suite-name :report-progress nil)
    

    Clunit2, unlike clunit has a *test-output-stream* variable which can be used to redirect the reports to file or other stream locations.

    To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

    (run-test test-name :use-debugger t)
    

    To give you a sense of what the failure report looks like, we take a basic failing test with multiple assertions . We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has two variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format-like parameters.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-equal x y  "This assert-equal test was meant to fail" x y)
        (assert-true (= 1 2) "This assert-true test was meant to fail")
        (assert-false (=  1 1))
        (assert-eq 'a 'b)
        (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3))
        (assert-condition division-by-zero
            (error 'floating-point-overflow)
          "testing condition assertions")
        (assert-equalp (values 1 2) (values 1 3 4))))
    #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}>
    
    (run-test 't1-fail)
    
    PROGRESS:
    =========
        T1-FAIL: FFFFFE.
    
    FAILURE DETAILS:
    ================
        T1-FAIL: Expression: (EQUAL X Y)
                 Expected: X
                 Returned: 2
                 This assert-equal test was meant to fail
                 X => 1
                 Y => 2
    
        T1-FAIL: Expression: (= 1 2)
                 Expected: T
                 Returned: NIL
                 This assert-true test was meant to fail
    
        T1-FAIL: Expression: (= 1 1)
                 Expected: NIL
                 Returned: T
    
        T1-FAIL: Expression: (EQ 'A 'B)
                 Expected: 'A
                 Returned: B
    
        T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3))
                 Expected: (PROGN (SETQ V1 4) (SETQ V2 3))
                 Returned: (PROGN (SETQ V1 3) (SETQ V2 3))
    
        T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 7 assertions.
            Passed: 1/7 ( 14.3%)
            Failed: 5/7 ( 71.4%)
            Errors: 1/7 ( 14.3%)
    
  2. Basics

    Looking a little closer at basic test where we know everything will pass. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.

    (defmacro setq2 (v1 v2 e)
      (list 'progn (list 'setq v1 e) (list 'setq v2 e)))
    
    (deftest t1 ()
      "describe t1"
      (assert-true (=  1 1))
      (assert-false (=  1 2))
      (assert-eq 'a 'a)
      (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3))
      (assert-condition division-by-zero
          (error 'division-by-zero)
        "testing condition assertions")
      (assert-condition simple-warning
          (signal 'simple-warning)))
    

    Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.

    (run-test 't1)
    
    PROGRESS:
    =========
        T1: ......
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 6 assertion.
            Passed: 6/6 (100.0%)
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops. closures and calling other tests

    Clunit has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (assert-equal (values 1 2)
                    (values 1 3)))
    
    1. Looping and closures.

      Will a test accept looping through assertions with lexical variables from a closure? NO. Clunit complains that the variables l1 and l2 are never defined.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (deftest t2-loop ()
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      

      Clunit is quite happy to loop if the variables are defined within the test or, for that matter, if the closure encompassed tested functions rather than the test itself.:

      (deftest t2-loop ()
        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
          (loop for x in l1 for y in l2 do
            (assert-equal (char-code x) y))))
      

      CLunit has no assertions that will handle checking more than two values in a single assertion, so you will have to use assert-true with the usual CL functions.

    2. Calling another test from a test

      This uses the second version of test t2 which has two failing asssertions and one passing assertion.

      (deftest t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-test 't2))
      
      (run-test 't3)
      
      PROGRESS:
      =========
          T3: .
      PROGRESS:
      =========
          T2: FF.
      
      FAILURE DETAILS:
      ================
          T2: Expression: (EQUAL 1 2)
              Expected: 1
              Returned: 2
      
          T2: Expression: (EQUAL 2 3)
              Expected: 2
              Returned: 3
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 3 assertions.
              Passed: 1/3 some tests not passed
              Failed: 2/3 some tests failed
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      

      It reports each test separately, but correctly, but obviously no composition..

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      clunit will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.

    2. Suites

      Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.

      Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.

      (deftest testC ((suiteA suiteB)(testA testB))
        ...)
      

      Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):

      • If REPORT-PROGRESS is non-NIL, the test progress is reported.
      • If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
      • If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
      • If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
      • if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
      (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
      
        (deftest t4 (s0) ; a test that is a member of a suite
          "describe t4"
          (assert-eq 1 1))
      
        ;;a multiple assertion test that is a member of a suite with
        ;; a passing test, an error signaled and a failing test
        (deftest t4-error (s0)
          "describe t4-error"
          (assert-eq  'a 'a)
          (assert-condition error (error "t4-errored out"))
          (assert-true (= 1 2)))
      
        (deftest t4-fail (s0) ;
          "describe t4-fail"
          (assert-false (= 1 2)))
      
      (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
        (deftest t4-s1 (s1)
         (assert-true (= 1 1)))
      
      (run-suite 's0)
      
      PROGRESS:
      =========
      
          S0: (Test Suite)
              T4-FAIL: .
              T4-ERROR: ..F
              T4: .
      
              S1: (Test Suite)
                  T5: .
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      
      
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      
      
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
    3. Early termination

      You stop a suite test if a test fails without dropping into the debugger.

      (run-suite 'suite-name :stop-on-fail t)
      

      top

    4. Fixtures and Freezing Data
      (defclass fixture-data ()
        ((a :initarg :a :initform 0 :accessor a)
         (b :initarg :b :initform 0 :accessor b)))
      
      (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply
        (let ((x (make-instance 'fixture-data :a 100 :b -100)))
          @body))
      
      ;; create a sub suite and checking fixture inheritance
      (defsuite s2 (s1))
      
      (deftest t6-s1 (s1)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (deftest t6-s2 (s2)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (run-suite 's1)
      
      PROGRESS:
      =========
          S1: (Test Suite)
              T6-S1: ..
              T5: .
      
              S2: (Test Suite)
                  T6-S2: ..
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      

      To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

      (run-suite suite-name :use-debugger t)
      
  5. Removing tests
    (clunit:undeftest t1)
    (clunit:undeffixture fixture-name)
    (clunit:undefsuite suite-name)
    
  6. Sequencing, Random and Failure Only

    The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit has a function rerun-failed-tests to rerun failed tests.

  7. Skip Capability

    Other than the dependency abilities previously mentioned, clunit has no additional skipping capability.

  8. Random Data Generators

    Clunit has no built in data generators.

15.4. Discussion

The difference between clunit2 and clunit is clunit2's ability to redirect reporting output, suite signaling capability and the fact that it has a maintainer. Both of them are very slow. top

15.5. Who Uses

bt-semaphore, data-frame, cl-kanren, "cl-random-tests" "cl-slice-tests" "listoflist" "lla-tests" "oe-encode-test" "trivial-tco-test")

top

16. clunit2

top

16.1. Summary

homepage Cage (fork of clunit) BSD 2020

Update 13 June 2021 Clunit2 is a fork of Clunit. For quicklisp system loading purposes, it is clunit2, for package naming purposes it is clunit, not clunit2. The difference between clunit2 and clunit is:

  • clunit2's ability to redirect reporting output,
  • clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
  • clunit2' ability to test multiple value expressions
  • clunit2's suite signaling capability and
  • the fact that clunit2 has a maintainer.

Clunit2 does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, fixtures are available as are suites and the ability to rerun failed tests. You can specify that a test that depends on other tests passing will be skipped if those prior tests fail.

With respect to the edge cases, as of the 13 June 2021 update, Clunit2 will accept variables declared in closures surrounding the test and does have the ability completely test all the values returned from a values expression.

16.2. Assertion Functions

Clunit2's assertion functions are:

assert-condition assert-eq assert-eql
assert-equal assert-equality assert-equality*
assert-equalp assert-expands assert-fail
assert-false assert-true assertion-condition
assertion-conditions assertion-error assertion-expander
assertion-fail-forced assertion-failed assertion-passed

top

16.3. Usage

  1. Report Format

    Report format is controlled by the variable *clunit-report-format*. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.

    The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.

    (run-test 'some-test-name :report-progress nil)
    
    (run-suite 'some-suite-name :report-progress nil)
    

    Clunit2, unlike clunit has a *test-output-stream* variable which can be used to redirect the reports to file or other stream locations.

    To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

    (run-test test-name :use-debugger t)
    
  2. Basics

    The most basic test. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.

    (defmacro setq2 (v1 v2 e)
      (list 'progn (list 'setq v1 e) (list 'setq v2 e)))
    
    (deftest t1 ()
      "describe t1"
      (assert-true (=  1 1))
      (assert-false (=  1 2))
      (assert-eq 'a 'a)
      (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3))
      (assert-condition division-by-zero
          (error 'division-by-zero)
        "testing condition assertions")
      (assert-condition simple-warning
          (signal 'simple-warning)))
    

    Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.:

    (run-test 't1)
    
    PROGRESS:
    =========
        T1: .....
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 6 assertion.
            Passed: 6/6 (100.0%)
    

    You can switch off the progress report for running tests and suites by setting the keyword parameter :report-progress to nil:

    (run-test 't1 :report-progress nil)
    

    Now a basic failing test with multiple assertions (and also to see if the library can deal with values expressions). We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has foll variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format like parameters.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-equal x y  "This assert-equal test was meant to fail" x y)
        (assert-true (= 1 2) "This assert-true test was meant to fail")
        (assert-false (=  1 1))
        (assert-eq 'a 'b)
        (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3))
        (assert-condition division-by-zero
            (error 'floating-point-overflow)
          "testing condition assertions")
        (assert-equalp (values 1 2) (values 1 3 4))))
    #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}>
    TF-CLUNIT> (run-test 't1-fail)
    
    PROGRESS:
    =========
        T1-FAIL: FFFFFE.
    
    FAILURE DETAILS:
    ================
        T1-FAIL: Expression: (EQUAL X Y)
                 Expected: X
                 Returned: 2
                 This assert-equal test was meant to fail
                 X => 1
                 Y => 2
    
        T1-FAIL: Expression: (= 1 2)
                 Expected: T
                 Returned: NIL
                 This assert-true test was meant to fail
    
        T1-FAIL: Expression: (= 1 1)
                 Expected: NIL
                 Returned: T
    
        T1-FAIL: Expression: (EQ 'A 'B)
                 Expected: 'A
                 Returned: B
    
        T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3))
                 Expected: (PROGN (SETQ V1 4) (SETQ V2 3))
                 Returned: (PROGN (SETQ V1 3) (SETQ V2 3))
    
        T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 7 assertions.
            Passed: 1/7 ( 14.3%)
            Failed: 5/7 ( 71.4%)
            Errors: 1/7 ( 14.3%)
    

    With respect to the values expression, we can see that it passed, proving clunit only looks at the first value in the values expression.

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Update 13 June 2021 Clunit2 as of the date of this update report will compare all the values from two values expressions. The following now properly fails.

    (deftest t2-values-expressions ()
      (assert-equal (values 1 2)
                    (values 1 3)))
    
    1. Looping and closures.

      Update 13 June 2021 Clunit2 will accept variables declared in a closure surrounding the test. The following passes.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (deftest t2-loop ()
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      

      Clunits is quite happy to loop if the variables are defined within the test:

      (deftest t2-loop ()
        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
          (loop for x in l1 for y in l2 do
            (assert-equal (char-code x) y))))
      
    2. Calling another test from a test

      We will call the second version of test t2 which has failures.

      (deftest t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-test 't2))
      
      (run-test 't3)
      
      PROGRESS:
      =========
          T3: .
      PROGRESS:
      =========
          T2: FF.
      
      FAILURE DETAILS:
      ================
          T2: Expression: (EQUAL 1 2)
              Expected: 1
              Returned: 2
      
          T2: Expression: (EQUAL 2 3)
              Expected: 2
              Returned: 3
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 3 assertions.
              Passed: 1/3 some tests not passed
              Failed: 2/3 some tests failed
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      

      It reports each test separately (no compostion), but correctly.

  4. Conditions

    The following fails as expected.

    (deftest t7-bad-error ()
      (assert-condition floating-point-overflow
         (error 'division-by-zero)
         "testing condition assertions. This should fail"))
    
  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      clunit2 will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.

    2. Suites

      Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.

      Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.

      (deftest testC ((suiteA suiteB)(testA testB))
        ...)
      

      Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):

      • If REPORT-PROGRESS is non-NIL, the test progress is reported.
      • If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
      • If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
      • If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
      • if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
      (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
      
        (deftest t4 (s0) ; a test that is a member of a suite
          "describe t4"
          (assert-eq 1 1))
      
        ;;a multiple assertion test that is a member of a suite with
        ;; a passing test, an error signaled and a failing test
        (deftest t4-error (s0)
          "describe t4-error"
          (assert-eq  'a 'a)
          (assert-condition error (error "t4-errored out"))
          (assert-true (= 1 2)))
      
        (deftest t4-fail (s0) ;
          "describe t4-fail"
          (assert-false (= 1 2)))
      
      (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
        (deftest t4-s1 (s1)
         (assert-true (= 1 1)))
      
      (run-suite 's0)
      
      PROGRESS:
      =========
          S0: (Test Suite)
              T4-FAIL: .
              T4-ERROR: ..F
              T4: .
      
              S1: (Test Suite)
                  T5: .
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
    3. Early termination

      You stop a suite test if a test fails without dropping into the debugger.

      (run-suite 'suite-name :stop-on-fail t)
      

      top

    4. Fixtures and Freezing Data
      (defclass fixture-data ()
        ((a :initarg :a :initform 0 :accessor a)
         (b :initarg :b :initform 0 :accessor b)))
      
      (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply
        (let ((x (make-instance 'fixture-data :a 100 :b -100)))
          @body))
      
      ;; create a sub suite and checking fixture inheritance
      (defsuite s2 (s1))
      
      (deftest t6-s1 (s1)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (deftest t6-s2 (s2)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (run-suite 's1)
      
      PROGRESS:
      =========
          S1: (Test Suite)
              T6-S1: ..
              T5: .
      
              S2: (Test Suite)
                  T6-S2: ..
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      
  6. Removing tests
    (undeftest t1)
    (undeffixture fixture-name)
    (undefsuite suite-name)
    
  7. Sequencing, Random and Failure Only

    The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit2 has a function rerun-failed-tests to rerun failed tests.

  8. Skip Capability

    Other than the dependency abilities previously mentioned, clunit2 has no additional skipping capability.

  9. Random Data Generators

    Clunit2 has no built in data generators.

16.4. Discussion

Clunit2 is a substantial step forward from clunit and should be considered the successor.

17. com.gigamonkeys.test-framework

17.1. Summary

homepage Peter Seibel BSD 2010

This is a basic testing framework without the bells and whistles found in several of the others. For example, it lacks fixtures or suites. Nothing wrong with it but you can find a lot more functionality elsewhere.

17.2. Assertion Functions

check expect

Gigamonkeys has a limited range of assertion functions. expect covers conditions and errors. check is the equivalent of is in e.g. Fiveam.

top

17.3. Usage

One thing to note on setup. If you are using quicklisp, the quickload system name is:

(ql:quickload :com.gigamonkeys.test-framework)

however the package name, at least in SBCL, is com.gigamonkeys.test

  1. Report Format

    Gigamonkey tests can be set to go into the debugger on errors, failures or never. this is controlled by the settings of *debug* (for error conditions) and *debug-on-faill* (for test failures).

  2. Basics

    The most basic passing test. Unlike most other frameworks, the empty form after the test name is for parameters which can be passed to the test. Also unlike many other frameworks calling the test using the test function uses an unquoted name of the test.

    (deftest t1 (x)
      (check (= 1 x))
      (expect division-by-zero (error 'division-by-zero)))
    
    (test t1 1)
    Okay: 2 passes; 0 failures; 0 aborts.
    T
    2
    0
    0
    

    Now a basic failing test.

    To go interactive - dropping immediately into the debugger for unexpected conditions, set *debug* to t.

    To drop immediately into the debugger when a test fails, set *debug-on-fail* to t. This is the default, but we will set it to nil for these examples

    (deftest t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (check (= x y) )))
    
    NIL
    TEST> (test t1-fail)
    FAIL ... (T1-FAIL): (= X Y)
      X                 => 1
      Y                 => 2
      (= X Y)           => NIL
    NOT okay: 0 passes; 1 failures; 0 aborts.
    NIL
    0
    1
    0
    
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Gigamonkeys has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (check (equal (values 1 2)
                    (values 1 3))))
    
    1. Closures.

      Gigamonkeys has no problem with variables declared in a closure encompassing the test.

    2. Calling another test from a test
      (deftest t3 (); a test that tries to call another test in its body
        (check (eql 'a 'a))
        (test t2))
      
      (test t3)
      Okay: 3 passes; 0 failures; 0 aborts.
      Okay: 4 passes; 0 failures; 0 aborts.
      T
      4
      0
      0
      

      So far so good, but no composition.

  4. Conditions

    The following immediately throws us into the debugger. If we hit PROCEED, we will get the feedback shown below.

      (deftest t7-bad-error ()
        (expect division-by-zero
           (error 'floating-point-overflow)))
    
      (test t7-bad-error)
    
    ABORT ... (T7-BAD-ERROR): arithmetic error FLOATING-POINT-OVERFLOW signalled
    NOT okay: 0 passes; 0 failures; 1 aborts.
    NIL
    0
    0
    1
    
  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      Gigamonkeys' test function will not accept a list of tests.

    2. Suites

      Gigamonkeys can run all the tests associated with a package, but if you want to define "suites", you should write your own function that runs specific tests.

  6. Fixtures and Freezing Data

    None

  7. Removing tests

    Gigamonkeys has functions remove-test-function and clear-package-tests

  8. Sequencing, Random and Failure Only

    None

  9. Skip Capability

    None

  10. Random Data Generators

    None

17.4. Discussion

top

17.5. Who Uses com.gigamonkeys.test-framework

("monkeylib-text-output")

top

18. Confidence

18.1. Summary

homepage Michaël Le Barbier MIT 2023

Confidence is a new entry by someone who found the existing frameworks (based on his experience with Stefil and Fiveam) either too complicated, they did not provide enough debugging information to know exactly where the problem is and were not extensible in the sense of adding additional assertions. I am sure that Confidence meets his needs, but would not meet the needs of other users.

It does report all failing assertions in a test. It has suites but no fixtures and does not allow you to provide user created diagnostic strings for the assertions. On the plus side it has some really nice floating point assertions that are not found elsewhere.

18.2. Assertion Functions

assert-char-equal assert-string-equal
assert-char< assert-string-match
assert-char<= assert-string<
assert-char= assert-string<=
assert-char> assert-string=
assert-char>= assert-string>
assert-condition assert-string>=
assert-eq assert-subsetp
assert-eql assert-t
assert-equal assert-t*
assert-equalp assert-type
assert-float-is-approximately-equal assert-vector-equal
assert-float-is-definitely-greater-than assert<
assert-float-is-definitely-less-than assert<=
assert-float-is-essentially-equal assert=
assert-list-equal assert>
assert-nil assert>=
assert-set-equal

Confidence also provides a macro for defining more assertions.

top

18.3. Usage

  1. Report Format

    Test failures in Confidence create a result object. Result object are combined and produce a report.

  2. Basics

    Tests in Confidence are functions. Unless we are calling multiple tests in the following examples, we will just call the test-case function itself. The empty form after the test name is not really described in the documentation, but can be used in parameterized test cases.

    The basic passing test below shows the floating point comparisons built into Confidence.

    (define-testcase t1 ()
      "describe t1"
      (assert-t (=  1 1))
      (assert-string< "abc" "def")
      (assert-float-is-approximately-equal 5.100000 5.1000001)
      (assert-float-is-essentially-equal 5.100000 5.1000001)
      (assert-float-is-definitely-greater-than 5.100001 5.100000)
      (assert-float-is-definitely-less-than 5.100000 5.100001)
      (assert-equal (values 1 2) (values 1 2)))
    
    (t1)
    Name: T1
    Total: 7
    Success: 7/7 (100%)
    Failure: 0/7 (0%)
    Condition: 0/7 (0%)
    Outcome: Success
    NIL
    

    A parameterized test case:

    (define-testcase t1-p (y); the most basic parameterized
      (let ((x 1))
        (assert-equal x y)))
    
    (t1-p 1)
    

    Now a basic failing test. This time we are using a more specific assertion (assert-equal..). Unlike some other frameworks, we cannot pass a descriptive string to the assertion.

    (define-testcase t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (assert-equal x y)
        (assert-equal y 3))
    
    (t1-fail)
    Name: T1-FAIL
    Total: 2
    Success: 0/2 (0%)
    Failure: 2/2 (100%)
    Condition: 0/2 (0%)
    Outcome: Failure
    ================================================================================
    #<ASSERTION-FAILURE {70066055D3}> is an assertion result of type ASSERTION-FAILURE.
    Type: :FUNCTION
    Name: ASSERT-EQUAL
    Path:
      T1-FAIL
    Arguments:
     Argument #1: 1
     Argument #2: 2
    Form: (ASSERT-EQUAL X Y)
    Outcome: Failure
    Description: Assert that A and B satisfy the EQUAL predicate.
      In this call, forms in argument position evaluate as:
    
      A: 1
    
      B: 2
    
    ================================================================================
    #<ASSERTION-FAILURE {7006605863}> is an assertion result of type ASSERTION-FAILURE.
    Type: :FUNCTION
    Name: ASSERT-EQUAL
    Path:
      T1-FAIL
    Arguments:
     Argument #1: 2
     Argument #2: 3
    Form: (ASSERT-EQUAL Y 3)
    Outcome: Failure
    Description: Assert that A and B satisfy the EQUAL predicate.
      In this call, forms in argument position evaluate as:
    
      A: 2
    
      B: 3
    
    NIL
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Multiple assertions, loops. closures and calling other tests
      (t2)
      Name: T2
      Total: 3
      Success: 1/3 (33%)
      Failure: 2/3 (67%)
      Condition: 0/3 (0%)
      Outcome: Failure
      ================================================================================
      #<ASSERTION-FAILURE {700692EF83}> is an assertion result of type ASSERTION-FAILURE.
      Type: :FUNCTION
      Name: ASSERT-EQUAL
      Path:
        T2
      Arguments:
       Argument #1: 1
       Argument #2: 2
      Form: (ASSERT-EQUAL 1 2)
      Outcome: Failure
      Description: Assert that A and B satisfy the EQUAL predicate.
        In this call, forms in argument position evaluate as:
    
        A: 1
    
        B: 2
    
      ================================================================================
      #<ASSERTION-FAILURE {700692F1F3}> is an assertion result of type ASSERTION-FAILURE.
      Type: :FUNCTION
      Name: ASSERT-EQUAL
      Path:
        T2
      Arguments:
       Argument #1: 2
       Argument #2: 3
      Form: (ASSERT-EQUAL 2 3)
      Outcome: Failure
      Description: Assert that A and B satisfy the EQUAL predicate.
        In this call, forms in argument position evaluate as:
    
        A: 2
    
        B: 3
    
    NIL
    

    Confidence had no problem with the values expression as such, but like almost all the frameworks, it only looked at the first value.

    1. Closures

      Confidence has no problem accessing variables defined in a closure encompassing the test.

      (let ((l1 '(#\a #\B #\z))
             (l2 '(97 66 122)))
       (define-testcase t2-loop ()
         (loop for x in l1 for y in l2 do
           (assert-equal (char-code x) y))))
      
           (t2-loop)
           Name: T2-LOOP
           Total: 3
           Success: 3/3 (100%)
           Failure: 0/3 (0%)
           Condition: 0/3 (0%)
           Outcome: Success
           NIL
      

      Checking assert= with multiple values.

      (define-testcase t2-with-multiple-values ()
            (assert= 1 1 2))
        T2-WITH-MULTIPLE-VALUES
        (t2-with-multiple-values)
      (t2-with-multiple-values)
      Name: T2-WITH-MULTIPLE-VALUES
      Total: 1
      Success: 0/1 (0%)
      Failure: 0/1 (0%)
      Condition: 1/1 (100%)
      Outcome: Failure
      ================================================================================
      #<ASSERTION-CONDITION {70070524F3}> is an assertion result of type ASSERTION-CONDITION.
      Type: :FUNCTION
      Name: ASSERT=
      Path:
        T2-WITH-MULTIPLE-VALUES
      Arguments:
       Argument #1: 1
       Argument #2: 1
       Argument #3: 2
      Form: (ASSERT= 1 1 2)
      Outcome: Condition
      Condition: #<SB-INT:SIMPLE-PROGRAM-ERROR "invalid number of arguments: ~S" {7006F4B683}>
      #<SB-INT:SIMPLE-PROGRAM-ERROR "invalid number of arguments: ~S" {7006F..
        [condition]
      
      Slots with :INSTANCE allocation:
        FORMAT-CONTROL                 = "invalid number of arguments: ~S"
        FORMAT-ARGUMENTS               = (3)
        In this call, forms in argument position evaluate as:
      
        A: 1
      
        B: 1
      
      NIL
      

      It failed. All the assertions in Confidence compare two values only.

    2. Calling another test from a test

      If you call a test within a test in Most other frameworks you will effectively get two reports. Confidence actually composes the results.

      (define-testcase t3 ()
        "describe t3 which is a test that tries to call another test in its body"
        (assert-equal 'a 'a)
        (t1))
      (t3)
      Name: T3
      Total: 8
      Success: 8/8 (100%)
      Failure: 0/8 (0%)
      Condition: 0/8 (0%)
      Outcome: Success
      NIL
      

      If test t3 called test t2, we would have seen the following in the debugger which implies that the assertion failure was in t2, not t3:

      Name: T3
      Total: 4
      Success: 2/4 (50%)
      Failure: 2/4 (50%)
      Condition: 0/4 (0%)
      Outcome: Failure
      ================================================================================
      Name: T2
      Total: 3
      Success: 1/3 (33%)
      Failure: 2/3 (67%)
      Condition: 0/3 (0%)
      Outcome: Failure
      ================================================================================
      #<ASSERTION-FAILURE {7007FEADA3}> is an assertion result of type ASSERTION-FAILURE.
      Type: :FUNCTION
      Name: ASSERT-EQUAL
      Path:
        T3
          T2
      Arguments:
       Argument #1: 1
       Argument #2: 2
      Form: (ASSERT-EQUAL 1 2)
      Outcome: Failure
      Description: Assert that A and B satisfy the EQUAL predicate.
        In this call, forms in argument position evaluate as:
      
        A: 1
      
        B: 2
      
      ================================================================================
      #<ASSERTION-FAILURE {7007FEB013}> is an assertion result of type ASSERTION-FAILURE.
      Type: :FUNCTION
      Name: ASSERT-EQUAL
      Path:
        T3
          T2
      Arguments:
       Argument #1: 2
       Argument #2: 3
      Form: (ASSERT-EQUAL 2 3)
      Outcome: Failure
      Description: Assert that A and B satisfy the EQUAL predicate.
        In this call, forms in argument position evaluate as:
      
        A: 2
      
        B: 3
      
  4. Conditions

    Confidence has a macro assert-condition that verifies that a form signals a condition of a certain class. It can also examine the slots of the condition with assertions:

    (define-testcase t4 ()
      (assert-condition
          (error 'testing-framework :a "a" :b "b" :c "c")
          testing-framework
          (a b)
        (assert-string= "a" a)
        (assert-string= "b" b)))
    
    (t4)
    Name: T4
    Total: 1
    Success: 1/1 (100%)
    Failure: 0/1 (0%)
    Condition: 0/1 (0%)
    Outcome: Success
    NIL
    
  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      There is no "run-test" like function in Confidence. If you want to run a list of tests, you need to define a test that funcalls those tests.

    2. Suites

      Tests can be nested and will generate a composed summary. This might be considered a "suite" capability.

  6. Fixtures and Freezing Data

    There is no additional capability in Confidence for fixtures or freezing data.

  7. Removing tests

    None

  8. Sequencing, Random and Failure Only

    Tests will be called in sequence and there is no random shuffling or skipping ability.

  9. Skip Capability

    None other than provided in the debugger.

  10. Random Data Generators

    None

18.4. Discussion

Confidence does provide the ability to define more assertions.

In summary, Confidence is interesting, but it will not make me change from another framework. I do think some other frameworks might want to follow its lead in having some float comparision assertions.

top

18.5. Who Uses

19. fiasco

19.1. Summary

homepage João Távora BSD 2 Clause 2020

In spite of the fact that Fiasco does not have its own fixture capability (unless I am missing something), it managed to hit most of the other concerns that I have. It does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, suites are available as is the ability to rerun failed tests. It has skipping functions and, with respect to the edge cases, it handles variables declared in a closure surrounding the test. When a test calls another test it actually manages to compose the results rather than reporting two separate sets.

I found using the suite capability to be confusing (and likely would end up defining packages instead of suites, but you lose some compostion that way). It also does not have the ability some other frameworks have to deal with values expressions.

19.2. Assertion Functions

is finishes not-signals signals

top

19.3. Usage

  1. Report Format

    Fiasco defaults to a reporting format. To go interactive run the test with :interactive t.

    There are two slightly different versions of reporting format, the default and running the test with :verbose t. The verbose version simply adds the docstring for the test, so it does not really add much.

    In the following example, look at the difference in reporting between the four assertions. The first assertion has an = predicate comparing numbers. The second has an equal predicate comparing numbers and the third has variables and a diagnostic string that accepts parameters followed by the variables being passed to the test variables.

    (deftest t1-fail ()
      "Docstring for test t1-fail"
      (let ((x 1) (y 2))
        (is (= 1 2))
        (is (equal 1 2))
        (is (= x y)
            "This test was meant to fail because we know ~a is not = to ~a"
            x y )
        (signals division-by-zero
                 (error 'floating-point-overflow)
                 "testing condition assertions. This should fail")))
    
    (run-tests 't1-fail)
    T1-FAIL...................................................................[FAIL]
    
    Test run had 4 failures:
    
    Failure 1: UNEXPECTED-ERROR when running T1-FAIL
    arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    Failure 2: FAILED-ASSERTION when running T1-FAIL
    This test was meant to fail because we know 1 is not = to 2
    
    Failure 3: FAILED-ASSERTION when running T1-FAIL
    Binary predicate (EQUAL X Y) failed.
    x: 1 => 1
    y: 2 => 2
    
    Failure 4: FAILED-ASSERTION when running T1-FAIL
    Binary predicate (= X Y) failed.
    x: 1 => 1
    y: 2 => 2
    NIL
    (#<test-run of T1-FAIL: 1 test, 4 assertions, 4 failures in NIL sec (3 failed assertions, 1 error, none expected)>)
    

    The interactive version would look like this:

    (run-tests 't1-fail :interactive t)
      Test assertion failed when running T1-FAIL:
    
      Binary predicate (= X Y) failed.
      x: 1 => 1
      y: 2 => 2
         [Condition of type FIASCO::FAILED-ASSERTION]
    
      Restarts:
       0: [CONTINUE] Roger, go on testing...
       1: [CONTINUE] Skip the rest of the test T1-FAIL and continue by returning (values)
       2: [RETEST] Rerun the test T1-FAIL
       3: [CONTINUE-WITHOUT-DEBUGGING] Turn off debugging for this test session and invoke the first CONTINUE restart
       4: [CONTINUE-WITHOUT-DEBUGGING-ERRORS] Do not stop at unexpected errors for the rest of this test session and continue by invoking the first CONTINUE restart
       5: [CONTINUE-WITHOUT-DEBUGGING-ASSERTIONS] Do not stop at failed assertions for the rest of this test session and continue by invoking the first CONTINUE restart
    
  2. Basics

    The empty form after the test name is for parameters to pass to the test. Calling the test using run-tests returns a list of context objects which is an internal class. Each test run is pushed to a history of test runs kept in the appropriately named *test-result-history*.

    (deftest t1 ()
      "docstring for t1"
      (is (=  1 1) "first assertion")
      (is (eq 'a 'a) "second assertion")
      (signals division-by-zero (error 'division-by-zero))
      (finishes (+ 1 1)))
    T1
    (run-tests 't1) ;; or (run-tests '(t1))
    T1........................................................................[ OK ]
    
    T
    (#<test-run of T1: 1 test, 4 assertions, 0 failures in 1.4e-5 sec>)
    

    If you add the keyword parameter :verbose, you get slightly more information in that it prints the test docstring (but not the assertion docstrings) and the number of assertions, failures etc.

    (run-tests 't1 :verbose t)
    T1........................................................................[ OK ]
    (docstring for t1)
    (4 assertions, 0 failed, 0 errors, 0 expected)
    

    Fiasco tests are funcallable. You will note that when calling the test in this fashion returns a single test-run object rather than a list of test run objects.

    (t1)
    .
    T
    #<test-run of T1: 1 test, 1 assertion, 0 failures in 2.8e-5 sec>
    
    (funcall 't1)
    .
    T
    #<test-run of T1: 1 test, 4 assertion, 0 failures in 2.6e-5 sec>
    

    Fiasco tests also take parameters as in this example:

    (deftest t1-param (x) (is (= 1 x)))
    
    (t1-param 1)
    #<test-run of T1-PARAM: 1 test, 1 assertion, 0 failures in 2.2e-5 sec>
    
    (t1-param 2)
    X; Evaluation aborted on #<FIASCO::FAILED-ASSERTION "Binary assertion function ~A failed.~%~
                                   x: ~S => ~S~%~
                                   y: ~S => ~S" {100297EE03}>.
    

    You do not have to manually recompile a test after a tested function has been modified. We will skip the proof.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Fiasco has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (is (equalp (values 1 2)
                    (values 1 3))))
    
    1. Looping and closures.

      Fiasco has no problems with using variables declared in a closure surrounding the test.

    2. Calling another test from a test
      (deftest t3 (); a test that tries to call another test in its body
          (is (eq 'a 'a))
          (t2))
      
      (t3)
      .XX.
      T
      #<test-run of T3: 2 tests, 4 assertions, 2 failures in 0.063884 sec (2 failed assertions, 0 errors, none expected)>
      

      As hoped, the failures in t2 kicked us into the debugger where we could select continue and correctly end up with 2 tests, 4 assertions and 2 failures. This is better than most frameworks which would present us with two reports rather than a composed report.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Fiasco has no problem running lists of tests

      (run-tests '(t1 t2))
      T1........................................................................[ OK ]
      
      T2........................................................................[ OK ]
      
      T
      (#<test-run of T1: 1 test, 1 assertion, 0 failures in 1.4e-5 sec>
       #<test-run of T2: 1 test, 3 assertions, 0 failures in 5.0e-6 sec>)
      
    2. Suites

      Fiasco can test for suites or all tests associated with a fiasco defined package.

      Let's start with packages.

      1. Packages

        You will need to use fiasco's define-test-package macro rather than define-package in order to use the run-package-tests function. Inside the new package, the function run-package-tests is the preferred way to execute the suite. To run the tests from outside, use run-tests.

        The function run-package-tests will print a report and returns two values. It accepts a :stream keyword parameter, making it easy to redirect the output to a file if so desired.

        The first value returned will be t if all tests passed, nil otherwise. The second value will be a list of context objects which contain various information about the test run. See the following example, modified slightly from https://github.com/joaotavora/fiasco/blob/master/test/suite-tests.lisp.

        (fiasco:define-test-package #:tf-fiasco-examples)
        
        (in-package :tf-fiasco-examples)
        
        (defun seconds (hours-and-minutes)
          (+ (* 3600 (first hours-and-minutes))
             (* 60 (second hours-and-minutes))))
        
        (defun hours-and-minutes (seconds)
          (list (truncate seconds 3600)
                (truncate seconds 60)))
        
        (deftest test-conversion-to-hours-and-minutes ()
          (is (equal (hours-and-minutes 180) '(0 3)))
          (is (equal (hours-and-minutes 4500) '(1 15))))
        
        (deftest test-conversion-to-seconds ()
          (is (= 60 (seconds '(0 1))))
          (is (= 4500 (seconds '(1 15)))))
        
        (deftest double-conversion ()
          (is (= 3600 (seconds (hours-and-minutes 3600))))
          (is (= 1234 (seconds (hours-and-minutes 1234)))))
        
        (deftest test-skip-test ()
          (skip)
          ;; These should not affect the test statistics below.
          (is (= 1 1))
          (is (= 1 2)))
        
        (run-package-tests :package :tf-fiasco-examples)
        TF-FIASCO-EXAMPLES (Suite)
          TEST-CONVERSION-TO-HOURS-AND-MINUTES....................................[FAIL]
          TEST-CONVERSION-TO-SECONDS..............................................[ OK ]
          DOUBLE-CONVERSION.......................................................[FAIL]
          TEST-SKIP-TEST..........................................................[SKIP]
        
        Test run had 3 failures:
        
          Failure 1: FAILED-ASSERTION when running DOUBLE-CONVERSION
            Binary assertion function (= X Y) failed.
            x: 1234 => 1234
            y: (SECONDS (HOURS-AND-MINUTES 1234)) => 1200
        
          Failure 2: FAILED-ASSERTION when running DOUBLE-CONVERSION
            Binary assertion function (= X Y) failed.
            x: 3600 => 3600
            y: (SECONDS (HOURS-AND-MINUTES 3600)) => 7200
        
          Failure 3: FAILED-ASSERTION when running TEST-CONVERSION-TO-HOURS-AND-MINUTES
            Binary assertion function (EQUAL X Y) failed.
            x: (HOURS-AND-MINUTES 4500) => (1 75)
            y: '(1 15) => (1 15)
        NIL
        (#<test-run of TF-FIASCO-EXAMPLES: 5 tests, 6 assertions, 3 failures in 5.5e-4 sec (3 failed assertions, 0 errors, none expected)>)
        

        You can drop the explanations of the failures by passing nil to :describe-failures

        (run-package-tests :package :tf-fiasco-examples :describe-failures nil)
        

        There is an undocumented function run-failed-tests which looks at the last test run. My issue with this function is that it seems to need *debug-on-assertion-failure* and *debug-on-unexpected-error* set to T in order to work, which means that it forces me into the debugger whether I want it to or not.

      2. Suites

        Suites are created by the (defsuite) macro, but they are really just tests that call other tests. Phil Gold's original concern about suites in Stefil was "My only problem with the setup is that I don't see a way to explicitly assign tests to suites, aside from dynamically binding stefil::*suite*. Normally, the current suite is set by in-suite, which requires careful attention if you're jumping between different suites often. (A somewhat mitigating factor is that tests remember which suite they were created in, so the current suite only matters for newly-defined tests.)" I think that concern is just as valid in fiasco.

        I find using suites in fiasco very confusing. Everything I looked at in quicklisp that used fiasco used run-package-tests rather than run-suite-tests. YMMV.

        top

  5. Fixtures and Freezing Data

    None that I am aware of.

  6. Removing tests

    Fiasco has the ability to delete tests, but it is not an exported function:

    (fiasco::delete-test 't1)
    
  7. Sequencing, Random and Failure Only

    Fiasco has a function run-failed-tests to run the tests that failed last time.

  8. Skip Capability
    1. Assertions

      Fiasco has skip functions skip and skip-unless. The following will cause the test to skip the second assertion.

      (deftest test-skip-test ()
         (is (= 1 1))
         (skip)
         (is (= 1 2)))
      
  9. Random Data Generators

    None

19.4. Additional Discussion

While Fiasco has a function to re-run failed tests, if I wanted to collect the names of the failing tests so that I could save them for some other purpose, I might do something like:

(defun collect-test-failure-names (package-name)
  "Runs a package test on the package and returns the names of the failing tests"
  (multiple-value-bind (x results)
      (run-package-tests :package package-name :describe-failures nil)
    (declare (ignore x))
    (let ((result (first results)))
      (when (typep result 'fiasco::context)
        (loop for test-result in (fiasco::children-contexts-of result)
              when (fiasco::failures-of test-result)
                collect (fiasco::name-of (fiasco::test-of test-result)))))))

top

19.5. Who Uses Fiasco

At last count 24 libraries on quicklisp use Fiasco. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :fiasco)

top

20. fiveam

20.1. Summary

homepage Edward Marco Baringer BSD 2020

Fiveam has a lot of market share. At the time of writing, it has 10 issues and 9 pull requests with no responses. The README at the github page is lacking but good documentation exists at common-lisp.net or the turtleware tutorial. There are also examples at Common Lisp Cookbook

Obviously with its market share there is a lot to like. It does report all the assertion failures in a test, allows user defined diagnostic messages with variables, interactive debugging is optional, it has suites and it runs lists of tests.

From a speed standpoint, it is either middle of the pack or, on a big test package and running in an emacs repl, vying with clunit for painfully slow. In such a case, if you are deciding between more tests with fewer assertions or fewer tests with more assertions, go with more tests and fewer assertions (but this is an emacs problem more than a fiveam problem). I do not know what using other editors would be like.

My wishlist for Fiveam is better fixture capability, the edge case ability to handle value expressions and variables declared in closures surrounding the test and get rid of all those blank lines in the failure reports.

20.2. Assertion Functions

is is-false finishes signals fail pass skip

20.3. Usage

Generally speaking tests are called using the run and run! functions. If you set *run-test-when-defined* to T, tests will be run as soon as they are defined (which includes hitting C-c C-c in the source code (assuming you are doing this in an editor with slime or sly or some such.

  1. Report Format

    Fiveam will default to a reporting format. The format you get will depend on whether you call run or run!.

    • run provides a progress report using the typical dot/f/e format and which returns a list of the assertion result objects.
    • run! provides the progress report, more details on failure and does not return a list of the passing assertion result objects.

    The following will allow you to turn off the progress report:

    (let ((fiveam:*test-dribble* (make-broadcast-stream)))
      (fiveam:run! …))
    

    To demonstrate the difference in the reports assume the following test that has a couple of passes and a couple of failures. We will insert a diagnostic string in the second assertion with a couple of variables to use in the string.

    (test t1-fail
      "describe t1-fail"
      (let ((x 1) (y 2))
        (is (eql 1 2))
        (is (equal x y)
            "We deliberately ensured that the first parameters ~a is not equal to the second parameter ~a" x y)
        (is-false (eq 'b 'b))
        (pass "I do not want to run this test of ~a but will say it passes anyway" '(= 1 1))
        (skip "Skip the next test because reasons")
        (finishes (+ 1 2))
        (signals division-by-zero (error 'floating-point-overflow))))
    

    Now using the simple run, we get:

    (run 't1-fail)
    
    Running test T1-FAIL fff.ss.X
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {10023556C3}>
     #<IT.BESE.FIVEAM::TEST-PASSED {10023550B3}>
     #<IT.BESE.FIVEAM::TEST-SKIPPED {1002354F43}>
     #<IT.BESE.FIVEAM::TEST-PASSED {1002354CF3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002354043}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002353463}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002352D03}>)
    

    We can immediately see that (run) gives us a progress report showing f for failure, . for pass, s for skip and X for an error and a list of test-result objects. We can get more details, including the diagnostic messages using (run!).

      (run! 't1-fail)
    
    
    Running test T1-FAIL fff.ss.X
     Did 7 checks.
        Pass: 2 (28%)
        Skip: 1 (14%)
        Fail: 4 (57%)
    
     Failure Details:
     --------------------------------
     T1-FAIL in S0 [describe t1-fail]:
    
    2
    
     evaluated to
    
    2
    
     which is not
    
    EQL
    
     to
    
    1
    
    
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          We deliberately ensured that the first parameters 1 is not equal to the second parameter 2
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          (EQ 'B 'B) returned the value T, which is true
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          Unexpected Error: #<FLOATING-POINT-OVERFLOW {100236BE63}>
    arithmetic error FLOATING-POINT-OVERFLOW signalled.
     --------------------------------
    
     Skip Details:
     T1-FAIL []:
         Skip the next test because reasons
    
    NIL
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {100236C3C3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {100236AD43}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {100236A163}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002369D33}>)
    (#<IT.BESE.FIVEAM::TEST-SKIPPED {100236BC43}>)
    

    Did you notice anything about the test results returned using run compared to run!? run! did not return any test-passed results

    Personally I hate the immense amount of wasted space fiveam generates using run!.

    By the way, if we set *verbose-failures* to T, it will add the failing expression to the failure details.

    Fiveam does have optionality to drop into the debugger on errors or failures. You can set those individually:

    • (setf *on-error* :debug) if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.
    • (setf *on-failure* :debug) if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.
  2. Basics

    We already saw a test using Fiveam's testing functions above with some passes and fails above, so we will skip repeating ourselves.

    As expected, you do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Now let's try a test with values expressions:

    (test t2 ; the most basic named test with multiple assertions and values expressions
      "describe t2"
      (let ((x 1) (y 2))
        (is (equal 1 2))
        (is (equal x y))
        (is (equal (values 1 2) (values 1 2)))))
    ; in: ALEXANDRIA:NAMED-LAMBDA %TEST-T2
    ;     (IT.BESE.FIVEAM:IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    ;
    ; caught ERROR:
    ;   during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    ;
    ;    Both the expected and actual part is a values expression.
    ;
    ; compilation unit finished
    ;   caught 1 ERROR condition
    

    Fiveam threw an error on the values expression on compilation, but continued with the compiliation.

    Now to run the failing test.

    (run! 't2)
    
    Running test T2 ffX
     Did 3 checks.
        Pass: 0 ( 0%)
        Skip: 0 ( 0%)
        Fail: 3 (100%)
     Failure Details:
     --------------------------------
     T2 [describe t2]:
    2
     evaluated to
    2
     which is not
    EQUAL
     to
    1
     --------------------------------
     --------------------------------
     T2 [describe t2]:
    Y
     evaluated to
    2
     which is not
    EQUAL
     to
    1
     --------------------------------
     --------------------------------
     T2 [describe t2]:
          Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {102D38B283}>
    Execution of a form compiled with errors.
    Form:
      (IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    Compile-time error:
      during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    
     Both the expected and actual part is a values expression..
     --------------------------------
    NIL
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {102D38BB73}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {102D38B1B3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {102D38AD93}>)
    NIL
    

    Notice the ffX in the report. The f indicate failing assertions. The X indicates that the assertion threw an error instead of failing. No, fiveam does not like value expressions.

    What happens if we try to call a test inside a test?

    (test t3
        "a test that tries to call another test in its body"
        (is (equal 'a 'a))
        (run! t2))
    
    (run! 't3)
    
    Running test T3 .
    Running test T2 ..X
     Did 3 checks.
        Pass: 2 (66%)
        Skip: 0 ( 0%)
        Fail: 1 (33%)
    
     Failure Details:
     --------------------------------
     T2 in S1 [describe t2]:
          Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {1008EE35A3}>
    Execution of a form compiled with errors.
    Form:
      (IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    Compile-time error:
      during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    
     Both the expected and actual part is a values expression..
     --------------------------------
    
     Did 1 check.
        Pass: 1 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    
    T
    

    So we can run tests within tests.

    1. Closures Variables

      Fiveam cannot find variables declared in a closure surrounding the test. For example, the following fails.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (test t2-loop
        (loop for x in l1 for y in l2 do
          (is (= (char-code x) y)))))
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Fiveam can run lists of tests

      (run! '(t1 t2))
      
    2. Suites

      Suites are relatively straight forward in fiveam so long as you remember that you need to define yourself as being in a suite and any test defined after that will be placed in that suite. I surprised myself once after compiling a test file and then defining some tests in the REPL. As far as fiveam was concerned I was still in the suite defined in the test file, so the the tests defined in the REPL had been added to the suite.

        (def-suite :s0 ; Ultimate parent suite
          :description "describe suite 0")
      
       (in-suite :s0)
      ;; Any test defined after this will be in suite s0 until a new suite is specified
      
        (test t4  ; a test that is a member of a suite
          "describe t4"
          (is (equal 1 1)))
      
      (run! :s0)
      

      Suites can be nested. Here we have suite :s1 that is nested in suite :s0

      (def-suite :s1
        :in :s0)
      
  5. Fixtures and Freezing Data

    As far as I can tell, fixtures and freezing data are basically the same for Fiveam. The fiveam maintainer admits that maybe it's fixture capability is not "the best designed feature".

    ;; Create a class for data fixture purposes
    (defclass class-A ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (defparameter *some-existing-data-parameter*
      (make-instance 'class-A :a 17.3 :b -12))
    
    (def-fixture f1 ()
      (let ((old-parameter *some-existing-data-parameter*))
        (setf *some-existing-data-parameter*
            (make-instance 'class-A :a 100 :b -100))
        (&body)
        (setf *some-existing-data-parameter* old-parameter)))
    
    (def-test t6-f1 (:fixture f1)
      (is (equal (a *some-existing-data-parameter*) 100))
      (is (equal (b *some-existing-data-parameter*) -100)))
    
    ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter*
    
    (run! 't6-f1)
    
    Running test T6-F1 ..
     Did 2 checks.
        Pass: 2 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    

    top

  6. Removing tests

    Fiveam has the functions rem-test and rem-fixture

  7. Sequencing, Random and Failure Only

    The tests are randomly shuffled. The run! function will return a list of failed-test objects (the run function does not).

  8. Skip Capability

    Fiveam does some skip capability

  9. Random Testing and Data Generators

    Fiveam Generates lambda functions for buffers, character, float, integer, list, one-element, string and tree character, string and lists. Some examples:

    (funcall (gen-float))
    1.3259344e38
    
    (funcall (gen-buffer))
    #(115 238 129 72 84 40 230)
    
    (funcall (gen-character :code-limit 256))
    #\Etx
    
    (funcall (gen-integer :max 27 :min -16))
    -4
    
    (funcall (gen-list ))
    (-1 4)
    
    (funcall (gen-string))
    "򅦜􇨲򫎂𣻨򋷂񋖧􌽆󗍨𪽉𴾻󮨠󙢝鞀󻕨򐓺蠿𬚽𬁬񭷱򐖴㍨󀜤󘛋򉚇򓉛𠫼򞼫񸔝𺍬񴫰㽈󽜔󇠰񅉳鉄󠪔"
    
    (funcall (gen-string :elements (gen-character :code-limit 122 :alphanumericp t)))
    "exAarlUllrgsQZQAnUYeKIbZQuPYAKNLvTyMcIYlLoYS"
    
    (funcall (gen-tree :size 10))
    ((((-2 ((-3 6) (2 ((3 (6 (10 10))) ((10 4) -9))))) (-8 -8))
      (((-7 8) -3) (-10 ((((1 -5) (6 ((-9 -6) 4))) ((5 -9) (0 (-4 -8)))) -2))))
     (((((9 (5 ((3 -1) ((0 -10) -5)))) (((4 (7 -8)) (-5 (6 7))) -4)) -3)
       (6 (2 ((-5 6) (2 (((9 -1) -5) -5))))))
      6))
    

20.4. Discussion

If you recall, run returns a list of all test-result objects, but run! returns just the failing test-result objects. If you wanted to use run, but just wanted a list of the failing test names, you can do something like the following:

(defun collect-failing-test-case-names (suite)
  "Takes a suite, calls the run function and returns a list of the test names that failed."
  (loop for x in (run suite)
        when (typep x 'fiveam::test-failure)
          collect (fiveam::name (fiveam::test-case x))))

(collect-failing-test-case-names :s0)

Running test suite S0
Running test T4 .
Running test T4-ERROR ..f
Running test T4-FAIL f
Running test T6-F1 ..
Running test T5 f
Running test T4-FAIL-2 f
(T4-ERROR T4-FAIL T5 T4-FAIL-2)

Fiveam does not have a time limit threshold that you can set like Parachute or Prove, but you can set a *max-trials* variable to prevent infinite loops. It also has undocumented profiling capability that I did not look at.

20.5. Who Uses Fiveam

Many libraries on quicklisp use fiveam. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :fiveam)

top

top

21. lift

top

21.1. Summary

homepage Gary Warren King MIT 2019 (c)

Documentation for Lift can be found here, but a lot of sections are "To be written".

The original Phil Gold review noted his concerns about speed and memory footprint: "The larger problem, though, was its speed and memory footprint. Defining tests is very slow; when using LIFT, the time necessary to compile and load all of my Project Euler code jumped from the other frameworks' average of about 1.5 minutes to over nine minutes. Redefining tests felt even slower than defining them initially, but I don't have solid numbers on that. After loading everything, memory usage was more than twice that of other frameworks. Running all of the tests took more than a minute longer than other frameworks, though that seems mostly to be a result of swapping induced by LIFT's greater memory requirements."

I did not benchmark compiling tests, just running the tests and as you can see from the benchmarks, lift is one of the fastest frameworks. His concern on runtime was not borne out in my benchmark, but uax-15 is also very different from Project Euler. YMMV.

There is a lot I like about Lift and there are undocumented features that you could spent a few days exploring.

There are two annoyances.

  • Multiple assertion problem:. If you have multiple assertions in a test, the tests stop at the first assertion failure. I can understand that if the intent is to get thrown into the debugger and fix the failure immediately, but not when you are running reports. There are reasons why I would put multiple assertions into a test. The obvious work around is only one assertion per test. You then have to create possibly hundreds of suites and then use addtest to add the test to your suite.
  • Clumsy failure reporting

Lift has both hierarchical suites and tags, what it calls categories.

21.2. Assertion Functions

ensure ensure-cases
ensure-cases-failure ensure-condition
ensure-different ensure-directories-exist
ensure-directory ensure-error
ensure-every ensure-expected-condition
ensure-expected-no-warning-condition ensure-failed
ensure-failed-error ensure-function
ensure-generic-function ensure-list
ensure-member ensure-no-warning
ensure-not-same ensure-null
ensure-null-failed-error ensure-random-cases
ensure-random-cases+ ensure-random-cases-failure
ensure-same ensure-some
ensure-string ensure-symbol
ensure-warning  

top

21.3. Usage

Unlike most frameworks, lift provides variables for *test-maximum-error-count*, *test-maximum-failure-count* and *test-maximum-time* which can be set so that large sets of failing tests can be shutdown down early without wasting time.

Lift has a lot of undocumented functionality. For example, you can generate log-entries which seems to have something to do with sample counts and profiling, but those have no documentation either.

  1. Report Format

    Like some other frameworks, Lift will run tests as you compile them. If you use run-test (singular) or run-tests (plural), you get a very limited amount of information that will look something like this:

     (run-test :name 't4-s1-1)
    #<S1.T4-S1-1 failed>
    
      (run-tests)
      Start: S0
      #<Results for S0 1 Tests, 1 Failure>
    

    If you (setf *test-describe-if-not-successful?* t) you get a lot more information, but you are really just running (describe) on the test result object. We run only the single test version this time:

    (run-test :name 't4-s1-1)
    #<S1.T4-S1-1 failed
    Failure: s1 : t4-s1-1
      Documentation: NIL
      Source       : /tmp/slimeBSclUz
      Condition    : Ensure failed: (= 2 3) ()
      During       : (END-TEST)
      Code         : (
      ((ENSURE (= 2 3)) (ENSURE (= 4 4)) (ENSURE (EQL 'B 'C))))
      >
    

    Lift can print also test result details. The first parameter is the stream to which to direct the output, the second parameter is getting the test result. The third parameter is show-expected-p and the fourth is show-code-p. All parameters must be provided.

    (print-test-result-details *standard-output* (run-test :name 't1-fail) t t)
    Failure: tf-lift : t1-fail
      Documentation: NIL
      Source       : NIL
      Condition    : Ensure failed: (= X
                                       Y) (This test was meant to fail because 1 is not = 2)
      During       : (END-TEST)
      Code         : (
      ((LET ((X 1) (Y 2))
         (ENSURE (= X Y) :REPORT "This test was meant to fail because ~a is not = ~a"
                 :ARGUMENTS (X Y)))))
    

    The function run-tests takes a keyword parameter :report-pathname which will direct a substantial amount of information to the designated file. The following example runs all the tests associated with suite s0 (either directly or indirectly). You can also set the variable *lift-report-pathname* to a pathname. Any subsequent failure reports will be printed there. Lift does not have progress reports if that is important to you.

    (run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
    

    Opening that file may show results looking something like:

    ((:RESULTS-FOR . S0)
    (:ARGUMENTS . (:SUITE ("S0" . "TF-LIFT") :REPORT-PATHNAME #P"/tmp/lift-1.txt"))
    (:FEATURES . (:HUNCHENTOOT-SBCL-DEBUG-PRINT-VARIABLE-ALIST :5AM
                  :OSICAT-FD-STREAMS :ITER :NAMED-READTABLES :UTF-32 :TOOT
                  :SBCL-DEBUG-PRINT-VARIABLE-ALIST :SPLIT-SEQUENCE
                  CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64
                  CFFI-FEATURES:UNIX :CFFI CFFI-SYS::FLAT-NAMESPACE :FLEXI-STREAMS
                  :CL-FAD :CHUNGA :LISP-UNIT :CLOSER-MOP :CL-PPCRE
                  :BORDEAUX-THREADS ALEXANDRIA::SEQUENCE-EMPTYP :THREAD-SUPPORT
                  :SWANK :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2 :ASDF
                  :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :GENCGC
                  :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
                  :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS
                  :SB-THREAD :SB-UNICODE :SBCL :UNIX))
    (:DATETIME . 3831119803)
    )
    (
    (:SUITE . ("S0" . "TF-LIFT"))
    (:NAME . ("T4-S0-1" . "TF-LIFT"))
    (:START-TIME . 3831119803000)
    (:END-TIME . 3831119803000)
    (:SECONDS . 0.0d0)
    (:CONSES . 0)
    (:RESULT . T)
    )
    (
    (:SUITE . ("S0" . "TF-LIFT"))
    (:NAME . ("T4-S0-2" . "TF-LIFT"))
    (:START-TIME . 3831119803000)
    (:END-TIME . 3831119803000)
    (:PROBLEM-KIND . "failure")
    (:PROBLEM-STEP . :END-TEST)
    (:PROBLEM-CONDITION . "#<ENSURE-FAILED-ERROR {100DA7E143}>")
    (:PROBLEM-CONDITION-DESCRIPTION . "Ensure failed: (= 1 2) ()")
    )
    (
    (:TEST-CASE-COUNT . 2)
    (:TEST-SUITE-COUNT . 1)
    (:FAILURE-COUNT . 1)
    (:ERROR-COUNT . 0)
    (:EXPECTED-FAILURE-COUNT . 0)
    (:EXPECTED-ERROR-COUNT . 0)
    (:SKIPPED-TESTSUITES-COUNT . 0)
    (:SKIPPED-TEST-CASES-COUNT . 0)
    (:START-TIME-UNIVERSAL . 3831119803)
    (:END-TIME-UNIVERSAL . 3831119803)
    (:FAILURES . ((("S0" . "TF-LIFT") ("T4-S0-2" . "TF-LIFT")))))
    

    Adding a test and compiling it will, as noted, cause it to be run immediately, but all you get is pass or fail. Let's try to get a little more information by using describe *.

      (addtest (s1) t4-s1-4
          (ensure (= 3 4)))
      #<Test failed>
    
    (describe *)
    Test Report for S1: 1 test run, 1 Failure.
    
    Failure: s1 : t4-s1-4
      Documentation: NIL
      Source       : NIL
      Condition    : Ensure failed: (= 3 4) ()
      During       : (END-TEST)
      Code         : (
      ((ENSURE (= 3 4))))
    
    Test Report for S1: 1 test run, 1 Failure.
    

    I would note that in tests with multiple assertions, Lift only shows the first failure, not all failures. That is a real problem for me because I want the results of multiple assertions if I am tracking down one of my many bugs.

    To go interactive - dropping immediately into the debugger, you would set one or more key word parameter based on what condition should throw you into the debugger. The following will throw you into the debugger on failures but not on errors.

    (run-test :name 't1-function :break-on-errors? nil :break-on-failures? t)
    
  2. Basics

    Lift really wants a suite to be defined first before any tests are defined. The first form after the suite name would contain the name of a parent suite (if any). The second form would be used for suite slot specifications which are used with fixtures and will be discussed below.

    (deftestsuite tf-lift () ())
    

    Starting with the most basic named test. This adds a test to the most recently defined suite or you can insert a form before the test name which specifies which suite for the test

    (addtest t1
        (ensure (equal 1 1))
        (ensure-condition division-by-zero
          (error 'division-by-zero))
        (ensure-same 1 1)
        (ensure-different '(1 2 3) '(1 3 4)))
    

    Besides running the test on compilation, we can also run the test using the (run-test function), specifying the test name with the keyword parameter :name.

      (run-test :name 't1)
    
    #<TF-LIFT.T1 passed>
    

    Now adding a basic failing test. Let's make sure we get a bit more information on failing tests by inserting a report keyword with a descriptive string with format like parameters and an arguments keyword with parameters to pass to the report keyword. Then we use describe against the results of running the test.

    (addtest t1-fail
             (let ((x 1) (y 2))
               (ensure (= x y)
                       :report "This test was meant to fail because ~a is not = ~a"
                       :arguments (x y))))
    
    (describe (run-test :name 't1-fail)))
    Test Report for TF-LIFT: 1 test run, 1 Failure.
    
    Failure: tf-lift : t1-fail
    Documentation: NIL
    Source       : NIL
    Condition    : Ensure failed: (= X
                                     Y) (This test was meant to fail because 1 is not = 2)
    During       : (END-TEST)
    Code         : (
                    ((LET ((X 1) (Y 2))
                          (ENSURE (= X Y) :REPORT
                                  "This test was meant to fail because ~a is not = ~a"
                                  :ARGUMENTS (X Y)))))
    
    Test Report for TF-LIFT: 1 test run, 1 Failure.
    

    As one would hope, you do not need to manually recompile a test just because a tested function is modified.

  3. Edge Cases: Multiple failing assertions, Values expressions, loops, closures and calling other tests
    1. Multiple assertions and Value expressions

      First checking a test with multiple assertions. The answer is yes and no, which surprised me.

        (addtest t2
          (ensure (= 1 2))
          (ensure (= 2 3)))
      
      (print-test-result-details *standard-output* (run-test :name 't2) t t)
      Failure: tf-lift : t2
        Documentation: NIL
        Source       : NIL
        Condition    : Ensure failed: (= 1 2) ()
        During       : (END-TEST)
        Code         : (
        ((ENSURE (= 1 2)) (ENSURE (= 2 3))))
      

      Obviously we expected the test to fail. But I expected two assertions to be shown as failing, not only the first one. I can understand that if the intent is to just throw me into the debugger on the first failure, but not in a reporting situation.

      Lift has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.

    2. Now looping and closures.

      Checking whether Lift can run tests using variables declared in a closure encompassing the test. Yes.

        (let ((l1 '(#\a #\B #\z))
                (l2 '(97 66 122)))
          (addtest t2-loop
            (loop for x in l1 for y in l2 do
              (ensure (= (char-code x) y)))))
      #<Test passed>
      
    3. Calling another test from a test

      We know tests are not functions in Lift, but can a test call another test in its body? We know test t2 should fail.

        (addtest t3
          (ensure (eql 'a 'a))
          (run-test :name 't2))
      
      (run-test :name 't3)
      #<TF-LIFT.T3 passed>
      

      It does not look like t3 actually called t2.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Lift cannot run lists of tests outside the suite functionality.

    2. Suites

      We already know that we had to set up a suite for tests, in our case a suite named tf-lift (print-tests)

      Unlike most test frameworks, lift actually provides a function which will print out the names of the tests included in the suite.

      (print-tests :start-at 'tf-lift)
      TF-LIFT (10)
        T1
        T1-FAIL
        T1-FUNCTION
        T2
        T2-VALUES
        T2-LOOP
        T2-WITH-MULTIPLE-VALUES
        T3
        T7-ERROR
        T7-BAD-ERROR
      

      Let's start with the same simple inheritance structure we have been using with other frameworks.

      (deftestsuite s0 () ())
      ;; a test that is a member of a suite because it is defined after a defsuite
      (addtest t4-s0-1
        (ensure-same 1 1))
      
      ;; Add another test suite
      (deftestsuite s1 () ())
      
      ;; add another test, but preface the name with s0 in the form, making this test part of suite s0
      (addtest (s0) t4-s0-2
        (ensure (= 1 2)))
      
      ;; add a test, specifying that this one is part of suite s1
      (addtest (s1) t4-s1-1
        (ensure (= 2 3)))
      
      ;; Now run tests for suite s0 and s1 respectively and we see that s0 does indeed have two tests and suite s1 has one test.
      (run-tests :suite 's0)
      Start: S0
      #<Results for S0 2 Tests, 1 Failure>
      
      (run-tests :suite 's1)
      Start: S1
      #<Results for S1 1 Test, 1 Failure>
      

      We now define suite s2 which is a child suite of s0 and add a test

      (deftestsuite s2 (s0) ())
      
      (addtest (s2) t4-s2-1
          (ensure (= 1 1))
        (ensure (eq 'a 'a)))
      

      If we now apply RUN-TESTS to suite s0, we see that it runs both s2 and s0

      (run-tests :suite 's0)
      Start: S0
      Start: S2
      #<Results for S0 3 Tests, 1 Failure>
      

      If we run it with a :report-pathname keyword parameter set, we can get a lot more information sent to a file:

      (run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
      

      top

  5. Fixtures and Freezing Data

    Variables can be created and set at the suite level, making those variables available down the suite inheritance chain.

    (deftestsuite s3 ()
      ((a 1) (b 2) (c 3)))
    
    (addtest t4-s3-1
             (ensure-same 1 a))
    
    (deftestsuite s4 (s3)
      ((d 4)))
    
    (addtest t4-s4-1
             (ensure-same 2 b)
             (ensure-same 4 d))
    
    (addtest (s3) t4-s3-2
             (ensure-same 2 b))
    

    These all pass because the tests can see the variables creates in their suite and the parent suites. If we created a test in suite s3 that tried to reference the variable d creates in suite s4 (a lower level suite), we would get an undefined variable error.

    Suites also have :setup and :takedown keyword parameters and an additional :run-setup parameter that controls when the setup provisions are performed. The default is :once-per-test-case (setup again for each and every test in the suite). The other alternatives are :once-per-suite and :never.

    (deftestsuite s5 ()()
      (:setup
        (setf db (open-data "bar" :if-exists :supersede)))
      (:teardown
        (setf db nil))
      (:run-setup :once-per-test-case))
    
  6. Removing tests

    Tests and suites can be removed using remove-tests

    (remove-test :suite 's2)
    
    (remove-test :test-case 't1)
    
  7. Sequencing, Random and Failure Only

    Do the tests in a suite run in sequential order, randomly or is it optional? Failure only testing (just running all the tests that failed last time) is nice to have, but of course you still need to be able to run everything at the end to ensure that fixing one bug did not create another.

  8. Skip Capability
    1. Tests

      The run-tests function takes a :skip-tests keyword parameter which accepts a list of test names to skip.

      The variable =*test-maximum-time* controls the number of seconds that a test can take before lift gives up. It defaults to 2 seconds.

  9. Random Data Generators

    Lift has various random data generators:

    ;; (random-number suite min max)
    (random-number 's4 1 100)
    
    ;; (random-element suite sequence)
    (random-element 's4 '(a b c 23))
    

    If anyone can give a good example of the use of the DEFRANDOM-INSTANCE macro besides what is in the random-testing file, feel free to submit a pull request.

21.4. Discussion

I really want to like lift, but I really have a hard time getting over the fact that it stops at the first assertion failure.

21.5. Who Uses Lift

Many libraries on quicklisp use Lift. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lift)

top

22. lisp-unit

top

22.1. Summary

homepage Thomas M. Hermann MIT 2017

Phil Gold's original concerns about lisp-unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities. Lisp-unit still focuses on counting assertions rather than tests, but you can now collect information on just the failing tests.

I generally like lisp-unit. It does not have progress reports which might bother some people. My bigger concern is its lack of fixtures and you can turn on debugging only for errors, not for failures. If you need floating point tests, those are built-in. Documentation can be found at the wiki. top

22.2. Assertion Functions

assert-eq assert-eql assert-equal
assert-equality assert-equalp assert-error
assert-expands assert-false assert-float-equal
assert-nil assert-norm-equal assert-number-equal
assert-numerical-equal assert-prints assert-rational-equal
assert-result assert-sigfig-equal assert-test
assert-true check-type logically-equal
set-equal    

top

22.3. Usage

  1. Report Format

    Lisp-unit defaults to a reporting format shown below. You can do (setf *use-debugger* :ask) or (setf *use-debugger* t), but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error). So, not complete debugger optionality.

    Lisp-unit will normally just count assertions passed, failed, and execution errors and report those. You will see in the first failing test examples how to get more information. You can hav also have it kick you into the debugger on errors by calling use-debugger. This only applies to errors and not failures.

    Calling run-tests will return an instance of a test-results-db object. You can get a list of failed test objects with the failed-tests function which also accepts an optional stream, allowing easy printing to a file:

    (failed-tests (run-tests :all) optional-stream)
    

    You can print the detailed failure information using the print-failures function:

    (print-failures (run-tests :all))
    

    Lisp-unit also has print and print-errors which also take an optional stream.

    If you like the TAP format, Lisp-unit also has (write-tap-to-file test-results path) and (write-tap test-results [stream]).

  2. Basics

    First, the basic test where we know everything is going to pass. Since Lisp-unit has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/floating-point.lisp and https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/rational.lisp for more information on the floating point and rational tests.

      (defmacro my-macro (arg1 arg2)
        (let ((g1 (gensym))
              (g2 (gensym)))
          `(let ((,g1 ,arg1)
                 (,g2 ,arg2))
             "Start"
             (+ ,g1 ,g2 3))))
    
    (define-test t1
      "describe t1"
      (assert-true (= 1 1))
      (assert-equal 1 1)
      (assert-float-equal 17 17.0000d0)
      (assert-rational-equal 3/2 3/2)
      (assert-true (set-equal '(a b c) '(b a c))) ;every element in both sets needs to be in the other
      (assert-true (logically-equal t t)) ; both true or both false
      (assert-true (logically-equal nil nil)) ; both true or both false
      (assert-expands
          (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
          (my-macro a b))
      (assert-prints "12" (format t "~a" 12))
      (assert-error 'division-by-zero
                    (error 'division-by-zero)
                    "testing condition assertions"))
    

    Now run this test:

    (run-tests '(t1))
    Unit Test Summary
     | 10 assertions total
     | 10 passed
     | 0 failed
     | 0 execution errors
     | 0 missing tests
    
    #<TEST-RESULTS-DB Total(6) Passed(6) Failed(0) Errors(0)>
    

    Now a basic failing test.

    (define-test t1-fail
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-true (= x y))
        (assert-equal x y)
        (assert-expands
            (let ((#:G1 D) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
            (my-macro a b))
        (assert-prints "12" (format nil "~a" 12))
        (assert-error 'division-by-zero
                      (error 'floating-point-overflow)
                      "testing condition assertions")))
    
    (run-tests '(t1-fail))
    Unit Test Summary
     | 5 assertions total
     | 0 passed
     | 5 failed
     | 0 execution errors
     | 0 missing tests
    

    That told us assertions failed, but did not give a lot of information. Let's change the setup slightly, setting *PRINT-FAILURES* to t. (You can also print info just on errors by setting *PRINT-FAILURES* to nil and *PRINT-ERRORS* to t.)

    (setf *print-failures* t)
    
    (run-tests '(t1-fail))
     | Failed Form: (ERROR 'FLOATING-POINT-OVERFLOW)
     | Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100E6D4D13}>
     | "testing condition assertions" => "testing condition assertions"
     |
     | Failed Form: (FORMAT NIL "~a" 12)
     | Should have printed "12" but saw ""
     |
     | Failed Form: (MY-MACRO A B)
     | Should have expanded to (LET ((#:G1 D) (#:G2 B))
                                 "Start"
                                 (+ #:G1 #:G2 3))
    but saw (LET ((#:G1 A) (#:G2 B))
              "Start"
              (+ #:G1 #:G2 3)); T
     |
     | Failed Form: Y
     | Expected 1 but saw 2
     |
     | Failed Form: (= X Y)
     | Expected T but saw NIL
     | X => 1
     | Y => 2
     |
    T1-FAIL: 0 assertions passed, 5 failed.
    
    Unit Test Summary
     | 5 assertions total
     | 0 passed
     | 5 failed
     | 0 execution errors
     | 0 missing tests
    
    #<TEST-RESULTS-DB Total(5) Passed(0) Failed(5) Errors(0)>
    

    That gives more information, but notice the slight difference between the information provided for assert-equal - Failed Form: Y and the information provided for assert-true - Failed Form: (= X Y).

    We can get still more if we pass more info to the assertion clause. While the assertions compares the following two items, we can pass more information that it will print on failures. Unlike some other frameworks, we cannot pass a diagnostic string which accepts interpolated variables, but we can pass a string and variables. This time we will reduce the test to just the assert-equal clause.

      (define-test t1-fail-short
          (let ((x 1) (y 2))
            (assert-equal x y "Diagnostic Message: X ~a should equal Y ~a" x y)))
    
    (run-tests '(t1-fail-short))
     | Failed Form: Y
     | Expected 1 but saw 2
     | "Diagnostic Message: X should equal Y" => "Diagnostic Message: X should equal Y"
     | X => 1
     | Y => 2
     |
    T1-FAIL: 0 assertions passed, 1 failed.
    
    Unit Test Summary
     | 1 assertions total
     | 0 passed
     | 1 failed
     | 0 execution errors
     | 0 missing tests
    

    Of course, the usefulness of the diagnostic message will depend on the context.

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Loops. closures and calling other tests
    1. Value expressions

      Lisp-Unit has a pleasant surprise with respect to values expressions. Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:

        (define-test t2-values-expressions
          (assert-equal (values 1 2) (values 1 3))
          (assert-equal (values 1 2 3) (values 1 3 2)))
      
      (print-failures (run-tests '(t2-values-expressions)))
      Unit Test Summary
       | 2 assertions total
       | 0 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
       | Failed Form: (VALUES 1 3 2)
       | Expected 1; 2; 3 but saw 1; 3; 2
       |
       | Failed Form: (VALUES 1 3)
       | Expected 1; 2 but saw 1; 3
       |
      T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
      
    2. Closure Variables

      Lisp-Unit will not see the variables declared in a closure surrounding the test function, so the following would fail.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (define-test t2-loop
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      
    3. Calling another test from a test

      While tests are not functions in lisp-unit, they can call other tests.

      (define-test t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-tests '(t2)))
      T3
      LISP-UNIT-EXAMPLES> (run-tests '(t3))
       | Failed Form: 3
       | Expected 2 but saw 3
       |
       | Failed Form: 2
       | Expected 1 but saw 2
       |
      T2: 1 assertions passed, 2 failed.
      
      Unit Test Summary
       | 3 assertions total
       | 1 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
      T3: 0 assertions passed, 0 failed.
      
      Unit Test Summary
       | 0 assertions total
       | 0 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      

      A bit of a surprise here. Test t3 does call test t2 but does not track its own assertion. If you reverse the order so that t3's assertions come after the call to run-tests on t2, then it does work properly. In neither case are the situations composed.

  4. Suites, tags and other multiple test abilities

    Lisp-unit uses both packages and tags rather than suites. That provides a bit more flexibility in terms of reusing tests in different situations, but does not create the automatic inheritance that some people like.

    1. Lists of tests

      Lisp-unit can run lists of tests

      (run-tests '(t1 t2))
      

      Lisp-unit makes it easy to get a list of the names of the failing tests which you can then save and run-tests against. Run-tests returns a test-results-db object. Just call failed-tests on that to get a list of the names of the tests that failed in that run. Then run-tests against that smaller list.

    2. Packages

      Assuming you have set up your tests in a separate package (that package can cover many application packages) and that test package is the current package, you can run all the tests in the current package with:

      (run-tests :all)
      

      The reporting can get confusing as in this sample run with *print-failures* set to nil:

      (run-tests :all)
      Diagnostic Message: X 1 should equal Y 2
      Unit Test Summary
       | 3 assertions total
       | 1 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
      Unit Test Summary
       | 13 assertions total
       | 6 passed
       | 7 failed
       | 0 execution errors
       | 0 missing tests
      
      #<TEST-RESULTS-DB Total(13) Passed(6) Failed(7) Errors(0)>
      

      Why do we have two Unit Test Summaries having different numbers of tests and assertions? If you recall, test t3 calls test t2 and that first summary is a secondary summary from that call.

      To run all the tests in a non-current package, add the name of the package after the keyword parameter :all

      (lisp-unit:run-tests :all :date-tests)
      

      You can list the names of all the tests in a package

      (list-tests [package])
      
    3. Tags

      As noted, lisp-unit provides the ability to define tests with multiple tags:

      (define-test foo
        "This is the documentation."
        (:tag :tag1 :tag2 symtag)
        exp1 exp2 ...)
      

      So assume three tests that we want tagged differently:

      (define-test t6-1
        "Test t6-1 tagged simple and complex"
        (:tag :simple :complex)
        (assert-true (= 1 1 1)))
      
        (define-test t6-2
        "Test t6-2 tagged simple only"
        (:tag :simple)
        (assert-equal 1 1))
      
        (define-test t6-3
        "Test t6-2 tagged simple only"
        (:tag :complex)
        (assert-equal 'a 'a))
      

      Then using run-tages does what we expect. We will set *print-summary* t for simplicity.

      (setf *print-summary* t)
      
      (run-tags '(:simple))
      T6-2: 1 assertions passed, 0 failed.
      
      T6-1: 1 assertions passed, 0 failed.
      
      Unit Test Summary
       | 2 assertions total
       | 2 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      
      #<TEST-RESULTS-DB Total(2) Passed(2) Failed(0) Errors(0)>
      
      LISP-UNIT> (run-tags '(:complex))
      T6-3: 1 assertions passed, 0 failed.
      
      T6-1: 1 assertions passed, 0 failed.
      
      Unit Test Summary
       | 2 assertions total
       | 2 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      

      Tags can be listed with (LIST-TAGS [PACKAGE]). TAGGED-TESTS returns the tests associated with the listed tags. All tagged tests are returned with no arguments or if the keyword :all is provided instead of a list of tags. Use *package* if package is not specified.

      (tagged-tests '(tag1 tag2 ...) [package])
      (tagged-tests :all [package])
      (tagged-tests)
      

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    Lisp-unit has both remove-tests and remove-tags functions.

  7. Sequencing, Random and Failure Only
  8. Skip Capability

    None

  9. Random Data Generators

    Lisp-unit2 has various functions for generating random data. See examples below:

    (complex-random #C(5 3))
    #C(4 1)
    
    (make-random-2d-array 2 3)
    #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425))
    
    (make-random-2d-list 2 3)
    ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146))
    
    (make-random-list 3)
    (0.5449568 0.32319236 0.7780224)
    
    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

22.4. Who Uses Lisp-Unit

Many libraries on quicklisp use Lisp-Unit. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lisp-unit)

top

23. lisp-unit2

top

23.1. Summary

homepage Russ Tyndall MIT 2018

I really like Lisp-Unit2. Phil Gold's original concerns about Lisp-Unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities.

Unlike the situation with Clunit and Clunit2 which are almost identical, Lisp-Unit and Lisp-Unit2 have definitely diverged over the years. Lisp-unit2 has fixtures and can run just the previously failing tests and you can turn on debugging for failures as well as errors. It does not have progress reports which might bother some people. If you need floating point tests, those are built-in.

It will report all the assertion failures in a test, gives the opportunity to provide user generated diagnostic messages in assertions and has a tags system that allows different ways to re-use tests not found in the typical hierarchical setup. It can re-run failed tests and interactive debugging is optional. I did find the fixture structure to be confusing.

23.2. Assertion Functions

Lisp-unit2 has more assertions than just about anyone. Using the correct assertions will increase speed and efficiency. Whether that matters will, of course, depend on your test case.

assert= assert/=  
asssert-char= assert-char-equal assert-char/=
assert-char-not-equal    
assert-eq assert-eql assert-equal
assert-equality assert-equalp assert-error
assert-expands assert-fail assert-false
assert-float-equal assert-no-error assert-no-signal
assert-no-warning assert-norm-equal assert-number-equal
assert-numerical-equal assert-passes? assert-prints
assert-rational-equal assert-sigfig-equal assert-signal
assert-string= assert-string-equal assert-string/=
assert-string-not-equal    
assert-true assert-typep assert-warning
assertion-fail assertion-pass check-type
logically-equal    

top

23.3. Usage

If you store the results of a test run, you can call rerun-failures on those results to just rerun the failing tests rather than go through all the tests again.

  1. Report Format

    You can choose to run one test or more than one test. When running more than one test, you can choose to have the summary provided at the end or have progress summary reports generated during the test run.

    As an example, the following will generate summary reports during a test run:

      (run-tests :package :uax-15-lisp-unit2-tests :run-contexts #'with-summary-context)
    
    
    ------- STARTING Testing: UAX-15-LISP-UNIT2-TESTS
    
    Starting: UAX-15-LISP-UNIT2-TESTS::PART0-NFKC
    UAX-15-LISP-UNIT2-TESTS::PART0-NFKC - PASSED (0.01s) : 100 assertions passed
    
    Starting: UAX-15-LISP-UNIT2-TESTS::PART1-NFKC
    UAX-15-LISP-UNIT2-TESTS::PART1-NFKC - PASSED (0.29s) : 68116 assertions passed
    
    .....
    
    Starting: UAX-15-LISP-UNIT2-TESTS::PART2-NFD
    UAX-15-LISP-UNIT2-TESTS::PART2-NFD - PASSED (0.04s) : 9220 assertions passed
    
    Starting: UAX-15-LISP-UNIT2-TESTS::PART3-NFD
    UAX-15-LISP-UNIT2-TESTS::PART3-NFD - PASSED (0.01s) : 880 assertions passed
    
    Test Summary for :UAX-15-LISP-UNIT2-TESTS (16 tests 1.08 sec)
      | 343332 assertions total
      | 343332 passed
      | 0 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    
    -------   ENDING Testing: UAX-15-LISP-UNIT2-TESTS
    

    Wrap it with print summary and Remove the internal call to with-summary-context will put the summary at the end:

      (print-summary (run-tests :package :uax-15-lisp-unit2-tests))
    UAX-15-LISP-UNIT2-TESTS::PART0-NFKC - PASSED (0.01s) : 100 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART1-NFKC - PASSED (0.18s) : 68116 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART2-NFKC - PASSED (0.04s) : 7376 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART3-NFKC - PASSED (0.01s) : 704 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART0-NFKD - PASSED (0.00s) : 100 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART1-NFKD - PASSED (0.16s) : 68116 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART2-NFKD - PASSED (0.04s) : 7376 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART3-NFKD - PASSED (0.00s) : 704 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART0-NFC - PASSED (0.01s) : 125 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART1-NFC - PASSED (0.22s) : 85145 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART2-NFC - PASSED (0.05s) : 9220 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART3-NFC - PASSED (0.01s) : 880 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART0-NFD - PASSED (0.01s) : 125 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART1-NFD - PASSED (0.18s) : 85145 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART2-NFD - PASSED (0.05s) : 9220 assertions passed
    UAX-15-LISP-UNIT2-TESTS::PART3-NFD - PASSED (0.01s) : 880 assertions passed
    
    Test Summary for :UAX-15-LISP-UNIT2-TESTS (16 tests 0.96 sec)
      | 343332 assertions total
      | 343332 passed
      | 0 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    #<TEST-RESULTS-DB Tests:(16) Passed:(343332) Failed:(0) Errors:(0) Warnings:(0) {10067D9AF3}>
    

    The variable *debugger-hook* is set by default to #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>. If you want to stay out of interactive debugging, set *debugger-hook* to nil.

    You could allow the system to put you in the debugger using the with-failure-debugging wrapper or provide that wrapper to the keyword parameter :run-contexts when call run-tests as in the following two examples

    (with-failure-debugging ()
      (run-tests :tests '(t7-bad-error)))
    
    (run-tests :tests 'tf-lisp-unit2::tf-find-str-in-list-t
               :run-contexts #'with-failure-debugging-context)
    
  2. Basics

    Tests in lisp-unit2 are functions. They are also compiled at the time of definition (so that any compile warnings or errors are immediately noticeable) and also before every run of the test (so that macro expansions are never out of date).

    The define-test macro takes a name parameter and a form specifying :tags, :contexts or :package before you get to the assertions.

    First, the basic test where we know everything is going to pass. Since Lisp-unit2 has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/AccelerationNet/lisp-unit2/blob/master/floating-point.lisp and https://github.com/AccelerationNet/lisp-unit2/blob/master/rational.lisp for more information on the floating point and rational tests including setting the epsilon values etc.

    (defmacro my-macro (arg1 arg2)
      (let ((g1 (gensym))
            (g2 (gensym)))
        `(let ((,g1 ,arg1)
               (,g2 ,arg2))
           "Start"
           (+ ,g1 ,g2 3))))
    
    (define-test t1
        (:tags '(tf-basic))
      (assert-true (=  1 1))
      (assert-eq 'a 'a)
      (assert-rational-equal 3/2 3/2)
      (assert-float-equal 17 17.0000d0)
      (assert-true (logically-equal t t)) ; both true or both false
      (assert-true (logically-equal nil nil)) ; both true or both false
      (assert-expands
          (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
          (my-macro a b))
      (assert-error 'division-by-zero
                    (error 'division-by-zero)
                    "testing condition assertions"))
    

    Now run this test. The keyword parameter :tests will accept a single test symbol or a list of tests. E.g.

      (run-tests :tests 't1)
    
      (run-tests :tests '(t1))
    
    #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006A22CE3}>
    

    Short and to the point. For a slightly different format you can use any of the following:

    (with-summary ()
      (run-tests :tests '(t1)))
    
    (print-summary
     (run-tests :tests '(t1)))
    
    (run-tests :run-contexts #'with-summary-context :tests '(t1))
    

    Both will provide something like the following:

    TF-LISP-UNIT2::T1 - PASSED (0.01s) : 6 assertions passed
    
    Test Summary for :TF-LISP-UNIT2 (1 tests 0.01 sec)
      | 8 assertions total
      | 8 passed
      | 0 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006E6EE53}>
    

    Now with a basic failing test. This time we will give the test a description string and first assertion gets a diagnostic string and the variables in question.

    (define-test t1-fail
                       (:tags '(tf-basic))
      "describe t1-fail"
      (let ((x 1))
      (assert-true (= x 2) )
      (assert-equal x 3)
      (assert-error 'division-by-zero
                    (error 'floating-point-overflow)
                    "testing condition assertions")
      (assert-true (logically-equal t nil)) ; both true or both false
      (assert-true (logically-equal nil t))))
    

    Now if we simply run the basic test, we get thrown into the debugger on the error assertion. If we hit continue, we are handed a test-results-db object. Why did we get thrown into the debugger rather than just fail? Because the variable *debugger-hook* is set by default to #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>. If you want to stay out of interactive debugging when errors get thrown, set *debugger-hook* to nil.

    (run-tests :tests '(t1-fail))
    #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {10063359D3}>
    

    If we use the print-summary wrapper, we still get through into the debugger on the error assertion, but assuming we hit continue, we get the report below and a test-results-db object.

    (print-summary (run-tests :tests '(t1-fail)))
    TF-LISP-UNIT2::T1-FAIL - ERRORS (5.33s) : 0 assertions passed
      | ERRORS (1)
      | ERROR: arithmetic error FLOATING-POINT-OVERFLOW signalled
      | #<FLOATING-POINT-OVERFLOW {100619DEE3}>
      |
      | FAILED (2)
      | Failed Form: (ASSERT-TRUE (= X 2)
      |                           "deliberate failure here because we know ~a is not equal to ~a"
      |                           X 2)
      | Expected T
      | but saw NIL
      | "deliberate failure here because we know ~a is not equal to ~a"
      | X => 1
      | 2
      | Failed Form: (ASSERT-EQUAL X 3)
      | Expected 1
      | but saw 3
      |
      |
    Test Summary for :TF-LISP-UNIT2 (1 tests 5.33 sec)
      | 2 assertions total
      | 0 passed
      | 2 failed
      | 1 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {1005F7D0E3}>
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Value Expressions, loops. closures and calling other tests
    1. Value expressions

      Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:

      (define-test t2-values-expressions
        (:tags '(tf-multiple))
        (assert-equal (values 1 2) (values 1 2 3))
        (assert-equal (values 1 2) (values 1 3))
        (assert-equal (values 1 2 3) (values 1 3 2)))
      #<FUNCTION T2-VALUES-EXPRESSIONS>
      LISP-UNIT2> (print-summary (run-tests :tests '(t2-values-expressions)))
      LISP-UNIT2::T2-VALUES-EXPRESSIONS - FAILED (0.00s) : 1 assertions passed
        | FAILED (2)
        | Failed Form: (ASSERT-EQUAL (VALUES 1 2) (VALUES 1 3))
        | Expected 1; 2
        | but saw 1; 3
        | Failed Form: (ASSERT-EQUAL (VALUES 1 2 3) (VALUES 1 3 2))
        | Expected 1; 2; 3
        | but saw 1; 3; 2
        |
        |
      Test Summary for :LISP-UNIT2 (1 tests 0.00 sec)
        | 3 assertions total
        | 1 passed
        | 2 failed
        | 0 execution errors
        | 0 warnings
        | 0 empty
        | 0 missing tests
      #<TEST-RESULTS-DB Tests:(1) Passed:(1) Failed:(2) Errors:(0) Warnings:(0) {10263B53C3}>
      
    2. Closures.

      Lisp-Unit2 will not see variables declared in a closure encompassing the test.The following will throw an error and drop you into the debugger.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (define-test t2-loop-closure
          (:tags '(tf-multiple tf-basic))
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      
    3. Calling another test from a test

      Since we know that tests are functions in lisp-unit2, we can just have test t3 call test t2 directly rather than indirectly running through the RUN-TESTS function.

      (define-test t3 ; a test that tries to call another test in its body
          (:tags '(tf-calling-other-tests))
                (assert-equal 'a 'a)
                (t2))
      
      (print-summary (run-tests :tests '(t3)))
      LISP-UNIT2-EXAMPLES::T3 - FAILED (0.00s) : 2 assertions passed
        | FAILED (2)
        | Failed Form: (ASSERT-EQUAL 1 2)
        | Expected 1
        | but saw 2
        | Failed Form: (ASSERT-EQUAL 2 3)
        | Expected 2
        | but saw 3
        |
        Test Summary for :LISP-UNIT2-EXAMPLES (1 tests 0.00 sec)
        | 4 assertions total
        | 2 passed
        | 2 failed
        | 0 execution errors
        | 0 warnings
        | 0 empty
        | 0 missing tests
      #<TEST-RESULTS-DB Tests:(1) Passed:(2) Failed:(2) Errors:(0) Warnings:(0) {1009F73183}>
      

      Unlike lisp-unit, everything works as expected and we actually got composed results.

  4. Suites, Tags, Packages and other multiple test abilities

    If run-tests is called without any keyword parameters, it will run all the tests in the current package. It accepts keyword parameters for :tests, :tags and :package.

    (lisp-unit2:run-tests &key tests tags package reintern-package)
    
    1. Lists of tests

      As previously stated, Lisp-unit2 can run lists of tests.

      (run-tests :tests  '(t1 t2))
      
    2. Tags

      As you would expect, you can run all the tests having a specific tag. In the following example we wrap run-tests in a call to print-summary in order to get useful results:

      (print-summary (run-tests :tags '(tf-basic)))
      
      (with-summary ()
                    (t1-fail)) ;; here we just call t1 as a function. We need WITH-SUMMARY or PRINT-SUMMARY to get results printed
      
      Starting: LISP-UNIT2::T1-FAIL
      LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed
        | FAILED (1)
        | Failed Form: (ASSERT-EQL 1 2)
        | Expected 1
        | but saw 2
        |
      

      Tags can be listed using LIST-TAGS.

  5. Fixtures and Contexts

    What we have been referring to as fixtures is called contexts in Lisp-Unit2.

  6. Writing Summary to file

    The (write-tap-to-file) macro takes input that will generate a report and writes it to a file in TAP format.

    We show the results in the TAP format for the successful t1 test and the deliberately failing t1-fail test.

    (write-tap-to-file (run-tests :tests 't1) #P "/tmp/lisp-unit2.tap")
    
    cat /tmp/lisp-unit2.tap
    TAP version 13
    1..1
    ok 1 LISP-UNIT2::T1 (0.00 s)
    
    (write-tap-to-file (run-tests :tests 't1-fail) #P "/tmp/lisp-unit2.tap")
    
    cat /tmp/lisp-unit2.tap
    TAP version 13
    1..1
    not ok 1 LISP-UNIT2::T1-FAIL (0.00 s)
        ---
         # FAILED (1)
         # Failed Form: (ASSERT-EQL 1 2)
         # Expected 1
         # but saw 2
         #
         #
        ...
    

    Or we can wrap a (with-open-file macro targetting lisp-unit2::*test-stream* to write any of the other formats to file.

    (with-open-file (*test-stream* #P "/tmp/lisp-unit2.summary"  :direction :output
                                                   :if-exists :supersede
                                                   :external-format :utf-8
                                                   :element-type :default)
      (print-summary (run-tests :tests 't1-fail)))
    
    cat /tmp/lisp-unit2.summary
    LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed
      | FAILED (1)
      | Failed Form: (ASSERT-EQL 1 2)
      | Expected 1
      | but saw 2
      |
      |
    Test Summary for :LISP-UNIT2 (1 tests 0.00 sec)
      | 1 assertions total
      | 0 passed
      | 1 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    

    No description for this link

    1. Fixtures and Freezing Data

      Lisp-unit2 refers to fixtures as "context". The following is an example from the source files of how to build up a context that can be used in a test.

      (defun meta-test-context (body-fn)
        (let ((lisp-unit2::*test-db* *example-db*)
              *debugger-hook*
              (lisp-unit2::*test-stream* (make-broadcast-stream)))
          (handler-bind
              ((warning #'muffle-warning))
            (funcall body-fn))))
      
      (defmacro with-meta-test-context (() &body body)
        `(meta-test-context
          (lambda () ,@body)))
      
      (define-test test-with-test-results (:tags '(meta-tests)
                                           :contexts #'meta-test-context)
        (let ( results )
          (lisp-unit2:with-test-signals-muffled ()
            (lisp-unit2:with-test-results (:collection-place results)
              (lisp-unit2:run-tests :tags 'warnings)
              (lisp-unit2:run-tests :tags 'examples)))
          ;; subtract-integer-test calls run-tests
          (assert-eql 3 (len results))
          (assert-typep 'lisp-unit2::test-results-db (first results))))
      
  7. Removing tests

    Lisp-unit2 has a uninstall-test function and a undefine-test macro.

  8. Sequencing, Random and Failure Only

    Tests are run in sequential order. There does not appear to be provision for only running the tests that failed last time.

  9. Skip Capability

    None noted

  10. Random Data Generators

    Lisp-unit2 has various functions for generating random data. See examples below:

    (complex-random #C(5 3))
    #C(4 1)
    
    (make-random-2d-array 2 3)
    #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425))
    
    (make-random-2d-list 2 3)
    ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146))
    
    (make-random-list 3)
    (0.5449568 0.32319236 0.7780224)
    
    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

23.4. Discussion

Lisp-Unit2 is very good. Atlas Engineering put it at the top of their list top

23.5. Who Uses Lift-Unit2

Many libraries on quicklisp use Lisp-Unit2. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lisp-unit2)

top

24. nst

24.1. Summary

homepage John Maraist LLGPL3 latest 2021

You get a sense of what NST is focused on when the README starts with fixtures and states that the criterion testing has its own DSL.

This focus on fixtures is further reinforced by the definition of a test (what I have been calling an assertion in the context of the other frameworks).:

(def-test NAME ( [ :group GROUP-NAME ]
                  [ :setup FORM ]
                  [ :cleanup FORM ]
                  [ :startup FORM ]
                  [ :finish FORM ]
                  [ :fixtures (FIXTURE FIXTURE ... FIXTURE) ]
                  [ :aspirational FLAG ]
                  [ :documentation STRING ] )
    criterion
 FORM ... FORM)

Obviously this is a framework intended for complexity. That brings two problems. The first is the learning curve for the DSL. The second is the overhead that the infrastructure brings with it. Look at the stacked-ranking-benchmarks. It is almost as bad as clunit in runtime and multiple orders of magnitude worse than anything is bytes consed and eval-calls.

If you do not require the ability to handle serious complexity in your tests, look elsewhere.

24.2. Assertion Functions

assert-criterion assert-eq assert-eql
assert-equal assert-equalp assert-non-nil
assert-not-eq assert-not-eql assert-not-equal
assert-not-equalp assert-null assert-zero

top

24.3. Usage

First some terminology. What I have been referring to as assertions, NST refers to as a test. What I have been referring to as a test, NST refers to as a test-group.

Second, NST has its own DSL that you need to understand in order to use it. It can obviously handle complex systems, but that comes at a learning curve cost and some things that I find easy in CL I do not find as easy in NST's criterion language (probably speaks to my limitations).

  1. Report Format

    The level of detail from reports is dependent on the verbosity setting. Valid commands are:

    (nst-cmd :set :verbose :silent)
    (nst-cmd :set :verbose :quiet)
    (nst-cmd :set :verbose :verbose)
    (nst-cmd :set :verbose :vverbose)
    (nst-cmd :set :verbose :trace)
    

    You can get a little more detail using the :detail parameter.

    (nst-cmd :detail [blank, package-name, group-name, or group-name *and* test-tname])
    

    To switch to interactive debug behavior the following commands are necessary:

    (nst-cmd :debug-on-error t)
    (nst-cmd :debug-on-fail t)
    
    (nst-cmd :debug) ;; will set both to t
    

    The README contains the following warning: "This behavior is less useful than it may seem; by the time the results of the test are examined for failure, the stack from the actual form evaluation will usually have been released. Still, this switch is useful for inspecting the environment in which a failing test was run."

    We will show the different reporting levels with the first two groups of examples.

  2. Basics

    NST requires that you have groups defined and each test must belong to a group. If you are defining the tests within the definition of a group, you do not need to specify the group. The empty form after the group name in the following example is used for fixtures

    1. All Passing Basic Test
      (def-test-group tf-basic-pass ()
        (def-test t1-1
          :true (= 1 1))
        (def-test t1-2
          :true (not (= 1 2)))
        (def-test t1-3
            (:eq 'a) 'a)
        (def-test t1-4
          :forms-eq (cadr '(a b c))
          (caddr '(a c b)))
        (def-test t1-5
            (:err :type division-by-zero)
          (error 'division-by-zero)))
      

      Now you need to run a mini command line to run tests and get a report. We will go from least verbose to most verbose.

      1. Silent
        (nst-cmd :set :verbose :silent)
        
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        (nst-cmd :run t1)
        Check T1 (group TF-BASIC) passed
        TOTAL: 1 of 1 passed (0 failed, 0 errors, 0 warnings)
        
      2. Quiet
        (nst-cmd :set :verbose :quiet)
        
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        - Executing test T1-1
        Check T1-1 (group TF-BASIC-PASS) passed.
        - Executing test T1-2
        Check T1-2 (group TF-BASIC-PASS) passed.
        - Executing test T1-3
        Check T1-3 (group TF-BASIC-PASS) passed.
        - Executing test T1-4
        Check T1-4 (group TF-BASIC-PASS) passed.
        - Executing test T1-5
        Check T1-5 (group TF-BASIC-PASS) passed.
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        
      3. VVerbose (verbose is generally the same as quiet)
        (nst-cmd :set :verbose :vverbose)
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        Starting run loop for #S(NST::GROUP-RECORD
                                 :NAME TF-BASIC-PASS
                                 :ANON-FIXTURE-FORMS NIL
                                 :ASPIRATIONAL NIL
                                 :GIVEN-FIXTURES NIL
                                 :DOCUMENTATION NIL
                                 :TESTS #<HASH-TABLE :TEST EQ :COUNT 5 {1008848E43}>
                                 :FIXTURES-SETUP-THUNK NIL
                                 :FIXTURES-CLEANUP-THUNK NIL
                                 :WITHFIXTURES-SETUP-THUNK NIL
                                 :WITHFIXTURES-CLEANUP-THUNK NIL
                                 :EACHTEST-SETUP-THUNK NIL
                                 :EACHTEST-CLEANUP-THUNK NIL
                                 :INCLUDE-GROUPS NIL)
        - Executing test T1-1
        Applying criterion :TRUE
        to (MULTIPLE-VALUE-LIST (= 1 1))
        Result at :TRUE is Check T1-1 (group TF-BASIC-PASS) passed.
        Check T1-1 (group TF-BASIC-PASS) passed.
        - Executing test T1-2
        Applying criterion :TRUE
        to (MULTIPLE-VALUE-LIST (NOT (= 1 2)))
        Result at :TRUE is Check T1-2 (group TF-BASIC-PASS) passed.
        Check T1-2 (group TF-BASIC-PASS) passed.
        - Executing test T1-3
        Applying criterion :EQ 'A
        to (MULTIPLE-VALUE-LIST 'A)
        Result at :EQ is Check T1-3 (group TF-BASIC-PASS) passed.
        Check T1-3 (group TF-BASIC-PASS) passed.
        - Executing test T1-4
        Applying criterion :FORMS-EQ
        to (LIST (CADR '(A B C)) (CADDR '(A C B)))
        Applying criterion :PREDICATE EQ
        to (LIST (CADR '(A B C)) (CADDR '(A C B)))
        Result at :PREDICATE is Check T1-4 (group TF-BASIC-PASS) passed.
        Result at :FORMS-EQ is Check T1-4 (group TF-BASIC-PASS) passed.
        Check T1-4 (group TF-BASIC-PASS) passed.
        - Executing test T1-5
        Applying criterion :ERR :TYPE DIVISION-BY-ZERO
        to (MULTIPLE-VALUE-LIST (ERROR 'DIVISION-BY-ZERO))
        Result at :ERR is Check T1-5 (group TF-BASIC-PASS) passed.
        Check T1-5 (group TF-BASIC-PASS) passed.
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        
    2. All Failing Basic Test

      Now a test where everything fails or creates an error. This time we define the test-group separately, then each test separately. As a result we will need to define the group in the test definition. You can insert a :documentation string into a test but it only gets printed at the vverbose level of verbosity.

      (def-test-group tf-basic-fail ())
      
      (def-test (t1-1 :group tf-basic-fail)
        :true (= 1 2))
      (def-test (t1-2 :group tf-basic-fail)
        :true (not (= 2 2)))
      (def-test (t1-3 :group tf-basic-fail)
          (:eq 'a) 'b)
      (def-test (t1-4 :group tf-basic-fail)
        :forms-eq (cadr '(a d c))
        (caddr '(a c b)))
      (def-test (t1-5 :group tf-basic-fail)
          (:err :type division-by-zero)
        (error 'floating-point-overflow))
      

      Again, going from least verbose to most verbose.

      1. Silent
        (nst-cmd :set :verbose :silent)
        
        (nst-cmd :run tf-basic-fail)
        Running group TF-BASIC-FAIL
        Group TF-BASIC-FAIL: 0 of 5 passed
        - Check T1-1 failed
        - Expected non-null, got: NIL
        - Check T1-2 failed
        - Expected non-null, got: NIL
        - Check T1-3 failed
        - Value B not eq to value of A
        - Check T1-4 failed
        - Predicate EQ fails for (D B)
        - Check T1-5 raised an error
        TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
        
      2. Quiet
        (nst-cmd :set :verbose :quiet)
        
        (nst-cmd :run tf-basic-fail)
        Running group TF-BASIC-FAIL
        - Executing test T1-1
        Check T1-1 (group TF-BASIC-FAIL) failed
        - Expected non-null, got: NIL
        - Executing test T1-2
        Check T1-2 (group TF-BASIC-FAIL) failed
        - Expected non-null, got: NIL
        - Executing test T1-3
        Check T1-3 (group TF-BASIC-FAIL) failed
        - Value B not eq to value of A
        - Executing test T1-4
        Check T1-4 (group TF-BASIC-FAIL) failed
        - Predicate EQ fails for (D B)
        - Executing test T1-5
        Check T1-5 (group TF-BASIC-FAIL) raised an error
        Group TF-BASIC-FAIL: 0 of 5 passed
        - Check T1-1 failed
        - Expected non-null, got: NIL
        - Check T1-2 failed
        - Expected non-null, got: NIL
        - Check T1-3 failed
        - Value B not eq to value of A
        - Check T1-4 failed
        - Predicate EQ fails for (D B)
        - Check T1-5 raised an error
        TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
        

        For the sake of brevity, we will skip verbose and vverbose and trace. You get the picture.

  3. Edge Cases: Value expressions, loops, closures and calling other tests
    1. Value expressions

      NST has a values criterion that looks at the results coming from a values expression individually. Otherwise it will only look at the first value. The following two versions pass.

      (def-test-group tf-basic-values ()
        (def-test (t1-1 :group tf-basic-values)
            (:equalp (values 1 2)) 1)
      
        (def-test (t1-2 :group tf-basic-values)
            (:values (:eql 1) (:eql 2)) (values 1 2)))
      
    2. Looping and closures.

      NST does not have a loop construct. What it does have is :each, which takes an optional item and a comparision test and applies it to a list. In the following examples, we

      (def-test-group tf-basic-each ())
      
      (def-test (each1 :group tf-basic-each)
          (:each (:symbol a))
        '(a a a a a))
      
      (def-test (each2 :group tf-basic-each)
          (:each (:apply write-to-string (:equal "A")))
        '(a a a a a))
      

      Like the clunits and lisp-units, nst does not look for variables in a closure surrounding the test defintion. The following will not work.

      (let ((lst '(a a a a a)))
        (def-test (each3 :group tf-basic-each)
            (:each (:apply write-to-string (:equal "A")))
          lst))
      

      When I tried calling other tests from inside an NST test, I triggered stack exhaustion errors. So probably do not do that.

  4. Suites, tags and other multiple test abilities

    You can define an nst-package which can contain nst-groups which can contain nst-tests. That seems to be the limit of nestability. So an nst-package is what I have been calling a suite when talking about other frameworks.

    top

  5. Fixtures and Freezing Data

    NST has fixtures. And fixtures. and… The following is just a simple example, please look at the documentation for more details. First we define three groups of fixtures. We intend to use the first one at the test-group level (all tests in that group have access) and the next two at individual test level (just the tests specifying that will have access). We will pretend the first two groups have an expensive calculation that we want to cache to avoid doing the calculation every time the fixture is called. The empty form after the fixtures group name takes a lot of different options and you will have to read the documentation for that..

    (def-fixtures suite-fix-1 ()
      (sf1-a '(1 2 3 4))
      ((:cache t) sf1-b (* 23 47)))
    
    (def-fixtures test-fix-1 ()
      (tf1-a '("a" "b" "c" "d"))
      ((:cache t) tf1-b (- 2 1)))
    
    (def-fixtures test-fix-2 ()
      (tf2-a "some boring string here"))
    

    Now we define a test-group that uses those fixtures. In test t3, we do not need to call out the suite level fixture in the list of fixtures to be accessed by that test.

    (nst:def-test-group tf-nst-fix (suite-fix-1)
      (def-test (t1 :group tf-nst-fix :fixtures (test-fix-1))
          (:eql b) 1)
      (def-test (t2 :group tf-nst-fix :fixtures (test-fix-1))
        (:each (:apply length (:equal 1)))
        a)
      (def-test (t3 :group tf-nst-fix :fixtures (test-fix-1 test-fix-2))
         (:equal "some boring string here-a1-1081")
         (format nil "~a-~a~a-~a" tf2-a (first tf1-a) tf1-b sf1-a)))
    
  6. Removing tests

    I do not see a function for removing tests.

  7. Sequencing, Random and Failure Only

    I did not see any shuffle functionality and the tests seem to run only in sequential order.

    There is a make-failure-report function, but I did not see something that looked like the ability to rerun just failing tests.

  8. Skip Capability

    NST seems to offer skips only in the context of running interactively and letting a condition handler in the debugger ask you if you want to skip the test-group or remaining tests.

  9. Random Data Generators

    NST has extensive random data generators. Please see the documentation for details.

24.4. Discussion

We have already seen in the benchmarking session that either I am doing something wrong with NST or it's infrastructure overhead creates speed issues. It can obviously handle very complex systems. That comes, however, at the cost of having to learn a new DSL and I found the criterion learning curve much steeper than I expected.

Take something as simple as a list of characters and integers and validating that the integer is the char-code for the character. First a plain CL version. There are a lot of different ways to do this in CL using every or loop or mapcar. Below is just one of those ways.

(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100)))

(every #'(lambda (x)
           (eq (char-code (first x)) (second x)))
       *test-lst*)

Now an NST version. There may be better ways to write this but as far as I can tell :apply does not accept a lambda function.

(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100)))

(defun test-char-code-sublist (lst)
  (eq (char-code (first lst))
      (second lst)))

(def-fixtures fixture1 ()
  (lst *test-lst*))

(nst:def-test-group tf-nst ()
  (def-test (t-char-codes :group tf-nst :fixtures (lst))
    (:each (:apply test-char-code-sublist (:eq t)))
    lst))

I find myself writing a lot more than I feel necessary in what seems like simple situations. A large part may be simply because I am not going to get far enough up the learning curve for NST given my needs. An article introducing NST claims:

[For simple examples] "the overhead of a separate criteria language seems hardly justifiable. In fact, the criteria bring two primary advantages over Lisp forms. First, criteria can report more detailed information than just pass or fail. In a larger application where the tested values are more complicated objects and structures, the reason for a test's failure may be more subtle. More informative reports can significantly assist the programmer, especially when validating changes to less familiar older or others' code. Moreover, NST's criteria can report multiple reasons for failure. Such more complicated analyses can reduce the boilerplate involved in writing tests; one test against a conjunctive criterion can provide as informative a result as a series of sep-arate tests on the same form. As a project grows larger and more complex, and as a team of programmers and testers becomes less intimately familiar with all of the components of a system, criteria can both reduce tests’ overall verbosity, while at the same time raising the usefulness of their results."

Unfortunately for me, many times in trying to learn the DSL, NST reported that it raised an error but refused to tell me what the error was, regardless of the level of verbosity I set.

Maybe it is just me, but every time I tried to redefine a test, I triggered a hash table error and needed to define a test with a new name.

24.5. Who Uses NST

25. parachute

top

25.1. Summary

homepage Yukari Hafner zlib 2022

It hits almost everything on my wish list - optionality on progress reports and debugging, good suite setup and reporting, good default error-reporting and the ability to provide diagnostic strings with variables, skip failing dependencies, set time limits on long running tests and has decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on (see Discussion). While it is not the fastest, it is in the pack as opposed to the also-rans.

The name of a test is coerced into a string internally, so test names are not functions that can be called on their own.

There are four types of reports: quiet, plain (the default), summary and interactive (throwing you into the debugger).

Parachute does allow you to set time limits for tests and will report the times for tests.

My wish list would be for there to be a built-in ability for tests to keep a list of the last tests that failed so that you could just run those over again after you think you have fixed all your bugs.

25.2. Assertion Functions

true false fail is isnt
is-values isnt-values of-type finish  

As with other frameworks, finish simply indicates that the test does not produce an error.

top

25.3. Usage

  1. Report Format

    Parachute provides four types of reports at the moment. Each returns a test-result object as well as printing a report to stream.

    • Quiet for when you just want the summary
    • Interactive for when you want to go into the debugger on failures, and
    • Plain (the default) for the nice progress report with checks and timing and
    • Summary (plain but without the progress report)

    So a basic failing test to show the differences in reporting. We are passing the third assertion a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.

    (define-test t1-fail
      (let ((x 1) (y 2))
        (is = 1 2)
        (is equal 1 2)
        (is =  x y "Intentional failure ~a does not equal ~a" x y)
         (fail (error 'floating-point-overflow)
           'division-by-zero)))
    

    Now the quiet report version:

    1. Quiet
      (test 't1-fail :report 'quiet)
      #<QUIET 5, FAILED results>
      
    2. Default "Plain" Report
      (test 't1-fail)
              ? TF-PARACHUTE::T1-FAIL
        0.000 ✘   (is = 1 2)
        0.000 ✘   (is equal 1 2)
        0.000 ✘   (is = x y)
        0.000 ✘   (fail (error 'floating-point-overflow) 'division-by-zero)
        0.010 ✘ TF-PARACHUTE::T1-FAIL
      
      ;; Summary:
      Passed:     0
      Failed:     4
      Skipped:    0
      
      ;; Failures:
         4/   4 tests failed in TF-PARACHUTE::T1-FAIL
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under =.
      
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under EQUAL.
      
      The test form   y
      evaluated to    2
      when            1
      was expected to be equal under =.
      Intentional failure 1 does not equal 2
      
      The test form   (capture-error (error 'floating-point-overflow))
      evaluated to    [floating-point-overflow] arithmetic error floating-point-overflow signalled
      when            division-by-zero
      was expected to be equal under TYPEP.
      
      #<PLAIN 5, FAILED results>
      

      If you start looking at summary reports for nested tests and notice that the number of test results is not greater than the number of assertions, just remember that a nested test is itself considered an assertion and will pass if all its assertions pass or fail if any of its assertions fail.

    3. Summary Report

      The difference between the summary report and the plain report is that the summary report suppresses the progress reports. See also parachute:largescale if a percentage report with the first five test failures (if any) is desired.

        (test 't1-fail :report 'summary)
      
      ;; Summary:
      Passed:     0
      Failed:     4
      Skipped:    0
      
      ;; Failures:
         4/   4 tests failed in UAX-15-PARACHUTE-TESTS::T1-FAIL
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under =.
      
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under EQUAL.
      
      The test form   y
      evaluated to    2
      when            1
      was expected to be equal under =.
      Intentional failure 1 does not equal 2
      
      The test form   (capture-error (error 'floating-point-overflow))
      evaluated to    [floating-point-overflow] arithmetic error floating-point-overflow signalled
      when            division-by-zero
      was expected to be equal under TYPEP.
      
      #<SUMMARY 5, FAILED results>
      
        (test 't1-fail :report 'largescale)
      
      Total:           5
      Passed:          0 (  0%)
      Failed:          5 (100%)
      Skipped:         0 (  0%)
      
      ;; Failures: (limited to 5)
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under =.
      
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under EQUAL.
      
      The test form   y
      evaluated to    2
      when            1
      was expected to be equal under =.
      Intentional failure 1 does not equal 2
      
      The test form   (capture-error (error 'floating-point-overflow))
      evaluated to    [floating-point-overflow] arithmetic error floating-point-overflow signalled
      when            division-by-zero
      was expected to be equal under TYPEP.
      
    4. Interactive

      The interactive report which throws you into the debugger:

      (test 't1-fail :report 'interactive)
        Test (is = 1 2) failed:
        The test form   2
        evaluated to    2
        when            1
        was expected to be equal under =.
           [Condition of type SIMPLE-ERROR]
      
        Restarts:
         0: [RETRY] Retry testing (is = 1 2)
         1: [ABORT] Continue, failing (is = 1 2)
         2: [CONTINUE] Continue, skipping (is = 1 2)
         3: [PASS] Continue, passing (is = 1 2)
         4: [RETRY] Retry testing TF-PARACHUTE::T1-FAIL
         5: [ABORT] Continue, failing TF-PARACHUTE::T1-FAIL
      
  2. Basics

    So lets look at a test where we know everything will pass, using the default report. This will give us a view of the syntax for various types of assertions.

    (define-test t1
      (true (= 1 1))
      (true "happy")
      (false (numberp "no"))
      (of-type integer 5)
      (of-type character #\space)
      (is = 1 1)
      (is equal "abc" "abc")
      (isnt equal "abc" "d")
      (is-values (values 1 2)
        (= (values 1 2)))
      (is-values (values 1 "a")
        (= 1)
        (string= "a"))
      (fail (error 'division-by-zero)
          'division-by-zero))
    
    (test 't1)
    
            ? TF-PARACHUTE::T1
      0.000 ✔   (true (= 1 1))
      0.000 ✔   (true "happy")
      0.000 ✔   (false (numberp "no"))
      0.000 ✔   (of-type integer 5)
      0.000 ✔   (of-type character #\ )
      0.000 ✔   (is = 1 1)
      0.000 ✔   (is equal "abc" "abc")
      0.000 ✔   (isnt equal "abc" "d")
      0.000 ✔   (is-values (values 1 2) (= (values 1 2)))
      0.000 ✔   (is-values (values 1 "a") (= 1) (string= "a"))
      0.000 ✔   (fail (error 'division-by-zero) 'division-by-zero)
      0.030 ✔ TF-PARACHUTE::T1
    
    ;; Summary:
    Passed:    11
    Failed:     0
    Skipped:    0
    #<PLAIN 12, PASSED results>
    

    As you would hope, changing tested functions does not require manually recompiling parachute tests. We will skip the proof.

  3. Edge Cases: Values expressions, loops. closures and calling other tests
    1. Values expressions

      Parachute has special functionality for dealing with values expressions with its is-values testing function as we saw just above. If you used another testing function and passed values expressions to it, they would be accepted but, as expected, Parachute would only look at the first value in the values expression.

    2. Now looping and closures

      Parachute has no problem with loops or finding variables that have been set in a closure containing the test.

        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
        (define-test t2-loop
          (loop for x in l1 for y in l2 do
            (true (= (char-code x) y)))))
      
      (test 't2-loop :report 'quiet)
      #<QUIET 4, PASSED results>
      
    3. Calling a test inside another test
      (define-test t3 ; a test that calls another test in its body
          (true (eql 'a 'a))
          (test 't2))
      
      (test 't3)
              ? TF-PARACHUTE::T3
        0.000 ✔   (true (eql 'a 'a))
              ?   TF-PARACHUTE::T2
        0.000 ✔     (true (= 1 1))
        0.000 ✘     (true (= 2 1))
        0.000 ✔     (is-values (values 1 2) (= (values 1 2)))
        0.000 ✔     (is-values (values 1 "a") (= 1) (string= "a"))
        0.000 ✘   TF-PARACHUTE::T2
      
      ;; Summary:
      Passed:     3
      Failed:     1
      Skipped:    0
      
      ;; Failures:
         1/   4 tests failed in TF-PARACHUTE::T2
      t2 description here
      The test form   (= 2 1)
      evaluated to    ()
      when            t
      was expected to be equal under GEQ.
      
        0.003 ✘ TF-PARACHUTE::T3
      
      ;; Summary:
      Passed:     1
      Failed:     1
      Skipped:    0
      
      ;; Failures:
         2/   3 tests failed in TF-PARACHUTE::T3
      Test for T2 failed.
      #<PLAIN 3, FAILED results>
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Parachute can handle lists of tests.

      (test '(t1 t2))
      
    2. Suites

      Everything is a test in parachute and tests can have parent tests just by adding a :parent <insert-name-here>. This makes suite inheritance easy as demonstrated below..

        (define-test s0)
      
        (define-test t4
          :parent s0
          (true (= 1 1))
          (false (= 1 2)))
      
      (define-test t4-1
        :parent t4
        (true (= 1 1))
        (false (= 1 2)))
      

      Now we can test 's0 and we will get the results for 't4 and t4-1

      (test 's0)
              ? TF-PARACHUTE::S0
              ?   TF-PARACHUTE::T4
        0.000 ✔     (true (= 1 1))
        0.000 ✔     (false (= 1 2))
              ?     TF-PARACHUTE::T4-1
        0.000 ✔       (true (= 1 1))
        0.000 ✔       (false (= 1 2))
        0.003 ✔     TF-PARACHUTE::T4-1
        0.010 ✔   TF-PARACHUTE::T4
        0.010 ✔ TF-PARACHUTE::S0
      
      ;; Summary:
      Passed:     4
      Failed:     0
      Skipped:    0
      #<PLAIN 7, PASSED results>
      

      top

  5. Fixtures and Freezing Data

    First checking whether we can freeze data, change it in the test, then change it back

    (defparameter *keep-this-data* 1)
    (define-test t-freeze-1
      :fix (*keep-this-data*)
      (setf *keep-this-data* "new")
      (true (stringp *keep-this-data*)))
    
    (define-test t-freeze-2
      (is = *keep-this-data* 1))
    
    (test '(t-freeze-1 t-freeze-2))
    

    Now the classic fixture - create a data set for some series of tests and clean it up afterwards

    ;; Create a class for data fixture purposes
    (defclass class-A ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (defparameter *some-existing-data-parameter*
      (make-instance 'class-A :a 17.3 :b -12))
    
    (define-fixture f1 ()
      (let ((old-parameter *some-existing-data-parameter*))
        (setf *some-existing-data-parameter*
            (make-instance 'class-A :a 100 :b -100))
        (&body)
        (setf *some-existing-data-parameter* old-parameter)))
    
    (def-test t6-f1 (:fixture f1)
      (is (equal (a *some-existing-data-parameter*) 100))
      (is (equal (b *some-existing-data-parameter*) -100)))
    
    ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter*
    
    (run! 't6-f1)
    
    Running test T6-F1 ..
     Did 2 checks.
        Pass: 2 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    

    See also define-fixture-capture and define-fixture-restore on symbols that name databases, and then annotating your tests with :fix (db-name). The only constraint right now is that fixtures are not inherited, meaning if you run the child test directly, it will not trigger the fixture.

      (defparameter *my-param* 1)
      (define-test parent-test-1 :fix '*my-param*
        (setf *my-param* 2)
        (is = 2 *my-param*))
    
      (define-test child-test-1 :parent parent-test-1
        (is = 2 *my-param*)
        (is eq 'a 'a))
    
    (test 'parent-test-1)
      ? TF-PARACHUTE::PARENT-TEST-1
    0.000 ✔   (is = 2 *my-param*)
    ?   TF-PARACHUTE::CHILD-TEST-1
    0.000 ✔     (is = 2 *my-param*)
    0.000 ✔     (is eq 'a 'a)
    0.030 ✔   TF-PARACHUTE::CHILD-TEST-1
    0.050 ✔ TF-PARACHUTE::PARENT-TEST-1
    
    ;; Summary:
    Passed:     3
    Failed:     0
    Skipped:    0
    #<TEST:PLAIN 5, PASSED results>
    
    1. Build Up and Tear Down Example

      Possibly more interesting is the following example which demonstrates building up and tearing down a test environment extending beyond the lisp environment. Suppose I want to create 3 databases that will be used in a suite of tests, then drop the databases at the end of the tests.

      First we need a predicate that takes a symbol and decides if that symbol meets some criteria.

      (defun names-global-database-p (symbol)
        (member symbol '(db1 db2 db3)))
      

      Now creating the functions that create and drop the database.

      (defun create-test-db (symbol)
        (with-test-connection
          (pomo:create-database symbol)))
      
      (defun drop-test-db (symbol)
        (with-test-connection
          (pomo:drop-database symbol)))
      

      Now we set up define-fixture-capture and define-fixture-restore. They will take a symbol, apply the predicate and, if t, create (or drop) a database where the name has some relation to the symbol. In this particular case, we do not need the value parameter for define-fixture-restore.

      (define-fixture-capture database (symbol)
        (when (names-global-database-p symbol)
          (values (create-test-db symbol) T)))
      
      (define-fixture-restore database (symbol db)
        (declare (ignore db))
        (when (names-global-database-p symbol)
          (drop-test-db symbol)))
      

      Now we can define the parent test of the suite:

      (define-test db-suite
      :fix (db1 db2 db3)
      (with-test-connection
        (true (pomo:database-exists-p 'db1))
        (true (pomo:database-exists-p 'db2))
        (true (pomo:database-exists-p 'db3))))
      

      And now a child of the parent:

      (define-test db-suite-child1
        :parent db-suite
        (with-test-connection
          (true (pomo:database-exists-p 'db1))
          (true (pomo:database-exists-p 'db2))
          (true (pomo:database-exists-p 'db3))))
      

      Now if we run (test 'db-suite), it will create the three databases, run the tests in the parent, then run the tests in the child, and then drop the databases:

                ? POSTMODERN-KITSCH-TESTS::DB-SUITE
        0.000 ✔   (true (database-exists-p 'db1))
        0.000 ✔   (true (database-exists-p 'db2))
        0.000 ✔   (true (database-exists-p 'db3))
              ?   POSTMODERN-KITSCH-TESTS::DB-SUITE-CHILD1
        0.000 ✔     (true (database-exists-p 'db1))
        0.000 ✔     (true (database-exists-p 'db2))
        0.000 ✔     (true (database-exists-p 'db3))
        0.030 ✔   POSTMODERN-KITSCH-TESTS::DB-SUITE-CHILD1
        0.480 ✔ POSTMODERN-KITSCH-TESTS::DB-SUITE
      
      ;; Summary:
      Passed:     6
      Failed:     0
      Skipped:    0
      #<PLAIN 8, PASSED results>
      

      We can now validate outside the db-suite of tests that all the databases have been dropped:

        (with-test-connection (pomo:database-exists-p 'db1))
      NIL
      1
      

      And they have.

    2. Wrapping around Multiple Tests

      You can wrap a fixture around multiple tests. Consider the following:

          (defclass class-A ()
          ((a :initarg :a :initform 0 :accessor a)
           (b :initarg :b :initform 0 :accessor b)))
      
        (defparameter *some-existing-data-parameter*
          (make-instance 'class-A :a 17.3 :b -12))
      
        (with-fixtures '(*some-existing-data-parameter*)
                                (define-test t-fix1
                                  (is = -12 (b *some-existing-data-parameter*))
                                  (is = 17.3 (a *some-existing-data-parameter*)))
                                (define-test t-fix2
                                  (is = 17.3 (a *some-existing-data-parameter*))
                                  (is = -12 (b *some-existing-data-parameter*))))
      
        (test 't-fix1)
              ? UAX-15-PARACHUTE-TESTS::T-FIX1
        0.000 ✔   (is = -12 (b *some-existing-data-parameter*))
        0.000 ✔   (is = 17.3 (a *some-existing-data-parameter*))
        0.007 ✔ UAX-15-PARACHUTE-TESTS::T-FIX1
      
      ;; Summary:
      Passed:     2
      Failed:     0
      Skipped:    0
      #<PLAIN 3, PASSED results>
      
        (test 't-fix2)
              ? UAX-15-PARACHUTE-TESTS::T-FIX2
        0.000 ✔   (is = 17.3 (a *some-existing-data-parameter*))
        0.000 ✔   (is = -12 (b *some-existing-data-parameter*))
        0.007 ✔ UAX-15-PARACHUTE-TESTS::T-FIX2
      
      ;; Summary:
      Passed:     2
      Failed:     0
      Skipped:    0
      #<PLAIN 3, PASSED results>
      
        (with-fixtures '(*some-existing-data-parameter*)
                                (define-test t-parent-fix3
                                  (is = -12 (b *some-existing-data-parameter*))
                                  (is = 17.3 (a *some-existing-data-parameter*)))
                                (define-test t-child-fix3 :parent t-parent-fix3
                                  (is = 12 (abs (b *some-existing-data-parameter*)))))
      
        (test 't-parent-fix3)
              ? UAX-15-PARACHUTE-TESTS::T-PARENT-FIX3
        0.000 ✔   (is = -12 (b *some-existing-data-parameter*))
        0.000 ✔   (is = 17.3 (a *some-existing-data-parameter*))
              ?   UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3
        0.000 ✔     (is = 12 (abs (b *some-existing-data-parameter*)))
        0.003 ✔   UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3
        0.010 ✔ UAX-15-PARACHUTE-TESTS::T-PARENT-FIX3
      
      ;; Summary:
      Passed:     3
      Failed:     0
      Skipped:    0
      #<PLAIN 5, PASSED results>
      
      (test 't-child-fix3)
              ? UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3
        0.000 ✔   (is = 12 (abs (b *some-existing-data-parameter*)))
        0.000 ✔ UAX-15-PARACHUTE-TESTS::T-CHILD-FIX3
      
      ;; Summary:
      Passed:     1
      Failed:     0
      Skipped:    0
      #<PLAIN 2, PASSED results>
      

      You can also nest with-fixtures:

        (defparameter *some-int* 3)
      
        (with-fixtures '(*some-int*)
          (with-fixtures '(*some-existing-data-parameter*)
            (define-test t-parent-fix5
              (is = -12 (b *some-existing-data-parameter*))
              (is = 3 *some-int*))
            (define-test t-child-fix5 :parent t-parent-fix5
                         (is = 12
                             (abs (b *some-existing-data-parameter*)))
                         (is = 3 *some-int*))))
      
      (test 't-child-fix5)
              ? UAX-15-PARACHUTE-TESTS::T-CHILD-FIX5
        0.000 ✔   (is = 12 (abs (b *some-existing-data-parameter*)))
        0.000 ✔   (is = 3 *some-int*)
        0.007 ✔ UAX-15-PARACHUTE-TESTS::T-CHILD-FIX5
      
      ;; Summary:
      Passed:     2
      Failed:     0
      Skipped:    0
      #<PLAIN 3, PASSED results>
      
  6. Removing tests

    Parachute can remove specific tests with remove-test or all the tests in a package with remove-all-tests-in-package.

    (remove-test 't1)
    
    (remove-all-tests-in-package optional-package-name)
    
  7. Sequencing, Random, Dependencies and Failure Only

    While tests normally follow a sequential order, parachute allows you to specify to either shuffle the assertions in a test or shuffle the tests within a suite by setting :serial to NIL.

      (define-test shuffle
        :serial NIL
        ...)
    
    (define-test shuffle-suite :serial NIL)
    

    You can also shuffle the tests within a test using with-shuffling.

    (define-test random-2
        (let ((a 0))
          (with-shuffling
              (is = 1 (incf a))
            (is = 2 (incf a))
            (is = 3 (incf a)))))
    

    You can specify that running a test depending on the success of another test. In the following example, test3 will only run if test1 is successful and test2 is not successful.

    (define-test test3
    :depends-on (:and test1 (:not test2))
    (of-type number (+ 2 3)))
    
  8. Skip Capability

    Parachute has multiple skip abilities including skipping based on assertions, tests or implementations.

    1. Assertions
      (define-test stuff
        (true :pass)
        (skip "Not ready yet"
          (is = 5 (some-unimplemented-function 10))))
      
    2. Tests
      (define-test suite
        :skip (test-a))
      
    3. Implementation
      (define-test stuff
        (skip-on (clisp) "Not supported on clisp."
          (is equal #p"a/b/" (merge-pathnames "b/" "a/"))))
      
  9. Random Data Generators

    None, but the helper libraries can fullfill this well.

25.4. Discussion

It is certainly possible to extend parachute to retest just the tests that failed last time. Since parachute can run against a list of tests, all you need is a function to save a list of the names of the tests that fail. The following might be one way to do that.

(defun collect-test-failure-names (test-results)
  "This function takes the report output of a parachute test and returns a list of the
   names of the tests that failed."
  (when (typep test-results 'parachute:report)
    (loop for test-result across (results test-results)
          when (and (typep test-result 'parachute::test-result)
                    (eq (status test-result) :failed))
            collect (name (expression test-result)))))

You might also consider whether the helper library Protest would be a good add-on as it has a Parachute module.

Since I questioned what is meant by extensibility at the very beginning of this report, allow me to quote the Parachute documentation:

"Extending Parachute Test and Result Evaluation

"Parachute follows its own evaluation semantics in order to run tests. Primarily this means that most everything goes through one central function called eval-in-context. This functions allows you to customise evaluation based on both what the context is, and what the object being "evaluated" is.

Usually the context is a report object, but other situations might also be conceived. Either way, it is your responsibility to add methods to this function when you add a new result type, some kind of test subclass, or a new report type that you want to customise according to your desired behaviour.

The evaluation of results is decoupled from the context and reports in the sense that their behaviour does not, by default, depend on it. At the most basic, the result class defines a single :around method that takes care of recording the duration of the test evaluation, setting a default status after finishing without errors, and skipping evaluation if the status is already set to something other than :unknown.

Next we have a result object that is interesting for anything that actually produces direct test results– value-result. Upon evaluation, if the value slot is not yet bound, it calls its body function and stores the return value thereof in the value slot.

However, the result type that is actually used for all standard test forms is the comparison-result. This also takes a comparator function and an expected result to compare against upon completion of the test. If the results match, then the test status is set to :passed, otherwise to :failed.

Since Parachute allows for a hierarchy in your tests, there have to be aggregate results as well, and indeed there are. Two of them, actually. First is the base case, namely parent-result which does two things on evaluation: one, it binds *parent* to itself to allow other results to register themselves upon construction, and two it sets its status to :failed if any of the children have failed.

Finally we have the test-result which takes care of properly evaluating an actual test object. What this means is to evaluate all dependencies before anything else happens, and to check the time limit after everything else has happened. If the time limit has exceeded, set the description accordingly and mark the result as :failed. For its main eval-in-context method however it checks whether any of the dependencies have failed, and if so, mark itself as :skipped. Otherwise it calls eval-in-context on the actual test object.

The default evaluation procedure for a test itself is to simply call all the functions in the tests list in a with-fixtures environment.

And that describes the semantics of default test procedures. Actual test forms like is are created through macros that emit an (eval-in-context context (make-instance 'comparison-result #|…|#)) form. The *context* object is automatically bound to the context object on call of eval-in-context and thus always refers to the current context object. This allows results to be evaluated even from within opaque parts like user-defined functions.

Report Generation

"It should be possible to get any kind of reporting behaviour you want by adding methods that specialise on your report object to eval-in-context. For the simple case where you want something that prints to the REPL but has a different style than the preset plain report, you can simply subclass that and specialise on the report-on and summarize functions that then produce the output you want.

Since you can control pretty much every aspect of evaluation rather closely, very different behaviours and recovery mechanisms are also possible to achieve. One final aspect to note is result-for-testable, which should return an appropriate result object for the given testable. This should only return fresh result objects if no result is already known for the testable in the given context. The standard tests provide for this, however they only ever return a standard test-result instance. If you need to customise the behaviour of the evaluation for that part, it would be a wise idea to subclass test-result and make sure to return instances thereof from result-for-testable for your report.

Finally it should be noted that if you happen to create new result types that you might want to run using the default reports, you should add methods to format-result that specialise on the keywords :oneline and :extensive for the type. These should return a string containing an appropriate description of the test in one line or extensively, respectively. This will allow you to customise how things look to some degree without having to create a new report object entirely." top

25.5. Who uses parachute

The following list is just pulling the results (ql:who-depends-on :parachute) and adding urls to a few of them. ("3b-hdr" 3d-matrices "3d-vectors" array-utils atomics "binpack" "canonicalized-initargs" cesdi "cl-elastic" "cl-markless" "class-options" "classowary" colored com-on "compatible-metaclasses" "definitions-systems" "enhanced-boolean" "enhanced-defclass" "enhanced-find-class" "enhanced-typep" "evaled-when" "fakenil" "first-time-value" float-features "inheriting-readers" "its" "method-hooks" mmap "nyaml/test" "object-class" "origin.test" pathname-utils "protest/parachute" "radiance" "shared-preferences" "shasht/test" "simple-guess" "slot-extra-options" "trivial-custom-debugger/test" "trivial-jumptables" uax-14 uax-9 "with-output-to-stream" "with-shadowed-bindings")

26. prove

top

26.1. Summary

homepage Eitaro Fukamachi MIT 2020

As most readers will know, the author has archived Prove in favor of Rove. Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.

Prove does report all assertion failures in a test and allows user generated diagnostic messages, albeit without the ability to provide variables. Interactive debugging is optional and it does have a *default-slow-threshold* parameter which defaults to 150 milleconds to handle slow tests.

On the downside, it does not have fixtures and I find the situation with suites and tags to be confusing. I really want to be able to turn off progress reports. Finally, it is somewhat slower than most of the frameworks, but not the orders of magnitude slower that you are faced with using clunit, clunit2 or nst.

26.2. Assertion Functions

ok is isnt is-values is-type like
is-print is-error is-expand pass fail skip

top

26.3. Usage

  1. Report Format

    Set prove:*debug-on-error* T for invoking CL debugger whenever getting an error during running tests.

    Prove has four different reporters (:list, :dot, :tap or :fiveam) with :list being the default) for different formatting. Set prove:*default-reporter* to the desired reporter to change the format. Lets take a very basic failing test just to see what the syntax and output looks like.

    We have inserted diagnostic strings into the first two assertions. The second string has a format directive just to show that prove does not use them in these tests.

    (deftest t1-fail
      (let ((x 1) (y 2))
       (ok (= 1 2) "We know 1 is not = 2")
       (is 3 4 "We know 3 is not ~a" 4)
       (ok (equal 1 2))
       (ok (equal x y))))
    

    If we now use run-test:

    1. List Reporter
      (run-test 't1-fail)
       T1-FAIL
          × We know 1 is not  = 2
            NIL is expected to be T
      
          × We know 3 is not ~a
            3 is expected to be 4
      
          × NIL is expected to be T
      
          × NIL is expected to be T
      
    2. TAP Reporter
      (setf *default-reporter* :tap)
       (run-test 't1-fail)
      # T1-FAIL
          not ok 1 - We know 1 is not  = 2
          #    got: NIL
          #    expected: T
          not ok 2 - We know 3 is not ~a
          #    got: 3
          #    expected: 4
          not ok 3
          #    got: NIL
          #    expected: T
          not ok 4
          #    got: NIL
          #    expected: T
      not ok 4 - T1-FAIL
      NIL
      
    3. DOT Reporter
      (setf *default-reporter* :dot)
      (run-test 't1-fail)
      .
      NIL
      
    4. Fiveam Reporter
      (setf *default-reporter* :fiveam)
      (run-test 't1-fail)
      f
      #\f
      

      We will stick with the default reporter for the rest of this section.

  2. Basics

    Prove has a limited number of test functions so we can check them all out in a single test that we know will pass. We will skip the diagnostic strings except for the like function since we know everything will pass.

    (deftest t1
      (let ((x 1) (y 2))
        (ok (= x 1))
        (is #(1 2 3) #(1 2 3) :test #'equalp)
        (isnt y 3)
        (is-values (values 1 2) '(1 2))
        (is-type #(1 2 3) 'simple-vector)
        (like "su9" "\\d" "Do we have a digit in the tested string?")
        (is-print (princ "jabberwok") "jabberwok")
        (is-error (error 'division-by-zero) 'division-by-zero)))
    

    The like test function uses regex for cl-ppcre. The default list reporter will turn this around as:

    (run-test 't1)
     T1
        ✓ T is expected to be T
    
        ✓ #(1 2 3) is expected to be #(1 2 3)
    
        ✓ 2 is not expected to be 3
    
        ✓ (1 2) is expected to be (1 2)
    
        ✓ #(1 2 3) is expected to be a type of SIMPLE-VECTOR
    
        ✓ Do we have a digit in the tested string?
    
        ✓ (PRINC "jabberwok") is expected to output "jabberwok" (got "jabberwok")
    
        ✓ (ERROR 'DIVISION-BY-ZERO) is expected to raise a condition DIVISION-BY-ZERO (got #<DIVISION-BY-ZERO {10027D1673}>)
    NIL
    

    As you would hope, changing tested functions does not require manually recompiling prove tests. We will skip the proof.

  3. Edge Cases: Values expressions, loops. closures and calling other tests
    1. Value expressions

      Similar to NST and Parachute, Prove does have special functionality with respect to values expressions and can look at the individual values coming from a values expression.

      (deftest t2-values
        (is-values (values 1 2) '(1 2)) ; passes
        (is-values (values 1 2) '(1 3)) ; fails
        (ok (equalp (values 1 2) (values 1 3)))) ; passes
      
    2. Looping and closures.

      Prove has no problems with looping and taking variables declared in a closure surrounding the test. The following passes.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop
          (loop for x in l1 for y in l2 do
            (ok (= (char-code x) y)))))
      
    3. Calling other tests

      Prove has a subtest macro which is intended to allow tests to be nested. So, for example, I compile the following and I get a nice indented report.

      (subtest "sub-1"
        (is 1 1)
        (is 1 2)
        (subtest "sub-2"
        (is 'a 'a)
        (is 'a 'b)))
       sub-1
          ✓ 1 is expected to be 1
      
          × 1 is expected to be 2
      
         sub-2
            ✓ A is expected to be A
      
            × A is expected to be B
      
      NIL
      
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Prove does not provide a way to run against lists of tests.

    2. Suites

      In spite of the fact that all my examples above are really done in the REPL, prove is at heart based on files of tests. So even without looking for "suite" functions or classes, each file is effectively a suite. Each package is also considered a suite and the macro subtest also creates a suite.

      Prove provides multiple functions to run different sets of tests.

      • run runs a test which can be a file pathname, a directory pathname or an asdf system name.
      • run-test runs a single test as we have been using above.
      • run-test-all runs all the tests in the current package
      • run-test-package runs all the tests in a specific package
      • run-test-system runs a testing ASDF system.

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    Prove has remove-test and remove-test-all

  7. Sequencing, Random and Failure Only

    Prove tests run sequentially and I do not see any shuffle or random order functionality. I also do not see a way to collect just the failing tests to be able to rerun just those.

  8. Skip Capability

    Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped. You can provide a string as to why they were skipped, but why mark them as passed? In fact, why do you need to be counting tests? You should be able to mark particular tests as skip.

    (skip 3 "No need to test these on Mac OS X")
    ;->  ✓ No need to test these on Mac OS X (Skipped)
    ;    ✓ No need to test these on Mac OS X (Skipped)
    ;    ✓ No need to test these on Mac OS X (Skipped)
    
  9. Random Data Generators

26.4. Discussion

Prove has a lot of "market share", but I am not sure how much of that is due to cl-project and some of the other libraries by Eitaro Fukamachi like caveman2 and clack that hard code prove into what you are building. Whether you like prove or not, at least it was an attempt to get people to actually test their code.

In spite of the fact that the author has archived prove and stated that rove is now the successor, his libraries have not moved over to rove and prove still has functionality lacking in rove (and vice versa).

If I were to use prove, I would write another test reporter that did not have progress reports and would return a list of just failing tests. I would still have to write my own fixture macros. Or I could just use a framework that does that.

top

26.5. Who Uses Prove?

Many libraries on quicklisp use prove. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :prove)

top

27. ptester

top

27.1. Summary

homepage Kevin Layer LLGPL 2016

Ptester was released by Allegro. Phil Gold's commentary on ptester in his 2007 blog is still relevant today. "Ptester is barely a test framework. It has no test suites and no test functions. All it provides is a set of macros for checking function results (test (analogous to lisp-unit:assert-equal), test-error, test-no-error, test-warning, and test-no-warning) and a wrapper macro designed to enclose the test clauses which merely provides a count of success and failures at the end. ptester expects that all testing is done in predefined functions and lacks the dynamic approach present in other frameworks."

Yes, you can do testing with it, but you can do much better with other frameworks.

27.2. Assertion Functions

None - normal CL predicates resolving to T or nil

top

27.3. Usage

  1. Report Format

    Reporting or interactivity is optional. Set *break-on-test-failures* if you want to go into interactive debugging when a test failure occurs.

  2. Basics

    The test macro by default applies eql to the subsequent arguments. This can be changed by specifying the actual test to use. The following including assertions about errors and warnings. The one item that might need a little explanation is the values test where we can explicitly flag to the test that it needs to be looking at multiple values.

      (with-tests (:name "t1")
        (test 1 1)
        (test 'a 'a)
        (test "ptester" "ptester" :test 'equal)
        (test  '(a b c) (values 'a 'b 'c) :multiple-values t)
        (test-error (error 'division-by-zero) :condition-type 'division-by-zero)
        (test-warning (warn "foo")))
    
    Begin t1 test
    **********************************
    End t1 test
    Errors detected in this test: 0
    Successes this test:6
    

    Now with a deliberately failing test. No, you cannot compare two values expressions with each other.

      (with-tests (:name "t2")
        (let ((x 2) (y 'd))
          (test x 1)
          (test y 'a)
          (test "ptester" "ptester" :test 'equal)
          (test  '(values 'a 'b 'c) (values 'a 'b 'c) :multiple-values t)
          (test-error (error 'division-by-zero) :condition-type 'floating-point-overflow)
          (test-warning (warn "foo"))))
    Begin t2 test
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: 1
      wanted: 2
         got: 1
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: 'A
      wanted: D
         got: A
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: (VALUES 'A 'B 'C)
      wanted values: VALUES, 'A, 'B, 'C
         got values: A, B, C
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: (ERROR 'DIVISION-BY-ZERO)
    Reason: detected an incorrect condition type.
      wanted: FLOATING-POINT-OVERFLOW
         got: #<SB-PCL::CONDITION-CLASS COMMON-LISP:DIVISION-BY-ZERO>
    **********************************
    End t2 test
    Errors detected in this test: 4 UNEXPECTED: 4
    Successes this test:2
    
  3. Edge Cases: Closures and calling other tests

    Ptester has no problem dealing with variables declared in a closure encompassing the test or with loops..

    Since ptester does not have a callable test "instance", a ptester test cannot call another test.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      None except using with-tests

    2. Suites

      None except using with-tests

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    None

  8. Skip Capability

    None

  9. Random Data Generators

    None

27.4. Discussion

You can do better with other frameworks.

top

27.5. Who depends on ptester?

("cl-base64-tests" "getopt-tests" "puri-tests")

top

28. rove

28.1. Summary

homepage Eitaro Fukamachi BSD 3 Clause 2020

If you use package-inferred systems, there may be more capabilities than if you do not. Without a package-inferred system, you get no consolidated summary of all the tests and you may have to write your tests differently than if you have a package-inferred system. It does have fixtures that can be used once per package or once per test, but there is no ability to use different fixtures with respect to different tests and no composable fixtures. In addition,signal testing seems incomplete compared to other frameworks.

As noted in the functionality tables, there have been reports that rove crashes with multithreaded results. See https://tychoish.com/post/programming-in-the-common-lisp-ecosystem/: "rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."

Rove is sensitive to how you write the tests, at least if compiled with BSCL. The first way I wrote the tests triggered memory errors I could not recover from (but only in SBCL, not with CCL). The second way I wrote the tests worked perfectly fine. Your mileage may vary.

Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.

Given the multithreaded concerns, the issue I had with benchmarking and the missing functionality both with respect to non-package-inferred systems and in comparison to Prove, I cannot recommend Rove.

28.2. Assertion Functions

As mentioned above, Rove does not have as many assertion functions as Prove, the library it is supposed to be replacing. The assertion functions are limited to:

ok ng (not-good?) signals outputs expands pass fail

28.3. Usage

  1. Report Format

    It has three different styles of reporting. The default style is the detailed :spec style. A simpler style that just shows dot progression is :dot and a style that just reports the result is :none. Turning off progress reporting is using the :none style. We show them all in the first basic passing test.

    To go interactive rather than reporting (setf rove:*debug-on-error* t).

  2. Basics

    Starting off with a basic multiple passing assertion test. We have added a macro and an assertion that uses the expands capability provided by Rove. I admit to not being entirely clear why deftest and testing are separate macros. Adding the testing macro allows a description string, but I am not seeing other additional functionality. Can anyone hit me with a clue stick?

    (defmacro defun-addn (n)
      (let ((m (gensym "m")))
        `(defun ,(intern (format nil "ADD~A" n)) (,m)
           (+ ,m ,n))))
    
    (deftest t1
      (testing "Basic passing test"
        (ok (equal 1 1))
        (ok (signals (error 'division-by-zero) 'division-by-zero))
        (ng (equal 1 2))
        (ok (expands '(defun-addn 10)
                 `(defun add10 (#:m)
                    (+ #:m 10))))))
    
    (rove:run-test 't1)
    t1
      Basic passing test
        ✓ Expect (EQUAL 1 1) to be true.
        ✓ Expect (ERROR 'DIVISION-BY-ZERO) to signal DIVISION-BY-ZERO.
        ✓ Expect (EQUAL 1 2) to be false.
        ✓ Expect '(DEFUN-ADDN 10) to be expanded to `(DEFUN ADD10 (#:M) (+ #:M 10)).
    
    ✓ 1 test completed
    T
    

    You can add a :compile-at keyword parameter to deftest. The available options are :definition-time (the default) or :run-time.

    You can add a :style keyword parameter to run-test to get different formats. The above was the default :spec style. Below we show the :dot and :none styles.

    (rove:run-test 't1 :style :dot)
    ....
    
    ✓ 1 test completed
    T
    (rove:run-test 't1 :style :none)
    T
    

    On to a failing test. In this case we pass a diagnostic string on to the first two assertions. Rove does not allow variables to be passed to the diagnostic string. In the :spec style, Rove will show the parameters were that were provided to the second assertion that failed.

    (deftest t2
      (testing "Basic failing test"
        (let ((x 1) (y 2))
          (ok (equal 1 2) "we know 1 is not equal to 2")
          (ok (equal x y) "we know ~a is not equal to ~a")
          (ok (equal (values 1 2) (values 1 2))))))
    T2
    ROVE> (run-test 't2)
    t2
      Basic failing test
        × 0) we know 1 is not equal to 2
        × 1) we know ~a is not equal to ~a
        ✓ Expect (EQUAL (VALUES 1 2) (VALUES 1 2)) to be true.
    
    × 1 of 1 test failed
    
    0) t2
         › Basic failing test
       we know 1 is not equal to 2
         (EQUAL 1 2)
    
    1) t2
         › Basic failing test
       we know ~a is not equal to ~a
         (EQUAL X Y)
             X = 1
             Y = 2
    

    Rove does not require you to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values Expressions, loops. closures and calling other tests
    1. Values Expressions

      Unlike Prove (or Lift or Parachute) Rove has no special functionality for dealing with values expressions. It accepts values expressions but only compares the first value in each. Thus the following passes:

      (deftest t2-values-expressions
          (testing "values expressions"
            (ok (equalp (values 1 2) (values 1 3)))
            (ok  (equalp (values 1 2 3) (values 1 3 2 7)))))
      
    2. Looping and closures

      Rove has no problem looping through assertions pulling the variables from a closure.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop
          (loop for x in l1 for y in l2 do
            (ok (= (char-code x) y)))))
      
    3. Tests calling tests

      Rove tests can call other Rove tests. As with most frameworks, this results in two test results rather than a combined test result.

  4. Conditions

    We saw in the first basic passing test Rove checking an error condition. I want to show what happens when an error condition fails because the result is different depending on whether the assertion function is ok or ng. It does not throw you into the debugger because we have *debug-on-error* set to nil, but shows the typical debugger output.

    (deftest t7-wrong-error
      (ok (signals (error 'floating-point-overflow)
              'division-by-zero)))
    
    (rove:run-test 't7-wrong-error)
    t7-wrong-error
      × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO. (3333ms)
    
    × 1 of 1 test failed
    
    0) t7-wrong-error
       Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO.
       FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled
         (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO)
    
         1: ((FLET "H0" :IN #:DROP-THRU-TAG-2) arithmetic error FLOATING-POINT-OVERFLOW signalled)
         2: (SB-KERNEL::%SIGNAL arithmetic error FLOATING-POINT-OVERFLOW signalled)
         3: (ERROR FLOATING-POINT-OVERFLOW)
         4: ((LABELS ROVE/CORE/ASSERTION::MAIN :IN #:DROP-THRU-TAG-2))
         5: ((FLET "MAIN0" :IN #:DROP-THRU-TAG-2))
         6: ((LAMBDA NIL))
         7: ((LAMBDA NIL :IN RUN-TEST))
         8: ((:METHOD ROVE/REPORTER:INVOKE-REPORTER (T T)) #<SPEC-REPORTER PASSED=0, FAILED=1> #<FUNCTION (LAMBDA NIL :IN RUN-TEST) {102D9E34FB}>)
         9: (SB-INT:SIMPLE-EVAL-IN-LEXENV (RUN-TEST (QUOTE T7-WRONG-ERROR)) #<NULL-LEXENV>)
         10: (EVAL (RUN-TEST (QUOTE T7-WRONG-ERROR)))
         11: (SWANK::EVAL-REGION (rove:run-test 't7-wrong-error)
         )
         12: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
         13: (SWANK-REPL::TRACK-PACKAGE #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E317B}>)
         14: (SWANK::CALL-WITH-RETRY-RESTART Retry SLIME REPL evaluation request. #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E311B}>)
         15: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E30FB}>)
    

    I am surprised, however, at the results if we change the assertion from ok to ng. We know it is going to be the wrong error, so I would have expected the ng assertion function to return a pass. But it does not.

    (deftest t7-wrong-error-NG
      (ng (signals (error 'floating-point-overflow)
              'division-by-zero)))
    
      × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO.
    
    × 1 of 1 test failed
    
    0) t7-wrong-error-ng
       Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO.
       FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled
         (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO)
    ...
    

    signals returns true or an error. ng accepts T or nil and getting back an error triggers an error rather than a failure.

  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      Rove does not run lists of tests.

    2. Suites

      Rove's RUN-SUITE function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing.

      Rove's RUN function does accept a style parameter but seems to handle only package-inferred systems. I confirm issue #42 that it will not run with non-package inferred systems.

      Since the author really likes the lots of packages style of structuring CL programs, I would not be surprised if he recommends having lots of test packages as the equivalent of how other testing frameworks treat suites of tests.

      (run-suite :tf-rove)
      

      top

  6. Fixtures and Freezing Data

    Rove provides SETUP for fixtures that are done once only in a package and TEARDOWN for cleanup. For a fixture that should be run before and after every test, Rove provides DEFHOOK.

    (defparameter *my-var-suite* 0)
    (defparameter *my-var-hook* 0)
    (setup
      (incf *my-var-suite*))
    
    (teardown
      (format t "Myvar ~a~%" *my-var-suite*))
    
    (defhook
        :before (incf *my-var-hook*)
        :after (format t "My-var-hook ~a~%" *my-var-hook*))
    
  7. Removing tests

    None apparently

  8. Sequencing, Random and Failure Only

    Everything is just done in sequential order. There is no obvious way to collect and run just failed tests.

  9. Skip Capability
    1. Assertions

      Yes

    2. Tests

      No

    3. Implementation

      No

  10. Random Data Generators

    None

28.4. Additional Discussion Points

The author claims rove is the successor to prove and cites the following differences. Rove supports package-inferred systems, has fewer dependencies, reports details of failure tests, has thread support and has fixtures.

Rove is clearly targetted at package-inferred systems. In fact some of the functionality doesn't work unless your system is package-inferred. Personally I do not like package-inferred systems. Other people have the completely opposite view. In any event I did not test any of the frameworks with a package inferred system so I cannot comment on whether they work or do not work in that circumstance.

To show that Rove actually is improved over Prove with respect to reporting details on failure, the following shows first prove, then rove on a simple

(let ((x 1) (y 2))
  (deftest t35
      (ok (= x y))))

Running with Prove

(run-test 't35)
T34-PROVE
× NIL is expected to be T

Now Rove:

  (run-test 't35)
  t35
  × 0) Expect (= X Y) to be true.

× 1 of 1 test failed

0) t34-rove
   Expect (= X Y) to be true.
     (= X Y)
         X = 1
         Y = 2
NIL

Both prove and rove would have accepted diagnostic message strings in the assertion.

On the whole, my concerns expressed in the summary still stand. There are better frameworks out there.

28.5. Who Uses

Many libraries on quicklisp use rove. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :rove)

top

29. rt

29.1. Summary

  Kevin M. Rosenberg MIT 2010

RT reminds me of Ptester (I wonder why) and is a part of CL history. See, e.g. Supporting the Regression Testing of Lisp Programs in 1991. Tests are limited to a single assertion and everything seems to be an A-B comparison using EQUAL. While you might think it is just of historical significance, there are still a surprising number of packages in quicklisp (29 at last count) that use it including major packages like ironclad, cffi, usocket, clsql and anaphora.

29.2. Assertion Functions

RT's tests do not accept multiple assertions. The test itself acts as an assertion that the included form is true or nil.

top

29.3. Usage

  1. Report Format and Basics

    We start with a basic passing test just to show the reporting.

    (deftest t1
      (= 1 1)
      t)
    
    (do-test 't1)
    T1
    ; processing (DEFTEST T4 ...)
    

    Now a deliberately failing test:

    (deftest t1-fail
      (= 1 2)
      t)
    
    (do-test 't1-fail)
    
    Test T1-FAIL failed
    Form: (= 1 2)
    Expected value: T
    Actual value: NIL.
    
  2. Multiple assertions, loops. closures and calling other tests

    RT tests do not handle multiple assertions, loops, closures or calling other tests

  3. Suites, tags and other multiple test abilities
    1. Lists of tests

      RT cannot directly handle lists of tests (although you could loop through list, the results would not be composable)

    2. Suites

      RT does not have suites per se. You can run all the tests that have been defined using the DO-TESTS function. By default it prints to *standard-output* but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice. do-tests will print the results for each individual test and then summarize with something like the following:

      5 out of 8 total tests failed: T4, T1-FAIL, T1-FUNCTION, T2-LOOP,
         T2-LOOP-CLOSURE.
      
  4. Fixtures and Freezing Data

    None, although rt's package which tests rt has a setup macro that could have been placed in the rt package to use for fixtures. You could use it for reference in writing your own.

  5. Removing tests

    RT has functions for rem-test and rem-all-tests.

  6. Sequencing, Random and Failure Only

    RT runs tests in their order of original definition.

  7. Skip Capability

    None

  8. Random Data Generators

    None

29.4. Discussion

While it is still used in major projects, I think Parachute or Fiasco would be better if you are starting a new project.

top

29.5. Who depends on rt?

("anaphora" "cffi" "cl-azure" "cl-cont" "cl-irc" "cl-performance-tuning-helper" "cl-photo" "cl-sentiment" "cl-store" "clsql" "cxml-stp/" "hyperobject" "infix-dollar-reader" "ironclad" "kmrcl" "lapack" "lml" "lml2" "narrowed-types" "nibbles/s" "osicat" "petit.string-utils" "qt" "quadpack" "trivial-features" "trivial-garbage" "umlisp" "usocket" "xhtmlgen")

top

30. should-test

30.1. Summary

homepage Vsevolod Dyomkin MIT 2019

Should test is pretty basic. It will report all the failing assertions in a test and does offer the opportunity to provide diagnostic strings to assertions, albeit without variables. It does offer the opportunity to just run the tests that failed last time, so you do not have to run through all the tests in the package every time. Unfortunately you cannot turn off progress reporting, you cannot go interactive into the debugger, it has no fixture capacity and cannot run lists of tests. Its suite capability are limited to you creating separate packages.

30.2. Assertion Functions

Assertion types are the minimal

be signal print-to

top

30.3. Usage

  1. Report Format

    The summary report will contain full failure reports if *verbose* is set to T (the default) or just test names otherwise.

    There is no optionality with respect to reporting or interactive. It is all reporting.

  2. Basics

    The basic all assertions passing test showing use of both be and signal. Calling test with the keyword parameter :test enables us to specify the test to be run. One item that is not clear is function of the empty form following the test name.

    (deftest t1 ()
      (should be = 1 1)
      (should signal division-by-zero (error 'division-by-zero)))
    
    (test :test 't1)
    Test T1:   OK
    T
    

    It just reported that the entire test passed.

    Now a basic failing test. This should have two failing assertions and one passing assertion. We put a diagnostic string in the first test which shows in the result but it does not allow us to insert variables into the string.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1)(y 2))
        (should be = x y "intentional failure x ~a y ~a" x y)
        (should be = (+ x 2) (+ x 3))
        (should be equal (values 1 2) (values 1 2))
        (should signal division-by-zero (error 'floating-point-overflow))))
    
    (test :test 't1-fail)
    Test T1-FAIL:
    Y FAIL
    expect: 1 2 "intentional failure x ~a y ~a" 1
    actual: 2
    (+ X 3) FAIL
    expect: 3
    actual: 4
    (ERROR 'FLOATING-POINT-OVERFLOW) FAIL
    expect: DIVISION-BY-ZERO
    actual: #<FLOATING-POINT-OVERFLOW {1009B89F63}>
      FAILED
    NIL
    (#<FLOATING-POINT-OVERFLOW {1009B89F63}> (4) (2))
    NIL
    

    Should-test has no special functionality for dealing with values expressions. It does accept them but as you would expect only looks at the first value in each values epxression. The following will pass.

    (deftest t1-unequal-values ()
      (should be equal (values 1 2) (values 1 3)))
    

    We get the expected and actual values without the extra blank lines that annoy me in fiveam. The list at the end shows the specific actual assertion values that failed.

    If we had set *verbose* to nil we would have just gotten the last three lines of the report.

    Test T1-FAIL:   FAILED
    NIL
    ((4) (2))
    NIL
    

    Should-test handles redefinitions of tested functions without forcing you to manually recompile the test. We will skip the proof.

  3. Edge Cases: Closures and calling other tests
    1. Looping and closures.

      Should-test cannot access the variables declared in a closure encompassing the test. This does not work:

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop-closure ()
          (loop for x in l1 for y in l2 do
            (should be = (char-code x) y))))
      
    2. Calling other tests

      Suppose you defined a test which also calls another test.

      (deftest t3 ()
        (should be = 1 1)
        (test :test 't1-fail))
      

      We know that t1-fail will fail. Will embedding it in test t3 cause t3 to fail as well? Yes.

      Test T3: Test T1-FAIL:
      Y FAIL
      expect: 1 2 "intentional failure x ~a y ~a" 1
      actual: 2
      (+ X 3) FAIL
      expect: 3
      actual: 4
      (ERROR 'FLOATING-POINT-OVERFLOW) FAIL
      expect: DIVISION-BY-ZERO
      actual: #<FLOATING-POINT-OVERFLOW {100A218BC3}>
        FAILED
        FAILED
      NIL
      (#<FLOATING-POINT-OVERFLOW {100A218BC3}> (4) (2))
      NIL
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Should-test does not handle lists of tests.

    2. Suites

      The test function for Should-test runs all the tests in the current package by default. As you have seen above, giving it a :test keyword parameter will trigger just the named test. Giving it a :package keyword parameter will cause it to run all the tests in the specified package. The :failed key to test will re-test only the tests which failed at their last run. All in all, there are better frameworks.

      (test :failed t)
      

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing or Random

    Should-test will run tests in the same order each time (no shuffle capability). As noted in the suite discussion, it is one of the few frameworks to have failure only functionality built in.

    (test :failed t)
    
  8. Skip Capability

    None

  9. Random Data Generators

    None

    top

30.4. Who Depends on Should-test

("cl-redis-test" "mexpr-tests" "rutils-test)

top

31. simplet

31.1. Summary

homepage Noloop GPLv3 2019

In simplet, tests can have only one assertion that they get from some function and suites can take multiple tests. From the standpoint of other frameworks, simplet "tests" are the assertion clauses and simplet "suites" are the way to package multiple assertions. If a suite has no tests, or a test has no function returning T or NIL, they are marked "PENDING".

Simplet's run function takes only an optional parameter to return a string rather than printing to the REPL.

I am just going to show one example of usage and leave it at that. Given all the functionality in other frameworks, I cannot recommend.

(suite "suite 2"
       (test "one"
         #'(lambda ()
             (let ((x 1))
               (= x 1))))
       (test "two"
         #'(lambda ()(eq 'a 'a)))
       (test "three" #'(lambda ()(= 2 1)))
       (test "four" #'(lambda ()(= 1 1))))
(#<FUNCTION (LAMBDA () :IN NOLOOP.SIMPLET::CREATE-SUITE) {100A391A6B}>)

(run)
#...Simplet...#

one: T
two: T
three: NIL
four: T
-----------------------------------
suite 2: NIL

Runner result: NIL

NIL

The author uses simplet in testing assert-p, eventbus and skeleton-creator

32. tap-unit-test

32.1. Summary

homepage Christopher K. Riesbeck, John Hanley MIT 2017

Tap-unit-test is a version of a slightly older version of lisp-unit with TAP reporting. There have not been any real updates since 2011 and I cannot find anyone using it, so I would simply look to either lisp-unit or lisp-unit2 if you like their approach to things.

32.2. Assertion Functions

assert-eq assert-eql assert-equal assert-equality
assert-equalp assert-error assert-expands assert-false
assert-prints assert-true fail logically-equal
set-equal unordered-equal    

top

32.3. Usage

  1. Report Format and basic syntax

    TAP-unit-test defaults to a reporting format shown below. You can do (setf *use-debugger* :ask) or (setf *use-debugger* t), but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error).

    We can start with a basic failing test to show the reporting format. We will provide a diagnostic string in the first assertion. Tap-unit-test has an unordered-equal assertion helper that might be useful for some which is shown in this example:

    (define-test t1-fail
      "describe t1-fail"
      (let ((x 1))
        (assert-true (= x 2) "Deliberate failure. We know 2 is not ~a" x)
        (assert-equal x 3)
        (assert-true (unordered-equal '(3 2 1 1) '(1 2 3 2))) ; Return true if l1 is a permuation of l2.
        (assert-true (set-equal '(a b c d) '(b a c c))) ;every element in both sets needs to be in the other
        (assert-error 'division-by-zero
                      (error 'floating-point-overflow)
                      "testing condition assertions")
        (assert-true (unordered-equal '(3 2 1) '(1 3 4)))
        (assert-true (logically-equal t nil)) ; both true or both false
        (assert-true (logically-equal nil t)))) ; both true or both false
    

    Unlike lisp-unit, when you call run-tests in tap-unit-test, you call unquoted test names, even when you are running it on several tests. Also note that it does not return any type of object as a test result. If we now run it we get the following report:

    (run-tests t1-fail)
    
    T1-FAIL: (= X 2) failed:
    Expected T but saw NIL
       "Deliberate failure. We know 2 is not ~a" => "Deliberate failure. We know 2 is not ~a"
       X => 1
    T1-FAIL: 3 failed:
    Expected 1 but saw 3
    T1-FAIL: (UNORDERED-EQUAL '(3 2 1 1) '(1 2 3 2)) failed:
    Expected T but saw NIL
    T1-FAIL: (SET-EQUAL '(A B C D) '(B A C C)) failed:
    Expected T but saw NIL
    T1-FAIL: (ERROR 'FLOATING-POINT-OVERFLOW) failed:
    Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100D41D203}>
       "testing condition assertions" => "testing condition assertions"
    T1-FAIL: (UNORDERED-EQUAL '(3 2 1) '(1 3 4)) failed:
    Expected T but saw NIL
    T1-FAIL: (LOGICALLY-EQUAL T NIL) failed:
    Expected T but saw NIL
    T1-FAIL: (LOGICALLY-EQUAL NIL T) failed:
    Expected T but saw NIL
    T1-FAIL: 0 assertions passed, 8 failed.
    NIL
    

    Tap-unit test does not need to manually recompile tests when a tested function is modified. We will skip the proof.

    1. Edge Cases: Value expressions, closures and calling other tests
      1. Values expressions

        Tap-unit-test has no special functionality for dealing with values expressions. It does accept them as input, but as expected, only compares the first value in a values expression.

        (define-test t2
          "describe t2"
          (assert-equal 1 2)
          (assert-equal 2 3)
          (assert-equalp (values 1 2) (values 1 2)))
        
        (run-tests t2)
        

        We get what we expected, two failing assertions and one passing assertion. Does tap-unit-test follow the lisp-unit ability to actually look at all members of the values expression or just the first one? Yes. So far only the two lisp-units and tap-unit-test actually compared each item in two values expressions.

        (define-test t2-values-expressions
            (assert-equal (values 1 2) (values 1 3))
            (assert-equal (values 1 2 3) (values 1 3 2)))
        
        (run-tests t2-values-expressions)
        T2-VALUES-EXPRESSIONS: (VALUES 1 3) failed:
        Expected 1; 2 but saw 1; 3
        T2-VALUES-EXPRESSIONS: (VALUES 1 3 2) failed:
        Expected 1; 2; 3 but saw 1; 3; 2
        T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
        
      2. Closures

        Unfortunately no luck with closure variables. It does, however, handle looping through assertions if the variables are dynamic or defined within the test. We will skip the proof.

    2. Calling another test

      While tests are not functions in tap-unit-test, they can call other tests.

      (define-test t3 ()
        "describe t3"
        (assert-equal 'a 'a)
      
      (run-tests t2))
      (run-tests t3)
      T2: 2 failed:
      Expected 1 but saw 2
      T2: 3 failed:
      Expected 2 but saw 3
      T2: 1 assertions passed, 2 failed.
      T3: 1 assertions passed, 0 failed.
      
  2. Suites, tags and other multiple test abilities
    1. Lists of tests

      As mentioned earlier, tap-unit-test uses unquoted test names and does not return any kind of test-results object. Running multiple specific tests would look like the following.

      (run-tests t7-bad-error t1-fail)
      T7-BAD-ERROR: (ERROR 'FLOATING-POINT-OVERFLOW) failed:
      Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100BEDBFA3}>
         "testing condition assertions. This should fail" => "testing condition assertions. This should fail"
      T7-BAD-ERROR: 0 assertions passed, 1 failed.
      T1-FAIL: Y failed:
      Expected 1 but saw 2
      12
      T1-FAIL: 1 assertions passed, 1 failed.
      TOTAL: 1 assertions passed, 2 failed, 0 execution errors.
      
    2. Packages

      If you want to run all the tests in a package, just call run-tests with no parameters

    3. Suites and Tags

      Tap-unit-test has no suites or tags capability

  3. Fixtures and Freezing Data

    None

  4. Removing tests

    Tap-unit-test has a remove-tests function which actually does take a quoted list of test names unlike some of the other functions which use unquoted names.

    (remove-tests '(t1 t2))
    
  5. Sequencing, Random and Failure Only

    None

  6. Skip Capability
  7. Generators

    Tap-unit-test has a make-random-state function for generating random data. See example below:

    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

32.4. Discussion

Basically lisp-unit and lisp-unit2 have moved on and tap-unit-test exists for historical reasons. There are enough syntactic differences that if someone is using it for an existing code base, pulling it out of quicklisp could be a breakage. No one is using it As far as I can tell.

top

33. try

top

33.1. Summary

homepage Gábor Melis MIT 2022

Try's self description is:

"Try is what we get if we make tests functions and build a test framework on top of the condition system as Stefil did but also address the issue of rerunning and replaying, make the IS check more capable, use the types of the condition hierarchy to parameterize what to debug, print, rerun, and finally document the whole thing."

"Try is a library for unit testing with equal support for interactive and non-interactive workflows. Tests are functions, and almost everything else is a condition, whose types feature prominently in parameterization."

33.2. Assertion Functions

is        
expected-result-success unexpected-result-success expected-result-failure unexpected-result-failure  

top

33.3. Usage

  1. Basics and Report Format
    1. Interactive

      Each defined test is a lisp function that records its execution in "trial" objects. There are different defaults intended for interactive and non-interactive modes. When a test is called as a Lisp function, the interactive defaults are used and unexpected failures invoke the debugger. When a test is called with TRY, the non-interactive defaults are used and the debugger is not invoked. In more detail, when the debugger is to invoked is determined by *DEBUG* and *TRY-DEBUG*, which are both event types.

      1. Basic Passing Test

        Here we define a test that takes one parameter and then call the test with a parameter we know will pass and see the results.

        (deftest test1-pass (x)
             (is (= x 1)))
        

        We can either funcall the test or just call it directly (test1-pass 1)

          (test1-pass 1)
        
        TEST1-PASS
          ⋅ (IS (= X 1))
        ⋅ TEST1-PASS ⋅1
        #<TRIAL (TEST1-PASS 1) EXPECTED-SUCCESS 0.000s ⋅1>
        

        The first line shows the beginning of a defined test, the second the successful assertion, the third the cumulative test result and the fourth indicates what was recorded in the trial object. If you look at the last three lines, the dot before the 1 is a "marker" that indicates an a "category" of expected success and the 1 represents the number of expected successes actually found.

        The total list of categories and their markers is:

        ((abort*             :marker "⊟")
         (unexpected-failure :marker "⊠")
         (unexpected-success :marker "⊡")
         (skip               :marker "-")
         (expected-failure   :marker "×")
         (expected-success   :marker "⋅"))
        
      2. Basic Failing Test

        The following has two assertions, both of which fail. In this case we will funcall the test and it will throw us into the debugger with restarts. We would get the same result if we just called it as a normal function (t1-fail):

          (deftest t1-fail ()
          (let ((x 1) (y 2))
           (is (equal 1 2))
            (is (= x y)
                :msg "Intentional failure x does not equal y"
                :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%"
                     *package* *print-case*))))
        
          (funcall 't1-fail)
        
          UNEXPECTED-FAILURE in check:
          (IS (EQUAL 1 2))
           [Condition of type UNEXPECTED-RESULT-FAILURE]
        
        Restarts:
         0: [RECORD-EVENT] Record the event and continue.
         1: [FORCE-EXPECTED-SUCCESS] Change outcome to TRY:EXPECTED-RESULT-SUCCESS.
         2: [FORCE-UNEXPECTED-SUCCESS] Change outcome to TRY:UNEXPECTED-RESULT-SUCCESS.
         3: [FORCE-EXPECTED-FAILURE] Change outcome to TRY:EXPECTED-RESULT-FAILURE.
         4: [ABORT-CHECK] Change outcome to TRY:RESULT-ABORT*.
         5: [SKIP-CHECK] Change outcome to TRY:RESULT-SKIP.
         6: [RETRY-CHECK] Retry check.
         7: [ABORT-TRIAL] Record the event and abort trial UAX-15-TRY-TESTS::T1-FAIL.
         8: [SKIP-TRIAL] Record the event and skip trial UAX-15-TRY-TESTS::T1-FAIL.
         9: [RETRY-TRIAL] Record the event and retry trial UAX-15-TRY-TESTS::T1-FAIL.
         10: [SET-TRY-DEBUG] Supply a new value for :DEBUG of TRY:TRY.
         11: [RETRY] Retry SLIME REPL evaluation request.
         12: [*ABORT] Return to SLIME's top level.
         13: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {1002487C03}>)
        
        Backtrace:
          0: (TRY::SIGNAL-OUTCOME T FAILURE (:CHECK (IS (EQUAL 1 2)) :ELAPSED-SECONDS 0 :CAPTURES NIL ...))
          1: ((FLET "DEFTEST" :IN T1-FAIL) #<unused argument>)
          2: (TRY::CALL-TEST)
          3: (T1-FAIL)
        

        If you call restart 13 to abort, the REPL prints out the following:

          T1-FAIL
          ⊟ non-local exit
        ⊟ T1-FAIL ⊟1
        

        You can see the markers summarizing the conditions being triggered in the line items and the net result.

        Now lets tell it that we expect failure

          (deftest t1-expected-fail ()
            (with-failure-expected (t)
                (let ((x 1) (y 2))
                  (is (equal 1 2))
                  (is (= x y)
                      :msg "Intentional failure x does not equal y"
                      :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%"
                            *package* *print-case*)))))
        
          (funcall 't1-expected-fail)
        
        T1-EXPECTED-FAIL
          × (IS (EQUAL 1 2))
          × Intentional failure x does not equal y
            where
              X = 1
              Y = 2
            *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE
        
        ⋅ T1-EXPECTED-FAIL ×2
        #<TRIAL (T1-EXPECTED-FAIL) EXPECTED-SUCCESS 0.000s ×2>
        

        Again you see the markers showing the conditions triggers, that there were two failures and both were expected.

        Now a version where we have two assertions, the first one fails unexpectedly and the second passes:

          (deftest t1b-unexpected-fail ()
                (let ((x 1) (y 2))
                  (is (= x y))
                  (is (equal 1 x))))
        
          (t1b-unexpected-fail)
          UNEXPECTED-FAILURE in check:
          (IS (= X Y))
        where
          X = 1
          Y = 2
           [Condition of type UNEXPECTED-RESULT-FAILURE]
        
        Restarts:
         0: [RECORD-EVENT] Record the event and continue.
         1: [FORCE-EXPECTED-SUCCESS] Change outcome to TRY:EXPECTED-RESULT-SUCCESS.
         2: [FORCE-UNEXPECTED-SUCCESS] Change outcome to TRY:UNEXPECTED-RESULT-SUCCESS.
         3: [FORCE-EXPECTED-FAILURE] Change outcome to TRY:EXPECTED-RESULT-FAILURE.
         4: [ABORT-CHECK] Change outcome to TRY:RESULT-ABORT*.
         5: [SKIP-CHECK] Change outcome to TRY:RESULT-SKIP.
         6: [RETRY-CHECK] Retry check.
         7: [ABORT-TRIAL] Record the event and abort trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL.
         8: [SKIP-TRIAL] Record the event and skip trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL.
         9: [RETRY-TRIAL] Record the event and retry trial UAX-15-TRY-TESTS::T1B-UNEXPECTED-FAIL.
         10: [SET-TRY-DEBUG] Supply a new value for :DEBUG of TRY:TRY.
         11: [RETRY] Retry SLIME REPL evaluation request.
         12: [*ABORT] Return to SLIME's top level.
         13: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {1009C947E3}>)
        

        If we choose RECORD-EVENT, we get the following:

          T1B-UNEXPECTED-FAIL
          ⊠ (IS (= X Y))
            where
              X = 1
              Y = 2
          ⋅ (IS (EQUAL 1 X))
        ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1
        #<TRIAL (T1B-UNEXPECTED-FAIL) UNEXPECTED-FAILURE 161.676s ⊠1 ⋅1>
        

        The last line shows two categories of results, one unexpected failure (indicated by the ⊠) and one expected success.

        If we choose SKIP-CHECK, we get:

          T1B-UNEXPECTED-FAIL
          - (IS (= X Y))
          ⋅ (IS (EQUAL 1 X))
        ⋅ T1B-UNEXPECTED-FAIL -1 ⋅1
        #<TRIAL (T1B-UNEXPECTED-FAIL) EXPECTED-SUCCESS 10.917s -1 ⋅1>
        

        Now the markers show one failure that was skipped and one expected success.

        Now a version where we have two assertions and we expect one specifically to fail:

          (deftest t1a-expected-fail ()
                (let ((x 1) (y 2))
                  (is (equal 1 x))
                  (with-failure-expected (t)
                  (is (= x y)
                      :msg "Intentional failure x does not equal y"
                      :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%"
                            *package* *print-case*)))))
        T1A-EXPECTED-FAIL
        
        (t1a-expected-fail)
        T1A-EXPECTED-FAIL
          ⋅ (IS (EQUAL 1 X))
          × Intentional failure x does not equal y
            where
              X = 1
              Y = 2
            *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE
        
        ⋅ T1A-EXPECTED-FAIL ×1 ⋅1
        #<TRIAL (T1A-EXPECTED-FAIL) EXPECTED-SUCCESS 0.000s ×1 ⋅1>
        
    2. Quiet Reports

      The Try function allows you to report only the trial object summary (by passing :print nil):

        (try 't1-fail :print nil)
      #<TRIAL (T1-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2>
      

      or only the unexpected events (by passing :print 'unexpected).

        (try 'test9-pass-and-fail :print 'unexpected)
      
      TEST9-PASS-AND-FAIL
        T1B-UNEXPECTED-FAIL
          ⊠ (IS (= X Y))
            where
              X = 1
              Y = 2
        ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1
        T1A-UNEXPECTED-FAIL
          ⊠ Intentional failure x does not equal y
            where
              X = 1
              Y = 2
            *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE
      
        ⊠ T1A-UNEXPECTED-FAIL ⊠1 ⋅1
      ⊠ TEST9-PASS-AND-FAIL ⊠2 ⋅6
      #<TRIAL (TEST9-PASS-AND-FAIL) UNEXPECTED-FAILURE 0.000s ⊠2 ⋅6>
      

      In general, what to print is parameterized as event types, of which nil and unexpected are two instances.

      The try function does not allow you to pass parameters to the tests, so you would need to wrap those tests inside another test or a lambda, or call the test directly. For example:

      (deftest test6-pass (x y)
        (is (= x 1))
        (is (= y 2))
        (is (not (= x y))))
      
      (deftest test7 ()
        (test6-pass 1 2)
        (test1-pass 1))
      
      (try 'test7 :print 'unexpected)
      ==> #<TRIAL (TEST7) EXPECTED-SUCCESS 0.000s ⋅4>
      
      (try (lambda () (test6-pass 1 2)))
      ==> #<TRIAL (TRY #<FUNCTION (LAMBDA ()) {100689750B}>) EXPECTED-SUCCESS 0.000s ⋅3>
      
      (test6-pass 1 2)
      ==> #<TRIAL (TEST6-PASS 1 2) EXPECTED-SUCCESS 0.000s ⋅3>
      

      As you can see from the result, it indicated that there were 4 expected successes in the total combination of tests.

      If we had not used :print 'unexpected, the result would have looked like:

        (try 'test7)
      
      TEST7
        TEST6-PASS
          ⋅ (IS (= X 1))
          ⋅ (IS (= Y 2))
          ⋅ (IS (NOT (= X Y)))
        ⋅ TEST6-PASS ⋅3
        TEST5-PASS
          ⋅ (IS (= X 1))
        ⋅ TEST5-PASS ⋅1
      ⋅ TEST7 ⋅4
      #<TRIAL (TEST7) EXPECTED-SUCCESS 0.000s ⋅4>
      

      Running try on its version of the uax-15 test suite would (a) have printed 343332 lines of successful assertions and the trial object would show the first version running in the emacs slime REPL and the second in a terminal (both under SBCL 2.3.0)

      ..... 343332 lines later ...
      #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 31.773s ⋅343332>
      
      ..... 343332 lines later ...
      #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 10.887s ⋅343332>
      

      You can see the time discrepancy: Almost 32 seconds in the REPL to run the test and print every assertion and almost 11 seconds in the terminal. Now with :print 'unexpected, we get the following (first in the emacs REPL and then in a terminal:

        (try 'uax-suite :print 'unexpected)
      
      #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.527s ⋅343332>
      
      #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.460s ⋅343332>
      

      And the test run is down from 32 secs to 1.5 seconds.

      1. Print Compactly

        If we set the *print-compactly* variable to t, and run try without the :print 'unexpected, try would have printed out a single dot for each assertion and the result for the running the uax-15 suite would have been (again, first in the emacs REPL and then running in a terminal):

          ..... name of test ... lots of dots representing passing assertions, concluding in
        
        #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.990s ⋅343332>
        
        #<TRIAL (UAX-SUITE) EXPECTED-SUCCESS 1.900s ⋅343332>
        

        So printing the dots only saved us 30 seconds of wasted time in the REPL and 8 seconds in the terminal.

    3. Failure reports only

      Lets create a test which calls tests we know will fail and tests we know will not fail:

      (deftest t1a-unexpected-fail ()
            (let ((x 1) (y 2))
              (is (= x y)
                  :msg "Intentional failure x does not equal y"
                  :ctx ("*PACKAGE* is ~S and *PRINT-CASE* is ~S~%"
                        *package* *print-case*))
              (is (equal 1 x))))
      
      (deftest t1b-unexpected-fail ()
            (let ((x 1) (y 2))
              (is (= x y))
              (is (equal 1 x))))
      
      (deftest test8-pass-and-fail ()
                        (t1b-unexpected-fail)
                        (test7)
                        (t1a-unexpected-fail))
      

      Now we call try with test8-pass-and-fail with the :print 'unexpected parameters

        (try 'test8-pass-and-fail :print 'unexpected)
      TEST8-PASS-AND-FAIL
        T1B-UNEXPECTED-FAIL
          ⊠ (IS (= X Y))
            where
              X = 1
              Y = 2
        ⊠ T1B-UNEXPECTED-FAIL ⊠1 ⋅1
        T1A-UNEXPECTED-FAIL
          ⊠ Intentional failure x does not equal y
            where
              X = 1
              Y = 2
            *PACKAGE* is #<PACKAGE "UAX-15-TRY-TESTS"> and *PRINT-CASE* is :UPCASE
      
        ⊠ T1A-UNEXPECTED-FAIL ⊠1 ⋅1
      ⊠ TEST8-PASS-AND-FAIL ⊠2 ⋅6
      #<TRIAL (TEST8-PASS-AND-FAIL) UNEXPECTED-FAILURE 0.003s ⊠2 ⋅6>
      

      As expected the net result (last line) is 2 unexpected failures and 6 expected successes. We do not get thrown into the debugger, and we only get details of the unexpected failures.

    4. Streams and Printer

      The try function accepts key parameters :stream and :printer to redirect the report printing.

    5. Other Reports

      The following shows the condition classes of events which can be signaled and printed:

      (let ((*debug* nil)
            (*print* '(not trial-start))
            (*describe* nil))
        (with-test (verdict-abort*)
          (with-test (expected-verdict-success))
          (with-expected-outcome ('failure)
            (with-test (unexpected-verdict-success)))
          (handler-bind (((and verdict success) #'force-expected-failure))
            (with-test (expected-verdict-failure)))
          (handler-bind (((and verdict success) #'force-unexpected-failure))
            (with-test (unexpected-verdict-failure)))
          (with-test (verdict-skip)
            (skip-trial))
          (is t :msg "EXPECTED-RESULT-SUCCESS")
          (with-failure-expected ('failure)
            (is t :msg "UNEXPECTED-RESULT-SUCCESS")
            (is nil :msg "EXPECTED-RESULT-FAILURE"))
          (is nil :msg "UNEXPECTED-RESULT-FAILURE")
          (with-skip ()
            (is nil :msg "RESULT-SKIP"))
          (handler-bind (((and result success) #'abort-check))
            (is t :msg "RESULT-ABORT*"))
          (catch 'foo
            (with-test (nlx-test)
              (throw 'foo nil)))
          (error "UNHANDLED-ERROR")))
      .. VERDICT-ABORT*                       ; TRIAL-START
      ..   ⋅ EXPECTED-VERDICT-SUCCESS
      ..   ⊡ UNEXPECTED-VERDICT-SUCCESS
      ..   × EXPECTED-VERDICT-FAILURE
      ..   ⊠ UNEXPECTED-VERDICT-FAILURE
      ..   - VERDICT-SKIP
      ..   ⋅ EXPECTED-RESULT-SUCCESS
      ..   ⊡ UNEXPECTED-RESULT-SUCCESS
      ..   × EXPECTED-RESULT-FAILURE
      ..   ⊠ UNEXPECTED-RESULT-FAILURE
      ..   - RESULT-SKIP
      ..   ⊟ RESULT-ABORT*
      ..   NLX-TEST                           ; TRIAL-START
      ..     ⊟ non-local exit                 ; NLX
      ..   ⊟ NLX-TEST ⊟1                      ; VERDICT-ABORT*
      ..   ⊟ "UNHANDLED-ERROR" (SIMPLE-ERROR)
      .. ⊟ VERDICT-ABORT* ⊟3 ⊠1 ⊡1 -1 ×1 ⋅1
      ..
      ==> #<TRIAL (WITH-TEST (VERDICT-ABORT*)) ABORT* 0.004s ⊟3 ⊠1 ⊡1 -1 ×1 ⋅1>
      
    6. Duration

      The try global variable *print-duration* defaults to nil. If set to true, the number of second spent during execution on each assertion is also printed. E.g.

        (let ((*print-duration* t)
            (*debug* nil)
            (*describe* nil))
        (with-test (timed)
          (is (progn (sleep 0.3) t))
          (is (progn (sleep 0.2) t))
          (error "xxx")))
      ..        TIMED
      ..  0.300   ⋅ (IS (PROGN (SLEEP 0.3) T))
      ..  0.200   ⋅ (IS (PROGN (SLEEP 0.2) T))
      ..          ⊟ ""xxx (SIMPLE-ERROR)
      ..  0.504 ⊟ TIMED ⊟1 ⋅2
      ..
      ==> #<TRIAL (WITH-TEST (TIMED)) ABORT* 0.504s ⊟1 ⋅2>
      
  2. Edge Cases: Values expressions, loops. closures and calling other tests
    1. Values expressions

      Try handles values expressions with no problem.

      (is (match-values (values (1+ 5) "sdf")
          (= * 0)
          (string= * "sdf")))
      
    2. Now looping and closures

      Try has no problem with loops or finding variables that have been set in a closure containing the test.

      
      
    3. Calling a test inside another test

      Try has no problem with calling tests inside other tests.

      (let ((x 1))
         (deftest test3-pass ()
            (is (= x 1))
            (is (funcall 'test2 1)))
         (funcall 'test3-pass))
      
      TEST3-PASS
        ⋅ (IS (= X 1))
        ⋅ (IS (= X Y))
        ⋅ (IS (FUNCALL 'TEST2 1))
      ⋅ TEST3-PASS ⋅3
      #<TRIAL (TEST3-PASS) EXPECTED-SUCCESS 0.000s ⋅3>
      
  3. Float Tests

    Try has an extensive library of float comparisons. I will just quote from the documentation:

    "Float comparisons following https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

    [function] FLOAT-~= X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)

    Return whether two numbers, X and Y, are approximately equal either according to MAX-DIFF-IN-VALUE or MAX-DIFF-IN-ULP.

    If the absolute value of the difference of two floats is not greater than MAX-DIFF-IN-VALUE, then they are considered equal.

    If two floats are of the same sign and the number of representable floats (ULP, unit in the last place) between them is less than MAX-DIFF-IN-ULP, then they are considered equal.

    If neither X nor Y are floats, then the comparison is done with =. If one of them is a DOUBLE-FLOAT, then the other is converted to a double float, and the comparison takes place in double float space. Else, both are converted to SINGLE-FLOAT and the comparison takes place in single float space.

    [variable] \*MAX-DIFF-IN-VALUE* 1.0e-16

    The default value of the MAX-DIFF-IN-VALUE argument of FLOAT-~=.

    [variable] \*MAX-DIFF-IN-ULP* 2

    The default value of the MAX-DIFF-IN-ULP argument of FLOAT-~=.

    [function] FLOAT-~< X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)

    Return whether X is approximately less than Y. Equivalent to <, but it also allows for approximate equality according to FLOAT-~=.

    [function] FLOAT-~> X Y &KEY (MAX-DIFF-IN-VALUE \*MAX-DIFF-IN-VALUE*) (MAX-DIFF-IN-ULP \*MAX-DIFF-IN-ULP*)

    Return whether X is approximately greater than Y. Equivalent to >, but it also allows for approximate equality according to FLOAT-~=."

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Try can handle lists of tests and the function list-package-tests will return a list of the try tests in a package.

      (test '(t1 t2))
      
      (list-package-tests)
      
    2. Suites

      Test suites in try are just tests that callother tests

      (deftest test6-pass (x y)
        (is (= x 1))
        (is (= y 2))
        (is (not (= x y))))
      
      (deftest test7 ()
        (test6-pass 1 2)
        (test1-pass 1))
      

      Now we can call test7 and get the results for test6 and test1

        (test7)
      TEST7
        TEST6-PASS
          ⋅ (IS (= X 1))
          ⋅ (IS (= Y 2))
          ⋅ (IS (NOT (= X Y)))
        ⋅ TEST6-PASS ⋅3
        TEST1-PASS
          ⋅ (IS (= X 1))
        ⋅ TEST1-PASS ⋅1
      ⋅ TEST7 ⋅4
      #<TRIAL (TEST7) EXPECTED-SUCCESS 0.003s ⋅4>
      
  5. Replay Events and Defer Descriptions

    Try has a replay-events function which reprocesses the events which where collected in a trial object. It does not re-run the tests, it just signals the events collected in the trial object for further processing. For example, you ran a large test without :print 'unexpected and need to winnow through the output to find just the failing tests. You can replay and just get the unexpected results:

    (replay-events ! :print 'unexpected)
    

    As you might expect, the variable ! passed in is the most recent trial. !! or !!! would be second or third most recent respectively.

    The function (recent-trial &optional (n 0)) will return the Nth most recent trial or NIL if there are not enough trials recorded. Every TRIAL returned by TRY gets pushed onto a list of trials, but only *n-recent-trials* are kept.

    top

  6. Rerunning Tests

    Rerun only the tests with unexpected results from the previous run:

    (try !)
    

    Alternatively, one could (funcall !) (or any other trial object). What is rerun is controlled by *rerun* and *try-rerun* (both event types).

    top

  7. Fixtures and Freezing Data

    Per the documentation, Try has no direct support for fixtures. One suggestion is writing macros like the following:

    (defvar *server* nil)
    
    (defmacro with-xxx (&body body)
      `(flet ((,with-xxx-body ()
                ,@body))
         (if *server*
             (with-xxx-body)
             (with-server (make-expensive-server)
               (with-xxx-body)))))
    

    With Try, fixtures are needed less often because one can rerun only the failing tests from the entire test suite. If the higher level tests establish the dynamic environment and call subtests, then things will just work.

  8. Removing tests

    Often, it suffices to remove the call to test function (if it is invoked explicitly by another test). If it is invoked via a package (list-package-tests list all tests in a given package), then it needs to be deleted by fmakunbound, unintern, or by redefining the function with defun.

  9. Sequencing, Random and Failure Only

    While tests and the forms inside tests normally follow a sequential order, try allows you to shuffle the list of forms inside the body of a test.

    (loop repeat 3 do
      (with-shuffling ()
        (prin1 1)
        (prin1 2)))
    .. 122112
    => NIL
    
  10. Skip Capability

    As we already saw when looking at tests that had failures, you can look at the conditions triggered and decide to skip an assertion result.

    The following shows a test that skips the test (server-available-p).

    (deftest my-suite ()
      (with-skip ((not (server-available-p)))
        (test-server)))
    
  11. Random Data Generators

    None, but the helper libraries can fullfill this well.

33.4. Discussion

I found try to be easy to use. I was a bit surprised at the lack of additional assertion comparisons, but in try, everything is an extension to is. You can easily use the normal CL comparison functions and customize reporting for them.

top

33.5. Who uses try

The following list is just pulling the results (ql:who-depends-on :try) and adding their homepage urls.

34. unit-test

34.1. Summary

homepage Manuel Odendahl, Alain Picard MIT 2012

Again, another framework that does the basics. It will report all assertions that failed in a test. It will do a progress report on the tests, not the assertions which cannot be turned off. It does allow you to provide diagnostic strings to assertions to help in debugging, but does not allow you to pass in any variables. It has no interactivity option, so you cannot just hop into the debugger on a test failure. It has no fixture capacity, but it does have suites.

34.2. Assertion Functions

test-assert test-condition test-equal

34.3. Usage

  1. Report Format

    Everything a return of a list of test-result objects. There is no provision for dropping into the debugger. run-test has an optional parameter for output that sends output by defult to *debug-io*

  2. Basics

    Unit-test has a limited vocabulary for test functions. The deftest macro will create an instance of a unit-test class with the first parameter being the unit name (used to group related tests) and the second parameter being the name of the test itself.

    (deftest :test "t1"
      (let ((x 1))
        (test-assert (=  x 1))
        (test-equal "a" "a" )
        (test-condition
         (/ 1 (- x 1))
         'division-by-zero)))
    

    In this case we used a string "t1" as the name of the test. We could have used a symbol 't1 or a keyword :t1. Unfortunately the run-test method is only defined for unit tests, which makes calling a single test a little clumsy. We have to call get-test-by-name and pass that to run-test. In your test I would assume that you would add another method to handle however you write your test names. We will continue to use get-test-by-name as a reminder.

    (run-test (get-test-by-name "t1"))
    (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL>
     #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL>
     #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
    

    Not the most exciting report in the world. But lets take a look at a failing test. We can put a diagnostic string into test-assert, but not into =test-equal. The equality test for test-equal is equal, but you can change that using a keyword parameter as shown below

    (deftest :test "t1-fail"
      (let ((x 1))
        (test-assert (= x 2) "we know that X (1) does not equal 2")
        (test-equal "a" 'a :test #'eq )
        (test-condition
         (/ 1 (- x 1))
         'floating-point-overflow)))
    
    (run-test (get-test-by-name "t1-fail"))
    (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: CRASH REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: 'A STATUS: FAIL REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (= X
                                                      2) STATUS: FAIL REASON: we know that X (1) does not equal 2>
                         #<TEST-EQUAL-RESULT FORM: (VALUES 1 3 4 5) STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
    

    The first thing I notice is that the list of results also includes the list of results from when we ran test t1. Everything just gets pushed to a non-exported list *unit-test-results*. So if you want to just see the results for the next test you are going to run, you need to run some cleanup.

    T1-fail generated three results, so again a little clumnsy on ensuring you see all the results from the test. Let's set *unit-test-results* to nil after every test so we can this clean.

    There is an exported variable =*unit-test-debug*, but looking at the source code, it does not appear to be actually used for anything, leaving it open for your to write your own code using it as a flag.

    If a test calls a function that is later modified, the test does not need to be recompiled to check the tested function correctly. We will skip the proof.

  3. Edge Cases: Value expressions, closures and calling other tests
    1. Value expressions

      Like most of the frameworks, unit-test will test a values expression by only checking the first value. We will skip the proof.

    2. Looping and closures

      Unit-test provided a little bit of a surprise here. If you run a test where the assertion is inside a loop, the test-result object will be pushed to unit-test::*unit-test-results*, but that list will not be printed to the REPL. You just get NIL in the REPL. We will skip the proof.

      Unit-test had no problem testing functions that use variables provided in a closure. We will skip the proof.

      Tests can call other tests, but there is no composition, just another test result.

      (deftest :test "t3"
        (test-assert (= 1 1))
        (run-test (get-test-by-name "t1")))
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Unit-test has no provision to handle lists of tests although you could write a method on run-test that would do so.

    2. Suites

      Looking at the examples above, we gave all the tests the unit name of ":test". This is essentially the suite named :test. If we call run-all-tests, all the tests would be run. If we had given tests different unit names, we could run all the tests with those names by passing the keyword parameter :unit to run-all-tests

      (run-all-tests :unit :some-unit-name)
      

      top

  5. Fixtures and Freezing Data

    From the source code:

    ;;;;   For more complex tests requiring fancy setting up and tearing down
    ;;;;   (as well as reclamation of resources in case a test fails), users are expected
    ;;;;   to create a subclass of the unit-test class using the DEFINE-TEST-CLASS macro.
    ;;;;   The syntax is meant to be reminiscent of CLOS, e.g
    ;;;;
    ;;;;   (define-test-class my-test-class
    ;;;;     ((my-slot-1 :initarg :foo ...)
    ;;;;      (my-slot-2 (any valid CLOS slot options)
    ;;;;      ....))
    ;;;;   After this, the methods
    ;;;;   (defgeneric run-test :before
    ;;;;               ((test my-test-class) &key (output *debug-io*))   and
    ;;;;   (defgeneric run-test :after
    ;;;;               ((test my-test-class) &key (output *debug-io*))   may be
    ;;;;   specialized to perform the required actions, possibly accessing the
    ;;;;   my-slot-1's, etc.
    ;;;;
    ;;;;   The test form is protected by a handler case.  Care should be taken
    ;;;;   than any run-test specialization also be protected not to crash.
    
  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    I do not see any capability to shuffle test order or to run only the tests that have previously failed.

  8. Skip Capability
    1. Assertions
    2. Tests
    3. Implementation
  9. Random Data Generators

    None

34.4. Discussion

Compared to other frameworks it feels a little clumsy and basic. I would top

34.5. Who Depends on Unit-Test?

It is used by cl-fad and several of the bknr programs.

35. xlunit

35.1. Summary

homepage Kevin RosenBerg BSD 2015

Xlunit stops at the first failure in a test, so you only get partial failure reporting (joining lift in this regard). That, in and of itself would cause me to look elsewhere. Phil Gold's original concern was that while you can create hierarchies of test suites, they are not composable.

35.2. Assertion Functions

assert-condition assert-eql assert-equal
assert-false assert-not-eql assert-true

top

35.3. Usage

I find the terminology of xlunit to be confusing after getting used to other frameworks.

Xlunit requires that you create a class for a test-case or suite. Every "test" is then a named test-method on that class. def-test-method adds a test to the suite. The class can, of course, have slots for variables that any test in the suite can use.

Test-methods can have multiple assertions and can be applied to either a test-case or a test-suite. The macro get-suite applies to either test-case or a test-suite classes and creates an instance of that class.

I notice that cambl-test (one of the libraries that uses xlunit) wraps a define-test macro around def-test-method to make this feel more natural. That version is here:

(defclass amount-test-case (test-case)
  ()
  (:documentation "test-case for CAMBL amounts"))

(defmacro define-test (name &rest body-forms)
  `(def-test-method ,name ((test amount-test-case) :run nil)
     ,@body-forms))
  1. Report Format

    Xlunit reports a single dot for a test with at least a passing assertion, an F for failure and E for errors. In testing suites, xlunit will provide one dot per test, the time it took to run the suite and, if everything is successful, OK with a count of the tests and a count of the assertions.

  2. Basics

    We will create a test-case named tf-xlunit that we can use to attach tests, each of whichcan have multiple assertions. The form immediately after the test method name takes both the class to which it applies and whether to run the method immediately upon compilation. The libraries using xlunit seem to define all methods with :run nil.

    (defclass tf-xlunit (xlunit:test-case) ())
    
    (def-test-method t1 ((test tf-xlunit) :run nil)
      (assert-equal "a" "a")
      (assert-condition 'division-by-zero (error 'division-by-zero))
      (assert-false (= 1 2))
      (assert-eql 'a 'a)
      (assert-not-eql 'a 'b)
      (assert-true (= 1 1)))
    

    Unfortunately you need to run all the tests methods applicable to a test-case or suite at once. For clarity purposes, we create a separate test-case class for each method so that we do not get burdened with results from other methods. Effectively a test-case can be viewed the same as other frameworks think of suites.

    The reporting is a bit underwhelming. Even more so as we get to failures.

    (xlunit:textui-test-run (xlunit:get-suite tf-xlunit))
    .
    Time: 0.0
    
    OK (1 tests)
    #<TEST-RESULTS {10016CD973}>
    

    All the assertions passed in order for the entire test to pass.

    Now a test that should have six assertion failures. This time we are going to put a diagnostic message intothe first assertion. Xlunit does not provide the ability to insert variables into the diagnostic message or provide trailing variables.

    I am going to create a new suite so that we just see the results for this test and will continue to do that until we get to the suites discussion.

    (defclass tf-xlunit-t1-fail (xlunit:test-case) ())
    
    (def-test-method t1-fail ((test tf-xlunit-t1-fail) :run nil)
      (assert-equal "a" "b" "Deliberate failure on our part")
      (assert-condition 'division-by-zero (error 'floating-point-overflow))
      (assert-false (= 1 1))
      (assert-eql 'a 'b)
      (assert-not-eql 'a 'a)
      (assert-true (= 1 2)))
    

    And now, the failure report:

    (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail))
    .F
    Time: 0.0
    
    There was 1 failure:
    1) T1-FAIL: Assert equal: "a" "b"
     Deliberate failure on our part
    
    FAILURES!!!
    Run: 1   Failures: 1   Errors: 0
    #<TEST-RESULTS {1003FF1353}>
    

    Yes, the test failed. Unfortunately it only reported the first assertion failure, not all of them. No, I do not know why a dot appeared before the failure indicator. I was really hoping for it to tell me all the different assertion failures.

  3. Edge Cases: Value Expressions, closures and calling other tests
    1. Value expressions

      XLunit has no special functionality for dealing with values expressions. Like most of the frameworks, xlunit will check values expressions but only look at the first value.

    2. Closures.

      Xlunit has no problem dealing with variables from closures. We will skip the proof.

    3. Calling tests from inside tests

      As with several frameworks, xlunit allows a test to call another test, but there is no composition - you get two separate reports.

        (defclass tf-xlunit-t3 (xlunit:test-case) ())
      
        (def-test-method t3 ((test tf-xlunit-t3) :run nil);
          (assert-equal 1 1)
          (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail)))
      
      (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t3))
      ..F
      Time: 0.0
      
      There was 1 failure:
      1) T1-FAIL: Assert equal: "a" "b"
         Deliberate failure on our part
      
      FAILURES!!!
      Run: 1   Failures: 1   Errors: 0
      Time: 0.003333
      
      OK (1 tests)
      #<TEST-RESULTS {100B2823A3}>
      
  4. Suites, fixtures and other multiple test abilities
    1. Lists of tests

      I did not see a way to run only a subset of the methods applicable to a test-case.

    2. Suites and fixtures

      I am going to cheat here combine the discussion of fixtures and suites together and use an example from the source code. Here we create a test-case named math-test-case with two additional slots for numbera and numberb. Before any tests are run, there is a set-up method which initialises those slots. We then add three test methods having a single assertion earch (one slightly modified from the source code).

      (defclass math-test-case (test-case)
        ((numbera :accessor numbera)
         (numberb :accessor numberb))
        (:documentation "Test test-case for math testing"))
      
      (defmethod set-up ((tcase math-test-case))
        (setf (numbera tcase) 2)
        (setf (numberb tcase) 3))
      
      (def-test-method test-addition ((test math-test-case) :run nil)
        (let ((result1 (+ (numbera test) (numberb test)))
              (result2 (+ 1 (numbera test) (numberb test))))
          (assert-true (= result1 5))
          (assert-true (= result2 6))))
      
      (def-test-method test-subtraction ((test math-test-case) :run nil)
        (let ((result (- (numberb test) (numbera test))))
          (assert-equal result 1)))
      
         ;;; This method is meant to signal a failure
      (def-test-method test-subtraction-2 ((test math-test-case) :run nil)
        (let ((result (- (numbera test) (numberb test))))
          (assert-equal result 1 "This is meant to failure")))
      

      Now we run all the methods applicable to math-test-case classes.

        (xlunit:textui-test-run (xlunit:get-suite math-test-case))
      .F..
      Time: 0.0
      
      There was 1 failure:
      1) TEST-SUBTRACTION-2: Assert equal: -1 1
         This is meant to failure
      
      FAILURES!!!
      Run: 3   Failures: 1   Errors: 0
      #<TEST-RESULTS {10051EF373}>
      

      As we can see, while there are four assertions in total, the report shows 3, meaning the number of methods run. If we looked at the internal details of the test-results instance returned at the end, it would show a count of 3 as well.

      top

  5. Removing tests

    Xlunit has a remove-test function

  6. Sequencing, Random and Failure Only

    Everything is sequential. There are no provisions for collecting and re-running only failed tests.

  7. Skip Capability

    None

  8. Random Data Generators

    None

35.4. Discussion

Phil Gold's 2007 review essentially concluded that xlunit feels clunky and lacks composition. I see no reason to differ from his conclusion.

35.5. Who Depends on XLUnit?

cambl, cl-heap (no longer maintained) and cl-marshal

top

36. xptest

36.1. Summary

No homepage Craig Brozensky Public Domain 2015

XPtest is very old and it does the basics. It will report all the failed assertions and provides the ability to generate failure reports with diagnostic strings. It does not provide an interactive session (no debugger, just the report). It also does not seem to provide any signal testing, so you would have to write your own condition handlers. Overall it just feels clumsy. It was not tested in the original Phil Gold blogging note.

36.2. Assertion Functions

None - It just relies on CL predicates.

36.3. Usage

Xptest is very simple. You create a test-suite and a fixture. Tests are methods of the fixture and you then add them to the test-suite. You use regular CL predicates in your test and trigger a failure function if they are not true.

  1. Report Format and Basic Operation
    ;; A test fixture and a suite get defined up front
    (def-test-fixture tf-xptest-fixture () ())
    
    (defparameter *tf-xptest-suite* (make-test-suite "tf-xptest-suite" "test framework demonstration"))
    
    (defmethod t1 ((test tf-xptest-fixture))
      (let ((x 1) (y 'a))
      (unless (equal 1 x)
        (failure "t1.1 failed"))
      (unless (eq 'a y)
        (failure "t1.2 failed"))))
    
    (add-test (make-test-case "t1" 'tf-xptest-fixture :test-thunk 't1) *tf-xptest-suite*)
    
    (defmethod t1-fail ((test tf-xptest-fixture))
      (let ((x 1) (y 'a))
      (unless (equal 2 x)
        (failure "t1-fail.1 failed"))
      (unless (eq 'b y)
        (failure "t1-fail.2 failed"))))
    
    (add-test (make-test-case "t1-fail" 'tf-xptest-fixture :test-thunk 't1-fail) *tf-xptest-suite*)
    

    You can use run-test on the test-suite. That will return a list of test-result objects, but that is not terribly useful. Digging into those objects will give you start and stop times, the test-fixture, a test-failure condition (if it failed) or a error condition if something else bad happened. Slightly more useful is running report-result on the list returned from run-test, but it only reports which tests passed and which tests failed.

    (run-test *tf-xptest-suite*)
    (#<TEST-RESULT {1002F88803}> #<TEST-RESULT {1002F88C93}>)
    
    (report-result (run-test *tf-xptest-suite*))
    Test t1 Passed
    Test t1-fail Failed
    

    There is a keyword parameter option of :verbose, but if you try to use it, it generates a format control error in the xptest source code that I am not going to try to debug.

    Xptest properly picks up changes in tested functions without having to manually recompile tests.

  2. Multiple assertions, loops. closures and calling other tests
    1. Multiple assertions and value expressions

      Xptest relies on CL for predicates and assertions, so you have to build your own multiple assertion test and decide how you would handle value expressions.

    2. Closures.

      Xptest has no problem with the loop inside a closure test.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (defmethod t2-loop ((test tf-xptest-fixture))
          (loop for x in l1 for y in l2 do
            (unless (equal (char-code x) y)
              (failure "t2-loop")))))
      
      (add-test (make-test-case "t2-loop" 'tf-xptest-fixture :test-thunk 't2-loop) *tf-xptest-suite*)
      
      (report-result (run-test *tf-xptest-suite*))
      
  3. Conditions

    You would have to write your own condition handlers.

  4. Suites and fixtures

    I am going to cheat here again and show the example from the source code.

    (defparameter *math-test-suite* nil)
    
      (def-test-fixture math-fixture ()
        ((numbera
          :accessor numbera)
         (numberb
          :accessor numberb))
        (:documentation "Test fixture for math testing"))
    
      (defmethod setup ((fix math-fixture))
        (setf (numbera fix) 2)
        (setf (numberb fix) 3))
    
      (defmethod teardown ((fix math-fixture))
        t)
    
      (defmethod addition-test ((test math-fixture))
        (let ((result (+ (numbera test) (numberb test))))
          (unless (= result 5)
            (failure "Result was not 5 when adding ~A and ~A"
                     (numbera test) (numberb test)))))
    
      (defmethod subtraction-test ((test math-fixture))
        (let ((result (- (numberb test) (numbera test))))
          (unless (= result 1)
            (failure "Result was not 1 when subtracting ~A ~A"
                     (numberb test) (numbera test)))))
    
          ;;; This method is meant to signal a failure
      (defmethod subtraction-test2 ((test math-fixture))
        (let ((result (- (numbera test) (numberb test))))
          (unless (= result 1)
            (failure "Result was not 1 when subtracting ~A ~A"
                     (numbera test) (numberb test)))))
    
      (setf *math-test-suite* (make-test-suite
                             "Math Test Suite"
                             "Simple test suite for arithmetic operators."
                             ("Addition Test" 'math-fixture
                                              :test-thunk 'addition-test
                                              :description "A simple test of the + operator")
                             ("Subtraction Test" 'math-fixture
                                                 :test-thunk 'subtraction-test
                                                 :description "A simple test of the - operator")))
    
      (add-test (make-test-case "Substraction Test 2" 'math-fixture
                                :test-thunk 'subtraction-test2
                                :description "A broken substraction test, should fail.")
                *math-test-suite*)
    
      (report-result (run-test *math-test-suite*))
    

    top

  5. Removing tests

    Xptest has a remove-test function

  6. Sequencing, Random and Failure Only

    Sequential only

  7. Skip Capability

    None

  8. Random Data Generators

    None

    top

36.4. Discussion

I do not see anything here that would really make me consider it.

36.5. Who Depends on xptest?

Nothing in quicklisp. No idea about the wider world.

37. Helper Libraries

37.1. assert-p

  1. Summary
    homepage Noloop GPL3 2020

    This is a library to help build your own assertions and is built on assertion-error by the same author (see below). The only library currently using it is Cacau.

    I was really hoping for more here. Consider the following code from the library:

    (defun not-equalp-p (actual expected)
      "Check actual not equalp expected"
      (assertion (not (equalp actual expected)) actual expected 'not-equalp))
    

    Seven of the test frameworks described above provide assertions that accept diagnostic messages and pass variables to those diagnostic messages. Another eight provide assertions that accept diagnostic messages but without variables. Compared to those, this seems really elementary. I will leave it to writers of testing frameworks as to whether it is worthwhile, but from my perspective, it does not add anything useful to the forest of CL testing.

    top

37.2. assertion-error

  1. Summary
    homepage Noloop GPL3 2019

    This is a library to build your own assertion-error conditions. It does depend on dissect. The only library currently using it is cacau.

    The entire source code is:

    (define-condition assertion-error (error)
      ((assertion-error-message :initarg :assertion-error-message :reader assertion-error-message)
       (assertion-error-result :initarg :assertion-error-result :reader assertion-error-result)
       (assertion-error-actual :initarg :assertion-error-actual :reader assertion-error-actual)
       (assertion-error-expected :initarg :assertion-error-expected :reader assertion-error-expected)
       (assertion-error-stack :initarg :assertion-error-stack :reader assertion-error-stack)))
    
    (defun get-stack-trace ()
      (stack))
    

    I will leave it to writers of testing frameworks as to whether it is worthwhile.

    top

37.3. check-it

top

  1. Summary
    homepage Kyle Littler LLGPL 2015

    Check-it is the opposite of a mock and stub library which provides known values. Check-it provides randomized input values based on properties of the input. Some testing frameworks provide random value generators, but this is more complete, so use this with your favorite test framework. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.

  2. Usage
    1. General Usage

      The general usage is to call generate on a generator given a specific type with optional specifications. The following examples use optional lower and upper bounds.

      (check-it:generate
       (check-it:generator (integer -3 10)))
      6
      
      (check-it:generate
       (check-it:generator (character #\a #\k)))
      #\f
      
      (let ((gen-i (check-it:generator (list (integer -10 10)
                                             :min-length 3
                                             :max-length 10))))
        (check-it:generate gen-i))
      (5 0 8 -2 9)
      
      (check-it:generate (check-it:generator (string :min-length 3 :max-length 10)))
      "Uw76ZV"
      
    2. Values must meet a predicate

      You can ensure that values meet a specific predicate. The generator will keep trying until that predicate is met. In the following example we want a character between #\a and #\f but not #\c.

      (check-it:generate
       (check-it:generator
        (check-it:guard (lambda (x) (not (eql x #\c))) (character #\a #\f))))
      #\e
      
    3. Or Generator

      The OR generator takes subgenerators and randomly chooses one. For example:

      (let ((gen-num (check-it:generator (or (integer) (real)))))
        (loop for x from 1 to 5 collect
                                (check-it:generate gen-num)))
      (7 6.685932 6 -9 9)
      
    4. Struct Generator

      If you have a struct that has default constructor functions, you can use a struct generator to build out the slots.

      (check-it:generate
       (check-it:generator
        (check-it:struct b-struct :slot-1 (integer) :slot-2 (string) :slot-3 (real))))
      #S(B-STRUCT :SLOT-1 2 :SLOT-2 "iE4qZ5U00oOs" :SLOT-3 5.9885387)
      

      For more fun and games you can do with this library, see https://github.com/DalekBaldwin/check-it

      top

37.4. cl-fuzz

  1. Summary
    homepage Neil T. Dantam BSD 2 Clause 2018

    Cl-fuzz is another random data generating library. To use it you need to define a function to generate random data, then generate a function to perform some tests, then pass both to fuzz:run-tests (not perform-tests as the readme states). To be honest, I do not think there is much utility here compared to the frameworks we have looked at plus check-it and cl-quickcheck.

    top

37.5. cl-quickcheck

  1. Summary
    homepage Andrew Pennebaker MIT 2020

    Cl-quickcheck focuses on "property based tests". In other words, write tests use random inputs matching some specification, apply a operation to the data and assert something about that result. Cl-quickcheck is effectively an assertion library with the ability to generate different types of inputs. If you look at packages in quicklisp which use it, Burgled-Batteries uses it in conjunction with Lift; Test-utils uses it in conjunction with Prove and only json-streams uses it on its own. As such I decided to put it in the Helpers section rather than in the frameworks section.

    Cl-quickcheck has somewhat more functionality than check-it in that it does have assertions. I still think the generators are the real raison d'être for both these libraries. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.

  2. Assertions
    is is= isnt isnt= should-signal
  3. Report Format

    Cl-quickcheck follows the typical pattern of . for a passing test. Instead of an f, it prints X for failures.

    To jump immediately into the debugger rather than a report format, set *break-on-failure* to t.

    To eliminate progress reports, set *loud* to nil.

  4. Usage

    The number of iterations of a test using a generator is set by *num-trials* which starts with a default value of 100.

    To take a silly example, the following is a test that asserts that any integer multiplied by two will equal the integer plus itself and we will set *num-trials* 20. Thus n will be set to a random integer generated by an-integer and the assertion will be run 20 times with a new n generated each time.

    (setf *num-trials* 20)
    (for-all ((n an-integer))
             (is= (* 2 n) (+ n n)))
    ....................
    

    If we modify that to be always wrong, we only get a single resulting X

    (for-all ((n an-integer))
             (is= (* 2 n) (+ n n 1)))
    X
    

    The following are all passing assertions.

    (is= 1 1)
    (is = 1 1 1)
    (should-signal 'division-by-zero (error 'floating-point-overflow))
    (for-all ((n an-integer))
                        (is= (* 2 n) (+ n n)))
    
  5. Miscellaneous Comments

    The first thing I had to learn looking at cl-quickcheck was that a-boolean, a-real, an-index, an-integer, k-generator, m-generator and n-generator are functions in a variable (funcallable) but a-char, a-list, a-member, a-string, a-symbol and a-tuple were functions of their own. The difference in how they can called is confusing, at least for me.

    top

37.6. hamcrest

top

  1. Summary
    homepage Alexander Artemenko New BSD 2020

    Hamcrest's idea is to use pattern matching to make unit tests more readable.

  2. Usage

    top

37.7. mockingbird

top

  1. Summary
    homepage Christopher Eames MIT 2017

    Stubs and Mocks are used to ensure constant values are returned instead of computed values for use in testing.

  2. Usage

    Assume two functions for this usage demonstration:

    (defun foo (x) x)
    (defun bar (x y) (+ x (foo x)))
    

    The WITH-STUBS macro provides lexical scoping for calling functions with guaranteed results.

      (with-stubs ((foo 10))
            (foo 1))
      10
    
    (with-stubs ((foo 10))
          (bar 3))
    6
    

    As an example of how this would look used in a testing framework, the following uses parachute and mb is the nickname for mockingbird.

    (define-test mockingbird-1
                    (mb:with-stubs ((foo 10))
                        (is = (bar 3) 6)))
    MOCKINGBIRD-1
    (test 'mockingbird-1)
            ? TF-PARACHUTE::MOCKINGBIRD-1
      0.003 ✔   (is = (bar 3) 6)
      0.007 ✔ TF-PARACHUTE::MOCKINGBIRD-1
    
    ;; Summary:
    Passed:     1
    Failed:     0
    Skipped:    0
    #<PLAIN 2, PASSED results>
    

    The WITH-DYNAMIC-STUBS macro provides dynamic scoping for calling functions with guaranteed results.

    (with-dynamic-stubs ((foo 10))
          (bar 3))
    13
    

    The WITH-MOCKS macro provides lexical scoping for calling functions but ensuring they return nil.

    (with-mocks (foo)
      (foo 5))
    NIL
    (with-mocks (foo)
      (bar 5))
    10
    

    top

37.8. portch

  1. Summary
    homepage Nick Allen BSD 3 Clause 2009

    Portch helps organize tests written with Franz's portable ptester library. I will leave discussion of this library to users of ptester.

    top

37.9. protest

top

  1. Summary
    homepage Michał Herda LLGPL 2020

    Protest is a wrapper around other testing libraries, currently 1am and parachute. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step. Other useful reading would be The concept of a protocol by Robert Strandh.

  2. Usage
  3. Discussion

    top

37.10. rtch

  1. Summary
    download David Thompson LLGPL 2008

    Rtch helps organize RT tests based on their position in a directory hierarchy. I will leave it to users of rt as to whether it is helpful. Note that the link is to a sourceforge download tar file rather than a homepage.

    top

37.11. testbild

  1. Summary
    homepage Alexander Kahl GPLv3 2010

    Testbild is an older library focused on a set of CLOS classes which can be used as a common interface for the output of test results. I will leave it to the writers of test frameworks as to whether incorporating these classes is useful for them.

    top

37.12. test-utils

  1. Summary
    homepage Leo Zovic MIT 2020

    Test-utils provides convenience functions and macros for prove and cl-quickcheck. It adds new generators to cl-quickcheck such as a-ratio, a-number, a-keyword, an-atom, a-pair, a-vector and a-hash.

    It also has QUIET-CHECK which runs a cl-quickcheck suite but only sends to *standard-output* on failure.

    It provides additional generators for:

    • a-ratio
    • a-number
    • a-keyword
    • an-atom
    • a-pair
    • a-vector
    • a-hash
    • a-value
    • a-alist
    • a-plist
    • an-improper-list
    • an-array

    top

38. Test Coverage Tools

top

38.1. sb-cover

The following is a sample sequence running sb-cover on the package you want to test

(require :sb-cover)
;; now you need to tell SBCL to instrument what it is about to load
(declaim (optimize sb-cover:store-coverage-data))
(asdf:oos 'asdf:load-op :your-package-name-here :force t)

;; Now run your tests. (run-all-tests 'blah-blah-blah-package)

(sb-cover:report "path-to-directory-for-the-coverage-htmlpages" :form-mode :car)

;; now restore SBCL to its normal state
(declaim (optimize (sb-cover:store-coverage-data 0)))
;; to restore

The last line turns off the instrumentation after the report has been generated. The sb-cover:report line should have generated one or more html pages, starting with a page named cover-index.html in the specified directory directory which shows:

  • expression
  • branch

on a file by file basis in the your package. Now the html pages will also print out the source file, color coded showing not executed expressions and, where the expression might have conditionals or branches, whether each of those conditional points or branches were actually triggered in the test. E.g.

(defun foo (x)
  (if (evenp x) 1 2))

If the test only ran with (foo some-even-number) and not (foo some-odd-number), that fact would be highlighted.)

sb-cover can be enabled globally. (eval '(declaim (optimize sb-cover:store-coverage-data)))

Per pfdietz: "The problem I have with sb-cover is that is can screw up when the readtable is changed. It needs to somehow record readtable information to properly annotate source files."

top

38.2. CCL code coverage

I have not used this tool.

(setf ccl:*compile-code-coverage* t)

Comment: when ccl:*compile-code-coverage* was set to t, compiling ironclad triggered an error:

[package ironclad]……. > Error: The value (&LAP . 0) is not of the expected type VAR. > While executing: DECOMP-VAR, in process listener(1).

top

38.3. cover

I have not tried cover.

top

39. Appendix

39.1. Problem Space

Testing covers a lot of ground, there are unit tests, regression tests, test driven development, etc. Testing often runs on an automated basis, but CL being CL, it can be part of an interactive development process. Some people write their unit tests first, then develop to pass the tests (test driven development). Testing is also not error checking.

Ideally a testing framework should make it as easy as possible to write tests, cover different inputs and produce a report showing what passed and failed. If you are writing a library rather than an application, it can be useful to recognize that your test suites are a client to your library's API (and if you find it hard to write the tests, think about how a client user will feel).

Assuming the source is available, the tests should be part of your user documentation in showing how to use the library and an ideal testing framework should make it easy for users to see the tests as examples.

I have seen reasoned arguments that unit tests should only cover exported functions, generally on the grounds that this implicitly tests the internal functions and any additional testing is just adding technical debt. My response is typically, fine, so long as the tests on the exported function can show how it failed. If it depends on 100 internal functions, can you trace back to the real point of failure? If testing is a defense against change, then testing code that has no reason to change does not add value to your test suite - until, of course, you refactor and suddenly it does. By the way, for those who are concerned about static typing, unit tests do not replace static typing unless you actually test by really throwing different inputs at the function being tested.

39.2. Terminology

Different frameworks address different problem sets but before I discuss the problem space, I want to get some terminology out of the way first.

  1. Testing Types
    • Integration testing deals with how units of software interact.
    • Mutation Testing - Pfdietz brought the concept of mutation testing to my attention. This concept targets testing your test suite by inserting errors into programs and measuring the ability of the test suite to detect them.
    • Property based testing (PBT) makes statements about the output of your code based on the input. The statements are tested by feeding random data to tests that are focused on the stated properties. E.g. in testing an addition function, adding zero to thenumber should result in the same number and changing the order of the inputs should result in the same number. Similarly, a function that reverses a list should always result in (a) a list and (b) the first element of the result was the last element of the input, etc. This obviously requires more thinking about each test. In some respects, what this gets you thinking about is what constitutes valid input and edge cases, and then you need to write generators to randomly generate input that meets (or fails to meet) those criteria. PBT is not a replacement for what I will call result testing, it is an additional testing strategy. cl-quickcheck and nst provide property based testing.
    • Regression tests verify that software that has already been written is still correct after it is changed or combined with other software. In other words, it worked yesterday, does it still work after a bug fix, after refactoring or after another system has been connected. Interactive development from the repl does not address this problem.
    • Unit testing deals with a separate software system or subsystem. (I am not interested in arguing how small the unit needs to be. I leave that up to the TDD missionaries and the TDD haters.) Unit testing can be a part of regression testing - regression tests are often built on suites of unit tests. You might have multiple tests for each function and a suite of tests for every function in a file. As I use the term "unit test", I am talking about how much code is covered, not whether the unit tests are "property based tests" or result testing.
  2. Other Terms
    • Assertions - the types of equality tests available. "Assert-eq", "Assert-equal" and "Assert-true" are typical. Some packages provide assertions that have descriptive messages to help debug failures. Some packages (e.g. Confidence) provide built-in assertions test float comparisions. Some packages allow you to define your own assertions.
    • Code coverage apparently means different things to different people. I have seen test suites that cover every function, but only with a single simple expected input and 100% code coverage victory has been declared. That is barely a hand wave. As one person has said, that checks that your code is right, but does not check that your code is not wrong. Of course, there are trivial bits of code where it is pointless to try to think about possible different inputs to test.
    • Fixtures (sometimes referred to as contexts)- Fixtures create a temporary environment with a known data set used for the tests. They may be static variables, constructed database tables, etc. Typically there is a setup and teardown process to ensure that the testing environment is in a known state.
    • Mocks - Mocking is a variation on Fixtures. While fixtures are intended to create a known data collection to test against, mocking is intended to eliminate external dependencies in code and create known faux code which can be used as an input (sometimes called stubs) or compared after a test is run to see if there are expected or unexpected side effects.
    • Parametrization means running the same test body with different input each time. You can do this either by running a test against a collection of test data or you can do it within a single test by running the test body against a list of forms or test data. How you do it will depend on which way will make it easier to determine what test and what parameters triggered the failure.
    • Refactoring typically requires rewriting unit tests for everything that was touched, then re-running test suites to ensure that everything still works together.
    • Reporting - a failing test should generate a usable bug report. Do we know the input, output, function involved, expected result and what we thought we were testing for? Note that what we thought we are testing for is not the same as the expected result.
    • Shuffle Testing - Randomly changing the sequence in which tests are applied.
    • TAP - TAP (the Test Anything Protocol) is a text based interface between testing modules, decoupling reporting of errors from the presentation of reports. In other words, you can write a TAP consumer which takes TAP spec output from the test harness and the TAP consumer would be responsible for generating user friendly reports (or do other things like compare the tests against your own list of functions to generate your own code coverage report). Development on the SPEC seems to have ceased in 2017. There was a hackernews discussion in JUne 2020 which can be found here.
    • Verification means that your code contains every bug in the specification.
    • Validation means that it is doing the right thing. This may or may not be possible to automate. I do not envy front end developers or designers dealing with clients.
    • TAP Output - Test Anything Protocol is a formally specified output, considered by some to be a superior alternative to xUnit type testing. Depending on the output mechanisms, TAP can be easy to read but difficult to parse.

    top

  3. Discussion

    In general, each test has three parts - the setup, the action and the validation. Does the testing framework make it easy to see each of those segments when reading the test or reading any report coming out of the test?

    Similarly, when the tests are run and a test fails, is it obvious from the report what happened or do you need to start a debugging session with limited information? It is one thing for a test to report failure, another thing to report what was expected compared to what was generated, and still a much better result to indicate that the correct value was in (aref array-name 2) instead of the expected (aref array-name 0) - the context of the failure.

    Does the test framework allow long enough names for tests and hierarchies (or accept comments) to give meaningful reports?

    How easy is it to run parameterized tests - the test logic is the same, but you run different parameters through the same tests and expect different results?

    top

Footnotes:

1

Not available from QuickLisp

2

Looking for new maintainer. Has been forked to clunit2 and you should only consider clunit2.

3

Fork of stefil

4

Port to Clasp, otherwise 2015. Author has stated that it is no longer maintained and he is no longer involved in CL.

5

The authors have specified it as obsolete, so it will not be further considered.

6

Tap-Unit-Test is a version of lisp-unit with TAP formatted reporting.