Comparison of Common Lisp Testing Frameworks (13 June 2021 Edition)

1 Changelog

13 June 2021 - Updated clunit2 on its huge performance improvement update and new edge case abilities to compare multiple values in values expressions and handle variables declared in closures. Clunit2 is now substantially differentiated from clunit.

2 Introduction

What testing framework should I use? The response should not be X or Y because it is "battle tested" or "extensible" or "has color coding". That is just content free advertising buzz words. Some of the webpages out there mentioning different common lisp unit tests frameworks merely parrot comments from library authors without validating whether they are true. Others like the lispcookbook do silly things like advising to start with FiveAm, but then only gives examples for Prove. The real response should be asking back - "how do you code"? But even armed with that information, how do you match a testing framework to their needs - Common Lisp has a ridiculous number of testing framework libraries.

Previous reviews of testing frameworks were done in 2007, 2010 (NST review) and some work in 2012 by the author of clunit which links seem to have bitrot and are now only accessible via the wayback machine: part 1 and part 2. I thought it was time for an update. I am open to pull requests on this document for corrections, additions, whatever. See https://github.com/sabracrolleton/sabracrolleton.github.io

The best testing framework is context dependent and that context includes how you work. As an example, there was an exchange on reddit.com between dzecniv and shinmera. Dzecniv likes Prove more than Parachute because s/he could run tests just by hitting (C-c) the source. Shinmera pointed out a way you could easily add that to Parachute, but he views compilation and execution as two different things. I'm in Shinmera's camp. I do not want to run the test until I want to run it. At the same time, I want to be clear on terminology. If I understand the context, Dzecniv was talking about was what I would call an assertion, so in the interests of clear terminology, consider the following pseudo code:

(defsuite s0  ; (1)
  (deftest t1 ; (2)
    (assert-true (= 1 1)))) ; (3)

(1) I will call this a suite - It contains one or more tests (or other suites) and hopefully composes the results (2) I will call this a test - it contains one or more assertions and hopefull composes the results (3) I will call this an assertion and the "assert-true" or equivalent as a testing function

I still do not want to run assertions at compilation either, but I do not compile assertions outside of test anyway. So, just for Dzecniv and those like him, there is a functionality table on running assertions on compilation.

Some people are just looking for regression testing, so focusing on running all the tests at once. For example, the README for Prove does not even mention deftest or run-test. Others are looking for TDD or other continuous testing with their development. I use individual tests a lot during development, but obviously need regression testing as well.

Some testing frameworks insist on tracking progress printing little dots or checkmarks with every assertion or test passed or failed. Consider testing libraries like uax-9 which has 1,815,582 assertions (yes the tests are autogenerated). I do not want to waste the screen space for all those dots, the time wasted printing those dots to screen or, if there is a test failure, trying to find the failed test in the haystack of successful tests. So I prefer a testing framework that allowed me to only collect failures and turn off the progress report. Of course some frameworks do not do progress reports (or do them at the test level not the assertion level) but you might really want or need those progress reports.

Some frameworks cannot find lexical variables declared in a closure containing the test. Most people will not care, but if you use closures you might.

Other contexts are best served by other libraries. Complex fixture requirements might drive you towards something like NST, the need to define your own speciality assertions might be served by should-test or kaputt. Numerical comparisons might suggest using lisp-unit, lisp-unit2 or kaputt (or just write your own). For macro expansion testing clunit, clunit2, lisp-unit, lisp-unit2 and rove have specific assertion functions. Your situation should govern which testing framework you use, not your project skeleton or market share.

I occasionally hear extensibility as a buzzword applied to a framework. I'm not sure that means to people. Consider the following example in sxql which is not a testing framework but which included this macro in its own test file to simplify use of prove.

(defmacro is-mv (test result &optional desc)
  `(is (multiple-value-list (yield ,test))
       ,result
       ,desc))

(is-mv (select ((:+ 1 1)))
       '("SELECT (? + ?)" (1 1))
       "field")

How hard is that to do in any framework? This is CL.

In any event, yes, I will state my opinions, but what you think is important will drive your preferences in testing frameworks.

3 Testing Libraries Considered

3.1 Testing Frameworks

Table 1: Libraries Considered
Library Homepage Author License Last Update
1am homepage James Lawrence MIT 2014
2am (not in quicklisp) homepage Daniel Kochmański MIT 2016
cacau homepage Noloop GPL3 2020
cardiogram (e) homepage Abraham Aguilar MIT 2020
clunit (a) homepage Tapiwa Gutu BSD 2017
clunit2 homepage Cage (fork of clunit) BSD 2021
com.gigamonkeys.test-framework homepage Peter Seibel BSD 2010
fiasco (b) homepage João Távora BSD 2 Clause 2020
fiveam homepage Edward Marco Baringer BSD 2020
kaputt homepage Michaël Le Barbier MIT 2020
lift homepage Gary Warren King MIT 2019 (c)
lisp-unit homepage Thomas M. Hermann MIT 2017
lisp-unit2 homepage Russ Tyndall MIT 2018
nst homepage John Maraist LLGPL3 latest 2021
parachute homepage Nicolas Hafner zlib 2021
prove homepage Eitaro Fukamachi MIT 2020
ptester homepage Kevin Layer LLGPL 2016
rove homepage Eitaro Fukamachi BSD 3 Clause 2020
rt none Kevin M. Rosenberg MIT 2010
should-test homepage Vsevolod Dyomkin MIT 2019
simplet homepage Noloop GPLv3 2019
stefil (f) homepage Attila Lendvai, Tamas Borbely, Levente Meszaros BSD/Public Domain 2018
tap-unit-test (d) homepage Christopher K. Riesbeck, John Hanley MIT 2017
unit-test homepage Manuel Odendahl, Alain Picard MIT 2012
xlunit homepage Kevin RosenBerg BSD 2015
xptest none Craig Brozensky Public Domain 2015
  • (a) Looking for new maintainer. Has been forked to clunit2 and you should only consider clunit2.
  • (b) Fork of stefil
  • (c) Port to Clasp, otherwise 2015
  • (d) Tap-Unit-Test is a version of lisp-unit with TAP formatted reporting.
  • (e) Cannot get it to work.
  • (f) The authors have specified it as obsolete, so it will not be further considered.

3.2 Speciality Libaries

Table 2: Speciality Libaries
Library Homepage Author License Last Update
checkl homepage Ryan Pavlik LLGPL, BSD 2018
Table 3: Selenium Interface Libaries
Library Homepage Author License Last Update Selenium
cl-selenium-webdriver homepage TatriX MIT 2018 2.0
selenium homepage Matthew Kennedy LLGPL 2016 1.0?

The selenium interfaces are here for reference purposes and are not further discussed.

3.3 Helper Libraries

Table 4: Libraries Considered
Library Homepage Author License Last Update
assert-p homepage Noloop GPL3 2020
assertion-error homepage Noloop GPL3 2019
check-it homepage Kyle Littler LLGPL 2015
cl-fuzz homepage Neil T. Dantam BSD 2 Clause 2018
cl-quickcheck homepage Andrew Pennebaker MIT 2020
cover homepage Richard Waters MIT  
hamcrest homepage Alexander Artemenko BSD 3 Clause 2020
mockingbird homepage Christopher Eames MIT 2017
portch (not in quicklisp) homepage Nick Allen BSD 3 Clause 2009
protest homepage Michał Herda LLGPL 2020
rtch (not in quicklisp) download David Thompson LLGPL 2008
testbild homepage Alexander Kahl GPLv3 2010
test-utils homepage Leo Zovic MIT 2020

Assert-p, assertion-error, check-it, cl-fuzz, cl-quickcheck, cover, hamcrest, protest, testbild and test-utils are not, per se, testing frameworks. They are designed to be used in connection with other testing frameworks.

  • Check-it and cl-quickcheck are randomized property-based testing libraries (Quickcheck style). See https://en.wikipedia.org/wiki/QuickCheck
  • Cl-fuzz is another variant of testing with random data.
  • Assert-p and Assertion-error are collections of assertions or assertion error macros that can be used in testing frameworks or by a test runner.
  • Cover is a test coverage library, much like sbcl's sb-cover, ccl's code-cover, or LispWorks Code Coverage
  • Hamcrest uses pattern matching for building tests.
  • Mockingbird provides stubbing and mocking macros for unit testing. These are used when specified functions in a test should not be computed but should instead return a provided constant value.
  • Portch helps organize tests written with Franz's portable ptester library
  • Protest is a wrapper around other testing libraries, currently 1am and parachute. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step.
  • Rtch helps organize RT tests based on their position in a directory hierarchy
  • Testbild provides a common interface for unit testing output, supporting TAP (versions 12 and 13) and xunit styles.
  • Test-utils provides convenience functions and macros for prove and cl-quickcheck.

top

3.4 Dependencies

Libraries not in the table below do not show any dependencies in their asd files.

Table 5: Library Dependencies
Library Dependencies
cacau eventbus, assertion-error
checkl marshal
fiasco alexandria, trivial-gray-streams
fiveam alexandria, net.didierverna.asdf-flv, trivial-backtrace
lisp-unit2 alexandria, cl-interpol, iterate, symbol-munger
nst (#+(or allegro sbcl clozure openmcl clisp) closer-mop, org-sampler)
parachute documentation-utils, form-fiddle
prove cl-ppcre, cl-ansi-text, cl-colors, alexandria, uiop
rove trivial-gray-streams, uiop
should-test rutils, local-time, osicat, cl-ppcre

4 Quick Summary

4.1 Opinionated Awards

For those who want the opinionated quick summary. So, the awards are:

  • Best General Purpose: Parachute (It hits almost everything on my wish list - optionality on progress reports and debugging, good suite setup and reporting, good default error reporting and the ability to provide diagnostic strings with variables, the ability to skip failing test dependencies and set time limits on tests (and reports the time for each test and decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on. The bigger limitation is that while fixtures are easy to setup, fixtures at a parent test level (suite level) do not apply to nested child tests. While it is not the fastest, it is in the pack as opposed to the also-rans.) My second pick would be Fiasco, but I like Parachute's fixture capability and suite setup better. My third choice would be Lisp-Unit2. (Update 13 June 2021 - based on the latest update of Clunit2, it needs to be included for consideration as well)
  • If Only Award: Lift If only it reported all failing assertions and did not stop at the first one. Why? Why can't I change this?
  • If you only care about speed: Lift and 2am Go to Benchmarking
  • Best General Purpose Fixtures (Suite/Tag and test level): Lisp-Unit2 and Lift
  • Ability to reuse tests in multiple suites: Lisp-Unit2 (because of composable tags)
  • If you need tests to take parameters: Fiasco
  • If you need progress reporting optionality: Parachute or Fiasco or Clunit2
  • Favorite Hierarchy Setup (nestable suites): Parachute (Everything is a test and its :parents all the way up, can easily specify parents at the child level). Also 2am and Lift
  • Assertions that take diagnostic comments with variables: Parachute, Fiasco, 2am, Fiveam, Lift, Clunit2 This is something that I like for debugging purposes along with whatever reporting comes built in with the framework. See error-reporting
  • Values expression testing: Lisp-Unit2, Lisp-Unit, Parachute, (Update Clunit2 as well)
  • I want to track if my functions changed results: Checkl
  • Tests that specify suite or tags (does not rely on location in file): Parachute, Lisp-Unit (tags), Lisp-Unit2(tags), Lift, Clunit2
  • Heavy duty complex fixtures: NST (but there are trade-offs in the shape of the learning curve and performance)
  • Ability to define new assertions: NST, Kaputt (but they have their issues in other areas)
  • Ability to rerun failures only: Fiasco, Lisp-Unit2 (you can extend Parachute and Fiveam to get this, but it is not there now)
  • Favorite Random Data Generator: Check-it
  • Can redirect output to a different stream (a): Clunit2, Fiasco, Kaputt, Lift, Lisp-Unit, Lisp-Unit2 and RT
  • Randomized Property Tests: Check-it with any framework
  • Choice of Interactive Debugging or Reporting: Most frameworks at this point
  • Rosetta Stone Award for reading different test formats: Parachute (can read Fiveam, Prove and Lisp-Unit tests)
  • Code Coverage Reports: Use your compiler
  • I use it because it was included in my project skeleton generator: Prove

(a) Most frameworks just write to *standard-output* so you have to redirect that to a file.

top

4.2 Features Considered

  • Ease of use and documentation: Most of the frameworks are straightforward. Some have no documentation, others have partial documentation (often documenting only one use case). The documentation may be out of sync with the code. Some get so excited about writing up the implementation details that it becomes difficult to see the forest for the trees. NST has a high learning curve. Prove and Rove will require digging into the source code if you want to do more than simple regression testing. Lift has a lot of undocumented functionality that might be just what you need but you have no way of knowing.
  • Tests
    • Tests should take multiple assertions and report ALL the assertion failures in the test (Looking at you Lift, Kaputt and Xlunit - I put multiple assertions into a test for a reason, please do not lose some of the evidence.)
    • Are tests functions or otherwise funcallable? (Faré and others requested this in an exchange with Tapiwa, the author of Clunit, back in 2013. At the same time others want or do not want test names in the function namespace. You choose your preference. Those who want funcallable tests typically cite either the ability to program running the test or the ability to go to definion from test name.)
    • Immediate access to source code (Integration with debugger or funcallable tests?)
    • Does a failure or error throw you immediately into the debugger, never into the debugger, and is that optional?
    • Easy to test structures/classes (does the framework provide assistance in determining that all parts of a structure or class meet a test)
    • Tests can call other tests (This is not the same as funcallable tests. To be useful this does require a minimum level of test labeling in the reporting.)
  • Assertions (aka Assertion Functions)
    • There are frameworks with only a few assertion test functions. There are frameworks with so many assertions that you wonder if you have to learn them all. The advantage of specialized assertions is less typing, possibly faster (or slower) performance and possibly relevant built-in error messages. You will have to check for yourself whether performance is positively or negatively impacted. You have to decide for yourself how much weight to put on extra assertions like having assert-symbolp instead of (is (symbolp x)).
    • Assertions that either automatically explain why the the test failed or allow a diagnostic string that describes the assertion and what failed. (Have you ever seen a test fail but the report of what it should have been and what the result was look exactly the same? Maybe the test required EQL and you thought it was EQUALP? These might or might not help)
    • Can assertions can access variables in a closure containing the test? (Most frameworks can, but Clunit, Clunit2, Lisp-Unit, Lisp-Unit2 and NST cannot).
    • Do the assertions have macroexpand assertion functions? (Clunit, Clunit2, Lisp-Unit, Lisp-Unit2, Prove, Rove and Tap-Unit-Test have this)
    • Do the assertions have floating point and rational comparisons or do you have to write your own? (Lift, Lisp-Unit, Lisp-Unit2, Kaputt have these functions for you.)
    • Signal and condition testing or at least be able to validate that the right condition was signalled. (Kaputt, did you forget something?)
    • Definable assertions/criteria (can you easily define additional assertions?)
    • Do assertions or tests run on compilation (C-c C-c in the source file)?
    • Do the assertions handle values expressions? Most frameworks accept a values expression but compare just the first value. Fiveam complains about getting a values expression and throws an error. Parachute and NST will compare a single values expression against multiple individual values. Prove will compare a values expression against a list. Lisp-Unit and Lisp-Unit2 (Update Clunit2) will actually compare two values expressions value by value.
  • Easy to set up understandable suites and hierarchies or tags. Many frameworks automatically add tests to the last test suite that was defined. That it makes things easy if you work very linearly or just in files for regression testing. If you are working in the REPL and switching between multiple test sub-suites that can create unexpected behavior. I like to able to specify the suite (or tags) when defining the test, but that creates more unecessary typing if you work differently.
  • Choice of Interactive (drop directly into the debugger) or Reporting (run one or more tests and show which ones fail and which ones pass).
  • Data generators are nice to have, but the helper libraries Check-it and Cl-Quickcheck can also be used and probably have more extensive facilities.
  • Easy to setup and clean up Fixtures
    • Composable fixtures (fixtures for multiple test suites can be composed into a single fixture)
    • Freezing existing data while a test temporarily changes it
  • Compilation: Some people want the ability to compile before running tests for two reasons. First, deferred compilation can seriously slow down extensive tests. Second, getting compile errors and warnings at the test run stage can be hard to track down in the middle of a lot of test output. Other people want deferred compilation (running the test compiles it, so no pre-compilation step required) and tested functions which have changed will get picked up when running the test.
  • Reports
    • Easy to read reports with descriptive comments (this requires that each test have description or documentation support)
    • Does the framework have progress reporting, at what level and can it be turned off?
    • Report just failing tests with descriptive info
    • Composable Reports (in the sense of a single report aggregating multiple tests or test suites)
    • Reports to File. I know most developers do not care, but I have seen situations where the ability to prove that the software at date A is documented to have passed xyz tests would have been nice). See Dribble and Output Streams
    • Test Timing. See Timing
    • TAP Output (some people like to pass this test results in this format on to other tools).
    • Reports of Function (and parameter) test coverage (Rove was the only framework that has something in this area and it depends on using sbcl. I would suggest looking to your compiler and did not test this.)
  • Error tracking (Do test runs create a test history so that you can run only against failing tests?) As far as I can tell, no framework creates a database to allow historical analysis.
  • Test Sequencing Shuffling
    • Can choose test sequencing or shuffle
    • Can choose consistent or random or fuzzing data
    • Can choose just the tests that failed last time (Chris Riesbeck exchange with Tapiwa in 2013)
  • Ability to skip tests Skipping
    • Skip tests
    • Skip assertions
    • Skip based on implementations
    • also skip tests that exceed a certain time period
  • Benchmarks There were a few surprises here. I tested each framework on uax-15 which has 16 tests and 338760 assertions (all passing) and ran trivial-benchmark with 10 iterations on both the latest sbcl and ccl. Obviously the smaller the code base, the less important speed matters. If speed is important to you, stay away from clunit and nst. (Note: 13 June 2021 Update - removing clunit2 from this caveat)
  • Asynchronous and parallel testing (not tested in this report)
  • Case safety (Max Mikhanosha asked for this an an exchange with Tapiwa in 2013. Not tested in this report)
  • Memory, time and resource usage reports (no one documented this and I did not dive into the source code looking for it.)

I am not covering support for asdf package-inferred systems, roswell script support and integration with travis ci, github actions, Coveralls, etc. If someone wants to do that and submit a pull request, I am open to that.

I am not including a pie chart describing which library has market share because (a) I do not like pie charts and (b) I do not believe market share is a measure of quality. That being said, because someone asked nicely, I pulled the following info out of quicklisp just based on who-depends-on. The actual count in the wild is completely unknown.

Table 6: User Count on Quicklisp
Name Count
1am 22
2am 0
fiveam 323
clunit 11
clunit2 4
fiasco 24
kaputt 2
lift 54
lisp-unit 42
lisp-unit2 21
nst 10
parachute 49
prove 163
ptester 5
rove 31
rt 29
should-test 3
xlunit 4
xptest 0

5 Functionality Comparison

5.1 Hierarchy Overview

Table 7: Overview-1
Name Hierarchies/suites/tags/lists Composable Reports
1am N (2)(5) N N
2am Y Y (5) (4)
cacau (6)   (4)
clunit Y Y (4)
clunit2 Y Y (4)
fiasco Y Y  
fiveam Y Y  
gigamonkeys N    
kaputt N (9)    
lift Y Y  
lisp-unit (tags) (3)   (1,4)
lisp-unit2 (tags) (3)(5) Y (5) (1,4)
nst Y Y  
parachute Y Y (1)
prove Y Y (4)
ptester N    
rove (7) (7)  
rt package (8)  
should-test package    
simplet N    
tap-unit-test N   (4)
unit-test Y Y  
xlunit Y Y  
xptest Y N  
  1. report objects are provided which are expected to be extended by the user
  2. uses a flat list of tests. You can pass any list of test-names to run. See, e.g. macro provided by Phoe in the 1am discussion.
  3. lisp-unit and lisp-unit2 organize by packages and by tags. You can run all the tests in a package, or all the tests for a list of tags, but they do not have the strict sense of hierarchy that other libraries have.
  4. TAP Formatted Reports are available
  5. Because tests are functions, tests can call other functions so you can create ad-hoc suites or hierarchies but they are not likely to be composable.
  6. Has suites but no real capacity to run them independently - all or nothing
  7. Rove's run-suite function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing. Rove's run function does accept a style parameter but seems to handle only package-inferred systems. I confirm Rove's issue #42 that it will not run with non-package inferred systems.
  8. RT does not have suites per se. You can run all the tests that have been defined using the DO-TESTS function. By default it prints to *standard-output* but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice. do-tests will print the results for each individual test and then summarize with something like the following:

5.2 Run on Compile and Funcallable Tests

Table 8: Run on Compile and Funcallable Tests
Library Run on compile Are Tests Funcallable?
1am A Y
2am (not in quicklisp) A Y
cacau N N
clunit A N
clunit2 A N
fiasco A Y
fiveam Optional N
gigamonkeys N N
kaputt N Y
lift A, T(1) N
lisp-unit N N
lisp-unit2 N Y
nst N N
parachute N N
prove A N
ptester N N
rove A N
rt N N
should-test N N
tap-unit-test N N
unit-test N N
xlunit T(2) N
xptest N N
  • A means assertions run on compile, T means tests run on compile
  • (1) if compiled at REPL
  • (2) Optional by test, specified at definition: (def-test-method t1 ((test tf-xlunit) :run nil) body)
  • (3) *run-test-when-defined* controls this option

5.3 Fixtures

Table 9: Fixtures
Library Fixtures Suite Fixtures Test Fixtures Multiple Fixtures
1am N      
2am (not in quicklisp) N      
cacau Y Y Y  
clunit Y Y Y Y
clunit2 Y Y Y Y
fiasco N      
fiveam (a) K Y Y  
gigamonkeys N      
kaputt N      
lift Y Y   inherited from higher level suites
lisp-unit N      
lisp-unit2 Y   Y  
nst Y Y Y Y
parachute Y   Y  
prove N      
ptester N      
rove Y Y Y Y
rt N      
should-test N      
tap-unit-test N      
unit-test (b) Y (b) (b) (b)
xlunit Y Y Y Y
xptest Y   Y  

(a) Not really recommended, but does exist. (b) Users are expected to create a subclass of the unit-test class using the define-test-class macro.

top

5.4 Debugging Optionality and User Provided Diagnostic Messages

Does a failure (not error) trigger the debugger, is it optional, and do assertions allow user-provided diagnostic messages. If yes, can you further provide variables for a failure message?

Table 10: Overview Reporting v. Debugger Optionality / Diagnostic Messages
Library Failure triggers debugger Diagnostic Messags in Assertions
1am (always) N
2am (optional) with vars
cacau (optional) N
clunit (optional) with vars
clunit2 (optional) with vars
gigamonkeys (optional) N
fiasco (optional) with vars
fiveam (optional) with vars
kaputt (always) N
lift (optional) with vars
lisp-unit (optional) Y
lisp-unit2 (optional) Y
nst (optional) N
parachute (optional) with vars
prove (optional) Y
ptester (optional) N
rove (optional) Y
rt (never) N
should-test (never) Y
simplet (never) N
tap-unit-test (optional) Y
unit-test (never) Y
xlunit (never) Y
xptest (never) N

Also see error-reporting

5.5 Output of Run Functions (other than what is printed to the stream)

Table 11: Output of Run Functions (other than what is printed to the stream)
Library Function Returns
1am run nil
2am (not in quicklisp) run nil
cacau run nil
clunit run-test, run-suite nil
clunit2 run-test, run-suite nil
fiasco run-tests test-run object
fiveam run list of test-passed, test-skipped, test-failure objects
  run! nil
gigamonkeys test nil
kaputt name-of-test nil
lift run-test, run-tests results object
lisp-unit run-tests test-results-db object
lisp-unit2 run-tests test-results-db object
nst :run nil
parachute test a result object
prove run Returns 3 multiple-values, a flag if the tests passed as T or NIL, passed test files as a list and failed test files also as a list.
  run-test-system passed-files, failed-files
  run-test nil
ptester with-tests nil
rove run-test, run-suite t or nil
rt do-test nil
should-test test hash-table (1)
tap-unit-test run-tests nil
unit-test run-test test-equal-result object
xlunit textui-test-run test-results-object
xptest run-test list of test-result objects

(1) Should-test: at the lowest level should returns T or NIL and signals information about the failed assertion. This information is aggregated by deftest which will return aggregate information about all the failed assertions in the hash-table at the highest level test will once again aggregate information over all tests.

5.6 Progress Reports

Does the framework provide a progress report, is it optional, and does it run just at the test level or also at the asserts level?

Table 12: Overview - Progress Reports
Library Progress Reports
1am Every assert
2am Every assert
cacau optional
clunit optional
clunit2 optional
gigamonkeys never
fiasco optional
fiveam optional (1)
kaputt Every assert
lift never
lisp-unit never
lisp-unit2 never
nst Every test
parachute optional
prove Every assert
ptester Every assert
rove Optional
rt Every test
should-test Every assert
simplet Every test
tap-unit-test never
unit-test Every test
xlunit never
xptest never

(1) The following will allow fiveam to run without output

(let ((fiveam:*test-dribble*
        (make-broadcast-stream)))
  (fiveam:run! …))

top

5.7 Skipping, Shuffling and Re-running

Table 13: Overview-2 Skipping, Shuffling and Rerunning Abilities
Name Skip failing dependencies Shuffle Re-run only failed tests
1am   Y (auto)  
2am   Y (auto)  
cacau S, T    
clunit D Y (auto)  
clunit2 D Y (auto) Y
fiasco P(1), A   Y
fiveam P(2)   (3)
gigamonkeys      
kaputt      
lift T    
lisp-unit      
lisp-unit2     Y
nst      
parachute D, C, P Y  
prove (4)    
ptester      
rove A    
rt      
should-test   N Y
simplet P    
tap-unit-test      
unit-test      
xlunit      
xptest      

D - failing dependencies, C - children, P - pending, S - suites, T - tests, A - assertions

  1. skip based on conditions when and skip-unless
  2. skip when specified
  3. run! returns a list of failed-test-results that you could save and use for this purpose
  4. Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped.

5.8 Timing Reporting and Time Limits

Table 14: Timing Reporting and Time Limits
Library Time Reporting Time Limits
1am N N
2am (not in quicklisp) N N
cacau N Y(T or S)
clunit N N
clunit2 N N
fiasco N N
fiveam (a) ? N
gigamonkeys N N
kaputt N N
lift Y Y
lisp-unit Y N
lisp-unit2 Y N
nst Y Y
parachute Y Y
prove N Y
ptester N N
rove N N
rt N N
should-test N N
tap-unit-test Y N
unit-test N N
xlunit N N
xptest N N

(a) Fiveam has some undocumented profiling capabilities that I did not look at

5.9 Dribble and Output Streams

Table 15: Dribble and Output Streams
Library Dribble output streams
1am N S
2am (not in quicklisp) N S
cacau N S
clunit N S
clunit2 N *test-output-stream*
fiasco N optional parameter
fiveam Y *test-dribble* S
gigamonkeys N S
kaputt N optional parameter
lift Y *lift-dribble-pathname* optional parameter
lisp-unit N optional parameter
lisp-unit2 N *test-stream*
nst N optional parameter
parachute N (setf output)
prove N *test-result-output*
ptester N S
rove N *report-stream*
rt N optional parameter
should-test N *test-output*
tap-unit-test N S
unit-test N S
xlunit N S
xptest N S

Where S is *standard-output*

5.10 Edge Cases: Float Testing, Value Expressions and Closure Variables

top This table is looking at whether the framework provides float equality tests, looks at all the values coming from a values expression, and can access variables declared in a closure surrounding the test.

Table 16: Edge Cases
Name float tests Handles value expressions Variables in Closures
1am   First value only Y
2am   First value only Y
cacau   First value only Y
clunit   First value only N
clunit2 (a)   Y N
fiasco   First value only Y
fiveam   N N
gigamonkeys   First value only Y
kaputt Y First value only Y
lift   First value only N
lisp-unit Y Y N
lisp-unit2 Y Y N
nst   Y N
parachute   Y Y
prove   Y Y
ptester   First value only Y
rove   First value only Y
rt   N N
should-test   First value only N
tap-unit-test   Y N
unit-test   First value only Y
xlunit   First value only Y
xptest   relies on CL predicates Y

(a) Updated 13 June 2021

top

5.11 Compatibility and Customizable Assertions

Table 17: Overview-4 Misc
Name compatibility layers Customizeable Assertion Functions
cacau   Y
kaputt   Y
parachute fiveam lisp-unit prove  
nst   Y

(a) Running suites without tests or tests without test functions will result in tests marked PENDING rather than success or fail

5.12 Claims Not Test

Table 18: Overview-5 Claims Not Tested
Name Async Thread Ready Package Inferred
1am   X  
2am   X  
Cacau X    
Rove   X (1) X

(1) Tycho Garen reported in February 2021 that "Rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."

top

6 Assertion Failure Comments

There are two reasons you test. First to pat yourself on the back that your code passed tests. Second, find any bugs. Assertions in the test frameworks have different amounts of automatically generated information that they will provide on failures. The following are the automatically generated failure messages on an assertion that (= x y) where x is 1 and y is 2. We also note whether the framework also accepts diagnostic strings and variables for those strings.

6.1 1am

What, you wanted a report? Let me introduce you to the debugger.

6.2 2am

Assertions also accept diagnostic strings with variables

T1-FAIL-34:
FAIL: (= X Y)

6.3 cacau

Error message:
BIT EQUAL (INTEGER 0 4611686018427387903)
Actual:
1
Expected:
2

6.4 clunit and clunit2

Assertions also accept diagnostic strings with variables

T1-FAIL-34: Expression: (= X Y)
Expected: T
Returned: NIL

6.5 fiasco

Assertions also accept diagnostic strings with variables

Failure 1: FAILED-ASSERTION when running T1-FAIL
Binary predicate (= X Y) failed.
x: X => 1
y: Y => 2

6.6 fiveam

Assertions also accept diagnostic strings with variables. I deleted several blank lines. Why do you waste so much screen space Fiveam?

T1-FAIL-34 []:
Y
evaluated to
2
which is not
=
to
1

6.7 gigamonkeys

FAIL ... (T1-FAIL): (= X Y)
X                 => 1
Y                 => 2
(= X Y)           => NIL

6.8 kaputt

Into the debugger you go.

 Test assertion failed:
  (ASSERT-T (= X Y))
In this call, the composed forms in argument position evaluate as:
  (= X Y) => NIL
The assertion (ASSERT-T EXPR) is true, iff EXPR is a true generalised boolean.

6.9 lift

Assertions also accept diagnostic strings with variables

Failure: s0 : t1-fail-34
Documentation: NIL
Source       : NIL
Condition    : Ensure failed: (= X Y) ()
During       : (END-TEST)
Code         : (
                ((LET ((X 1) (Y 2))
                   (ENSURE (= X Y)))))

6.10 lisp-unit

Assertions also accept diagnostic strings but no variables

Failed Form: (= X Y)
 | Expected T but saw NIL
 | X => 1
 | Y => 2

6.11 lisp-unit2

Assertions also accept diagnostic strings but no variables

| FAILED (1)
 | Failed Form: (ASSERT-TRUE (= X Y))
 | Expected T
 | but saw NIL

6.12 parachute

Assertions also accept diagnostic strings with variables

test 't1-fail-34)
        ? TF-PARACHUTE::T1-FAIL-34
  0.000 ✘   (is = x y)
  0.010 ✘ TF-PARACHUTE::T1-FAIL-34

;; Failures:
   1/   1 tests failed in TF-PARACHUTE::T1-FAIL-34
The test form   y
evaluated to    2
when            1
was expected to be equal under =.

6.13 ptester

Test failed: Y
  wanted: 1
     got: 2

6.14 prove

Assertions also accept diagnostic strings but no variables

× NIL is expected to be T (prove)

6.15 rove

Assertions also accept diagnostic strings but no variables

(EQUAL X Y) (rove)
X = 1
Y = 2

6.16 rt

Form: (LET ((X 1) (Y 2))
        (= X Y))
Expected value: T
Actual value: NIL.

6.17 should-test

Assertions also accept diagnostic strings but no variables

Test T1-FAIL-34:
Y FAIL
expect: 1
actual: 2
FAILED

6.18 tap-unit-test

Assertions also accept diagnostic strings but no variables

T1-FAIL-34: (= X Y) failed:
Expected T but saw NIL

6.19 unit-test

Assertions also accept diagnostic strings but no variables

(#<TEST-EQUAL-RESULT FORM: (= X Y) STATUS: FAIL REASON: NIL>)

top

7 Benchmarking

This is really simple benchmarking using sbcl version 2.1.4 on a linux server and ccl version 1.12 LinuxX8664. I used each framework and applied them to uax-15 which has 16 tests and 338760 assertions (and they all pass). The tests were stripped to the minimum. No diagnostic strings were used and for the frameworks which allowed it, the test was set to no progress reporting and overall summary only. All of the assertions pass, so any real world test with failing assertions generating failure reports will be different.

Based on this one application, from a speed perspective, there was a host of frameworks in a pack, with Rove and Prove at the back, then NST and Clunit not even on the same continent. (13 June 2021 Update: Clunit2 has resolved clunit's performance deficit.) (Strangely enough, while the other frameworks were slower under cc, Clunit and NST improved.) Fiveam is in the pack so long as it runs in a terminal or some non-emacs editor. If running in emacs it runs into an emacs issue with long lines. You will see the difference in the fiveam report.

Your context will be important as to whether these benchmarks are at all meaningful to you.

7.1 Stack Ranking

What immediately jumps out is the vast majority are grouped together, then there are a few outliers that are just way worse than the pack. For consistency, every benchmark was done on sbcl 2.1.4 in a terminal (to avoid emacs issues) except as noted to show the emacs effect.

Table 19: Order by Benchmark Runtime (lower is better)
Library SBCL RunTime CCL Runtime
xptest 6.5151 11.397
xlunit 6.5840 11.618
lift 6.6040 11.605
2am (not in quicklisp) 6.8468 12.905
1am 6.8821 12.870
cacau 7.0334 11.609
rt 7.0450 11.663
unit-test 7.5560 17.880
lisp-unit 7.8594 13.345
kaputt 7.9049 15.731
tap-unit-test 7.9746 13.095
should-test 8.1620 24.667
com.gigamonkeys.test-framework 8.5220 30.627
ptester 9.0088 22.307
fiasco 9.6013 28.807
clunit2 (b1) 10.3766 26.347
lisp-unit2 (a) 10.7883  
fiveam 11.1790 18.647
parachute 11.7699 35.020
clunit2 (b2) 13.3833 32.582
rove 18.2223 41.800
prove 31.2611 116.864
nst 522.5623 490.843
clunit 601.0652 272.858

(a) CCL decided it was not friends with Lisp-Unit2 and did not compile it. (b1) 13 June 2021 Update - testing using assert-true (b2) 13 June 2021 Update - testing using assert-equal

Table 20: Order by Benchmark Bytes Consed (lower is better)
Library Bytes Consed
2am (not in quicklisp) 3541592704
1am 3541621392
xptest 3545190064
xlunit 3547926336
cacau 3559494400
lift 3662570128
rt 3781963056
unit-test 3842573536
should-test 4033912496
kaputt 4034319280
fiasco 4249190656
tap-unit-test 4406207680
lisp-unit 4411139200
fiveam 4680219840
com.gigamonkeys.test-framework 4950815168
lisp-unit2 5127796976
ptester 5195902592
clunit 5262303120
parachute 5357258016
rove 8960510816
clunit2 (a) 13401972672
prove 14124826480
clunit2 (b) 15377667616
nst 319321472704

(a) 13 June 2021 Update using assert-true (b) 13 June 2021 Update using assert-equal

Table 21: Order by Benchmark Eval calls (lower is better)
Library Eval Calls
1am 0
cacau 0
com.gigamonkeys.test-framework 0
fiveam 0
kaputt 0
lift 0
lisp-unit2 0
parachute 0
prove 0
ptester 0
rove 0
should-test 0
unit-test 0
xlunit 0
xptest 0
clunit2 (a) 0
2am (not in quicklisp) 10
fiasco 10
lisp-unit 10
tap-unit-test 160
clunit 320
rt 480
nst 6768780

(a) 13 June 2021 Update

Now the detailed report.

7.2 1am

1am seems to have no way to turn off the progress reports. The benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines. YMMV with other editors.

(benchmark:with-timing (10) (uax-15-1am-tests::run))

Success: 16 tests, 338760 checks.
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       10.889917   1.013326   1.153324   1.076658   1.088992     0.042165
RUN-TIME         10       9.661734    0.942873   0.983157   0.966447   0.966173     0.014798
USER-RUN-TIME    10       9.262606    0.909559   0.950976   0.922516   0.926261     0.012376
SYSTEM-RUN-TIME  10       0.399142    0.026377   0.050109   0.039822   0.039914     0.007806
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       974.403     82.634     111.238    96.93      97.4403      9.298019
BYTES-CONSED     10       4870658256  487053200  487078096  487066720  487065820.0  7454.81
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       12.877645  1.277734  1.307823  1.285777  1.287764  0.00831
RUN-TIME   10       12.870041  1.276389  1.306837  1.285161  1.287004  0.008351

7.3 2am

2am seems to have no way to turn off the progress reports. As with the 1am benchmark, the benchmark below was done running in a terminal window. The same test running in a emacs REPL took roughly six times longer due to how emacs mishandles long lines.

(benchmark:with-timing (10) (uax-15-2am-tests::run))
Did 16 tests (0 crashed), 338760 checks.
Pass: 338760 (100%)
Fail: 0 ( 0%)
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       8.389924    0.729993   0.906659   0.839992   0.838992     0.049081
RUN-TIME         10       6.846887    0.674877   0.730084   0.680101   0.684689     0.015464
USER-RUN-TIME    10       6.79029     0.664879   0.713442   0.675069   0.679029     0.012428
SYSTEM-RUN-TIME  10       0.056611    0          0.016639   0.006661   0.005661     0.005376
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       272.862     20.824     65.941     24.247     27.2862      13.023412
BYTES-CONSED     10       3541592704  354158208  354162800  354158736  354159260.0  1253.5916
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       12.906069  1.278402  1.332574  1.28587   1.290607  0.014833
RUN-TIME   10       12.905263  1.278475  1.334017  1.281963  1.290526  0.015472

7.4 cacua

Since Cacau does not run tests unless they are recompiled, we have simply multiplied a single run by 10 to get some kind of comparable here. Running at the minimum reporting.

    (benchmark:with-timing (10) (uax-15-cacau-tests::run :reporter :min))
-                SAMPLES  TOTAL(10x)      MINIMUM   MAXIMUM    MEDIAN    AVERAGE   DEVIATION
REAL-TIME        10       7.03327    0         0.703327   0         0.070333  0.210998
RUN-TIME         10       7.03341    0.000018  0.703049   0.000021  0.070334  0.210905
USER-RUN-TIME    10       6.86598    0.000017  0.686321   0.000021  0.06866   0.205887
SYSTEM-RUN-TIME  10       0.16734    0.000001  0.016721   0.000001  0.001673  0.005016
PAGE-FAULTS      10       0          0         0          0         0         0.0
GC-RUN-TIME      10       572.44     0         57.244     0         5.7244    17.1732
BYTES-CONSED     10       3576325600 0         357534400  0         35763256  107257040.0
EVAL-CALLS       10       0          0         0          0         0         0.0

The ccl version (total multiplied by 10 get try to get a comparable)

-          SAMPLES  TOTAL     MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  1        11.63139  1.163139  1.163139  1.163139  1.163139  0
RUN-TIME   1        11.6093   1.16093   1.16093   1.16093   1.16093   0

7.5 clunit

Clunit has always had a concern about performance. Running this benchmark was painful. Unlike fiveam, which should not be run in a REPL in emacs on tests with lots of assertions because of emacs' issues with long lines, clunit has no one to blame but itself. But look at the ccl results compared to the sbcl results. Clunit was the only framework faster under ccl than sbcl. Still unacceptably slow, but … Wwith sbcl in a terminal.

Passed: 338760/338760 all tests passed
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       601.4108    57.19953   64.00593   59.65935   60.141087  2.303678
RUN-TIME         10       601.0652    57.161556  63.96824   59.62751   60.106518  2.301759
USER-RUN-TIME    10       600.65216   57.108273  63.941593  59.587543  60.06522   2.305383
SYSTEM-RUN-TIME  10       0.413016    0.019989   0.059948   0.043303   0.041302   0.011839
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       1158.426    87.246     145.034    115.57     115.8426   17.674866
BYTES-CONSED     10       5262303120  526034656  527650448  526069408  526230312  473977.47
EVAL-CALLS       10       320         32         32         32         32         0.0
NIL

The ccl result

-          SAMPLES  TOTAL     MINIMUM    MAXIMUM    MEDIAN    AVERAGE    DEVIATION
REAL-TIME  10       272.9831  27.003325  27.478271  27.37946  27.298307  0.179919
RUN-TIME   10       272.8588  26.99254   27.466413  27.36916  27.28588   0.179731

7.6 clunit2

Update 13 June 2021: Clunit2 has had a huge performance increase, most of it apparently involving moving from using lists to using arrays. Clunit2 should now be considered a member of the pack from a performance standpoint.

I ran the new improved clunit2 two ways and there is a performance difference to be considered here.

First I let CL equal do the comparision and then clunit2 just checked whether the assertion was true (assert-true) which was how all the other frameworks were also tested.

-                SAMPLES  TOTAL        MINIMUM     MAXIMUM     MEDIAN      AVERAGE       DEVIATION
REAL-TIME        10       10.376591    1.023326    1.066659    1.029992    1.037659      0.015279
RUN-TIME         10       10.367396    1.022475    1.065339    1.028659    1.03674       0.015162
USER-RUN-TIME    10       10.057592    0.995536    1.019138    1.002031    1.005759      0.008744
SYSTEM-RUN-TIME  10       0.309821     0.016781    0.046708    0.029969    0.030982      0.008953
PAGE-FAULTS      10       0            0           0           0           0             0.0
GC-RUN-TIME      10       999.48       91.007      122.793     92.142      99.948        12.148087
BYTES-CONSED     10       13401972672  1340051456  1340265440  1340209264  1340197200.0  65795.266
EVAL-CALLS       10       0            0           0           0           0             0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       26.267895  2.592513  2.656461  2.627074  2.626789  0.015478
RUN-TIME   10       26.34661   2.602268  2.665974  2.631869  2.634661  0.015835

I then speculated whether using assert-equal would increase performance on the ground that you are only testing once - assert-equal replaces both the CL equal function and the clunit2 assert-true function). Interestingly that actually slowed things down slightly.

-                SAMPLES  TOTAL        MINIMUM     MAXIMUM     MEDIAN      AVERAGE       DEVIATION
REAL-TIME        10       13.389882    1.309988    1.483321    1.313322    1.338988      0.051165
RUN-TIME         10       13.383314    1.31106     1.481517    1.315183    1.338331      0.050361
USER-RUN-TIME    10       13.023996    1.27605     1.415059    1.288964    1.3024        0.038748
SYSTEM-RUN-TIME  10       0.359333     0.013431    0.066453    0.023397    0.035933      0.017299
PAGE-FAULTS      10       0            0           0           0           0             0.0
GC-RUN-TIME      10       1186.3       93.469      256.883     97.418      118.63        47.99955
BYTES-CONSED     10       15377667616  1537414496  1540026832  1537518864  1537766800.0  755227.9
EVAL-CALLS       10       0            0           0           0           0             0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       32.504135  3.207461  3.299608  3.244852  3.250414  0.0254
RUN-TIME   10       32.58276   3.221215  3.292748  3.254175  3.258276  0.021464

7.7 fiasco

With progress reporting turned off

(setf *print-test-run-progress* nil)
(in-package :uax-15-fiasco-suite)
(benchmark:with-timing (10) (run-package-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.606572    0.913324   1.026656   0.939991   0.960657     0.041225
RUN-TIME         10       9.601316    0.911867   1.025575   0.939456   0.960132     0.040784
USER-RUN-TIME    10       9.234909    0.88525    0.992281   0.902122   0.923491     0.036895
SYSTEM-RUN-TIME  10       0.366417    0.023392   0.053293   0.033323   0.036642     0.009047
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       1397.089    97.043     213.176    120.03     139.7089     38.06595
BYTES-CONSED     10       4249190656  424657984  426455728  424685488  424919070.0  545014.2
EVAL-CALLS       10       10          1          1          1          1            0.0
NIL

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       28.8267    2.806652  2.952852  2.882208  2.88267   0.043738
RUN-TIME   10       28.806755  2.809196  2.948668  2.878305  2.880676  0.041971

7.8 fiveam

With progress reporting turned off:

(benchmark:with-timing (10)
 (let ((fiveam:*test-dribble* (make-broadcast-stream)))
      (run 'uax-15-fiveam)))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       11.189935   1.069995   1.169993   1.109993   1.118994   0.030038
RUN-TIME         10       11.179056   1.069764   1.170402   1.107926   1.117906   0.029854
USER-RUN-TIME    10       10.824186   1.029703   1.114896   1.081201   1.082419   0.025603
SYSTEM-RUN-TIME  10       0.354886    0.023378   0.060391   0.027252   0.035489   0.011385
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       1477.345    99.423     203.254    134.299    147.7345   30.82203
BYTES-CONSED     10       4680219840  467997648  468051088  468022496  468021984  14806.843
EVAL-CALLS       10       0           0          0          0          0          0.0

If you do not have progress reporting turned off, besides wasting a huge amount of screen space and time, it creates interesting issues based on what you are running fiveam on. Emacs has known problems with long lines and fiveam's progress reporting in a benchmark like this creates lots of long line. It gets even worse if you set the run keyword parameter :print-names to nil.

Rule of thumb for big test systems and fiveam. Run it from a terminal, not an emacs REPL.

First, this is sbcl running in a terminal with progress reporting on:

(benchmark:with-timing (10) (run 'uax-15-fiveam))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       11.683272   1.106661   1.26666    1.13666    1.168327     0.050315
RUN-TIME         10       11.433937   1.096922   1.217642   1.132014   1.143394     0.035098
USER-RUN-TIME    10       11.09766    1.073696   1.167503   1.098376   1.109766     0.025922
SYSTEM-RUN-TIME  10       0.336295    0.00979    0.050139   0.036583   0.03363      0.011926
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       1364.435    103.005    181.628    128.838    136.4435     26.518356
BYTES-CONSED     10       4680267008  468008608  468062224  468021680  468026700.0  14802.947
EVAL-CALLS       10       0           0          0          0          0            0.0

NIL

The ccl version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       19.297316  1.917268  1.957064  1.926189  1.929732  0.01028
RUN-TIME   10       19.291763  1.917065  1.958334  1.92539   1.929176  0.010841

Now look at the run using sbcl in emacs base case:

(benchmark:with-timing (10) (run 'uax-15-fiveam))

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       46.182957   1.386656   12.649892  1.473321   4.618296   4.85889
RUN-TIME         10       14.597167   1.404305   1.512524   1.453036   1.459717   0.032735
USER-RUN-TIME    10       14.132159   1.355067   1.462705   1.408331   1.413216   0.0275
SYSTEM-RUN-TIME  10       0.465011    0.009331   0.083569   0.046587   0.046501   0.020562
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       1565.843    98.467     248.256    145.087    156.5843   48.0745
BYTES-CONSED     10       6191647200  619003152  619415568  619130784  619164720  108128.414
EVAL-CALLS       10       0           0          0          0          0          0.0

While setting the run keyword parameter :print-names to nil is supposed to help performance, it seems to actually make the emacs long line problem worse. See, eg. this result:

(benchmark:with-timing (10) (run 'uax-15-fiveam :print-names nil))

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       2252.024    1.399989   1121.9193  1.47332    225.2024     377.6827
RUN-TIME         10       14.941372   1.409806   1.746217   1.479273   1.494137     0.090286
USER-RUN-TIME    10       14.448811   1.389841   1.64304    1.422158   1.444881     0.069076
SYSTEM-RUN-TIME  10       0.492585    0.009959   0.103179   0.046558   0.049258     0.024733
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       1904.724    115.29     417.308    179.123    190.4724     80.48869
BYTES-CONSED     10       6191383968  619016352  619290080  619130000  619138370.0  80104.17

7.9 gigamonkeys

Gigamonkeys does not do progress reporting

  (benchmark:with-timing (10) (test-package))

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       8.509921    0.846659   0.856659   0.849992   0.850992     0.003
RUN-TIME         10       8.522091    0.848863   0.860674   0.850513   0.852209     0.003309
USER-RUN-TIME    10       8.502139    0.846733   0.854092   0.849976   0.850214     0.002381
SYSTEM-RUN-TIME  10       0.019971    0          0.006631   0.000002   0.001997     0.002203
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       330.405     31.881     35.916     32.204     33.0405      1.41741
BYTES-CONSED     10       4950815168  495054096  495089584  495087440  495081500.0  13706.401
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       30.610546  3.024176  3.284612  3.033481  3.061055  0.075138
RUN-TIME   10       30.627195  3.026633  3.285767  3.035477  3.06272   0.074814

7.10 kaputt

Kaputt has no built in capability for running all the tests in a suite or package, so this is based on creating a function that just runs all the tests for uax-15-kaputt-tests.

There is no way to turn off the progress report.

(benchmark:with-timing (10) (uax-15-kaputt-tests:run-all-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       7.906594    0.776659   0.856658   0.783326   0.790659   0.022548
RUN-TIME         10       7.904968    0.777959   0.857393   0.782869   0.790497   0.022549
USER-RUN-TIME    10       7.795143    0.761309   0.82076    0.774808   0.779514   0.015074
SYSTEM-RUN-TIME  10       0.10984     0          0.036632   0.006663   0.010984   0.010646
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       411.1       31.975     98.302     36.429     41.11      19.192804
BYTES-CONSED     10       4034319280  403009904  407150416  403012320  403431928  1239563.6
EVAL-CALLS       10       0           0          0          0          0          0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       15.73      1.551425  1.603039  1.561077  1.573     0.019737
RUN-TIME   10       15.730825  1.551544  1.602959  1.561094  1.573082  0.019757

7.11 lift

Lift says that there were 16 successful tests, but does not specify the number of successful assertions, so no progress reports..

(benchmark:with-timing (10) (run-tests :suite 'uax-lift-15))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.60661     0.653326   0.693328   0.656661   0.660661     0.011136
RUN-TIME         10       6.604042    0.652423   0.693389   0.657587   0.660404     0.011164
USER-RUN-TIME    10       6.590675    0.652425   0.683388   0.657588   0.659068     0.008398
SYSTEM-RUN-TIME  10       0.01338     0          0.01       0          0.001338     0.003049
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       278.852     23.49      40.71      27.184     27.8852      4.50024
BYTES-CONSED     10       3662570128  365002000  377114400  365003264  366257020.0  3620682.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version of the benchmark resulted in this:

  (benchmark:with-timing (10) (run-tests :suite 'uax-lift-15))

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.619499  1.15895   1.172787  1.159915  1.16195   0.003887
RUN-TIME   10       11.604948  1.156783  1.170432  1.158835  1.160495  0.003818

7.12 lisp-unit

No progress reports

(benchmark:with-timing (10) (run-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       7.866596    0.769993   0.816659   0.776658   0.78666    0.016124
RUN-TIME         10       7.859482    0.771481   0.813944   0.77546    0.785948   0.015153
USER-RUN-TIME    10       7.762971    0.765489   0.797356   0.772805   0.776297   0.008726
SYSTEM-RUN-TIME  10       0.096526    0          0.026643   0.009915   0.009653   0.009454
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       444.581     35.22      63.385     37.716     44.4581    10.282556
BYTES-CONSED     10       4411139200  440744928  444310816  440759072  441113920  1065665.8
EVAL-CALLS       10       160         16         16         16         16         0.0

Now the ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       13.336789  1.326142  1.349726  1.330619  1.333679  0.007343
RUN-TIME   10       13.344638  1.323655  1.347442  1.330136  1.334464  0.008201
NIL

7.13 list-unit2

No progress reports

(benchmark:with-timing (10) (run-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       10.796587   1.036659   1.369989   1.043326   1.079659     0.097198
RUN-TIME         10       10.788351   1.033685   1.36925    1.04422    1.078835     0.097286
USER-RUN-TIME    10       10.555075   1.017118   1.282673   1.026415   1.055508     0.076476
SYSTEM-RUN-TIME  10       0.23329     0.010035   0.086577   0.013448   0.023329     0.021721
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       1076.758    66.774     399.308    75.839     107.6758     97.46117
BYTES-CONSED     10       5127796976  512756560  512800544  512778608  512779700.0  13773.206
EVAL-CALLS       10       0           0          0          0          0            0.0

No ccl version because ccl decide not to compile lisp-unit2.

7.14 nst

NST's results were surprisingly bad. I ran tests with and without :cache being set on each fixture and it did not seem to make much of a difference.

(benchmark:with-timing (10) (nst-cmd :run :uax-15-nst))
-                SAMPLES  TOTAL         MINIMUM      MAXIMUM      MEDIAN       AVERAGE        DEVIATION
REAL-TIME        10       522.56226     52.202904    52.3729      52.24624     52.25623       0.048742
RUN-TIME         10       522.1588      52.14377     52.32057     52.210835    52.215885      0.047927
USER-RUN-TIME    10       520.0237      51.92075     52.0908      51.99614     52.00237       0.046089
SYSTEM-RUN-TIME  10       2.135158      0.186492     0.236569     0.209909     0.213516       0.01544
PAGE-FAULTS      10       0             0            0            0            0              0.0
GC-RUN-TIME      10       16532.35      1629.216     1679.521     1656.667     1653.2349      17.045515
BYTES-CONSED     10       319321472704  31931030448  31942032624  31931048176  31932148000.0  3295139.8
EVAL-CALLS       10       6768780       676878       676878       676878       676878         0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME  10       490.6759   48.98529  49.147766  49.05929   49.067593  0.040534
RUN-TIME   10       490.84375  49.00008  49.194897  49.082294  49.084373  0.048305

7.15 parachute

Progress reporting turned off by using the quiet report.

(benchmark:with-timing (10) (test 'suite :report 'quiet))

SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       11.783255   1.133325   1.303324   1.149992   1.178326     0.056906
RUN-TIME         10       11.76993    1.133041   1.300795   1.147235   1.176993     0.05646
USER-RUN-TIME    10       11.237235   1.079734   1.240937   1.098715   1.123724     0.047878
SYSTEM-RUN-TIME  10       0.532717    0.029905   0.086578   0.049991   0.053272     0.016382
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       2197.263    187.204    323.411    198.264    219.7263     38.07161
BYTES-CONSED     10       5357258016  528976624  596222960  529004640  535725800.0  20165728.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The same benchmark with ccl:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       35.034126  3.446474  3.589289  3.496444  3.503413  0.038255
RUN-TIME   10       35.019848  3.44557   3.593415  3.491784  3.501985  0.040734
NIL

7.16 prove

The prove tests were done with the *default-reporter* set to :dot because there is no way to turn off the progress reporting. The times were surprisingly slow (not clunit slow, but roughly five times longer than the other frameworks), with no real difference between running in a terminal window or in an emacs REPL.

(benchmark:with-timing (10) (run-all-uax-15))
                SAMPLES  TOTAL        MINIMUM     MAXIMUM     MEDIAN      AVERAGE     DEVIATION
REAL-TIME        10       53.35967     4.923303    5.596632    5.359966    5.335967    0.191425
RUN-TIME         10       31.261152    3.050384    3.377765    3.100676    3.126115    0.091643
USER-RUN-TIME    10       30.215063    2.950801    3.197486    2.983159    3.021506    0.070946
SYSTEM-RUN-TIME  10       1.046104     0.075742    0.180281    0.093283    0.10461     0.029117
PAGE-FAULTS      10       0            0           0           0           0           0.0
GC-RUN-TIME      10       2573.17      169.012     522.004     217.099     257.317     96.44223
BYTES-CONSED     10       14124826480  1405710432  1439239664  1405725840  1412482648  10476879.0
EVAL-CALLS       10       0            0           0           0           0           0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME  10       116.92284  11.292191  12.127493  11.640758  11.692284  0.26381
RUN-TIME   10       116.86357  11.283448  12.121286  11.633533  11.686357  0.264158

7.17 ptester

The benchmarking was done with a single function (ptester-tests) that called all the tests. Progess reporting cannot be turned off.

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       9.003254    0.879991   0.919993   0.899992   0.900325     0.015738
RUN-TIME         10       9.0088      0.879493   0.92187    0.898636   0.90088      0.015573
USER-RUN-TIME    10       8.855853    0.876179   0.899495   0.881767   0.885585     0.007137
SYSTEM-RUN-TIME  10       0.152978    0.000027   0.043389   0.003385   0.015298     0.014619
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       339.523     19.892     53.051     29.351     33.9523      10.742064
BYTES-CONSED     10       5195902592  519572448  519605984  519591104  519590270.0  12074.577
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       22.3047    2.07826   2.369612  2.307167  2.23047   0.118142
RUN-TIME   10       22.307024  2.072661  2.370712  2.302883  2.230702  0.118646

7.18 rove

I ran into an as yet unidentified issue with rove and sbcl. Several attempts to run the benchmark on sbcl triggered heap exhaustion during garbage collection (even on a clean sbcl instance). Below are the results for the one clean run of 10 iterations that I did get.

(benchmark:with-timing (10) (run :uax-15-rove-tests :style :none))

-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       18.219845   1.536653   2.41998    1.69332    1.821984     0.266849
RUN-TIME         10       18.222372   1.536997   2.420086   1.693447   1.822237     0.266392
USER-RUN-TIME    10       16.82601    1.466989   2.123462   1.60346    1.682601     0.193403
SYSTEM-RUN-TIME  10       1.396386    0.049999   0.296627   0.10344    0.139639     0.074402
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       6424.622    381.013    1233.202   516.577    642.4622     253.83493
BYTES-CONSED     10       8960510816  895972464  896586032  895994096  896051100.0  178432.86
EVAL-CALLS       10       130         13         13         13         13           0.0

I did not have the same problem running the same benchmark test with ccl, but obviously apples and oranges. Here is the ccl benchmark.

    (benchmark:with-timing (10) (run :uax-15-rove-tests :style :none))
-          SAMPLES  TOTAL      MINIMUM  MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       41.76701   3.8768   4.341102  4.16477   4.176701  0.132818
RUN-TIME   10       41.799763  3.87051  4.346589  4.170243  4.179976  0.135506

7.19 rt

Rt reports only the tests, not the assertions.

(benchmark:with-timing (10) (do-tests))
  -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
  REAL-TIME        10       7.053279    0.696661   0.739993   0.699995   0.705328     0.01284
  RUN-TIME         10       7.045017    0.695458   0.737783   0.698051   0.704502     0.012506
  USER-RUN-TIME    10       6.97191     0.684028   0.73121    0.695708   0.697191     0.012348
  SYSTEM-RUN-TIME  10       0.073117    0          0.026694   0.003303   0.007312     0.0084
  PAGE-FAULTS      10       0           0          0          0          0            0.0
  GC-RUN-TIME      10       287.186     23.124     55.806     24.106     28.7186      9.336387
  BYTES-CONSED     10       3781963056  378147840  378340736  378182688  378196300.0  50474.535
  EVAL-CALLS       10       480         48         48         48         48           0.0

Now ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.663611  1.155944  1.180289  1.166971  1.166361  0.006513
RUN-TIME   10       11.68608   1.157109  1.18317   1.169409  1.168608  0.006879

7.20 should-test

Should-test prints out the name of each test with OK.

  (benchmark:with-timing (10) (test))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       8.156594    0.803326   0.896659   0.806659   0.815659     0.027082
RUN-TIME         10       8.16205     0.802968   0.897908   0.808052   0.816205     0.02732
USER-RUN-TIME    10       8.122097    0.798071   0.877938   0.80472    0.81221      0.022233
SYSTEM-RUN-TIME  10       0.039968    0          0.019971   0.000005   0.003997     0.006101
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       440.966     33.525     112.947    38.192     44.0966      23.066162
BYTES-CONSED     10       4033912496  403038928  406439808  403043248  403391230.0  1016250.75
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       24.659025  2.453905  2.483674  2.463236  2.465902  0.008501
RUN-TIME   10       24.66664   2.456116  2.481225  2.466268  2.466664  0.007466

7.21 tap-unit-test

No progress reporting.

  (benchmark:with-timing (10) (run-tests))
                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE    DEVIATION
REAL-TIME        10       7.973262    0.776659   0.929992   0.779994   0.797326   0.044442
RUN-TIME         10       7.974663    0.77693    0.932383   0.781873   0.797466   0.045228
USER-RUN-TIME    10       7.891431    0.761086   0.909074   0.775958   0.789143   0.040629
SYSTEM-RUN-TIME  10       0.083247    0          0.033295   0.003331   0.008325   0.010865
PAGE-FAULTS      10       0           0          0          0          0          0.0
GC-RUN-TIME      10       484.082     33.188     154.321    35.693     48.4082    35.405758
BYTES-CONSED     10       4406207680  440607680  440635344  440620080  440620768  8083.716
EVAL-CALLS       10       160         16         16         16         16         0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       13.113791  1.305049  1.321753  1.310286  1.311379  0.005856
RUN-TIME   10       13.095441  1.304772  1.315464  1.30753   1.309544  0.003755

7.22 unit-test

Progress reporting on tests, not assertions

(benchmark:with-timing (10) (run-all-tests))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       7.549933    0.71666    0.856659   0.729994   0.754993     0.047004
RUN-TIME         10       7.556039    0.717151   0.857157   0.73066    0.755604     0.047087
USER-RUN-TIME    10       7.37959     0.706391   0.821589   0.717151   0.737959     0.039426
SYSTEM-RUN-TIME  10       0.176461    0.000002   0.049935   0.013322   0.017646     0.013324
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       850.624     52.211     190.571    63.007     85.0624      41.03387
BYTES-CONSED     10       3842573536  381252624  411219072  381259776  384257340.0  8987242.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       17.873028  1.676715  2.000257  1.68315   1.787303  0.129952
RUN-TIME   10       17.880577  1.6739    2.00028   1.686198  1.788058  0.129957

7.23 xlunit

Xlunit does progress reports only on the tests, not the assertions.

(benchmark:with-timing (10) (xlunit:textui-test-run (xlunit:get-suite uax-15)))
-                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.579943    0.649994   0.693328   0.653327   0.657994     0.012311
RUN-TIME         10       6.584024    0.651812   0.693726   0.653186   0.658402     0.012139
USER-RUN-TIME    10       6.547395    0.642397   0.670408   0.653116   0.65474      0.007082
SYSTEM-RUN-TIME  10       0.03665     0          0.023319   0.000003   0.003665     0.007213
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       246.993     19.932     50.002     23.138     24.6993      8.570283
BYTES-CONSED     10       3547926336  354158256  359179296  354160208  354792640.0  1513466.4
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.600723  1.148456  1.19076   1.156022  1.160072  0.011422
RUN-TIME   10       11.617545  1.148918  1.190917  1.159512  1.161754  0.011414

7.24 xptest

(benchmark:with-timing (10) (report-result (run-test *uax-15-suite*)))

 -                SAMPLES  TOTAL       MINIMUM    MAXIMUM    MEDIAN     AVERAGE      DEVIATION
REAL-TIME        10       6.513276    0.643327   0.689994   0.646661   0.651328     0.013098
RUN-TIME         10       6.515137    0.643435   0.689399   0.647958   0.651514     0.012776
USER-RUN-TIME    10       6.49848     0.640105   0.679408   0.647958   0.649848     0.010321
SYSTEM-RUN-TIME  10       0.01668     0          0.009991   0.000003   0.001668     0.00307
PAGE-FAULTS      10       0           0          0          0          0            0.0
GC-RUN-TIME      10       253.887     20.3       51.813     23.71      25.3887      8.99575
BYTES-CONSED     10       3545190064  354157152  357744240  354158784  354519000.0  1075096.0
EVAL-CALLS       10       0           0          0          0          0            0.0

The ccl version:

-          SAMPLES  TOTAL      MINIMUM   MAXIMUM   MEDIAN    AVERAGE   DEVIATION
REAL-TIME  10       11.375671  1.133088  1.148804  1.136425  1.137567  0.004255
RUN-TIME   10       11.397292  1.133247  1.151431  1.139366  1.139729  0.004538

top

8 Mapping Functions Against Each Other

8.1 Assertion Functions

I expect all libraries to have the equivalent of is, signals and maybe finishes. This table just validates that assumption AND whether assertions accept an optional diagnostic string.

Table 22: Assertion Functions-1a
Library Optional string Is (a) signals finishes (b)
1am N is signals  
2am Y (P) is signals finishes
assert-p (1) N t-p condition-error-p  
clunit Y assert-true assert-condition  
clunit2 Y assert-true assert-condition  
gigamonkeys N check expect  
fiasco Y (P) is signals finishes
fiveam Y (P) is signals finishes
kaputt N assert-true    
lift Y ensure ensure-condition  
lisp-unit Y assert-true assert-error  
lisp-unit2 Y assert-true assert-error  
nst N :true :err  
parachute Y (P) is fail finish
prove Y is, ok is-error  
ptester (2)   test test-error  
rove Y ok signals  
rt (2)        
should-test Y be signals  
simplet (2)        
tap-unit-test Y assert-true assert-error  
unit-test Y test-assert test-condition  
xlunit Y assert-true assert-condition  
xptest (2)        

(a) "is" asserts that the form evaluates to not nil (b) "finishes" asserts that the body of the test does not signal aany condition (P) The diagnostic string accepts variables (1) includes cacau for this purpose (2) None - normal CL predicates resolving to T or nil

One potential advantage of other assertion functions is whether they provide built-in additional error messages. The second advantage is that you do not have to write your own if they are more complicated than normal CL predicates. The next few tables will show additional assertion functions and what frameworks have them.

Table 23: Additional Assertion Functions-1b
Library False Zero Not Zero Nil Not-Nil Null* Not Null*
assert-p not-t-p zero-p not-zero-p nil-p not-nil-p null-p not-null-p
cacau not-t-p zero-p not-zero-p nil-p not-nil-p null-p not-null-p
clunit assert-false            
clunit2 assert-false            
fiveam is-false            
kaputt       assert-nil      
lift ensure-null            
lisp-unit assert-false     assert-nil      
lisp-unit2 assert-false            
nst   assert-zero     assert-non-nil assert-null  
parachute false     false      
prove isnt            
rove ng            
tap-unit-test assert-false         assert-null assert-not-null
xlunit assert-false            

Note to self: Per http://clhs.lisp.se/Body/f_null.htm, null is an empty list or nil, not null in an sql sense.

Table 24: Assertion Functions-2a Equality
Library Eq Eql Equal Equalp Equality
           
assert-p eq-p eql-p equal-p equalp-p  
cacau eq-p eql-p equal-p equalp-p  
clunit assert-eq assert-eql assert-equal assert-equalp assert-equality
clunit2 assert-eq assert-eql assert-equal assert-equalp assert-equality
kaputt assert-eq assert-eql assert-equal    
lisp-unit assert-eq assert-eql assert-equal assert-equalp assert-equality
lisp-unit2 assert-eq assert-eql assert-equal assert-equalp assert-equality
nst assert-eq assert-eql assert-equal assert-equalp assert-equality
parachute     is    
tap-unit-test assert-eq assert-eql assert-equal assert-equalp assert-equality
unit-test     test-equal    
xlunit   assert-eql assert-equal    
Table 25: Assertion Functions-2b Not-Equality
Library Eq Eql Equal Equalp
         
assert-p not-eq-p not-eql-p not-equal-p not-equalp-p
cacau not-eq-p not-eql-p not-equal-p not-equalp-p
nst assert-not-eq assert-not-eql assert-not-equal assert-not-equalp
parachute     isnt  
xlunit   assert-not-eql    

Table 26: Assertion Functions-2c Bounded Equality
Library Available assertions
clunit assert-equality*
clunit2 assert-equality*
kaputt assert-float-is-approximately-equal, assert-float-is-definitely-greater-than, assert-float-is-definitely-less-than, assert-float-is-essentially-equal
lisp-unit assert-norm-equal, assert-float-equal, assert-number-equal, assert-numerical-equal, assert-rational-equal, assert-sigfig-equal,
lisp-unit2 assert-norm-equal assert-float-equal assert-number-equal assert-numerical-equal, assert-rational-equal, assert-sigfig-equal,
Table 27: Assertion Functions-2d other Equality
Library Available assertions
   
kaputt assert-set-equal, assert-vector-equal
lisp-unit logically-equal, set-equal
lisp-unit2 logically-equal
tap-unit-test logically-equal, set-equal, unordered-equal
   
Table 28: Assertion Functions-
Library Available assertions
   
kaputt assert-set-equal, assert-vector-equal
lisp-unit logically-equal, set-equal
lisp-unit2 logically-equal
tap-unit-test logically-equal, set-equal, unordered-equal
Table 29: Assertion Functions-3a Types
Library Type Not Type Values Not-Values
assert-p typep-p not-typep-p values-p not-values-p
cacau typep-p not-typep-p values-p not-values-p
kaputt assert-type      
lift        
lisp-unit        
lisp-unit2 assert-typep      
nst        
parachute of-type   is-values isnt-values
protest        
prove is-type   is-values  
Table 30: Assertion Functions-3b Specific Value Types Cont
Library Symbol List Tuple Char String
lift ensure-symbol ensure-list     ensure-string
Table 31: Assertion Functions-3c Strings
Library Functions
kaputt assert-string-equal,assert-string<, assert-string<=, assert-string=, assert-string>, assert-string>=
Table 32: Assertion Functions-4 Membership
Library Every Different Member Contains
cl-quickcheck     a-member  
fiveam is-every      
kaputt       assert-subsetp
lift ensure-every ensure-different ensure-member  
lisp-unit set-equal      
lisp-unit2 set-equal      
tap-unit-test set-equal      

Table 33: Assertion Functions-4 (Prints, Macro Expansion and Custom)
Library Prints Expands (1) Custom
cacau     custom-p
clunit   assert-expands  
clunit2   assert-expands  
kaputt     Yes
lisp-unit assert-prints assert-expands  
lisp-unit2 assert-prints assert-expands  
nst     Yes
prove   is-expand  
rove is-print expands  
should-test print-to    
tap-unit-test assert-prints assert-expands  
  1. Tests macro expansion, passes if (EQUALP EXPANSION (MACROEXPAND-1 EXPRESSION)) is true
Table 34: Assertion Functions-5 Specific Errors, Signals and Conditions
Library Error/Conditions (1) Not (2)
1am signals  
2am signals  
assert-p condition-error-p not-error-p, not-condition-p
cacau condition-error-p  
clunit assert-condition  
clunit2 assert-condition  
gigamonkeys expect  
fiasco signals not-signals
fiveam signals  
lift ensure-condition, ensure-error  
lisp-unit assert-error  
lisp-unit2 assert-error  
nst :err  
parachute fail  
prove    
ptester test-error  
rove signals  
rt    
should-test signal  
simplet    
tap-unit-test assert-error  
unit-test test-condition  
xlunit assert-condition  
xptest    
  1. Signals asserts that the body signals a condition of a specified type
  2. Signals that the body does not signal a condition of a specified type. Might signal some other condition

top

Table 35: Misc. Assertions
Name Assertions
cl-quickcheck is=, isnt=
kaputt assert=, assert-p:, assert-t
lift ensure-cases, ensure-cases-failure, ensure-expected-no-warning-condition, ensure-failed-error, ensure-no-warning, ensure-not-same, ensure-null-failed-error ensure-random-cases, ensure-random-cases+, ensure-random-cases-failure, ensure-same, ensure-some, ensure-warning,ensure-expected-condition, ensure-directories-exist, ensure-directory, ensure-error, ensure-failed, ensure-function, ensure-generic-function, ensure-no-warning, ensure-null-failed-error
lisp-unit assert-result, assert-test, check-type
lisp-unit2 assert-no-error, assert-no-signal, assert-no-warning, assert-warning, assert-fail, assert-no-warning, assert-passes?, assert-signal, check-type
nst assert-criterion
prove ok, is-values, is-type, like, is-print, is-error,
unit-test test-assert

(a) Every test succeeds iff the form produces the same number of results as the values and each result is equal to the corresponding value

top

8.2 Defining or Adding Tests

Table 36: Defining or Adding Test Functions
Name Add Tests
1am (test test-name body)
2am (test test-name body)
cacau (deftest "test-name" (any-parameters go here) body)
cardiogram (deftest name (<options>*) <docstring>* <form>*)
clunit (deftest test-name (suite-name-if-any) docstring body)
clunit2 (deftest test-name (suite-name-if-any) docstring body)
gigamonkeys (deftest test-name (any-parameters) body)
fiasco (deftest test-name (any-parameters) docstring body)
fiveam (test test-name docstring body)
kaputt (define-testcase test-name (any-parameters) docstring body)
lift (addtest (test-suite-name) test-name body)
lisp-unit (define-test test-name body)
lisp-unit2 (define-test test-name (tags) body)
nst (def-test (t1 :group name :fixtures (fixture-names)) body)
parachute (define-test test-name [:parent parent-name] [(:fixture if any)] body)
prove (deftest name body)
ptester (test value form) (test-error) (test-no-error) (test-warning) (test-no-warning)
rove (deftest test-name body)
rt (deftest test-name function value)
should-test (deftest name body)
simplet (test string body)
tap-unit-test (define-test test-name docstring body)
unit-test (deftest :unit unit-name :name test-name body)
xlunit (def-test-method method-name ((class-name) run-on-compilation) body)
xptest (defmethod method-name ((suite-name fixture-name)) body)

top

8.3 Running Tests

Table 37: Running Test Functions
Name Running Tests
1am (a) (test-name) (run) ; (run) runs all tests
2am (run) (run '(list of tests) (name-of-tests)
clunit (run-test 'test-name) (run-suite 'suite-name)
clunit2 (run-test 'test-name) (run-suite 'suite-name)
gigamonkeys (test test-name)
fiasco (run-tests 'test-name) (run-package-tests :package package-name)
fiveam (run 'test-name) (run! 'test-name) (run! 'suite-name)
kaputt (test-name)
lift (run-tests :name 'test-name) (run-tests :suite 'suite-name)
lisp-unit(b) (run-tests :all) (run-tests '(name1 name2 ..) (continue-testing)(c)
lisp-unit2 (run-tests :tests 'test-name) (run-tests :tags '(tag-names)) (run-tests) (run-tests :package 'package-name)
nst (nst-cmd :run test-name)
parachute (d) (test test-name &optional :report report-type)
prove (run-test 'test-name)
ptester at compilation of (with-tests (:name "test-name") )
rove (run-test 'test-name) (run-suite)
rt (b) (do-test test-name) (do-tests); (do-tests) runs all tests
should-test (b) (test) (test :test test-name)
simplet (run)
tap-unit-test (run-tests test-name1 test-name2) (run-tests)
unit-test (run-test test-name)(run-all-tests)
xlunit (xlunit:textui-test-run (xlunit:get-suite suite-name))
xptest (run-test test-name)(run-test suite-name)

(a) Shuffles tests (b) runs tests in the order they were defined (c) continue-testing runs tests that have been defined, but not yet run (d) can be a quoted list of test names

8.4 Fixture Functions

Table 38: Fixtures
Name Fixture Functions
cacau (defbefore-all), (defafter-all), (defbefore-each), (defafter-each)
clunit (defclass) and (deffixture)
clunit2 (defclass) and (deffixture)
fiveam (def-fixture name-of-fixture ())
lift set at suite definition level, with :setup, :takedown, :run-setup
lisp-unit2 :contexts are specified in test definitions
nst (def-fixtures name () body)
parachute (def-fixture name () body)
rove (setup)(teardown) are suite fixture functions. (defhook) is a test fixture function
unit-test subclass a test-class with define-test-class
xlunit (defmethod setup () body)
xptest (deftest-fixture fixture-name ()), (defmethod setup ()), (defmethod teardown ())

top

8.5 Removing Tests etc

Table 39: Removing Tests
Name Removing tests etc
clunit (undeftest) (undeffixture) (undefsuite)
clunit2 (undeftest) (undeffixture) (undefsuite)
gigamonkeys (remove-test-function) (clear-package-tests)
fiasco (fiasco::delete-test)
fiveam (rem-test) (rem-fixture)
lift (remove-test :suite x)(remove-test :test-case x)
lisp-unit (remove-tests) (remove-tags)
lisp-unit2 (uninstall-test) (undefine-test)
parachute (remove-test, remove-all-tests-in-package)
prove (remove-test) (remove-test-all)
rt (rem-test) (rem-all-tests)
tap-unit-test (remove-tests)
xlunit (remove-test)
xptest (remove-test)

top

8.6 Suites

Table 40: Suite Functions
Name Suites
2am (suite name (optional sub-suite)
clunit (defsuite name (parent) (undefsuite name)
clunit2 (defsuite name (parent) (undefsuite name)
fiasco (define-test-package package-name) (defsuite suite-name)
fiveam (def-suite :name-of-suite)
lift (deftestsuite name-of-suite (super-test-suite)(slots)
lisp-unit packages and tags (tags are specified in the test definition)
lisp-unit2 :tags are specified in test definitions
nst (def-test-group)
parachute (define-test suite)
prove (subtest …)
ptester just the use of =(with-tests …)+
rt the package is the only unit above tests
should-test the package is the only unit above tests
simplet (suite string body)
tap-unit-test the package is the only unit above tests
unit-test [effectively tags in the deftest macro before the test-name]
xlunit (defclass test-case-name (test-case)(body))
xptest (make-test-suite suite-name docstring body)

top

8.7 Generators

  1. From Frameworks
    Table 41: Random Data Generators from Frameworks
    Name Suites
    fiveam buffer, character, float, integer, list, one-element, string, tree
    lift random-number, random-element
    lisp-unit complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state
    lisp-unit2 complex-random, make-random-2d-array, make-random-2d-list, make-random-list, make-random-state
    nst  
    tap-unit-test make-random-state
  2. From Helper Libraries

    Table 42: Random Data Generators from Check-it and Cl-Quickcheck
    Check-it Cl-quickcheck Comments
    character a-char  
    list a-list  
      a-member Produces a value from another generator
    string a-string  
      a-symbol  
    tuple a-tuple  
    boolean a-boolean  
    real a-real  
      an-index  
    integer an-integer  
    or   produces a value from another generator
    guard   ensures generator result within some spec
    struct   generates a struct with given type and slot values
    map   applies a transformation to output of a sub generator
    chain   chaining generators, e.g. to produce matrices
    user-define define create custom generators
      k-generator by default an index
      m-generator by default an integer
      n-generator by default an integer

    top

9 Generic Usage Example to be Followed for Each Framework Library

The following is pseudo code just trying to show the basic usage that will be demonstrated with each library.

9.1 Basics

Start with the real basics just to see how the framework looks, do tests accept parameters, suite designations, documentation strings, etc.

The first passing test should have a basic "is" test and a signals test. If the libraries has macro expand tests or floating point and rational tests, those get added to flag that they exist. Then a basic failing test. Then run each test and show what the reports look like.

(deftest t1
  "describe t1" ; obviously only if the library allows a documentation string.
  (is (=  1 1))
  (signals division-by-zero (error 'division-by-zero)))

(deftest t1-fail ; the most basic failing test
  "describe t1-fail"
  (is (=  1 2)))

Check and see if you have to manually recompile a test when a function being tested is modified. This is not a problem with most frameworks.

(defun t1-test-function ()
  1)
;; What happens you are testing a function and you change that function?
;; Do you need to recompile the test?

(deftest t1-function ;
    (is (= (t1-test-function) 1)))

;; Now redefine t1-test-function
(defun t1-test-function ()
  2)

;; re-run test t1-function. What happens?

9.2 Multiple Values, Variables, Loops and Closures

  • Make sure the library can have tests with multiple assertions (RT cannot).
  • Does it handle values expressions? Most do but only look at the first value, fiveam does not, the lisp-units actually compare each variable in the values expressions.
  • What happens with multiple assertions where more than one fail? Lift and Kaputt will only report the first failing assertion if there are multiple failing assertions.
  • Ensure that tests can handle loops
  • Can tests handle being inside a closure?
  • Can tests call other tests? Most frameworks allow this, but you tend to get multiple reports rather than consolidated reports. Some frameworks do not allow this.
(deftest t2
  "describe t2"
  (is (= 1 1))
  (is (= 2 2))
  (is (= (values 1 2) (values 1 3))))

(let ((l1 '(#\a #\B #\z))
      (l2 '(97 66 122)))
  (deftest t2-loop
      (loop for x in l1 for y in l2 do
        (is (= (char-code x) y)))))

(deftest t3 ; a test that tries to call another test in its body
  "describe t3"
  (is (= 'a 'a))
  (test t2))

9.3 Errors, Conditions and signal handling

Check and see if there are any surprises with respect to condition signalling tests. Some frameworks will treat an unexpected condition in a signalling test as a failure, others will treat it as an error.

(deftest t7-bad-error ()
(signals division-by-zero
   (error 'floating-point-overflow)
   "testing condition assertions. This should fail"))

9.4 Suites, tags and other multiple test abilities

  1. Can you run a list of tests?

    We checked with test t3 to see if tests can call other tests. Can you just call a list of test names? Some frameworks allow, others do not.

    (run '(t1 t2))
    
  2. Suites/tags

    This section for each framework will check if you can create inherited test suites or tags

    (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
    
      (deftest t4 (s0) ; a test that is a member of a suite
        "describe t4"
        (assert-eq 1 1))
    
      ;;a multiple assertion test that is a member of a suite with
      ;; a passing test, an error signaled and a failing test
      (deftest t4-error (s0)
        "describe t4-error"
        (assert-eq  'a 'a)
        (assert-condition error (error "t4-errored out"))
        (assert-true (= 1 2)))
    
      (deftest t4-fail (s0) ;
        "describe t4-fail"
        (assert-false (= 1 2)))
    
    (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
      (deftest t5 (s1)
       (assert-true (= 1 1)))
    
    
  3. Fixtures and Freezing Data

    Fixtures are used to create a known (or randomly generated set of data that tests will use. At the end of the test, the fixtures are removed so that the next test can start in a clean environment.

    Freezing data may be considered a subset of fixtures. Freezing data is used where a test will use other existing data such as special variables, but may change it for testing purposes. You obviously want to return that special variable to its pre-existing state at the end of the test.

    First checking whether we can freeze data, change it in the test, then change it back

      (defparameter *keep-this-data* 1)
    
      (deftest t-freeze-1
          :fix (*keep-this-data*)
          (setf *keep-this-data* "new")
          (true (stringp *keep-this-data*)))
    
      (deftest t-freeze-2
        (is (= *keep-this-data* 1)))
    
    (run '(t-freeze-1 t-freeze-2))
    

    Now the classic fixture - create a data set for the test and clean it up afterwards

      ;; Create a class for data fixture purposes
    (defclass fixture-data ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (deffixture s1 (@body)
    ;;IMPORTANT Some frameworks will require a name for the fixture. Others, like CLUNIT apply a fixture to a suite (as ;; in this pseudo code) and require a suite name
      (let ((x (make-instance 'fixture-data :a 100 :b -100)))
        @body))
    
    ;; create a sub suite and check fixture inheritance
    (defsuite s2 (s1))
    
    (deftest t6-s1 (s1)
      (assert-equal  (a x) 100)
      (assert-equal   (b x) -100))
    
    (deftest t6-s2 (s2)
      (assert-equal  (a x) 100)
      (assert-equal   (b x) -100)))
    
  4. Removing tests

    How do you actually remove a test from the system

  5. Skip Capability
    1. Assertions

      Can you skip an assertion?

    2. Tests

      Can you skip an entire test?

    3. Implementation

      Can you skip something if the CL implementation is XYZ?

9.5 Random Data Generators

Can you generate different types of random data to feed to the testing framework? What does the framework have to help?

10 1am

top

10.1 Summary

homepage James Lawrence MIT 2014

1am will thrown you into the debugger on failures or errors. There is no optionality, there is no reporting - in you go. There is no provision for diagnostic strings in assertions, but since it throws you into the debugger, that is probably not relevant. Tests are shuffled on each run.

On the plus side for some people, tests are functions.

On the minus side for people like me, you cannot turn off progress reports. You can create a list of tests, but there is no concept of suites or tags.

10.2 Assertion Functions

1am's assertion functions are limited to is and signals.

10.3 Usage

  • run will run all the tests in *tests*
  • (run '(foo)) will run the named tests in the provided parameter list.
  • (name-of-test) will run the named test because tests are functions in their own right..
(run)
FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~
                                         with ~:*~{~S = ~S~^, ~}.~]~:@>" {100BE70F03}>.
(run '(foo))

FOO; Evaluation aborted on #<SIMPLE-ERROR "~@<The assertion ~S failed~:[.~:; ~
                                         with ~:*~{~S = ~S~^, ~}.~]~:@>" {100B996103}>.
  1. Basics

    Starting with a basic test where we know everything will pass. These are all the assertion functions that 1am has. There is no provision for documentation strings or test descriptions.

      (test t1
            (is (equal 1 1))
            (signals division-by-zero
                     (/ 1 0)))
    
      (run '(t1)) ; or just (t1)
    T1..
    Success: 1 test, 2 checks.
    ; No value
    
    

    Now with a deliberately failing test. Notice how it just immediately kicks into the debugger:

    (test t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (is (= x y))
        (signals division-by-zero (error 'floating-point-overflow))))
    
      (t1-fail)
    The assertion (= X Y) failed with X = 1, Y = 2.
       [Condition of type SIMPLE-ERROR]
    
    Restarts:
     0: [CONTINUE] Retry assertion.
     1: [RETRY] Retry SLIME REPL evaluation request.
     2: [*ABORT] Return to SLIME's top level.
     3: [ABORT] abort thread (#<THREAD "new-repl-thread" RUNNING {10035CE213}>)
    

    As you would hope, you do not have to manually recompile a test after a tested function has been modified.

  2. Conditions

    1am works as expected if you signal the expected error. If you signal an unexpected error, it throws you into the debugger just like every other time a test fails.

     (test t7-bad-error
           (signals division-by-zero (error 'floating-point-overflow)))
    
    (run '(t7-wrong-error))
    T7-BAD-ERROR; Evaluation aborted on #<SIMPLE-ERROR "Expected to signal ~s, but got ~s:~%~a" {102EA4F423}>.
    
  3. Edge Cases: Values expressions, loops, closures and calling other tests

    1am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. So, for example, the following passes.

    (test t2-values-expressions ()
          (is (equal (values 1 2)
                     (values 1 3))))
    
    1. Now looping and closures.

      1am will handle looping through assertions using variables declared in a closure surrounding the test.

      (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
        (test t2-loop
          (loop for x in l1 for y in l2 do
            (is (= (char-code x) y)))))
      
    2. Calling a test inside another test

      It works but do not expect composable reports.

        (test t3
            (is (= 1 1))
          (t1))
      
       (t3)
      T3.
      T1.
      Success: 1 test, 1 check.
      Success: 1 test, 1 check.
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Tests are defined using (test test-name) and pushed to a list of tests named *tests*. If you want to run a list of tests, you could save that out so that you continue to keep a list of all tests, then set *test* to whatever list of tests you want. That sounds a bit cumbersome.

    2. Suites

      1am has no suite capability

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    N/A

  8. Skip Capability

    None

  9. Random Data Generators

    None

10.4 Discussion

The comments that I have seen around 1am seem to revolve around having only a single global variable to collect all compiled tests. Various people have suggested different solutions to essentially build test suites capability:

jorams suggested:

(defmacro define-test-framework (tests-variable
                                 test-macro
                                 run-function)
  "Define a variable to hold a list of tests, a macro to define tests and a
function to run the tests."
  `(progn
     (defvar ,tests-variable ())
     (defmacro ,test-macro (name &body body)
       `(let ((1am:*tests* ()))
          (1am:test ,name ,@body)
          (dolist (test 1am:*tests*)
            (pushnew test ,',tests-variable))))
     (defun ,run-function ()
       (1am:run ,tests-variable))))

luismbo suggested: "a simpler way might be to have 1am:test associate tests with the current *package* (e.g., by turning 1am:*tests* into an hash-table mapping package names to lists of tests) and add the ability for 1am:run to filter by package and perhaps default to the current *package*."

phoe suggested the following very simple 1AM wrapper to achieve multiple test suites.

(defvar *my-tests* '())

(defun run ()
  (1am:run *my-tests*))

(defmacro define-test (name &body body)
  `(let ((1am:*tests* '()))
     (1am:test ,name ,@body)
     (pushnew ',name *my-tests*)))

10.5 Who Uses 1am?

("adopt/test" "authenticated-encryption-test" "beast-test" "binary-io/test" "bobbin/test" "chancery.test" "cl-digraph.test" "cl-netpbm/test" "cl-pcg.test" "cl-rdkafka/test" "cl-scsu-test" "cl-skkserv/tests" "jp-numeral-test" "list-named-class/test" "openid-key-test" "petri" "petri/test" "polisher.test" "protest/1am" "protest/test" "with-c-syntax-test" "xml-emitter/tests")

top

11 2am

11.1 Summary

homepage Daniel Kochmański MIT 2016

2am is based on 1am with some features wanted for CI and hierarchical tests. As with 1am, 2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is also no provision for only running the tests that failed last time. There is also no way to turn off the progress report.

11.2 Assertion Functions

is signals finishes

top

11.3 Usage

Unlike 1am which always throws you into the debugger, 2am will only throw you into the debugger if the test crashes, not if it fails.

Note that 2am will distinguish between tests that fail and tests that crash.

  • (run) will run the tests in the default suite.
  • (run 'some-suite-name) will run the tests in the named suite
  • (run '(foo)) will run the named tests in the provided parameter list.

Since tests are functions in 2am, there is no need for a (run 'test-name) function.

  1. Report Format

    First a basic failing test to show the reporting. Notice in the third asssertion we are passing it a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.

    (test t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (is (= 1 2))
        (is (equal 1 2))
        (is (= x y) "This test was meant to fail ~a is not =  ~a" x y )
            (signals floating-point-overflow
       (error 'division-by-zero))))
    

    Now to run it:

      (t1-fail)
    Running test T1-FAIL ffff
    Test T1-FAIL: 4 checks.
       Pass: 0 ( 0%)
       Fail: 4 (100%)
    
    Failure details:
    --------------------------------
     T1-FAIL:
       FAIL: (= 1 2)
       FAIL: (EQUAL 1 2)
       FAIL: This test was meant to fail 1 is not =  2
       FAIL: Expected to signal FLOATING-POINT-OVERFLOW, but got DIVISION-BY-ZERO:
    arithmetic error DIVISION-BY-ZERO signalled
    --------------------------------
    

    The macro (test name &body body) defines a test function and adds it to *tests*. The following just shows what the report looks like when everything passes. These are all the assertion functions that 2am has.

    (test t1 ; the most basic test.
      (is (=  1 1))
      (signals division-by-zero
        (/ 1 0))
      (finishes (= 1 1)))
    
    (t1)
    Running test T1 ...
    Running test T1 ...
    Test T1: 3 checks.
    Pass: 3 (100%)
    Fail: 0 (0%)
    

    As you would hope, you do not have to manually recompile a test after a tested function has been modified.

  2. Edge Cases: Value expressions, loops. closures and calling other tests
    1. Value expressions

      2am has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.

    2. Looping and closures.

      Will a test accept looping through assertions using variables from a closure? Yes.

        (let ((l1 '(#\a #\B #\z))
                (l2 '(97 66 122)))
          (test t2-loop ()
            (loop for x in l1 for y in l2 do
              (is (= (char-code x) y)))))
      
      (t2-loop)
      Running test T2-LOOP ...
      Test T2-LOOP: 3 checks.
         Pass: 3 (100%)
         Fail: 0 ( 0%)
      
      (test t2-with-multiple-values ()
        (is (= 1 1 2)))  ; This should fail and it does
      
    3. Calling another test from a test

      This succeeds as expected but also as expected there is no composition on the report.

      (test t3 (); a test that tries to call another test in its body
                (is (eql 'a 'a))
                (t2))
      
      (t3)
      Running test T3 .
      Running test T2 ...
      Test T2: 3 checks.
         Pass: 3 (100%)
         Fail: 0 ( 0%)
      Test T3: 1 check.
         Pass: 1 (100%)
         Fail: 0 ( 0%)
      
  3. Suites, tags and other multiple test abilities
    1. Lists of tests

      2am can run lists of tests

      (run '(t1 t2))
      Running test T2 ...
      Running test T1 .
      Did 2 tests (0 crashed), 4 checks.
         Pass: 4 (100%)
         Fail: 0 ( 0%)
      

      top

    2. Suites

      2am has a hash table named *suites*. Any tests not associated with a specific suite are assigned to the default suite. If we called the function (run) with no parameters, it would run all the tests in the default suite which in our case would mean all the tests decribed above:

      (run)
      

      Suites are defined using (suite 'some-suite-name-here &optional list-of-sub-suites). If you pass and tests are identified with suites by prepending the suite name to the test name.

        (suite 's0) ; This suite has no sub-suites
      
        (test s0.t4  ; a test that is a member of a suite
          (is (= 1 1)))
      
         (test s0.t4-error
          (is (eql 'a 'a))
          (signals error (error "t4-errored out"))
          (is (= 1 2)))
      
        (test s0.t4-fail
          (is (not (= 1 2))))
      
      (suite 's1 '(s0)); This suite includes suite s0 as a sub-suite.
      
      (test s1.t4-s1
         (is (= 1 1)))
      

      Calling run on 's0 will run tests s0.t4, s0.t4-error and s0.t4-fail.

      (run 's0)
      --- Running test suite S0
      Running test S0.T4 .
      Running test S0.T4-FAIL .
      Running test S0.T4-ERROR ..f
      Did 3 tests (0 crashed), 5 checks.
         Pass: 4 (80%)
         Fail: 1 (20%)
      
      Failure details:
      --------------------------------
       S0.T4-ERROR:
         FAIL: (= 1 2)
      --------------------------------
      

      Calling run on 's1 will run test s1.t4-s1 and all the tests in suite s0.

  4. Fixtures and Freezing Data

    No built-in capability.

  5. Removing tests

    Nothing explicit

  6. Sequencing, Random and Failure Only

    2am runs tests randomly - the order is shuffled on each run. There is no optionality. There is no provision for only running the tests that failed last time.

  7. Skip Capability

    None

  8. Random Data Generators

    None

11.4 Discussion

The documentation indicates that assertions may be run inside threads. I did not validate this.

top

12 cacau

12.1 Summary

homepage Noloop GPL3 2020

Cacau is interesting in that it uses an external library for assertions and is just a "test runner". The examples shown with Cacau will all assume that the assert-p library by the same author is also loaded.

On the plus side, it has extensive hooks which can perform actions before and after a suite is run or before and after each test is run. It also has explicit async capabilities (but were not tested in this report) which do not exist in other frameworks.

At the same time, it tends to be all or nothing in what runs. The (run) function either runs the last defined test (if you have not defined suites) or, if you have defined suites, it runs all tests in all the suites. Or if you then define a new function, just that new function. Maybe it is just me but I would be getting lost in what run is supposed to be checking.

Most frameworks count the individual assertions in a test, Cacau treats the test as a whole - if one assertion fails, the entire test fails and if multiple assertions failed, it will only report the first failure, not all the failures in the test, leaving you with incomplete information.

Cacau is the only framework where, if you change a function that is being tested, you need to manually recompile the tests again.

Not recommended.

12.2 Assertion Functions

Cacau uses the assertion functions from an assertion library, currently you need to use assert-p. Those are:

t-p not-t-p zero-p not-zero-p
nil-p not-nil-p null-p not-null-p
eq-p not-eq-p eql-p not-eql-p
equal-p not-equal-p equalp-p not-equalp-p
typep-p not-typep-p values-p not-values-p
error-p not-error-p    
condition-error-p not-condition-error-p custom-p  

I have to say for an assertion library, I expected to see some numerical tests as well.

12.3 Usage

You will notice that the test names must be strings whereas the test names in the other frameworks are either quoted symbol or unquoted symbols.

Cacau only has a run function which accepts fixture, reporter and debugger parameters, but no provisions for telling it what tests you want to run. If you manually compile a single test, it usually runs just that test. Otherwise it runs all the tests in the package whether you want them or not. For purposes of walking through the basic capability, we will look only at the result of the specific test under consideration and not any other test which might be picked up in the report.

I do not see a way to rerun a test except by manually recompiling the test.

  1. Report Format

    Interactive mode can be enabled by passing the keyword parameter :cl-debugger to (run :cl-debugger t) Cacau has a few different reporting formats. The function (run) without a reporter specification will provide the default :min level of info. There is also :list and :full which provide different levels of information. You will notice that I am actually creating multiple copies of the failing test to simulate recompiling the test.

    Cacau treats tests with multiple assertions as a unit. Either everything passes or the test fails and it may be difficult to figure out which assertion was the one that failed. This is clearly shown below where both the first and second asssertions should fail, but only the first failure (the eql) gets reported.

    Cacau does not allow us to pass messages in the assertion which might have allowed us to flag potential issues that would aid in debugging failures.

    1. Min
        (deftest "t1-fail-1" ()
          (let ((x 1) (y 2))
            (assert-p:eql-p x y)
            (assert-p:equal-p 1 2)))
        (run :reporter :min)
      <=> Cacau <=>
      
      From 1 running tests:
      
      0 passed
      1 failed
      NIL
      
    2. List

      Now the list reporting level

      (deftest "t1-fail-2" ()
        (let ((x 1) (y 2))
          (assert-p:equal-p x y)
          (assert-p:equal-p 1 2)))
      
      (run :reporter :list)
      <=> Cacau <=>
      
      <- t1-fail-2:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      -------------------------
      From 1 running tests:
      
      0 passed
      1 failed
      NIL
      
    3. Full

      And finally the full reporting level

      (deftest "t1-fail-3" ()
        (let ((x 1) (y 2))
          (assert-p:eql-p x y)
          (assert-p:equal-p 1 2)))
      
      (run :reporter :full)
      <=> Cacau <=>
      
      <- t1-fail-3:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      Epilogue
      -------------------------
      0 running suites
      1 running tests
      0 only suites
      0 only tests
      0 skip suites
      0 skip tests
      0 total suites
      1 total tests
      0 passed
      1 failed
      1 errors
      740673543 run start
      740673543 run end
      1/1000000 run duration
      0 completed suites
      1 completed tests
      
      Errors
      -------------------------
      Suite: :SUITE-ROOT
      Test: t1-fail-3
      Message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      
  2. Basics

    The empty form after the test name is for particular parameters such as :skip and :async.

    (deftest "t1" ()
      (assert-p:eql-p 1 1)
      (assert-p:condition-error-p
       (error 'division-by-zero)
       division-by-zero))
    
    (run)
    <=> Cacau <=>
    
    From 1 running tests:
    
    1 passed
    0 failed
    

    You can already anticipate what happens you are testing a function and you change that function. Yes, you need to manually recompile the test and the earlier versions might still be found as well as the new version when you call (run).

  3. Edge Cases: Value expressions, loops, closures and calling other tests
    1. Value expressions

      Cacau (or really assert-p) has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

      (deftest "t2-values-3" ()
        (assert-p:equalp-p (values 1 2) (values 1 3)))
      (run)
      <=> Cacau <=>
      
      From 1 running tests:
      
      1 passed
      0 failed
      NIL
      

      Basically it accepted the values expressions but only looked at the first values.

    2. Looping and closures.

      Cacau will test correctly if it is looking at variables from a closure surrounding the test.

          (let ((l1 '(#\a #\B #\z))
                  (l2 '(97 66 122)))
            (deftest "t2-loop" ()
              (loop for x in l1 for y in l2 do
                (assert-p:eql-p (char-code x) y))))
      
        (run :reporter :list)
      <=> Cacau <=>
      
       -> t2-loop
      
      -------------------------
      From 1 running tests:
      
      1 passed
      0 failed
      
    3. Calling another test from a test

      I have not figured out a way for a test to call another test in cacau.

  4. Redefinition Ambiguities

    Now consider the following where I mis-define a test with multiple values, then attempt to correct it. and am left not knowing where I stand.

       (deftest "t2-with-multiple-values" () (assert-p:eql-p  1 1 2))
       ; in: DEFTEST "t2-with-multiple-values"
       ;     (NOLOOP.ASSERT-P:EQL-P 1 1 2)
       ;
       ; caught STYLE-WARNING:
       ;   The function EQL-P is called with three arguments, but wants exactly two.
       ;
       ; compilation unit finished
       ;   caught 1 STYLE-WARNING condition
       #<NOLOOP.CACAU::TEST-CLASS {1005098793}>
    
       (deftest "t2-with-multiple-values" () (assert-p:t-p  (= 1 1 2)))
       #<NOLOOP.CACAU::TEST-CLASS {1005292E43}>
    
     (run) ; the minimum level of info report, probably a mistake
       <=> Cacau <=>
       From 2 running tests:
    
       0 passed
       2 failed
       NIL
    
    (run :reporter :list) ; I try running again with a higher level of information
       <=> Cacau <=>
       -------------------------
       From 0 running tests:
    
       0 passed
       0 failed
    

    Even though the test failed as expected, it is showing 2 running tests. What are the two tests? I would have expected only one test since we are using the same string name.

  5. Conditions (Failing)

    The following fails as expected.

    (deftest "t7-bad-error" ()
      (assert-p:condition-error-p
         (error 'division-by-zero)
         floating-point-overflow))
    
  6. Suites, tags and other multiple test abilities
    1. Lists of tests

      No such capability outside of suites.

    2. Suites

      You can have multiple suites of tests, but the (run) function will run everything that has not been run before:

        (defsuite :s0 ()
          (deftest "s0-t1" () (assert-p:eql-p 1 2)))
      
        (defsuite :s1 ()
          (let ((x 0))
            (deftest "s1-t1" () (assert-p:eql-p x 0))
             (defsuite :s2 ()
              (deftest "s2-t1" () (assert-p:eql-p 1 3)))))
      
      (run :reporter :list)
      <=> Cacau <=>
      
      :S0
       <- s0-t1:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      2
      :S1
       -> s1-t1
       :S2
        <- s2-t1:
      Error message:
      BIT EQL (INTEGER 0 4611686018427387903)
      
      Actual:
      1
      Expected:
      3
      -------------------------
      From 3 running tests:
      
      1 passed
      2 failed
      

      What I would like to see is an ability to run the suites separately. You cannot do that. What is the value of having suites and sub-suites if you cannot run them separately? top

  7. Fixtures and Freezing Data

    This is where cacau brings something to the table. You can use the hooks defbefore-all, defafter-all, defbefore-each and defafter-each to set up or take down data environment or contexts.

          (defsuite :suite-1 ()
            (defbefore-all "Before-all" () (print ":Before-all"))
            (defafter-each "After-each" () (print ":After-each"))
            (defafter-all "After-all" () (print ":After-all"))
            (defbefore-each "Before-each Suite-1" ()
              (print "run Before-each Suite-1"))
            (deftest "Test-1" () (print "run Test-1") (t-p t))
            (deftest "Test-2" () (print "run Test-2") (t-p t))
            (defsuite :suite-2 ()
              (defbefore-each "Before-each Suite-2" ()
                (print "run Before-each Suite-2"))
              (deftest "Test-3" () (print "run Test-3") (t-p t))
              (deftest "Test-4" () (print "run Test-4") (t-p t))))
    
          (run)
    
    ":Before-all"
    "run Before-each Suite-1"
    "run Test-1"
    ":After-each"
    "run Before-each Suite-1"
    "run Test-2"
    ":After-each"
    "run Before-each Suite-1"
    "run Before-each Suite-2"
    "run Test-3"
    ":After-each"
    "run Before-each Suite-1"
    "run Before-each Suite-2"
    "run Test-4"
    ":After-each"
    ":After-all" <=> Cacau <=>
    
    From 4 running tests:
    
    4 passed
    0 failed
    
  8. Removing tests

    I did not see anything here, but maybe I missed it.

  9. Sequencing, Random and Failure Only

    Sequential only.

  10. Skip Capability

    You can specify to skip a test or a suite (and no, I do not consider this to be a good substitute for being able to specify which test or suite you want to run).

    (defsuite :suite-1 ()
      (deftest "Test-1" (:skip) (t-p t))
      (deftest "Test-2" () (t-p t))) ;; run!
    
    (defsuite :suite-2 (:skip)
      (let ((x 0))
        (deftest "Test-1" () (eql-p x 0))
        (deftest "Test-2" () (t-p t))
        (defsuite :suite-3 ()
          (deftest "Test-1" () (t-p t))
          (deftest "Test-2" () (t-p t)))))
    
  11. Async Abilities

    I am going to have to cheat here and refer you to the author's page for the description of the async capabilities https://github.com/noloop/cacau#async-test.

  12. Time Limits for tests

    Cacau does have the ability to specify time limits for tests. The time limits can be set by suite (all tests in the suite have the same time limit), by hook or by test. See the author's discussion at https://github.com/noloop/cacau#timeout

  13. Random Data Generators

    I did not see anything here with respect to data generators.

12.4 Discussion

As I said in the summary, the fact that it does not show all the assertions that failed and does not give me the ability to specify suites or tests to run make this unsuitable for me.

12.5 Who Uses

cl-minify-css-test

top

13 cardiogram

top

13.1 Summary

homepage Abraham Aguilar MIT 2020

Cardiogram starts off with immediate problems. The documentation does not match up with the code and its test-system does not comply (Xach flagged this and the author has not yet responded). I cannot get it to work and cannot recommend it.

14 checkl

top

14.1 Summary

homepage Ryan Pavlik LLGPL, BSD 2018

Checkl is different. As a result, this section is different from the other frameworks. Checkl assumes that you do informal checks at the REPL as you are coding and saves those results. Assuming you change your program and check the modified function or whatever with exactly the same parameters, it will let you know if the result is now different. As a result it is a bit more difficult to compare based on the wish list. It can, however, be integrated with Fiveam.

14.2 Usage

  1. Basic Usage

    Assume you create two functions foo-up and foo-down and compile them.

    (defun foo-up (x)
      (+ x 2))
    
    (defun foo-down (x)
      (- x 2))
    

    Now you create checks against the functions and compile them.

    (check () (foo-up 2))
    (check () (foo-down 2))
    

    If you now revise foo-down and compile it, checkl will cause the system to throw an error immediately upon compile foo-down because the results were different (different as in equalp different.)

    (defun foo-down (x)
      (- x 3))
    
    Result 0 has changed: -1
    Previous result: 0
       [Condition of type CHECKL::RESULT-ERROR]
    
    Restarts:
     0: [USE-NEW-VALUE] The new value is correct, use it from now on.
     1: [SKIP-TEST] Skip this, leaving the old value, but continue testing
     2: [ABORT] Abort compilation.
     3: [*ABORT] Return to SLIME's top level.
     4: [ABORT] abort thread (#<THREAD "worker" RUNNING {1010B73BD3}>)
    

    Modifying and recompiling foo-up similarly also triggers an error.

    Suppose you want to have multiple checks on function based on different parameters. You can name the check tests.

    (check (:name :foo-up-integer) (foo-up 4))
    (check (:name :foo-down-integer) (foo-down 4))
    (check (:name :foo-up-real) (foo-up 4.5))
    (check (:name :foo-down-real) (foo-down 4.5))
    

    If you pass those check names to run, you get the following (remember that post our modifications, foo-up and foo-down are now adding and subtracting 3.:

    (run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real)
    (7)
    (1)
    (7.5)
    (1.5)
    

    Checks can also be defined checking the results of multiple functions using the results function.

    (check (:name :foo-up-and-down)
      (results (foo-up 7) (foo-down 3.2)))
    
    (run :foo-up-and-down)
    
    (10 0.20000005)
    

    By the way, results will copy structures, sequences and marshalls standard-objects.

  2. Suites, tags and other multiple test abilities

    The run-all function will return the results for all the check tests defined in the current package.

    1. Lists of tests

      As seen in the basic usage, checkl can run multiple checks.

      (run :foo-up-integer :foo-down-integer :foo-up-real :foo-down-real)
      
    2. Suites/tags/categories

      When you are naming checks, you can also pass a category name to the keyword parameter category. You can then pass the category name to run-all and get just the values related to the checks with that category flagged.

      (check (:name :foo :category :some-category) ...)
      
      (run-all :some-category ...)
      
  3. Storage

    You can store these named checks by running check-store. E.g.:

    (checkl-store "/home/sabrac/checkl-test")
      ;;; some time later
    (checkl-load "/home/sabrac/checkl-test")
    
  4. Integration with Fiveam

    Assuming you have already loaded fiveam, you can also send the checkl tests to fiveam by using check-formal.

    (checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John" "Paul"))
    "JohnPaul"
    
    (fiveam:run! :default)
    
    Running test suite DEFAULT
     Running test ONE-CONCAT .
     Did 1 check.
        Pass: 1 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    
    T
    NIL
    

    Now, if we go back and add a space to "John", and run check-formal, not only will check-formal fail, but subsequently running fiveam will fail.

    (checkl:check-formal (:name :one-concat) (tf-concat-strings-1 "John " "Paul"))
    ; Evaluation aborted on #<CHECKL::RESULT-ERROR {1003FE1433}>.
    TF-TEST1> (fiveam:run! :default)
    
    Running test suite DEFAULT
     Running test ONE-CONCAT f
     Did 1 check.
        Pass: 0 ( 0%)
        Skip: 0 ( 0%)
        Fail: 1 (100%)
    
     Failure Details:
     --------------------------------
     ONE-CONCAT []:
    
    CHECKL::RESULT
     evaluated to
    ("John Paul")
     which is not
    CHECKL:RESULT-EQUALP
     to
    ("JohnPaul")
    ..
     --------------------------------
    NIL
    (#<IT.BESE.FIVEAM::TEST-FAILURE {1004189053}>)
    

    top

14.3 Who Uses Checkl?

15 clunit

top

15.1 Summary

homepage Tapiwa Gutu BSD 2017

Updated 13 June 2021 Based on unresolved issues showing at github, as well as my inability to reach the author, This does not appear to be maintained and subject to bitrot. You should look at clunit2 instead. The difference between clunit2 and clunit is

  • clunit2's ability to redirect reporting output,
  • clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
  • clunit2' ability to test multiple value expressions
  • clunit2's suite signaling capability and
  • the fact that clunit2 has a maintainer.

15.2 Assertion Functions

Clunit's assertion functions are:

assert-condition assert-eq assert-eql
assert-equal assert-equality assert-equality*
assert-equalp assert-expands assert-fail
assert-false assert-true assertion-condition
assertion-conditions assertion-error assertion-expander
assertion-fail-forced assertion-failed assertion-passed

The predicate used by assert-equality is determined by the setting of *clunit-equality-test*. top

15.3 Usage

  1. Report Format

    Report format is controlled by the variable *clunit-report-format*. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.

    The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.

    (run-test 'some-test-name :report-progress nil)
    
    (run-suite 'some-suite-name :report-progress nil)
    

    Clunit2, unlike clunit has a *test-output-stream* variable which can be used to redirect the reports to file or other stream locations.

    To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

    (run-test test-name :use-debugger t)
    

    To give you a sense of what the failure report looks like, we take a basic failing test with multiple assertions . We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has two variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format-like parameters.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-equal x y  "This assert-equal test was meant to fail" x y)
        (assert-true (= 1 2) "This assert-true test was meant to fail")
        (assert-false (=  1 1))
        (assert-eq 'a 'b)
        (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3))
        (assert-condition division-by-zero
            (error 'floating-point-overflow)
          "testing condition assertions")
        (assert-equalp (values 1 2) (values 1 3 4))))
    #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}>
    
    (run-test 't1-fail)
    
    PROGRESS:
    =========
        T1-FAIL: FFFFFE.
    
    FAILURE DETAILS:
    ================
        T1-FAIL: Expression: (EQUAL X Y)
                 Expected: X
                 Returned: 2
                 This assert-equal test was meant to fail
                 X => 1
                 Y => 2
    
        T1-FAIL: Expression: (= 1 2)
                 Expected: T
                 Returned: NIL
                 This assert-true test was meant to fail
    
        T1-FAIL: Expression: (= 1 1)
                 Expected: NIL
                 Returned: T
    
        T1-FAIL: Expression: (EQ 'A 'B)
                 Expected: 'A
                 Returned: B
    
        T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3))
                 Expected: (PROGN (SETQ V1 4) (SETQ V2 3))
                 Returned: (PROGN (SETQ V1 3) (SETQ V2 3))
    
        T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 7 assertions.
            Passed: 1/7 ( 14.3%)
            Failed: 5/7 ( 71.4%)
            Errors: 1/7 ( 14.3%)
    
  2. Basics

    Looking a little closer at basic test where we know everything will pass. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.

    (defmacro setq2 (v1 v2 e)
      (list 'progn (list 'setq v1 e) (list 'setq v2 e)))
    
    (deftest t1 ()
      "describe t1"
      (assert-true (=  1 1))
      (assert-false (=  1 2))
      (assert-eq 'a 'a)
      (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3))
      (assert-condition division-by-zero
          (error 'division-by-zero)
        "testing condition assertions")
      (assert-condition simple-warning
          (signal 'simple-warning)))
    

    Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.

    (run-test 't1)
    
    PROGRESS:
    =========
        T1: ......
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 6 assertion.
            Passed: 6/6 (100.0%)
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops. closures and calling other tests

    Clunit has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (assert-equal (values 1 2)
                    (values 1 3)))
    
    1. Looping and closures.

      Will a test accept looping through assertions with lexical variables from a closure? NO. Clunit complains that the variables l1 and l2 are never defined.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (deftest t2-loop ()
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      

      Clunit is quite happy to loop if the variables are defined within the test or, for that matter, if the closure encompassed tested functions rather than the test itself.:

      (deftest t2-loop ()
        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
          (loop for x in l1 for y in l2 do
            (assert-equal (char-code x) y))))
      

      CLunit has no assertions that will handle checking more than two values in a single assertion, so you will have to use assert-true with the usual CL functions.

    2. Calling another test from a test

      This uses the second version of test t2 which has two failing asssertions and one passing assertion.

      (deftest t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-test 't2))
      
      (run-test 't3)
      
      PROGRESS:
      =========
          T3: .
      PROGRESS:
      =========
          T2: FF.
      
      FAILURE DETAILS:
      ================
          T2: Expression: (EQUAL 1 2)
              Expected: 1
              Returned: 2
      
          T2: Expression: (EQUAL 2 3)
              Expected: 2
              Returned: 3
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 3 assertions.
              Passed: 1/3 some tests not passed
              Failed: 2/3 some tests failed
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      

      It reports each test separately, but correctly, but obviously no composition..

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      clunit will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.

    2. Suites

      Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.

      Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.

      (deftest testC ((suiteA suiteB)(testA testB))
        ...)
      

      Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):

      • If REPORT-PROGRESS is non-NIL, the test progress is reported.
      • If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
      • If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
      • If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
      • if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
      (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
      
        (deftest t4 (s0) ; a test that is a member of a suite
          "describe t4"
          (assert-eq 1 1))
      
        ;;a multiple assertion test that is a member of a suite with
        ;; a passing test, an error signaled and a failing test
        (deftest t4-error (s0)
          "describe t4-error"
          (assert-eq  'a 'a)
          (assert-condition error (error "t4-errored out"))
          (assert-true (= 1 2)))
      
        (deftest t4-fail (s0) ;
          "describe t4-fail"
          (assert-false (= 1 2)))
      
      (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
        (deftest t4-s1 (s1)
         (assert-true (= 1 1)))
      
      (run-suite 's0)
      
      PROGRESS:
      =========
      
          S0: (Test Suite)
              T4-FAIL: .
              T4-ERROR: ..F
              T4: .
      
              S1: (Test Suite)
                  T5: .
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      
      
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      
      
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
    3. Early termination

      You stop a suite test if a test fails without dropping into the debugger.

      (run-suite 'suite-name :stop-on-fail t)
      

      top

    4. Fixtures and Freezing Data
      (defclass fixture-data ()
        ((a :initarg :a :initform 0 :accessor a)
         (b :initarg :b :initform 0 :accessor b)))
      
      (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply
        (let ((x (make-instance 'fixture-data :a 100 :b -100)))
          @body))
      
      ;; create a sub suite and checking fixture inheritance
      (defsuite s2 (s1))
      
      (deftest t6-s1 (s1)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (deftest t6-s2 (s2)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (run-suite 's1)
      
      PROGRESS:
      =========
          S1: (Test Suite)
              T6-S1: ..
              T5: .
      
              S2: (Test Suite)
                  T6-S2: ..
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      

      To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

      (run-suite suite-name :use-debugger t)
      
  5. Removing tests
    (clunit:undeftest t1)
    (clunit:undeffixture fixture-name)
    (clunit:undefsuite suite-name)
    
  6. Sequencing, Random and Failure Only

    The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit has a function rerun-failed-tests to rerun failed tests.

  7. Skip Capability

    Other than the dependency abilities previously mentioned, clunit has no additional skipping capability.

  8. Random Data Generators

    Clunit has no built in data generators.

15.4 Discussion

The difference between clunit2 and clunit is clunit2's ability to redirect reporting output, suite signaling capability and the fact that it has a maintainer. Both of them are very slow. top

15.5 Who Uses

bt-semaphore, data-frame, cl-kanren, "cl-random-tests" "cl-slice-tests" "listoflist" "lla-tests" "oe-encode-test" "trivial-tco-test")

top

16 clunit2

top

16.1 Summary

homepage Cage (fork of clunit) BSD 2020

Update 13 June 2021 Clunit2 is a fork of Clunit. For quicklisp system loading purposes, it is clunit2, for package naming purposes it is clunit, not clunit2. The difference between clunit2 and clunit is:

  • clunit2's ability to redirect reporting output,
  • clunit2's huge performance increase (clunit is painfully slow on any sized testing target)
  • clunit2' ability to test multiple value expressions
  • clunit2's suite signaling capability and
  • the fact that clunit2 has a maintainer.

Clunit2 does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, fixtures are available as are suites and the ability to rerun failed tests. You can specify that a test that depends on other tests passing will be skipped if those prior tests fail.

With respect to the edge cases, as of the 13 June 2021 update, Clunit2 will accept variables declared in closures surrounding the test and does have the ability completely test all the values returned from a values expression.

16.2 Assertion Functions

Clunit2's assertion functions are:

assert-condition assert-eq assert-eql
assert-equal assert-equality assert-equality*
assert-equalp assert-expands assert-fail
assert-false assert-true assertion-condition
assertion-conditions assertion-error assertion-expander
assertion-fail-forced assertion-failed assertion-passed

top

16.3 Usage

  1. Report Format

    Report format is controlled by the variable *clunit-report-format*. It can be set to :default, :tap or NIL. In all the examples showing reports, we will be using the default format.

    The progress report can be switched off by passing a keyword parameter to the functions run-test or run-suite.

    (run-test 'some-test-name :report-progress nil)
    
    (run-suite 'some-suite-name :report-progress nil)
    

    Clunit2, unlike clunit has a *test-output-stream* variable which can be used to redirect the reports to file or other stream locations.

    To go interactive - dropping immediately into the debugger, you would set the key word parameter :use-debugger to t.

    (run-test test-name :use-debugger t)
    
  2. Basics

    The most basic test. The empty form after the test name is for the name of the suite (if any). Just for fun and since clunit has it, we will define a macro and show the assert-expands assertion function as well. We also include a diagnostic string in the assertion-condition assertion with the division-by-zero error, but the same could be done for any assertion.

    (defmacro setq2 (v1 v2 e)
      (list 'progn (list 'setq v1 e) (list 'setq v2 e)))
    
    (deftest t1 ()
      "describe t1"
      (assert-true (=  1 1))
      (assert-false (=  1 2))
      (assert-eq 'a 'a)
      (assert-expands (PROGN (SETQ V1 3) (SETQ V2 3)) (setq2 v1 v2 3))
      (assert-condition division-by-zero
          (error 'division-by-zero)
        "testing condition assertions")
      (assert-condition simple-warning
          (signal 'simple-warning)))
    

    Running this to show the default report on a passing test. There is a progress report with dots indicating passed assertions, F indicating failed assertions and E if there is an error.:

    (run-test 't1)
    
    PROGRESS:
    =========
        T1: .....
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 6 assertion.
            Passed: 6/6 (100.0%)
    

    You can switch off the progress report for running tests and suites by setting the keyword parameter :report-progress to nil:

    (run-test 't1 :report-progress nil)
    

    Now a basic failing test with multiple assertions (and also to see if the library can deal with values expressions). We will put some diagnostic strings into a few of the assertions. The first assertion not only has a diagnostic string, but it also has foll variables and the second assertion which does not. Unlike some other framework diagnostic strings, the string that gets passed does not accept format like parameters.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-equal x y  "This assert-equal test was meant to fail" x y)
        (assert-true (= 1 2) "This assert-true test was meant to fail")
        (assert-false (=  1 1))
        (assert-eq 'a 'b)
        (assert-expands (PROGN (SETQ V1 4) (SETQ V2 3)) (setq2 v1 v2 3))
        (assert-condition division-by-zero
            (error 'floating-point-overflow)
          "testing condition assertions")
        (assert-equalp (values 1 2) (values 1 3 4))))
    #<CLUNIT::CLUNIT-TEST-CASE {100FB04C83}>
    TF-CLUNIT> (run-test 't1-fail)
    
    PROGRESS:
    =========
        T1-FAIL: FFFFFE.
    
    FAILURE DETAILS:
    ================
        T1-FAIL: Expression: (EQUAL X Y)
                 Expected: X
                 Returned: 2
                 This assert-equal test was meant to fail
                 X => 1
                 Y => 2
    
        T1-FAIL: Expression: (= 1 2)
                 Expected: T
                 Returned: NIL
                 This assert-true test was meant to fail
    
        T1-FAIL: Expression: (= 1 1)
                 Expected: NIL
                 Returned: T
    
        T1-FAIL: Expression: (EQ 'A 'B)
                 Expected: 'A
                 Returned: B
    
        T1-FAIL: Expression: (MACROEXPAND-1 '(SETQ2 V1 V2 3))
                 Expected: (PROGN (SETQ V1 4) (SETQ V2 3))
                 Returned: (PROGN (SETQ V1 3) (SETQ V2 3))
    
        T1-FAIL: arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    
    SUMMARY:
    ========
        Test functions:
            Executed: 1
            Skipped:  0
    
        Tested 7 assertions.
            Passed: 1/7 ( 14.3%)
            Failed: 5/7 ( 71.4%)
            Errors: 1/7 ( 14.3%)
    

    With respect to the values expression, we can see that it passed, proving clunit only looks at the first value in the values expression.

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Update 13 June 2021 Clunit2 as of the date of this update report will compare all the values from two values expressions. The following now properly fails.

    (deftest t2-values-expressions ()
      (assert-equal (values 1 2)
                    (values 1 3)))
    
    1. Looping and closures.

      Update 13 June 2021 Clunit2 will accept variables declared in a closure surrounding the test. The following passes.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (deftest t2-loop ()
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      

      Clunits is quite happy to loop if the variables are defined within the test:

      (deftest t2-loop ()
        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
          (loop for x in l1 for y in l2 do
            (assert-equal (char-code x) y))))
      
    2. Calling another test from a test

      We will call the second version of test t2 which has failures.

      (deftest t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-test 't2))
      
      (run-test 't3)
      
      PROGRESS:
      =========
          T3: .
      PROGRESS:
      =========
          T2: FF.
      
      FAILURE DETAILS:
      ================
          T2: Expression: (EQUAL 1 2)
              Expected: 1
              Returned: 2
      
          T2: Expression: (EQUAL 2 3)
              Expected: 2
              Returned: 3
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 3 assertions.
              Passed: 1/3 some tests not passed
              Failed: 2/3 some tests failed
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      
      SUMMARY:
      ========
          Test functions:
              Executed: 1
              Skipped:  0
      
          Tested 1 assertion.
              Passed: 1/1 all tests passed
      

      It reports each test separately (no compostion), but correctly.

  4. Conditions

    The following fails as expected.

    (deftest t7-bad-error ()
      (assert-condition floating-point-overflow
         (error 'division-by-zero)
         "testing condition assertions. This should fail"))
    
  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      clunit2 will not run lists of tests. You can run tests which run other tests. But otherwise you will need to set up suites.

    2. Suites

      Tests can be associated with multiple suites. The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. This test is put in a queue unti its dependencies are satisfied. Both suite specifications and test dependencies are set in the first parameter form that we left nil in the above tests.

      Assume you wanted testC to be part of suites suiteA and suiteB and dependent on tests testA and testB passing.

      (deftest testC ((suiteA suiteB)(testA testB))
        ...)
      

      Suites are tested using the run-suite function. It has the following additional parameters (besides the obvious name for the suite to be tested):

      • If REPORT-PROGRESS is non-NIL, the test progress is reported.
      • If USE-DEBUGGER is non-NIL, the debugger is invoked whenever an assertion fails.
      • If STOP-ON-FAIL is non-NIL, the rest of the unit test is cancelled when any assertion fails or an error occurs.
      • If SIGNAL-CONDITION-ON-FAIL is non-NIL, run-suite will signal a TEST-SUITE-FAILURE condition if at least either a test fails or signal an error condition.
      • if PRINT-RESULTS-SUMMARY is non nil the summary results of tests is printed on the standard output.
      (defsuite s0 ()); Ultimate parent suite if the library provides inheritance
      
        (deftest t4 (s0) ; a test that is a member of a suite
          "describe t4"
          (assert-eq 1 1))
      
        ;;a multiple assertion test that is a member of a suite with
        ;; a passing test, an error signaled and a failing test
        (deftest t4-error (s0)
          "describe t4-error"
          (assert-eq  'a 'a)
          (assert-condition error (error "t4-errored out"))
          (assert-true (= 1 2)))
      
        (deftest t4-fail (s0) ;
          "describe t4-fail"
          (assert-false (= 1 2)))
      
      (defsuite s1 (s0)); a sub-suite of suite s0 to check on inheritance
        (deftest t4-s1 (s1)
         (assert-true (= 1 1)))
      
      (run-suite 's0)
      
      PROGRESS:
      =========
          S0: (Test Suite)
              T4-FAIL: .
              T4-ERROR: ..F
              T4: .
      
              S1: (Test Suite)
                  T5: .
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
      FAILURE DETAILS:
      ================
      
          S0: (Test Suite)
              T4-ERROR: Expression: (= 1 2)
                        Expected: T
                        Returned: NIL
      SUMMARY:
      ========
          Test functions:
              Executed: 4
              Skipped:  0
      
          Tested 6 assertions.
              Passed: 5/6 some tests not passed
              Failed: 1/6 some tests failed
      
    3. Early termination

      You stop a suite test if a test fails without dropping into the debugger.

      (run-suite 'suite-name :stop-on-fail t)
      

      top

    4. Fixtures and Freezing Data
      (defclass fixture-data ()
        ((a :initarg :a :initform 0 :accessor a)
         (b :initarg :b :initform 0 :accessor b)))
      
      (deffixture s1 (@body) ;;IMPORTANT Note that the fixture gets the name of the suite to which it will apply
        (let ((x (make-instance 'fixture-data :a 100 :b -100)))
          @body))
      
      ;; create a sub suite and checking fixture inheritance
      (defsuite s2 (s1))
      
      (deftest t6-s1 (s1)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (deftest t6-s2 (s2)
        (assert-equal  (a x) 100)
        (assert-equal   (b x) -100))
      
      (run-suite 's1)
      
      PROGRESS:
      =========
          S1: (Test Suite)
              T6-S1: ..
              T5: .
      
              S2: (Test Suite)
                  T6-S2: ..
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      SUMMARY:
      ========
          Test functions:
              Executed: 3
              Skipped:  0
      
          Tested 5 assertions.
              Passed: 5/5 all tests passed
      
  6. Removing tests
    (undeftest t1)
    (undeffixture fixture-name)
    (undefsuite suite-name)
    
  7. Sequencing, Random and Failure Only

    The execution of tests within a suite are unordered by default, but you can specify that a test depends on other tests passing. If those tests are not passed, then this test will be skipped. Clunit2 has a function rerun-failed-tests to rerun failed tests.

  8. Skip Capability

    Other than the dependency abilities previously mentioned, clunit2 has no additional skipping capability.

  9. Random Data Generators

    Clunit2 has no built in data generators.

16.4 Discussion

Clunit2 is a substantial step forward from clunit and should be considered the successor.

17 com.gigamonkeys.test-framework

17.1 Summary

homepage Peter Seibel BSD 2010

This is a basic testing framework without the bells and whistles found in several of the others. For example, it lacks fixtures or suites. Nothing wrong with it but you can find a lot more functionality elsewhere.

17.2 Assertion Functions

check expect

Gigamonkeys has a limited range of assertion functions. expect covers conditions and errors. check is the equivalent of is in e.g. Fiveam.

top

17.3 Usage

One thing to note on setup. If you are using quicklisp, the quickload system name is:

(ql:quickload :com.gigamonkeys.test-framework)

however the package name, at least in sbcl, is com.gigamonkeys.test

  1. Report Format

    Gigamonkey tests can be set to go into the debugger on errors, failures or never. this is controlled by the settings of *debug* (for error conditions) and *debug-on-faill* (for test failures).

  2. Basics

    The most basic passing test. Unlike most other frameworks, the empty form after the test name is for parameters which can be passed to the test. Also unlike many other frameworks calling the test using the test function uses an unquoted name of the test.

    (deftest t1 (x)
      (check (= 1 x))
      (expect division-by-zero (error 'division-by-zero)))
    
    (test t1 1)
    Okay: 2 passes; 0 failures; 0 aborts.
    T
    2
    0
    0
    

    Now a basic failing test.

    To go interactive - dropping immediately into the debugger for unexpected conditions, set *debug* to t.

    To drop immediately into the debugger when a test fails, set *debug-on-fail* to t. This is the default, but we will set it to nil for these examples

    (deftest t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (check (= x y) )))
    
    NIL
    TEST> (test t1-fail)
    FAIL ... (T1-FAIL): (= X Y)
      X                 => 1
      Y                 => 2
      (= X Y)           => NIL
    NOT okay: 0 passes; 1 failures; 0 aborts.
    NIL
    0
    1
    0
    
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Gigamonkeys has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (check (equal (values 1 2)
                    (values 1 3))))
    
    1. Closures.

      Gigamonkeys has no problem with variables declared in a closure encompassing the test.

    2. Calling another test from a test
      (deftest t3 (); a test that tries to call another test in its body
        (check (eql 'a 'a))
        (test t2))
      
      (test t3)
      Okay: 3 passes; 0 failures; 0 aborts.
      Okay: 4 passes; 0 failures; 0 aborts.
      T
      4
      0
      0
      

      So far so good, but no composition.

  4. Conditions

    The following immediately throws us into the debugger. If we hit PROCEED, we will get the feedback shown below.

      (deftest t7-bad-error ()
        (expect division-by-zero
           (error 'floating-point-overflow)))
    
      (test t7-bad-error)
    
    ABORT ... (T7-BAD-ERROR): arithmetic error FLOATING-POINT-OVERFLOW signalled
    NOT okay: 0 passes; 0 failures; 1 aborts.
    NIL
    0
    0
    1
    
  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      Gigamonkeys' test function will not accept a list of tests.

    2. Suites

      Gigamonkeys can run all the tests associated with a package, but if you want to define "suites", you should write your own function that runs specific tests.

  6. Fixtures and Freezing Data

    None

  7. Removing tests

    Gigamonkeys has functions remove-test-function and clear-package-tests

  8. Sequencing, Random and Failure Only

    None

  9. Skip Capability

    None

  10. Random Data Generators

    None

17.4 Discussion

top

17.5 Who Uses com.gigamonkeys.test-framework

("monkeylib-text-output")

top <<

18 fiasco

18.1 Summary

homepage João Távora BSD 2 Clause 2020

In spite of the fact that Fiasco does not have its own fixture capability (unless I am missing something), it managed to hit most of the other concerns that I have. It does report all failing assertions within a test, has the option to turn off progress reporting and accepts user provided diagnostic strings with variables in the assertions. Going interactive with debugging is optional, suites are available as is the ability to rerun failed tests. It has skipping functions and, with respect to the edge cases, it handles variables declared in a closure surrounding the test. When a test calls another test it actually manages to compose the results rather than reporting two separate sets.

I found using the suite capability to be confusing (and likely would end up defining packages instead of suites, but you lose some compostion that way). It also does not have the ability some other frameworks have to deal with values expressions.

18.2 Assertion Functions

is finishes not-signals signals

top

18.3 Usage

  1. Report Format

    Fiasco defaults to a reporting format. To go interactive run the test with :interactive t.

    There are two slightly different versions of reporting format, the default and running the test with :verbose t. The verbose version simply adds the docstring for the test, so it does not really add much.

    In the following example, look at the difference in reporting between the four assertions. The first assertion has an = predicate comparing numbers. The second has an equal predicate comparing numbers and the third has variables and a diagnostic string that accepts parameters followed by the variables being passed to the test variables.

    (deftest t1-fail ()
      "Docstring for test t1-fail"
      (let ((x 1) (y 2))
        (is (= 1 2))
        (is (equal 1 2))
        (is (= x y)
            "This test was meant to fail because we know ~a is not = to ~a"
            x y )
        (signals division-by-zero
                 (error 'floating-point-overflow)
                 "testing condition assertions. This should fail")))
    
    (run-tests 't1-fail)
    T1-FAIL...................................................................[FAIL]
    
    Test run had 4 failures:
    
    Failure 1: UNEXPECTED-ERROR when running T1-FAIL
    arithmetic error FLOATING-POINT-OVERFLOW signalled
    
    Failure 2: FAILED-ASSERTION when running T1-FAIL
    This test was meant to fail because we know 1 is not = to 2
    
    Failure 3: FAILED-ASSERTION when running T1-FAIL
    Binary predicate (EQUAL X Y) failed.
    x: 1 => 1
    y: 2 => 2
    
    Failure 4: FAILED-ASSERTION when running T1-FAIL
    Binary predicate (= X Y) failed.
    x: 1 => 1
    y: 2 => 2
    NIL
    (#<test-run of T1-FAIL: 1 test, 4 assertions, 4 failures in NIL sec (3 failed assertions, 1 error, none expected)>)
    

    The interactive version would look like this:

    (run-tests 't1-fail :interactive t)
      Test assertion failed when running T1-FAIL:
    
      Binary predicate (= X Y) failed.
      x: 1 => 1
      y: 2 => 2
         [Condition of type FIASCO::FAILED-ASSERTION]
    
      Restarts:
       0: [CONTINUE] Roger, go on testing...
       1: [CONTINUE] Skip the rest of the test T1-FAIL and continue by returning (values)
       2: [RETEST] Rerun the test T1-FAIL
       3: [CONTINUE-WITHOUT-DEBUGGING] Turn off debugging for this test session and invoke the first CONTINUE restart
       4: [CONTINUE-WITHOUT-DEBUGGING-ERRORS] Do not stop at unexpected errors for the rest of this test session and continue by invoking the first CONTINUE restart
       5: [CONTINUE-WITHOUT-DEBUGGING-ASSERTIONS] Do not stop at failed assertions for the rest of this test session and continue by invoking the first CONTINUE restart
    
  2. Basics

    The empty form after the test name is for parameters to pass to the test. Calling the test using run-tests returns a list of context objects which is an internal class. Each test run is pushed to a history of test runs kept in the appropriately named *test-result-history*.

    (deftest t1 ()
      "docstring for t1"
      (is (=  1 1) "first assertion")
      (is (eq 'a 'a) "second assertion")
      (signals division-by-zero (error 'division-by-zero))
      (finishes (+ 1 1)))
    T1
    (run-tests 't1) ;; or (run-tests '(t1))
    T1........................................................................[ OK ]
    
    T
    (#<test-run of T1: 1 test, 4 assertions, 0 failures in 1.4e-5 sec>)
    

    If you add the keyword parameter :verbose, you get slightly more information in that it prints the test docstring (but not the assertion docstrings) and the number of assertions, failures etc.

    (run-tests 't1 :verbose t)
    T1........................................................................[ OK ]
    (docstring for t1)
    (4 assertions, 0 failed, 0 errors, 0 expected)
    

    Fiasco tests are funcallable. You will note that when calling the test in this fashion returns a single test-run object rather than a list of test run objects.

    (t1)
    .
    T
    #<test-run of T1: 1 test, 1 assertion, 0 failures in 2.8e-5 sec>
    
    (funcall 't1)
    .
    T
    #<test-run of T1: 1 test, 4 assertion, 0 failures in 2.6e-5 sec>
    

    Fiasco tests also take parameters as in this example:

    (deftest t1-param (x) (is (= 1 x)))
    
    (t1-param 1)
    #<test-run of T1-PARAM: 1 test, 1 assertion, 0 failures in 2.2e-5 sec>
    
    (t1-param 2)
    X; Evaluation aborted on #<FIASCO::FAILED-ASSERTION "Binary assertion function ~A failed.~%~
                                   x: ~S => ~S~%~
                                   y: ~S => ~S" {100297EE03}>.
    

    You do not have to manually recompile a test after a tested function has been modified. We will skip the proof.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Fiasco has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression. The following passes.

    (deftest t2-values-expressions ()
      (is (equalp (values 1 2)
                    (values 1 3))))
    
    1. Looping and closures.

      Fiasco has no problems with using variables declared in a closure surrounding the test.

    2. Calling another test from a test
      (deftest t3 (); a test that tries to call another test in its body
          (is (eq 'a 'a))
          (t2))
      
      (t3)
      .XX.
      T
      #<test-run of T3: 2 tests, 4 assertions, 2 failures in 0.063884 sec (2 failed assertions, 0 errors, none expected)>
      

      As hoped, the failures in t2 kicked us into the debugger where we could select continue and correctly end up with 2 tests, 4 assertions and 2 failures. This is better than most frameworks which would present us with two reports rather than a composed report.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Fiasco has no problem running lists of tests

      (run-tests '(t1 t2))
      T1........................................................................[ OK ]
      
      T2........................................................................[ OK ]
      
      T
      (#<test-run of T1: 1 test, 1 assertion, 0 failures in 1.4e-5 sec>
       #<test-run of T2: 1 test, 3 assertions, 0 failures in 5.0e-6 sec>)
      
    2. Suites

      Fiasco can test for suites or all tests associated with a fiasco defined package.

      Let's start with packages.

      1. Packages

        You will need to use fiasco's define-test-package macro rather than define-package in order to use the run-package-tests function. Inside the new package, the function run-package-tests is the preferred way to execute the suite. To run the tests from outside, use run-tests.

        The function run-package-tests will print a report and returns two values. It accepts a :stream keyword parameter, making it easy to redirect the output to a file if so desired.

        The first value returned will be t if all tests passed, nil otherwise. The second value will be a list of context objects which contain various information about the test run. See the following example, modified slightly from https://github.com/joaotavora/fiasco/blob/master/test/suite-tests.lisp.

        (fiasco:define-test-package #:tf-fiasco-examples)
        
        (in-package :tf-fiasco-examples)
        
        (defun seconds (hours-and-minutes)
          (+ (* 3600 (first hours-and-minutes))
             (* 60 (second hours-and-minutes))))
        
        (defun hours-and-minutes (seconds)
          (list (truncate seconds 3600)
                (truncate seconds 60)))
        
        (deftest test-conversion-to-hours-and-minutes ()
          (is (equal (hours-and-minutes 180) '(0 3)))
          (is (equal (hours-and-minutes 4500) '(1 15))))
        
        (deftest test-conversion-to-seconds ()
          (is (= 60 (seconds '(0 1))))
          (is (= 4500 (seconds '(1 15)))))
        
        (deftest double-conversion ()
          (is (= 3600 (seconds (hours-and-minutes 3600))))
          (is (= 1234 (seconds (hours-and-minutes 1234)))))
        
        (deftest test-skip-test ()
          (skip)
          ;; These should not affect the test statistics below.
          (is (= 1 1))
          (is (= 1 2)))
        
        (run-package-tests :package :tf-fiasco-examples)
        TF-FIASCO-EXAMPLES (Suite)
          TEST-CONVERSION-TO-HOURS-AND-MINUTES....................................[FAIL]
          TEST-CONVERSION-TO-SECONDS..............................................[ OK ]
          DOUBLE-CONVERSION.......................................................[FAIL]
          TEST-SKIP-TEST..........................................................[SKIP]
        
        Test run had 3 failures:
        
          Failure 1: FAILED-ASSERTION when running DOUBLE-CONVERSION
            Binary assertion function (= X Y) failed.
            x: 1234 => 1234
            y: (SECONDS (HOURS-AND-MINUTES 1234)) => 1200
        
          Failure 2: FAILED-ASSERTION when running DOUBLE-CONVERSION
            Binary assertion function (= X Y) failed.
            x: 3600 => 3600
            y: (SECONDS (HOURS-AND-MINUTES 3600)) => 7200
        
          Failure 3: FAILED-ASSERTION when running TEST-CONVERSION-TO-HOURS-AND-MINUTES
            Binary assertion function (EQUAL X Y) failed.
            x: (HOURS-AND-MINUTES 4500) => (1 75)
            y: '(1 15) => (1 15)
        NIL
        (#<test-run of TF-FIASCO-EXAMPLES: 5 tests, 6 assertions, 3 failures in 5.5e-4 sec (3 failed assertions, 0 errors, none expected)>)
        

        You can drop the explanations of the failures by passing nil to :describe-failures

        (run-package-tests :package :tf-fiasco-examples :describe-failures nil)
        

        There is an undocumented function run-failed-tests which looks at the last test run. My issue with this function is that it seems to need *debug-on-assertion-failure* and *debug-on-unexpected-error* set to T in order to work, which means that it forces me into the debugger whether I want it to or not.

      2. Suites

        Suites are created by the (defsuite) macro, but they are really just tests that call other tests. Phil Gold's original concern about suites in Stefil was "My only problem with the setup is that I don't see a way to explicitly assign tests to suites, aside from dynamically binding stefil::*suite*. Normally, the current suite is set by in-suite, which requires careful attention if you're jumping between different suites often. (A somewhat mitigating factor is that tests remember which suite they were created in, so the current suite only matters for newly-defined tests.)" I think that concern is just as valid in fiasco.

        I find using suites in fiasco very confusing. Everything I looked at in quicklisp that used fiasco used run-package-tests rather than run-suite-tests. YMMV.

        top

  5. Fixtures and Freezing Data

    None that I am aware of.

  6. Removing tests

    Fiasco has the ability to delete tests, but it is not an exported function:

    (fiasco::delete-test 't1)
    
  7. Sequencing, Random and Failure Only

    Fiasco has a function run-failed-tests to run the tests that failed last time.

  8. Skip Capability
    1. Assertions

      Fiasco has skip functions skip and skip-unless. The following will cause the test to skip the second assertion.

      (deftest test-skip-test ()
         (is (= 1 1))
         (skip)
         (is (= 1 2)))
      
  9. Random Data Generators

    None

18.4 Additional Discussion

While Fiasco has a function to re-run failed tests, if I wanted to collect the names of the failing tests so that I could save them for some other purpose, I might do something like:

(defun collect-test-failure-names (package-name)
  "Runs a package test on the package and returns the names of the failing tests"
  (multiple-value-bind (x results)
      (run-package-tests :package package-name :describe-failures nil)
    (declare (ignore x))
    (let ((result (first results)))
      (when (typep result 'fiasco::context)
        (loop for test-result in (fiasco::children-contexts-of result)
              when (fiasco::failures-of test-result)
                collect (fiasco::name-of (fiasco::test-of test-result)))))))

top

18.5 Who Uses Fiasco

At last count 24 libraries on quicklisp use Fiasco. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :fiasco)

top

19 fiveam

19.1 Summary

homepage Edward Marco Baringer BSD 2020

Fiveam has a lot of market share. At the time of writing, it has 9 issues and 7 pull requests with no responses. The README at the github page is lacking but good documentation exists at common-lisp.net or the turtleware tutorial.

Obviously with its market share there is a lot to like. It does report all the assertion failures in a test, allows user defined diagnostic messages with variables, interactive debugging is optional, it has suites and it runs lists of tests.

From a speed standpoint, it is either middle of the pack or, on a big test package and running in an emacs repl, vying with clunit for painfully slow. In such a case, if you are deciding between more tests with fewer assertions or fewer tests with more assertions, go with more tests and fewer assertions (but this is an emacs problem more than a fiveam problem). I do not know what using other editors would be like.

My wishlist for Fiveam is better fixture capability, the edge case ability to handle value expressions and variables declared in closures surrounding the test and get rid of all those blank lines in the failure reports.

19.2 Assertion Functions

is is-false finishes signals fail pass skip

19.3 Usage

Generally speaking tests are called using the run and run! functions. If you set *run-test-when-defined* to T, tests will be run as soon as they are defined (which includes hitting C-c C-c in the source code (assuming you are doing this in an editor with slime or sly or some such.

  1. Report Format

    Fiveam will default to a reporting format. The format you get will depend on whether you call run or run!.

    • run provides a progress report using the typical dot/f/e format and which returns a list of the assertion result objects.
    • run! provides the progress report, more details on failure and does not return a list of the passing assertion result objects.

    The following will allow you to turn off the progress report:

    (let ((fiveam:*test-dribble* (make-broadcast-stream)))
      (fiveam:run! …))
    

    To demonstrate the difference in the reports assume the following test that has a couple of passes and a couple of failures. We will insert a diagnostic string in the second assertion with a couple of variables to use in the string.

    (test t1-fail
      "describe t1-fail"
      (let ((x 1) (y 2))
        (is (eql 1 2))
        (is (equal x y)
            "We deliberately ensured that the first parameters ~a is not equal to the second parameter ~a" x y)
        (is-false (eq 'b 'b))
        (pass "I do not want to run this test of ~a but will say it passes anyway" '(= 1 1))
        (skip "Skip the next test because reasons")
        (finishes (+ 1 2))
        (signals division-by-zero (error 'floating-point-overflow))))
    

    Now using the simple run, we get:

    (run 't1-fail)
    
    Running test T1-FAIL fff.ss.X
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {10023556C3}>
     #<IT.BESE.FIVEAM::TEST-PASSED {10023550B3}>
     #<IT.BESE.FIVEAM::TEST-SKIPPED {1002354F43}>
     #<IT.BESE.FIVEAM::TEST-PASSED {1002354CF3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002354043}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002353463}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002352D03}>)
    

    We can immediately see that (run) gives us a progress report showing f for failure, . for pass, s for skip and X for an error and a list of test-result objects. We can get more details, including the diagnostic messages using (run!).

      (run! 't1-fail)
    
    
    Running test T1-FAIL fff.ss.X
     Did 7 checks.
        Pass: 2 (28%)
        Skip: 1 (14%)
        Fail: 4 (57%)
    
     Failure Details:
     --------------------------------
     T1-FAIL in S0 [describe t1-fail]:
    
    2
    
     evaluated to
    
    2
    
     which is not
    
    EQL
    
     to
    
    1
    
    
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          We deliberately ensured that the first parameters 1 is not equal to the second parameter 2
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          (EQ 'B 'B) returned the value T, which is true
     --------------------------------
     --------------------------------
     T1-FAIL in S0 []:
          Unexpected Error: #<FLOATING-POINT-OVERFLOW {100236BE63}>
    arithmetic error FLOATING-POINT-OVERFLOW signalled.
     --------------------------------
    
     Skip Details:
     T1-FAIL []:
         Skip the next test because reasons
    
    NIL
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {100236C3C3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {100236AD43}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {100236A163}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {1002369D33}>)
    (#<IT.BESE.FIVEAM::TEST-SKIPPED {100236BC43}>)
    

    Did you notice anything about the test results returned using run compared to run!? run! did not return any test-passed results

    Personally I hate the immense amount of wasted space fiveam generates using run!.

    By the way, if we set *verbose-failures* to T, it will add the failing expression to the failure details.

    Fiveam does have optionality to drop into the debugger on errors or failures. You can set those individually:

    • (setf *on-error* :debug) if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.
    • (setf *on-failure* :debug) if we should drop into the debugger on error, :backtrace for backtrace or nil otherwise.
  2. Basics

    We already saw a test using Fiveam's testing functions above with some passes and fails above, so we will skip repeating ourselves.

    As expected, you do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values expressions, loops, closures and calling other tests

    Now let's try a test with values expressions:

    (test t2 ; the most basic named test with multiple assertions and values expressions
      "describe t2"
      (let ((x 1) (y 2))
        (is (equal 1 2))
        (is (equal x y))
        (is (equal (values 1 2) (values 1 2)))))
    ; in: ALEXANDRIA:NAMED-LAMBDA %TEST-T2
    ;     (IT.BESE.FIVEAM:IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    ;
    ; caught ERROR:
    ;   during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    ;
    ;    Both the expected and actual part is a values expression.
    ;
    ; compilation unit finished
    ;   caught 1 ERROR condition
    

    Fiveam threw an error on the values expression on compilation, but continued with the compiliation.

    Now to run the failing test.

    (run! 't2)
    
    Running test T2 ffX
     Did 3 checks.
        Pass: 0 ( 0%)
        Skip: 0 ( 0%)
        Fail: 3 (100%)
     Failure Details:
     --------------------------------
     T2 [describe t2]:
    2
     evaluated to
    2
     which is not
    EQUAL
     to
    1
     --------------------------------
     --------------------------------
     T2 [describe t2]:
    Y
     evaluated to
    2
     which is not
    EQUAL
     to
    1
     --------------------------------
     --------------------------------
     T2 [describe t2]:
          Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {102D38B283}>
    Execution of a form compiled with errors.
    Form:
      (IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    Compile-time error:
      during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    
     Both the expected and actual part is a values expression..
     --------------------------------
    NIL
    (#<IT.BESE.FIVEAM::UNEXPECTED-TEST-FAILURE {102D38BB73}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {102D38B1B3}>
     #<IT.BESE.FIVEAM::TEST-FAILURE {102D38AD93}>)
    NIL
    

    Notice the ffX in the report. The f indicate failing assertions. The X indicates that the assertion threw an error instead of failing. No, fiveam does not like value expressions.

    What happens if we try to call a test inside a test?

    (test t3
        "a test that tries to call another test in its body"
        (is (equal 'a 'a))
        (run! t2))
    
    (run! 't3)
    
    Running test T3 .
    Running test T2 ..X
     Did 3 checks.
        Pass: 2 (66%)
        Skip: 0 ( 0%)
        Fail: 1 (33%)
    
     Failure Details:
     --------------------------------
     T2 in S1 [describe t2]:
          Unexpected Error: #<SB-INT:COMPILED-PROGRAM-ERROR {1008EE35A3}>
    Execution of a form compiled with errors.
    Form:
      (IS (EQUAL (VALUES 1 2) (VALUES 1 2)))
    Compile-time error:
      during macroexpansion of (IS (EQUAL # #)). Use *BREAK-ON-SIGNALS* to intercept.
    
     Both the expected and actual part is a values expression..
     --------------------------------
    
     Did 1 check.
        Pass: 1 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    
    T
    

    So we can run tests within tests.

    1. Closures Variables

      Fiveam cannot find variables declared in a closure surrounding the test. For example, the following fails.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (test t2-loop
        (loop for x in l1 for y in l2 do
          (is (= (char-code x) y)))))
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Fiveam can run lists of tests

      (run! '(t1 t2))
      
    2. Suites

      Suites are relatively straight forward in fiveam so long as you remember that you need to define yourself as being in a suite and any test defined after that will be placed in that suite. I surprised myself once after compiling a test file and then defining some tests in the REPL. As far as fiveam was concerned I was still in the suite defined in the test file, so the the tests defined in the REPL had been added to the suite.

        (def-suite :s0 ; Ultimate parent suite
          :description "describe suite 0")
      
       (in-suite :s0)
      ;; Any test defined after this will be in suite s0 until a new suite is specified
      
        (test t4  ; a test that is a member of a suite
          "describe t4"
          (is (equal 1 1)))
      
      (run! :s0)
      

      Suites can be nested. Here we have suite :s1 that is nested in suite :s0

      (def-suite :s1
        :in :s0)
      
  5. Fixtures and Freezing Data

    As far as I can tell, fixtures and freezing data are basically the same for Fiveam. The fiveam maintainer admits that maybe it's fixture capability is not "the best designed feature".

    ;; Create a class for data fixture purposes
    (defclass class-A ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (defparameter *some-existing-data-parameter*
      (make-instance 'class-A :a 17.3 :b -12))
    
    (def-fixture f1 ()
      (let ((old-parameter *some-existing-data-parameter*))
        (setf *some-existing-data-parameter*
            (make-instance 'class-A :a 100 :b -100))
        (&body)
        (setf *some-existing-data-parameter* old-parameter)))
    
    (def-test t6-f1 (:fixture f1)
      (is (equal (a *some-existing-data-parameter*) 100))
      (is (equal (b *some-existing-data-parameter*) -100)))
    
    ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter*
    
    (run! 't6-f1)
    
    Running test T6-F1 ..
     Did 2 checks.
        Pass: 2 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    

    top

  6. Removing tests

    Fiveam has the functions rem-test and rem-fixture

  7. Sequencing, Random and Failure Only

    The tests are randomly shuffled. The run! function will return a list of failed-test objects (the run function does not).

  8. Skip Capability

    Fiveam does some skip capability

  9. Random Testing and Data Generators

    Fiveam Generates lambda functions for buffers, character, float, integer, list, one-element, string and tree character, string and lists. Some examples:

    (funcall (gen-float))
    1.3259344e38
    
    (funcall (gen-buffer))
    #(115 238 129 72 84 40 230)
    
    (funcall (gen-character :code-limit 256))
    #\Etx
    
    (funcall (gen-integer :max 27 :min -16))
    -4
    
    (funcall (gen-list ))
    (-1 4)
    
    (funcall (gen-string))
    "򅦜􇨲򫎂𣻨򋷂񋖧􌽆󗍨𪽉𴾻󮨠󙢝鞀󻕨򐓺蠿𬚽𬁬񭷱򐖴㍨󀜤󘛋򉚇򓉛𠫼򞼫񸔝𺍬񴫰㽈󽜔󇠰񅉳鉄󠪔"
    
    (funcall (gen-string :elements (gen-character :code-limit 122 :alphanumericp t)))
    "exAarlUllrgsQZQAnUYeKIbZQuPYAKNLvTyMcIYlLoYS"
    
    (funcall (gen-tree :size 10))
    ((((-2 ((-3 6) (2 ((3 (6 (10 10))) ((10 4) -9))))) (-8 -8))
      (((-7 8) -3) (-10 ((((1 -5) (6 ((-9 -6) 4))) ((5 -9) (0 (-4 -8)))) -2))))
     (((((9 (5 ((3 -1) ((0 -10) -5)))) (((4 (7 -8)) (-5 (6 7))) -4)) -3)
       (6 (2 ((-5 6) (2 (((9 -1) -5) -5))))))
      6))
    

19.4 Discussion

If you recall, run returns a list of all test-result objects, but run! returns just the failing test-result objects. If you wanted to use run, but just wanted a list of the failing test names, you can do something like the following:

(defun collect-failing-test-case-names (suite)
  "Takes a suite, calls the run function and returns a list of the test names that failed."
  (loop for x in (run suite)
        when (typep x 'fiveam::test-failure)
          collect (fiveam::name (fiveam::test-case x))))

(collect-failing-test-case-names :s0)

Running test suite S0
Running test T4 .
Running test T4-ERROR ..f
Running test T4-FAIL f
Running test T6-F1 ..
Running test T5 f
Running test T4-FAIL-2 f
(T4-ERROR T4-FAIL T5 T4-FAIL-2)

Fiveam does not have a time limit threshold that you can set like Parachute or Prove, but you can set a *max-trials* variable to prevent infinite loops. It also has undocumented profiling capability that I did not look at.

19.5 Who Uses Fiveam

Many libraries on quicklisp use fiveam. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :fiveam)

top

20 kaputt

20.1 Summary

homepage Michaël Le Barbier MIT 2020

Kaputt is a new entry by someone who found the existing frameworks (based on his experience with Stefil and Fiveam) either too complicated, they did not provide enough debugging information to know exactly where the problem is and were not extensible in the sense of adding additional assertions. I am sure that Kaputt meets his needs, but would not meet the needs of other users.

It does report all failing assertions in a test, but throws you into the debugger whether you want to go or not (at least you can hit 'continue'). It has suites but no fixtures and does not allow you to provide user created diagnostic strings for the assertions. On the plus side it has some really nice floating point assertions that are not found elsewhere.

20.2 Assertion Functions

assert=  
assert-eq assert-eql
assert-equal assert-float-is-approximately-equal
assert-float-is-definitely-greater-than assert-float-is-definitely-less-than
assert-float-is-essentially-equal assert-nil
assert-p: assert-set-equal
assert-string-equal assert-string<
assert-string<= assert-string=
assert-string> assert-string>=
assert-subsetp assert-t
assert-true assert-type
assert-vector-equal  

I am surprised that while Kaputt has various assertions for floats, it does not have an assertion for equalp or condition types.

Kaputt also provides a macro for defining more assertions.

top

20.3 Usage

  1. Report Format

    Test failures in Kaputt will throw you immediately into the debugger.

  2. Basics

    Tests in Kaputt are functions. Unless we are calling multiple tests in the following examples, we will just call the test-case function itself. The empty form after the test name is not really described in the documentation, but can be used in parameterized test cases.

    The basic passing test below shows the floating point comparisons built into kaputt.

    (define-testcase t1 ()
      "describe t1"
      (assert-t (=  1 1))
      (assert-string< "abc" "def")
      (assert-float-is-approximately-equal 5.100000 5.1000001)
      (assert-float-is-essentially-equal 5.100000 5.1000001)
      (assert-float-is-definitely-greater-than 5.100001 5.100000)
      (assert-float-is-definitely-less-than 5.100000 5.100001)
      (assert-equal (values 1 2) (values 1 2)))
    
    (t1)
      .......
    
      Test suite ran 7 assertions split across 1 test cases.
       Success: 7/7 (100%)
       Failure: 0/7 (0%)
    
    

    A parameterized test case:

    (define-testcase t1-p (y); the most basic parameterized
      (let ((x 1))
        (assert-equal x y)))
    
    (t1-p 1)
    

    Now a basic failing test. This time we are using a more specific assertion (assert-equal..). Unlike some other frameworks, we cannot pass a descriptive string to the assertion. Notice we immediately get thrown into the debugger. This is not optional - in you go.

    (define-testcase t1-fail (); the most basic failing test
      (let ((x 1) (y 2))
        (assert-equal x y)
        (assert-equal y 3))
    
    (t1-fail)
    
    Test assertion failed:
    
      (ASSERT-EQUAL X Y)
    
    The assertion (ASSERT-EQUAL A B) is true, iff A and B satisfy the EQUAL assertion function.
       [Condition of type ASSERTION-FAILED]
    
    Restarts:
     0: [CONTINUE] Record a failure for ASSERT-EQUAL and continue testing.
     1: [IGNORE] Record a success for ASSERT-EQUAL and continue testing.
     2: [RETRY] Retry ASSERT-EQUAL.
     3: [SKIP] Skip the rest of test case T1-FAIL and continue testing.
     4: [RETRY] Retry SLIME REPL evaluation request.
     5: [*ABORT] Return to SLIME's top level.
    

    Take a look at the first restart. we can continue to the next assertion and eventually get a report, in this case reflecting two failures:

    (t1-fail)
    EE
    
    Test suite ran 2 assertions split across 1 test cases.
     Success: 0/2 (0%)
     Failure: 2/2 (100%)
    
    List of failed assertions:
     Testcase T1-FAIL:
        (ASSERT-EQUAL Y 3)
        (ASSERT-EQUAL X Y)
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Multiple assertions, loops. closures and calling other tests
    (define-testcase t2 ()
      "describe t2"
      (assert-equal 1 2)
      (assert-equal 2 3)
      (assert-equal (values 1 2) (values 1 2)))
    
    (t2)
    

    Calling the function (t2) will throw you into the debugger, but if you keep hitting continue you get these result:

    EE.
    
    Test suite ran 3 assertions split across 1 test cases.
    Success: 1/3 (33%)
    Failure: 2/3 (67%)
    
    
    List of failed assertions:
    Testcase T2:
    (ASSERT-EQUAL 2 3)
    (ASSERT-EQUAL 1 2)
    

    Kaputt had no problem with the values expression as such, but like almost all the frameworks, it only looked at the first value.

    1. Closures

      Kaputt has no problem accessing variables defined in a closure encompassing the test.

       (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
        (define-testcase t2-loop ()
          (loop for x in l1 for y in l2 do
            (assert-equal (char-code x) y))))
      
      (t2-loop)
      ...
      
      Test suite ran 3 assertions split across 1 test cases.
       Success: 3/3 (100%)
       Failure: 0/3 (0%)
      
      T
      

      Checking assert= with multiple values.

      (define-testcase t2-with-multiple-values ()
            (assert= 1 1 2))
        T2-WITH-MULTIPLE-VALUES
        (t2-with-multiple-values)
        ; Evaluation aborted on #<SB-INT:SIMPLE-PROGRAM-ERROR "invalid number of arguments: ~S" {10019D9B93}>.
      

      It failed. All the assertions in Kaputt compare two values only.

    2. Calling another test from a test

      If you call a test within a test in Most other frameworks you will effectively get two reports. Kaputt actually composes the results.

      (define-testcase t3 ()
        "describe t3 which is a test that tries to call another test in its body"
        (assert-equal 'a 'a)
        (t1))
      (t3)
      ........
      
      Test suite ran 8 assertions split across 2 test cases.
      Success: 8/8 (100%)
      Failure: 0/8 (0%)
      T
      

      If test t3 called test t2, we would have seen the following in the debugger which implies that the assertion failure was in t2, not t3:

      Test assertion failed:
      
      (ASSERT-EQUAL 1 2)
      
      The assertion (ASSERT-EQUAL A B) is true, iff A and B satisfy the EQUAL assertion function.
      [Condition of type ASSERTION-FAILED]
      
      Restarts:
      0: [CONTINUE] Record a failure for ASSERT-EQUAL and continue testing.
      1: [IGNORE] Record a success for ASSERT-EQUAL and continue testing.
      2: [RETRY] Retry ASSERT-EQUAL.
      3: [SKIP] Skip the rest of test case T2 and continue testing.
      4: [SKIP] Skip the rest of test case T3 and continue testing.
      5: [RETRY] Retry SLIME REPL evaluation request.
      ...
      
  4. Conditions

    Surprisingly, Kaputt does not have any assertions for different types of conditions.

  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      There is no "run-test" like function in Kaputt. If you want to run a list of tests, you need to define a test that funcalls those tests.

    2. Suites

      Tests can be nested and will generate a composed summary. This might be considered a "suite" capability.

  6. Fixtures and Freezing Data

    There is no additional capability in Kaputt for fixtures or freezing data.

  7. Removing tests

    None

  8. Sequencing, Random and Failure Only

    Tests will be called in sequence and there is no random shuffling or skipping ability.

  9. Skip Capability

    None other than provided in the debugger.

  10. Random Data Generators

    None

20.4 Discussion

Kaputt does provide the ability to define more assertions and something that it refers to as protocols but does not expand on in the documentation.

Looking at the source code, there is a variable named *testcase-protocol-class* which defaults to protocol-dotta. The other options are: protocol-verbose, protocol-trace, protocol-count and protocol-record.

  • protocol-dotta A dotta protocol reports assertion progress with dots and capital letter E,

for success and errors respectively. At the end of a testsuite, it prints basic counts describing the current testsuite and a detailed failure report.

  • protocol-verbose A verbose protocol owns a STREAM-OUTPUT
  • protocol-trace A trace protocol reports each event sent to it.
  • protocol-count A count protocol counts TESTCASE, ASSERTION, SUCCESS and FAILURE
  • protocol-record A protocol record keeps track of all failures encountered in a test suite

and prints a detailed list of the failures when the test suite finishes.

If we change the testcase-protocol-class to 'protocol-record and run the t2 test (knowing it will have two failing assertions) and keep hitting continue in the debugger, we will get the following report:

(setf *testcase-protocol-class* 'protocol-record)

(t2)

List of failed assertions:
 Testcase T2:
    (ASSERT-EQUAL 2 3)
    (ASSERT-EQUAL 1 2)

If we used the 'protocol-dotta, we would have seen the additional following information:

EE.

Test suite ran 3 assertions split across 1 test cases.
 Success: 1/3 (33%)
 Failure: 2/3 (67%)

In summary, Kaputt is interesting, but it will not make me change from another framework. I do think some other frameworks might want to follow its lead in having some float comparision assertions.

top

20.5 Who Uses

top

21 lift

top

21.1 Summary

homepage Gary Warren King MIT 2019 (c)

Documentation for Lift can be found here, but a lot of sections are "To be written".

The original Phil Gold review noted his concerns about speed and memory footprint: "The larger problem, though, was its speed and memory footprint. Defining tests is very slow; when using LIFT, the time necessary to compile and load all of my Project Euler code jumped from the other frameworks' average of about 1.5 minutes to over nine minutes. Redefining tests felt even slower than defining them initially, but I don't have solid numbers on that. After loading everything, memory usage was more than twice that of other frameworks. Running all of the tests took more than a minute longer than other frameworks, though that seems mostly to be a result of swapping induced by LIFT's greater memory requirements."

I did not benchmark compiling tests, just running the tests and as you can see from the benchmarks, lift is one of the fastest frameworks. His concern on runtime was not borne out in my benchmark, but uax-15 is also very different from Project Euler. YMMV.

There is a lot I like about Lift and there are undocumented features that you could spent a few days exploring.

There are two annoyances.

  • Multiple assertion problem:. If you have multiple assertions in a test, the tests stop at the first assertion failure. I can understand that if the intent is to get thrown into the debugger and fix the failure immediately, but not when you are running reports. There are reasons why I would put multiple assertions into a test. The obvious work around is only one assertion per test. You then have to create possibly hundreds of suites and then use addtest to add the test to your suite.
  • Clumsy failure reporting

Lift has both hierarchical suites and tags, what it calls categories.

21.2 Assertion Functions

ensure ensure-cases
ensure-cases-failure ensure-condition
ensure-different ensure-directories-exist
ensure-directory ensure-error
ensure-every ensure-expected-condition
ensure-expected-no-warning-condition ensure-failed
ensure-failed-error ensure-function
ensure-generic-function ensure-list
ensure-member ensure-no-warning
ensure-not-same ensure-null
ensure-null-failed-error ensure-random-cases
ensure-random-cases+ ensure-random-cases-failure
ensure-same ensure-some
ensure-string ensure-symbol
ensure-warning  

top

21.3 Usage

Unlike most frameworks, lift provides variables for *test-maximum-error-count*, *test-maximum-failure-count* and *test-maximum-time* which can be set so that large sets of failing tests can be shutdown down early without wasting time.

Lift has a lot of undocumented functionality. For example, you can generate log-entries which seems to have something to do with sample counts and profiling, but those have no documentation either.

  1. Report Format

    Like some other frameworks, Lift will run tests as you compile them. If you use run-test (singular) or run-tests (plural), you get a very limited amount of information that will look something like this:

     (run-test :name 't4-s1-1)
    #<S1.T4-S1-1 failed>
    
      (run-tests)
      Start: S0
      #<Results for S0 1 Tests, 1 Failure>
    

    If you (setf *test-describe-if-not-successful?* t) you get a lot more information, but you are really just running (describe) on the test result object. We run only the single test version this time:

    (run-test :name 't4-s1-1)
    #<S1.T4-S1-1 failed
    Failure: s1 : t4-s1-1
      Documentation: NIL
      Source       : /tmp/slimeBSclUz
      Condition    : Ensure failed: (= 2 3) ()
      During       : (END-TEST)
      Code         : (
      ((ENSURE (= 2 3)) (ENSURE (= 4 4)) (ENSURE (EQL 'B 'C))))
      >
    

    Lift can print also test result details. The first parameter is the stream to which to direct the output, the second parameter is getting the test result. The third parameter is show-expected-p and the fourth is show-code-p. All parameters must be provided.

    (print-test-result-details *standard-output* (run-test :name 't1-fail) t t)
    Failure: tf-lift : t1-fail
      Documentation: NIL
      Source       : NIL
      Condition    : Ensure failed: (= X
                                       Y) (This test was meant to fail because 1 is not = 2)
      During       : (END-TEST)
      Code         : (
      ((LET ((X 1) (Y 2))
         (ENSURE (= X Y) :REPORT "This test was meant to fail because ~a is not = ~a"
                 :ARGUMENTS (X Y)))))
    

    The function run-tests takes a keyword parameter :report-pathname which will direct a substantial amount of information to the designated file. The following example runs all the tests associated with suite s0 (either directly or indirectly). You can also set the variable *lift-report-pathname* to a pathname. Any subsequent failure reports will be printed there. Lift does not have progress reports if that is important to you.

    (run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
    

    Opening that file may show results looking something like:

    ((:RESULTS-FOR . S0)
    (:ARGUMENTS . (:SUITE ("S0" . "TF-LIFT") :REPORT-PATHNAME #P"/tmp/lift-1.txt"))
    (:FEATURES . (:HUNCHENTOOT-SBCL-DEBUG-PRINT-VARIABLE-ALIST :5AM
                  :OSICAT-FD-STREAMS :ITER :NAMED-READTABLES :UTF-32 :TOOT
                  :SBCL-DEBUG-PRINT-VARIABLE-ALIST :SPLIT-SEQUENCE
                  CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64
                  CFFI-FEATURES:UNIX :CFFI CFFI-SYS::FLAT-NAMESPACE :FLEXI-STREAMS
                  :CL-FAD :CHUNGA :LISP-UNIT :CLOSER-MOP :CL-PPCRE
                  :BORDEAUX-THREADS ALEXANDRIA::SEQUENCE-EMPTYP :THREAD-SUPPORT
                  :SWANK :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2 :ASDF
                  :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :GENCGC
                  :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
                  :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS
                  :SB-THREAD :SB-UNICODE :SBCL :UNIX))
    (:DATETIME . 3831119803)
    )
    (
    (:SUITE . ("S0" . "TF-LIFT"))
    (:NAME . ("T4-S0-1" . "TF-LIFT"))
    (:START-TIME . 3831119803000)
    (:END-TIME . 3831119803000)
    (:SECONDS . 0.0d0)
    (:CONSES . 0)
    (:RESULT . T)
    )
    (
    (:SUITE . ("S0" . "TF-LIFT"))
    (:NAME . ("T4-S0-2" . "TF-LIFT"))
    (:START-TIME . 3831119803000)
    (:END-TIME . 3831119803000)
    (:PROBLEM-KIND . "failure")
    (:PROBLEM-STEP . :END-TEST)
    (:PROBLEM-CONDITION . "#<ENSURE-FAILED-ERROR {100DA7E143}>")
    (:PROBLEM-CONDITION-DESCRIPTION . "Ensure failed: (= 1 2) ()")
    )
    (
    (:TEST-CASE-COUNT . 2)
    (:TEST-SUITE-COUNT . 1)
    (:FAILURE-COUNT . 1)
    (:ERROR-COUNT . 0)
    (:EXPECTED-FAILURE-COUNT . 0)
    (:EXPECTED-ERROR-COUNT . 0)
    (:SKIPPED-TESTSUITES-COUNT . 0)
    (:SKIPPED-TEST-CASES-COUNT . 0)
    (:START-TIME-UNIVERSAL . 3831119803)
    (:END-TIME-UNIVERSAL . 3831119803)
    (:FAILURES . ((("S0" . "TF-LIFT") ("T4-S0-2" . "TF-LIFT")))))
    

    Adding a test and compiling it will, as noted, cause it to be run immediately, but all you get is pass or fail. Let's try to get a little more information by using describe *.

      (addtest (s1) t4-s1-4
          (ensure (= 3 4)))
      #<Test failed>
    
    (describe *)
    Test Report for S1: 1 test run, 1 Failure.
    
    Failure: s1 : t4-s1-4
      Documentation: NIL
      Source       : NIL
      Condition    : Ensure failed: (= 3 4) ()
      During       : (END-TEST)
      Code         : (
      ((ENSURE (= 3 4))))
    
    Test Report for S1: 1 test run, 1 Failure.
    

    I would note that in tests with multiple assertions, Lift only shows the first failure, not all failures. That is a real problem for me because I want the results of multiple assertions if I am tracking down one of my many bugs.

    To go interactive - dropping immediately into the debugger, you would set one or more key word parameter based on what condition should throw you into the debugger. The following will throw you into the debugger on failures but not on errors.

    (run-test :name 't1-function :break-on-errors? nil :break-on-failures? t)
    
  2. Basics

    Lift really wants a suite to be defined first before any tests are defined. The first form after the suite name would contain the name of a parent suite (if any). The second form would be used for suite slot specifications which are used with fixtures and will be discussed below.

    (deftestsuite tf-lift () ())
    

    Starting with the most basic named test. This adds a test to the most recently defined suite or you can insert a form before the test name which specifies which suite for the test

    (addtest t1
        (ensure (equal 1 1))
        (ensure-condition division-by-zero
          (error 'division-by-zero))
        (ensure-same 1 1)
        (ensure-different '(1 2 3) '(1 3 4)))
    

    Besides running the test on compilation, we can also run the test using the (run-test function), specifying the test name with the keyword parameter :name.

      (run-test :name 't1)
    
    #<TF-LIFT.T1 passed>
    

    Now adding a basic failing test. Let's make sure we get a bit more information on failing tests by inserting a report keyword with a descriptive string with format like parameters and an arguments keyword with parameters to pass to the report keyword. Then we use describe against the results of running the test.

    (addtest t1-fail
             (let ((x 1) (y 2))
               (ensure (= x y)
                       :report "This test was meant to fail because ~a is not = ~a"
                       :arguments (x y))))
    
    (describe (run-test :name 't1-fail)))
    Test Report for TF-LIFT: 1 test run, 1 Failure.
    
    Failure: tf-lift : t1-fail
    Documentation: NIL
    Source       : NIL
    Condition    : Ensure failed: (= X
                                     Y) (This test was meant to fail because 1 is not = 2)
    During       : (END-TEST)
    Code         : (
                    ((LET ((X 1) (Y 2))
                          (ENSURE (= X Y) :REPORT
                                  "This test was meant to fail because ~a is not = ~a"
                                  :ARGUMENTS (X Y)))))
    
    Test Report for TF-LIFT: 1 test run, 1 Failure.
    

    As one would hope, you do not need to manually recompile a test just because a tested function is modified.

  3. Edge Cases: Multiple failing assertions, Values expressions, loops, closures and calling other tests
    1. Multiple assertions and Value expressions

      First checking a test with multiple assertions. The answer is yes and no, which surprised me.

        (addtest t2
          (ensure (= 1 2))
          (ensure (= 2 3)))
      
      (print-test-result-details *standard-output* (run-test :name 't2) t t)
      Failure: tf-lift : t2
        Documentation: NIL
        Source       : NIL
        Condition    : Ensure failed: (= 1 2) ()
        During       : (END-TEST)
        Code         : (
        ((ENSURE (= 1 2)) (ENSURE (= 2 3))))
      

      Obviously we expected the test to fail. But I expected two assertions to be shown as failing, not only the first one. I can understand that if the intent is to just throw me into the debugger on the first failure, but not in a reporting situation.

      Lift has no special functionality for dealing with values expressions. It accepts them but merely looks at the intial value provided by each expression.

    2. Now looping and closures.

      Checking whether Lift can run tests using variables declared in a closure encompassing the test. Yes.

        (let ((l1 '(#\a #\B #\z))
                (l2 '(97 66 122)))
          (addtest t2-loop
            (loop for x in l1 for y in l2 do
              (ensure (= (char-code x) y)))))
      #<Test passed>
      
    3. Calling another test from a test

      We know tests are not functions in Lift, but can a test call another test in its body? We know test t2 should fail.

        (addtest t3
          (ensure (eql 'a 'a))
          (run-test :name 't2))
      
      (run-test :name 't3)
      #<TF-LIFT.T3 passed>
      

      It does not look like t3 actually called t2.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Lift cannot run lists of tests outside the suite functionality.

    2. Suites

      We already know that we had to set up a suite for tests, in our case a suite named tf-lift (print-tests)

      Unlike most test frameworks, lift actually provides a function which will print out the names of the tests included in the suite.

      (print-tests :start-at 'tf-lift)
      TF-LIFT (10)
        T1
        T1-FAIL
        T1-FUNCTION
        T2
        T2-VALUES
        T2-LOOP
        T2-WITH-MULTIPLE-VALUES
        T3
        T7-ERROR
        T7-BAD-ERROR
      

      Let's start with the same simple inheritance structure we have been using with other frameworks.

      (deftestsuite s0 () ())
      ;; a test that is a member of a suite because it is defined after a defsuite
      (addtest t4-s0-1
        (ensure-same 1 1))
      
      ;; Add another test suite
      (deftestsuite s1 () ())
      
      ;; add another test, but preface the name with s0 in the form, making this test part of suite s0
      (addtest (s0) t4-s0-2
        (ensure (= 1 2)))
      
      ;; add a test, specifying that this one is part of suite s1
      (addtest (s1) t4-s1-1
        (ensure (= 2 3)))
      
      ;; Now run tests for suite s0 and s1 respectively and we see that s0 does indeed have two tests and suite s1 has one test.
      (run-tests :suite 's0)
      Start: S0
      #<Results for S0 2 Tests, 1 Failure>
      
      (run-tests :suite 's1)
      Start: S1
      #<Results for S1 1 Test, 1 Failure>
      

      We now define suite s2 which is a child suite of s0 and add a test

      (deftestsuite s2 (s0) ())
      
      (addtest (s2) t4-s2-1
          (ensure (= 1 1))
        (ensure (eq 'a 'a)))
      

      If we now apply RUN-TESTS to suite s0, we see that it runs both s2 and s0

      (run-tests :suite 's0)
      Start: S0
      Start: S2
      #<Results for S0 3 Tests, 1 Failure>
      

      If we run it with a :report-pathname keyword parameter set, we can get a lot more information sent to a file:

      (run-tests :suite 's0 :report-pathname #P "/tmp/lift-1.txt")
      

      top

  5. Fixtures and Freezing Data

    Variables can be created and set at the suite level, making those variables available down the suite inheritance chain.

    (deftestsuite s3 ()
      ((a 1) (b 2) (c 3)))
    
    (addtest t4-s3-1
             (ensure-same 1 a))
    
    (deftestsuite s4 (s3)
      ((d 4)))
    
    (addtest t4-s4-1
             (ensure-same 2 b)
             (ensure-same 4 d))
    
    (addtest (s3) t4-s3-2
             (ensure-same 2 b))
    

    These all pass because the tests can see the variables creates in their suite and the parent suites. If we created a test in suite s3 that tried to reference the variable d creates in suite s4 (a lower level suite), we would get an undefined variable error.

    Suites also have :setup and :takedown keyword parameters and an additional :run-setup parameter that controls when the setup provisions are performed. The default is :once-per-test-case (setup again for each and every test in the suite). The other alternatives are :once-per-suite and :never.

    (deftestsuite s5 ()()
      (:setup
        (setf db (open-data "bar" :if-exists :supersede)))
      (:teardown
        (setf db nil))
      (:run-setup :once-per-test-case))
    
  6. Removing tests

    Tests and suites can be removed using remove-tests

    (remove-test :suite 's2)
    
    (remove-test :test-case 't1)
    
  7. Sequencing, Random and Failure Only

    Do the tests in a suite run in sequential order, randomly or is it optional? Failure only testing (just running all the tests that failed last time) is nice to have, but of course you still need to be able to run everything at the end to ensure that fixing one bug did not create another.

  8. Skip Capability
    1. Tests

      The run-tests function takes a :skip-tests keyword parameter which accepts a list of test names to skip.

      The variable =*test-maximum-time* controls the number of seconds that a test can take before lift gives up. It defaults to 2 seconds.

  9. Random Data Generators

    Lift has various random data generators:

    ;; (random-number suite min max)
    (random-number 's4 1 100)
    
    ;; (random-element suite sequence)
    (random-element 's4 '(a b c 23))
    

    If anyone can give a good example of the use of the DEFRANDOM-INSTANCE macro besides what is in the random-testing file, feel free to submit a pull request.

21.4 Discussion

I really want to like lift, but I really have a hard time getting over the fact that it stops at the first assertion failure.

21.5 Who Uses Lift

Many libraries on quicklisp use Lift. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lift)

top

22 lisp-unit

top

22.1 Summary

homepage Thomas M. Hermann MIT 2017

Phil Gold's original concerns about lisp-unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities. Lisp-unit still focuses on counting assertions rather than tests, but you can now collect information on just the failing tests.

I generally like lisp-unit. It does not have progress reports which might bother some people. My bigger concern is its lack of fixtures and you can turn on debugging only for errors, not for failures. If you need floating point tests, those are built-in. Documentation can be found at the wiki. top

22.2 Assertion Functions

assert-eq assert-eql assert-equal
assert-equality assert-equalp assert-error
assert-expands assert-false assert-float-equal
assert-nil assert-norm-equal assert-number-equal
assert-numerical-equal assert-prints assert-rational-equal
assert-result assert-sigfig-equal assert-test
assert-true check-type logically-equal
set-equal    

top

22.3 Usage

  1. Report Format

    Lisp-unit defaults to a reporting format shown below. You can do (setf *use-debugger* :ask) or (setf *use-debugger* t), but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error). So, not complete debugger optionality.

    Lisp-unit will normally just count assertions passed, failed, and execution errors and report those. You will see in the first failing test examples how to get more information. You can hav also have it kick you into the debugger on errors by calling use-debugger. This only applies to errors and not failures.

    Calling run-tests will return an instance of a test-results-db object. You can get a list of failed test objects with the failed-tests function which also accepts an optional stream, allowing easy printing to a file:

    (failed-tests (run-tests :all) optional-stream)
    

    You can print the detailed failure information using the print-failures function:

    (print-failures (run-tests :all))
    

    Lisp-unit also has print and print-errors which also take an optional stream.

    If you like the TAP format, Lisp-unit also has (write-tap-to-file test-results path) and (write-tap test-results [stream]).

  2. Basics

    First, the basic test where we know everything is going to pass. Since Lisp-unit has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/floating-point.lisp and https://github.com/OdonataResearchLLC/lisp-unit/blob/master/extensions/rational.lisp for more information on the floating point and rational tests.

      (defmacro my-macro (arg1 arg2)
        (let ((g1 (gensym))
              (g2 (gensym)))
          `(let ((,g1 ,arg1)
                 (,g2 ,arg2))
             "Start"
             (+ ,g1 ,g2 3))))
    
    (define-test t1
      "describe t1"
      (assert-true (= 1 1))
      (assert-equal 1 1)
      (assert-float-equal 17 17.0000d0)
      (assert-rational-equal 3/2 3/2)
      (assert-true (set-equal '(a b c) '(b a c))) ;every element in both sets needs to be in the other
      (assert-true (logically-equal t t)) ; both true or both false
      (assert-true (logically-equal nil nil)) ; both true or both false
      (assert-expands
          (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
          (my-macro a b))
      (assert-prints "12" (format t "~a" 12))
      (assert-error 'division-by-zero
                    (error 'division-by-zero)
                    "testing condition assertions"))
    

    Now run this test:

    (run-tests '(t1))
    Unit Test Summary
     | 10 assertions total
     | 10 passed
     | 0 failed
     | 0 execution errors
     | 0 missing tests
    
    #<TEST-RESULTS-DB Total(6) Passed(6) Failed(0) Errors(0)>
    

    Now a basic failing test.

    (define-test t1-fail
      "describe t1-fail"
      (let ((x 1) (y 2))
        (assert-true (= x y))
        (assert-equal x y)
        (assert-expands
            (let ((#:G1 D) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
            (my-macro a b))
        (assert-prints "12" (format nil "~a" 12))
        (assert-error 'division-by-zero
                      (error 'floating-point-overflow)
                      "testing condition assertions")))
    
    (run-tests '(t1-fail))
    Unit Test Summary
     | 5 assertions total
     | 0 passed
     | 5 failed
     | 0 execution errors
     | 0 missing tests
    

    That told us assertions failed, but did not give a lot of information. Let's change the setup slightly, setting *PRINT-FAILURES* to t. (You can also print info just on errors by setting *PRINT-FAILURES* to nil and *PRINT-ERRORS* to t.)

    (setf *print-failures* t)
    
    (run-tests '(t1-fail))
     | Failed Form: (ERROR 'FLOATING-POINT-OVERFLOW)
     | Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100E6D4D13}>
     | "testing condition assertions" => "testing condition assertions"
     |
     | Failed Form: (FORMAT NIL "~a" 12)
     | Should have printed "12" but saw ""
     |
     | Failed Form: (MY-MACRO A B)
     | Should have expanded to (LET ((#:G1 D) (#:G2 B))
                                 "Start"
                                 (+ #:G1 #:G2 3))
    but saw (LET ((#:G1 A) (#:G2 B))
              "Start"
              (+ #:G1 #:G2 3)); T
     |
     | Failed Form: Y
     | Expected 1 but saw 2
     |
     | Failed Form: (= X Y)
     | Expected T but saw NIL
     | X => 1
     | Y => 2
     |
    T1-FAIL: 0 assertions passed, 5 failed.
    
    Unit Test Summary
     | 5 assertions total
     | 0 passed
     | 5 failed
     | 0 execution errors
     | 0 missing tests
    
    #<TEST-RESULTS-DB Total(5) Passed(0) Failed(5) Errors(0)>
    

    That gives more information, but notice the slight difference between the information provided for assert-equal - Failed Form: Y and the information provided for assert-true - Failed Form: (= X Y).

    We can get still more if we pass more info to the assertion clause. While the assertions compares the following two items, we can pass more information that it will print on failures. Unlike some other frameworks, we cannot pass a diagnostic string which accepts interpolated variables, but we can pass a string and variables. This time we will reduce the test to just the assert-equal clause.

      (define-test t1-fail-short
          (let ((x 1) (y 2))
            (assert-equal x y "Diagnostic Message: X ~a should equal Y ~a" x y)))
    
    (run-tests '(t1-fail-short))
     | Failed Form: Y
     | Expected 1 but saw 2
     | "Diagnostic Message: X should equal Y" => "Diagnostic Message: X should equal Y"
     | X => 1
     | Y => 2
     |
    T1-FAIL: 0 assertions passed, 1 failed.
    
    Unit Test Summary
     | 1 assertions total
     | 0 passed
     | 1 failed
     | 0 execution errors
     | 0 missing tests
    

    Of course, the usefulness of the diagnostic message will depend on the context.

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Loops. closures and calling other tests
    1. Value expressions

      Lisp-Unit has a pleasant surprise with respect to values expressions. Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:

        (define-test t2-values-expressions
          (assert-equal (values 1 2) (values 1 3))
          (assert-equal (values 1 2 3) (values 1 3 2)))
      
      (print-failures (run-tests '(t2-values-expressions)))
      Unit Test Summary
       | 2 assertions total
       | 0 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
       | Failed Form: (VALUES 1 3 2)
       | Expected 1; 2; 3 but saw 1; 3; 2
       |
       | Failed Form: (VALUES 1 3)
       | Expected 1; 2 but saw 1; 3
       |
      T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
      
    2. Closure Variables

      Lisp-Unit will not see the variables declared in a closure surrounding the test function, so the following would fail.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
      (define-test t2-loop
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      
    3. Calling another test from a test

      While tests are not functions in lisp-unit, they can call other tests.

      (define-test t3 (); a test that tries to call another test in its body
          "describe t3"
                (assert-equal 'a 'a)
                (run-tests '(t2)))
      T3
      LISP-UNIT-EXAMPLES> (run-tests '(t3))
       | Failed Form: 3
       | Expected 2 but saw 3
       |
       | Failed Form: 2
       | Expected 1 but saw 2
       |
      T2: 1 assertions passed, 2 failed.
      
      Unit Test Summary
       | 3 assertions total
       | 1 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
      T3: 0 assertions passed, 0 failed.
      
      Unit Test Summary
       | 0 assertions total
       | 0 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      

      A bit of a surprise here. Test t3 does call test t2 but does not track its own assertion. If you reverse the order so that t3's assertions come after the call to run-tests on t2, then it does work properly. In neither case are the situations composed.

  4. Suites, tags and other multiple test abilities

    Lisp-unit uses both packages and tags rather than suites. That provides a bit more flexibility in terms of reusing tests in different situations, but does not create the automatic inheritance that some people like.

    1. Lists of tests

      Lisp-unit can run lists of tests

      (run-tests '(t1 t2))
      

      Lisp-unit makes it easy to get a list of the names of the failing tests which you can then save and run-tests against. Run-tests returns a test-results-db object. Just call failed-tests on that to get a list of the names of the tests that failed in that run. Then run-tests against that smaller list.

    2. Packages

      Assuming you have set up your tests in a separate package (that package can cover many application packages) and that test package is the current package, you can run all the tests in the current package with:

      (run-tests :all)
      

      The reporting can get confusing as in this sample run with *print-failures* set to nil:

      (run-tests :all)
      Diagnostic Message: X 1 should equal Y 2
      Unit Test Summary
       | 3 assertions total
       | 1 passed
       | 2 failed
       | 0 execution errors
       | 0 missing tests
      
      Unit Test Summary
       | 13 assertions total
       | 6 passed
       | 7 failed
       | 0 execution errors
       | 0 missing tests
      
      #<TEST-RESULTS-DB Total(13) Passed(6) Failed(7) Errors(0)>
      

      Why do we have two Unit Test Summaries having different numbers of tests and assertions? If you recall, test t3 calls test t2 and that first summary is a secondary summary from that call.

      To run all the tests in a non-current package, add the name of the package after the keyword parameter :all

      (lisp-unit:run-tests :all :date-tests)
      

      You can list the names of all the tests in a package

      (list-tests [package])
      
    3. Tags

      As noted, lisp-unit provides the ability to define tests with multiple tags:

      (define-test foo
        "This is the documentation."
        (:tag :tag1 :tag2 symtag)
        exp1 exp2 ...)
      

      So assume three tests that we want tagged differently:

      (define-test t6-1
        "Test t6-1 tagged simple and complex"
        (:tag :simple :complex)
        (assert-true (= 1 1 1)))
      
        (define-test t6-2
        "Test t6-2 tagged simple only"
        (:tag :simple)
        (assert-equal 1 1))
      
        (define-test t6-3
        "Test t6-2 tagged simple only"
        (:tag :complex)
        (assert-equal 'a 'a))
      

      Then using run-tages does what we expect. We will set *print-summary* t for simplicity.

      (setf *print-summary* t)
      
      (run-tags '(:simple))
      T6-2: 1 assertions passed, 0 failed.
      
      T6-1: 1 assertions passed, 0 failed.
      
      Unit Test Summary
       | 2 assertions total
       | 2 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      
      #<TEST-RESULTS-DB Total(2) Passed(2) Failed(0) Errors(0)>
      
      LISP-UNIT> (run-tags '(:complex))
      T6-3: 1 assertions passed, 0 failed.
      
      T6-1: 1 assertions passed, 0 failed.
      
      Unit Test Summary
       | 2 assertions total
       | 2 passed
       | 0 failed
       | 0 execution errors
       | 0 missing tests
      

      Tags can be listed with (LIST-TAGS [PACKAGE]). TAGGED-TESTS returns the tests associated with the listed tags. All tagged tests are returned with no arguments or if the keyword :all is provided instead of a list of tags. Use *package* if package is not specified.

      (tagged-tests '(tag1 tag2 ...) [package])
      (tagged-tests :all [package])
      (tagged-tests)
      

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    Lisp-unit has both remove-tests and remove-tags functions.

  7. Sequencing, Random and Failure Only
  8. Skip Capability

    None

  9. Random Data Generators

    Lisp-unit2 has various functions for generating random data. See examples below:

    (complex-random #C(5 3))
    #C(4 1)
    
    (make-random-2d-array 2 3)
    #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425))
    
    (make-random-2d-list 2 3)
    ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146))
    
    (make-random-list 3)
    (0.5449568 0.32319236 0.7780224)
    
    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

22.4 Who Uses Lisp-Unit

Many libraries on quicklisp use Lisp-Unit. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lisp-unit)

top

23 lisp-unit2

top

23.1 Summary

homepage Russ Tyndall MIT 2018

I generally like Lisp-Unit2. Phil Gold's original concerns about Lisp-Unit back in 2007 was its failure to scale. Specifically he pointed to non-composable test reports, the fact that you could not get just failure reports, so the failures were lost in the reporting of successful reports, and there is no count of failed tests. I think those have been addressed with the tags capabilities.

Unlike the situation with Clunit and Clunit2 which are almost identical, Lisp-Unit and Lisp-Unit2 have definitely diverged over the years. Lisp-unit2 has fixtures and can run just the previously failing tests and you can turn on debugging for failures as well as errors. It does not have progress reports which might bother some people. If you need floating point tests, those are built-in.

It will report all the assertion failures in a test, gives the opportunity to provide user generated diagnostic messages in assertions and has a tags system that allows different ways to re-use tests not found in the typical hierarchical setup. It can re-run failed tests and interactive debugging is optional. I did find the fixture structure to be confusing and it does not have progress reporting which some people really like. I did have an issue compiling it with ccl but have not tracked it down sufficiently to see if it is a bug to be reported.

23.2 Assertion Functions

assert-eq assert-eql assert-equal
assert-equality assert-equalp assert-error
assert-expands assert-fail assert-false
assert-float-equal assert-no-error assert-no-signal
assert-no-warning assert-norm-equal assert-number-equal
assert-numerical-equal assert-passes? assert-prints
assert-rational-equal assert-sigfig-equal assert-signal
assert-true assert-typep assert-warning
assertion-fail assertion-pass check-type
logically-equal    

top

23.3 Usage

If you store the results of a test run, you can call rerun-failures on those results to just rerun the failing tests rather than go through all the tests again.

  1. Report Format

    The variable *debugger-hook* is set by default to #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>. If you want to stay out of interactive debugging, set *debugger-hook* to nil.

    You could allow the system to put you in the debugger using the with-failure-debugging wrapper or provide that wrapper to the keyword parameter :run-contexts when call run-tests as in the following two examples

    (with-failure-debugging ()
      (run-tests :tests '(t7-bad-error)))
    
    (run-tests :tests 'tf-lisp-unit2::tf-find-str-in-list-t
               :run-contexts #'with-failure-debugging-context)
    
  2. Basics

    Tests in lisp-unit2 are functions. They are also compiled at the time of definition (so that any compile warnings or errors are immediately noticeable) and also before every run of the test (so that macro expansions are never out of date).

    The define-test macro takes a name parameter and a form specifying :tags, :contexts or :package before you get to the assertions.

    First, the basic test where we know everything is going to pass. Since Lisp-unit2 has macro expand and floating point assertion functions, we will show those in this example (so we need a macro just for the macroexpand test). See https://github.com/AccelerationNet/lisp-unit2/blob/master/floating-point.lisp and https://github.com/AccelerationNet/lisp-unit2/blob/master/rational.lisp for more information on the floating point and rational tests including setting the epsilon values etc.

    (defmacro my-macro (arg1 arg2)
      (let ((g1 (gensym))
            (g2 (gensym)))
        `(let ((,g1 ,arg1)
               (,g2 ,arg2))
           "Start"
           (+ ,g1 ,g2 3))))
    
    (define-test t1
        (:tags '(tf-basic))
      (assert-true (=  1 1))
      (assert-eq 'a 'a)
      (assert-rational-equal 3/2 3/2)
      (assert-float-equal 17 17.0000d0)
      (assert-true (logically-equal t t)) ; both true or both false
      (assert-true (logically-equal nil nil)) ; both true or both false
      (assert-expands
          (let ((#:G1 A) (#:G2 B)) "Start" (+ #:G1 #:G2 3))
          (my-macro a b))
      (assert-error 'division-by-zero
                    (error 'division-by-zero)
                    "testing condition assertions"))
    

    Now run this test. The keyword parameter :tests will accept a single test symbol or a list of tests. E.g.

      (run-tests :tests 't1)
    
      (run-tests :tests '(t1))
    
    #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006A22CE3}>
    

    Short and to the point. For a slightly different format you can use any of the following:

    (with-summary ()
      (run-tests :tests '(t1)))
    
    (print-summary
     (run-tests :tests '(t1)))
    
    (run-tests :run-contexts #'with-summary-context :tests '(t1))
    

    Both will provide something like the following:

    TF-LISP-UNIT2::T1 - PASSED (0.01s) : 6 assertions passed
    
    Test Summary for :TF-LISP-UNIT2 (1 tests 0.01 sec)
      | 8 assertions total
      | 8 passed
      | 0 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    #<TEST-RESULTS-DB Tests:(1) Passed:(8) Failed:(0) Errors:(0) Warnings:(0) {1006E6EE53}>
    

    Now with a basic failing test. This time we will give the test a description string and first assertion gets a diagnostic string and the variables in question.

    (define-test t1-fail
                       (:tags '(tf-basic))
      "describe t1-fail"
      (let ((x 1))
      (assert-true (= x 2) )
      (assert-equal x 3)
      (assert-error 'division-by-zero
                    (error 'floating-point-overflow)
                    "testing condition assertions")
      (assert-true (logically-equal t nil)) ; both true or both false
      (assert-true (logically-equal nil t))))
    

    Now if we simply run the basic test, we get thrown into the debugger on the error assertion. If we hit continue, we are handed a test-results-db object. Why did we get thrown into the debugger rather than just fail? Because the variable *debugger-hook* is set by default to #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK>. If you want to stay out of interactive debugging when errors get thrown, set *debugger-hook* to nil.

    (run-tests :tests '(t1-fail))
    #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {10063359D3}>
    

    If we use the print-summary wrapper, we still get through into the debugger on the error assertion, but assuming we hit continue, we get the report below and a test-results-db object.

    (print-summary (run-tests :tests '(t1-fail)))
    TF-LISP-UNIT2::T1-FAIL - ERRORS (5.33s) : 0 assertions passed
      | ERRORS (1)
      | ERROR: arithmetic error FLOATING-POINT-OVERFLOW signalled
      | #<FLOATING-POINT-OVERFLOW {100619DEE3}>
      |
      | FAILED (2)
      | Failed Form: (ASSERT-TRUE (= X 2)
      |                           "deliberate failure here because we know ~a is not equal to ~a"
      |                           X 2)
      | Expected T
      | but saw NIL
      | "deliberate failure here because we know ~a is not equal to ~a"
      | X => 1
      | 2
      | Failed Form: (ASSERT-EQUAL X 3)
      | Expected 1
      | but saw 3
      |
      |
    Test Summary for :TF-LISP-UNIT2 (1 tests 5.33 sec)
      | 2 assertions total
      | 0 passed
      | 2 failed
      | 1 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    #<TEST-RESULTS-DB Tests:(1) Passed:(0) Failed:(2) Errors:(1) Warnings:(0) {1005F7D0E3}>
    

    You do not have to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Value Expressions, loops. closures and calling other tests
    1. Value expressions

      Unlike almost all the other frameworks Lisp-unit and Lisp-unit2 actually look at all the values in the values expressions:

      (define-test t2-values-expressions
        (:tags '(tf-multiple))
        (assert-equal (values 1 2) (values 1 2 3))
        (assert-equal (values 1 2) (values 1 3))
        (assert-equal (values 1 2 3) (values 1 3 2)))
      #<FUNCTION T2-VALUES-EXPRESSIONS>
      LISP-UNIT2> (print-summary (run-tests :tests '(t2-values-expressions)))
      LISP-UNIT2::T2-VALUES-EXPRESSIONS - FAILED (0.00s) : 1 assertions passed
        | FAILED (2)
        | Failed Form: (ASSERT-EQUAL (VALUES 1 2) (VALUES 1 3))
        | Expected 1; 2
        | but saw 1; 3
        | Failed Form: (ASSERT-EQUAL (VALUES 1 2 3) (VALUES 1 3 2))
        | Expected 1; 2; 3
        | but saw 1; 3; 2
        |
        |
      Test Summary for :LISP-UNIT2 (1 tests 0.00 sec)
        | 3 assertions total
        | 1 passed
        | 2 failed
        | 0 execution errors
        | 0 warnings
        | 0 empty
        | 0 missing tests
      #<TEST-RESULTS-DB Tests:(1) Passed:(1) Failed:(2) Errors:(0) Warnings:(0) {10263B53C3}>
      
    2. Closures.

      Lisp-Unit2 will not see variables declared in a closure encompassing the test.The following will throw an error and drop you into the debugger.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (define-test t2-loop-closure
          (:tags '(tf-multiple tf-basic))
        (loop for x in l1 for y in l2 do
          (assert-equal (char-code x) y))))
      
    3. Calling another test from a test

      Since we know that tests are functions in lisp-unit2, we can just have test t3 call test t2 directly rather than indirectly running through the RUN-TESTS function.

      (define-test t3 ; a test that tries to call another test in its body
          (:tags '(tf-calling-other-tests))
                (assert-equal 'a 'a)
                (t2))
      
      (print-summary (run-tests :tests '(t3)))
      LISP-UNIT2-EXAMPLES::T3 - FAILED (0.00s) : 2 assertions passed
        | FAILED (2)
        | Failed Form: (ASSERT-EQUAL 1 2)
        | Expected 1
        | but saw 2
        | Failed Form: (ASSERT-EQUAL 2 3)
        | Expected 2
        | but saw 3
        |
        Test Summary for :LISP-UNIT2-EXAMPLES (1 tests 0.00 sec)
        | 4 assertions total
        | 2 passed
        | 2 failed
        | 0 execution errors
        | 0 warnings
        | 0 empty
        | 0 missing tests
      #<TEST-RESULTS-DB Tests:(1) Passed:(2) Failed:(2) Errors:(0) Warnings:(0) {1009F73183}>
      

      Unlike lisp-unit, everything works as expected and we actually got composed results.

  4. Suites, Tags, Packages and other multiple test abilities

    If run-tests is called without any keyword parameters, it will run all the tests in the current package. It accepts keyword parameters for :tests, :tags and :package.

    (lisp-unit2:run-tests &key tests tags package reintern-package)
    
    1. Lists of tests

      As previously stated, Lisp-unit2 can run lists of tests.

      (run-tests :tests  '(t1 t2))
      
    2. Tags

      As you would expect, you can run all the tests having a specific tag. In the following example we wrap run-tests in a call to print-summary in order to get useful results:

      (print-summary (run-tests :tags '(tf-basic)))
      
      (with-summary ()
                    (t1-fail)) ;; here we just call t1 as a function. We need WITH-SUMMARY or PRINT-SUMMARY to get results printed
      
      Starting: LISP-UNIT2::T1-FAIL
      LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed
        | FAILED (1)
        | Failed Form: (ASSERT-EQL 1 2)
        | Expected 1
        | but saw 2
        |
      

      Tags can be listed using LIST-TAGS.

  5. Fixtures and Contexts

    What we have been referring to as fixtures is called contexts in Lisp-Unit2.

  6. Writing Summary to file

    The (write-tap-to-file) macro takes input that will generate a report and writes it to a file in TAP format.

    We show the results in the TAP format for the successful t1 test and the deliberately failing t1-fail test.

    (write-tap-to-file (run-tests :tests 't1) #P "/tmp/lisp-unit2.tap")
    
    cat /tmp/lisp-unit2.tap
    TAP version 13
    1..1
    ok 1 LISP-UNIT2::T1 (0.00 s)
    
    (write-tap-to-file (run-tests :tests 't1-fail) #P "/tmp/lisp-unit2.tap")
    
    cat /tmp/lisp-unit2.tap
    TAP version 13
    1..1
    not ok 1 LISP-UNIT2::T1-FAIL (0.00 s)
        ---
         # FAILED (1)
         # Failed Form: (ASSERT-EQL 1 2)
         # Expected 1
         # but saw 2
         #
         #
        ...
    

    Or we can wrap a (with-open-file macro targetting lisp-unit2::*test-stream* to write any of the other formats to file.

    (with-open-file (*test-stream* #P "/tmp/lisp-unit2.summary"  :direction :output
                                                   :if-exists :supersede
                                                   :external-format :utf-8
                                                   :element-type :default)
      (print-summary (run-tests :tests 't1-fail)))
    
    cat /tmp/lisp-unit2.summary
    LISP-UNIT2::T1-FAIL - FAILED (0.00s) : 0 assertions passed
      | FAILED (1)
      | Failed Form: (ASSERT-EQL 1 2)
      | Expected 1
      | but saw 2
      |
      |
    Test Summary for :LISP-UNIT2 (1 tests 0.00 sec)
      | 1 assertions total
      | 0 passed
      | 1 failed
      | 0 execution errors
      | 0 warnings
      | 0 empty
      | 0 missing tests
    

    No description for this link

    1. Fixtures and Freezing Data

      Lisp-unit2 refers to fixtures as "context". The following is an example from the source files of how to build up a context that can be used in a test.

      (defun meta-test-context (body-fn)
        (let ((lisp-unit2::*test-db* *example-db*)
              *debugger-hook*
              (lisp-unit2::*test-stream* (make-broadcast-stream)))
          (handler-bind
              ((warning #'muffle-warning))
            (funcall body-fn))))
      
      (defmacro with-meta-test-context (() &body body)
        `(meta-test-context
          (lambda () ,@body)))
      
      (define-test test-with-test-results (:tags '(meta-tests)
                                           :contexts #'meta-test-context)
        (let ( results )
          (lisp-unit2:with-test-signals-muffled ()
            (lisp-unit2:with-test-results (:collection-place results)
              (lisp-unit2:run-tests :tags 'warnings)
              (lisp-unit2:run-tests :tags 'examples)))
          ;; subtract-integer-test calls run-tests
          (assert-eql 3 (len results))
          (assert-typep 'lisp-unit2::test-results-db (first results))))
      
  7. Removing tests

    Lisp-unit2 has a uninstall-test function and a undefine-test macro.

  8. Sequencing, Random and Failure Only

    Tests are run in sequential order. There does not appear to be provision for only running the tests that failed last time.

  9. Skip Capability

    None noted

  10. Random Data Generators

    Lisp-unit2 has various functions for generating random data. See examples below:

    (complex-random #C(5 3))
    #C(4 1)
    
    (make-random-2d-array 2 3)
    #2A((0.03395796 0.55509293 0.34209597) (0.5823394 0.8771157 0.29430425))
    
    (make-random-2d-list 2 3)
    ((0.18096626 0.916595 0.88126934) (0.45945048 0.8838378 0.57314146))
    
    (make-random-list 3)
    (0.5449568 0.32319236 0.7780224)
    
    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

23.4 Discussion

Hmm. My ccl version 1.12 LinuxX8664 decided not to compile Lisp-Unit2 with the following error:

Read error between positions 2071 and 2506 in /home/sabra/quicklisp/dists/quicklisp/software/lisp-unit2-20180131-git/interop.lisp.
> Error: Reader error: No external symbol named "*SYSTEMS-BEING-OPERATED*" in package #<Package "ASDF/OPERATE"> .
> While executing: CCL::%PARSE-TOKEN, in process listener(1).

top

23.5 Who Uses Lift-Unit2

Many libraries on quicklisp use Lisp-Unit2. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :lisp-unit2)

top

24 nst

24.1 Summary

homepage John Maraist LLGPL3 latest 2021

You get a sense of what NST is focused on when the README starts with fixtures and states that the criterion testing has its own DSL.

This focus on fixtures is further reinforced by the definition of a test (what I have been calling an assertion in the context of the other frameworks).:

(def-test NAME ( [ :group GROUP-NAME ]
                  [ :setup FORM ]
                  [ :cleanup FORM ]
                  [ :startup FORM ]
                  [ :finish FORM ]
                  [ :fixtures (FIXTURE FIXTURE ... FIXTURE) ]
                  [ :aspirational FLAG ]
                  [ :documentation STRING ] )
    criterion
 FORM ... FORM)

Obviously this is a framework intended for complexity. That brings two problems. The first is the learning curve for the DSL. The second is the overhead that the infrastructure brings with it. Look at the stacked-ranking-benchmarks. It is almost as bad as clunit in runtime and multiple orders of magnitude worse than anything is bytes consed and eval-calls.

If you do not require the ability to handle serious complexity in your tests, look elsewhere.

24.2 Assertion Functions

assert-criterion assert-eq assert-eql
assert-equal assert-equalp assert-non-nil
assert-not-eq assert-not-eql assert-not-equal
assert-not-equalp assert-null assert-zero

top

24.3 Usage

First some terminology. What I have been referring to as assertions, NST refers to as a test. What I have been referring to as a test, NST refers to as a test-group.

Second, NST has its own DSL that you need to understand in order to use it. It can obviously handle complex systems, but that comes at a learning curve cost and some things that I find easy in CL I do not find as easy in NST's criterion language (probably speaks to my limitations).

  1. Report Format

    The level of detail from reports is dependent on the verbosity setting. Valid commands are:

    (nst-cmd :set :verbose :silent)
    (nst-cmd :set :verbose :quiet)
    (nst-cmd :set :verbose :verbose)
    (nst-cmd :set :verbose :vverbose)
    (nst-cmd :set :verbose :trace)
    

    You can get a little more detail using the :detail parameter.

    (nst-cmd :detail [blank, package-name, group-name, or group-name *and* test-tname])
    

    To switch to interactive debug behavior the following commands are necessary:

    (nst-cmd :debug-on-error t)
    (nst-cmd :debug-on-fail t)
    
    (nst-cmd :debug) ;; will set both to t
    

    The README contains the following warning: "This behavior is less useful than it may seem; by the time the results of the test are examined for failure, the stack from the actual form evaluation will usually have been released. Still, this switch is useful for inspecting the environment in which a failing test was run."

    We will show the different reporting levels with the first two groups of examples.

  2. Basics

    NST requires that you have groups defined and each test must belong to a group. If you are defining the tests within the definition of a group, you do not need to specify the group. The empty form after the group name in the following example is used for fixtures

    1. All Passing Basic Test
      (def-test-group tf-basic-pass ()
        (def-test t1-1
          :true (= 1 1))
        (def-test t1-2
          :true (not (= 1 2)))
        (def-test t1-3
            (:eq 'a) 'a)
        (def-test t1-4
          :forms-eq (cadr '(a b c))
          (caddr '(a c b)))
        (def-test t1-5
            (:err :type division-by-zero)
          (error 'division-by-zero)))
      

      Now you need to run a mini command line to run tests and get a report. We will go from least verbose to most verbose.

      1. Silent
        (nst-cmd :set :verbose :silent)
        
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        (nst-cmd :run t1)
        Check T1 (group TF-BASIC) passed
        TOTAL: 1 of 1 passed (0 failed, 0 errors, 0 warnings)
        
      2. Quiet
        (nst-cmd :set :verbose :quiet)
        
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        - Executing test T1-1
        Check T1-1 (group TF-BASIC-PASS) passed.
        - Executing test T1-2
        Check T1-2 (group TF-BASIC-PASS) passed.
        - Executing test T1-3
        Check T1-3 (group TF-BASIC-PASS) passed.
        - Executing test T1-4
        Check T1-4 (group TF-BASIC-PASS) passed.
        - Executing test T1-5
        Check T1-5 (group TF-BASIC-PASS) passed.
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        
      3. VVerbose (verbose is generally the same as quiet)
        (nst-cmd :set :verbose :vverbose)
        (nst-cmd :run tf-basic-pass)
        Running group TF-BASIC-PASS
        Starting run loop for #S(NST::GROUP-RECORD
                                 :NAME TF-BASIC-PASS
                                 :ANON-FIXTURE-FORMS NIL
                                 :ASPIRATIONAL NIL
                                 :GIVEN-FIXTURES NIL
                                 :DOCUMENTATION NIL
                                 :TESTS #<HASH-TABLE :TEST EQ :COUNT 5 {1008848E43}>
                                 :FIXTURES-SETUP-THUNK NIL
                                 :FIXTURES-CLEANUP-THUNK NIL
                                 :WITHFIXTURES-SETUP-THUNK NIL
                                 :WITHFIXTURES-CLEANUP-THUNK NIL
                                 :EACHTEST-SETUP-THUNK NIL
                                 :EACHTEST-CLEANUP-THUNK NIL
                                 :INCLUDE-GROUPS NIL)
        - Executing test T1-1
        Applying criterion :TRUE
        to (MULTIPLE-VALUE-LIST (= 1 1))
        Result at :TRUE is Check T1-1 (group TF-BASIC-PASS) passed.
        Check T1-1 (group TF-BASIC-PASS) passed.
        - Executing test T1-2
        Applying criterion :TRUE
        to (MULTIPLE-VALUE-LIST (NOT (= 1 2)))
        Result at :TRUE is Check T1-2 (group TF-BASIC-PASS) passed.
        Check T1-2 (group TF-BASIC-PASS) passed.
        - Executing test T1-3
        Applying criterion :EQ 'A
        to (MULTIPLE-VALUE-LIST 'A)
        Result at :EQ is Check T1-3 (group TF-BASIC-PASS) passed.
        Check T1-3 (group TF-BASIC-PASS) passed.
        - Executing test T1-4
        Applying criterion :FORMS-EQ
        to (LIST (CADR '(A B C)) (CADDR '(A C B)))
        Applying criterion :PREDICATE EQ
        to (LIST (CADR '(A B C)) (CADDR '(A C B)))
        Result at :PREDICATE is Check T1-4 (group TF-BASIC-PASS) passed.
        Result at :FORMS-EQ is Check T1-4 (group TF-BASIC-PASS) passed.
        Check T1-4 (group TF-BASIC-PASS) passed.
        - Executing test T1-5
        Applying criterion :ERR :TYPE DIVISION-BY-ZERO
        to (MULTIPLE-VALUE-LIST (ERROR 'DIVISION-BY-ZERO))
        Result at :ERR is Check T1-5 (group TF-BASIC-PASS) passed.
        Check T1-5 (group TF-BASIC-PASS) passed.
        Group TF-BASIC-PASS: 5 of 5 passed
        TOTAL: 5 of 5 passed (0 failed, 0 errors, 0 warnings)
        
    2. All Failing Basic Test

      Now a test where everything fails or creates an error. This time we define the test-group separately, then each test separately. As a result we will need to define the group in the test definition. You can insert a :documentation string into a test but it only gets printed at the vverbose level of verbosity.

      (def-test-group tf-basic-fail ())
      
      (def-test (t1-1 :group tf-basic-fail)
        :true (= 1 2))
      (def-test (t1-2 :group tf-basic-fail)
        :true (not (= 2 2)))
      (def-test (t1-3 :group tf-basic-fail)
          (:eq 'a) 'b)
      (def-test (t1-4 :group tf-basic-fail)
        :forms-eq (cadr '(a d c))
        (caddr '(a c b)))
      (def-test (t1-5 :group tf-basic-fail)
          (:err :type division-by-zero)
        (error 'floating-point-overflow))
      

      Again, going from least verbose to most verbose.

      1. Silent
        (nst-cmd :set :verbose :silent)
        
        (nst-cmd :run tf-basic-fail)
        Running group TF-BASIC-FAIL
        Group TF-BASIC-FAIL: 0 of 5 passed
        - Check T1-1 failed
        - Expected non-null, got: NIL
        - Check T1-2 failed
        - Expected non-null, got: NIL
        - Check T1-3 failed
        - Value B not eq to value of A
        - Check T1-4 failed
        - Predicate EQ fails for (D B)
        - Check T1-5 raised an error
        TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
        
      2. Quiet
        (nst-cmd :set :verbose :quiet)
        
        (nst-cmd :run tf-basic-fail)
        Running group TF-BASIC-FAIL
        - Executing test T1-1
        Check T1-1 (group TF-BASIC-FAIL) failed
        - Expected non-null, got: NIL
        - Executing test T1-2
        Check T1-2 (group TF-BASIC-FAIL) failed
        - Expected non-null, got: NIL
        - Executing test T1-3
        Check T1-3 (group TF-BASIC-FAIL) failed
        - Value B not eq to value of A
        - Executing test T1-4
        Check T1-4 (group TF-BASIC-FAIL) failed
        - Predicate EQ fails for (D B)
        - Executing test T1-5
        Check T1-5 (group TF-BASIC-FAIL) raised an error
        Group TF-BASIC-FAIL: 0 of 5 passed
        - Check T1-1 failed
        - Expected non-null, got: NIL
        - Check T1-2 failed
        - Expected non-null, got: NIL
        - Check T1-3 failed
        - Value B not eq to value of A
        - Check T1-4 failed
        - Predicate EQ fails for (D B)
        - Check T1-5 raised an error
        TOTAL: 0 of 5 passed (4 failed, 1 error, 0 warnings)
        

        For the sake of brevity, we will skip verbose and vverbose and trace. You get the picture.

  3. Edge Cases: Value expressions, loops, closures and calling other tests
    1. Value expressions

      NST has a values criterion that looks at the results coming from a values expression individually. Otherwise it will only look at the first value. The following two versions pass.

      (def-test-group tf-basic-values ()
        (def-test (t1-1 :group tf-basic-values)
            (:equalp (values 1 2)) 1)
      
        (def-test (t1-2 :group tf-basic-values)
            (:values (:eql 1) (:eql 2)) (values 1 2)))
      
    2. Looping and closures.

      NST does not have a loop construct. What it does have is :each, which takes an optional item and a comparision test and applies it to a list. In the following examples, we

      (def-test-group tf-basic-each ())
      
      (def-test (each1 :group tf-basic-each)
          (:each (:symbol a))
        '(a a a a a))
      
      (def-test (each2 :group tf-basic-each)
          (:each (:apply write-to-string (:equal "A")))
        '(a a a a a))
      

      Like the clunits and lisp-units, nst does not look for variables in a closure surrounding the test defintion. The following will not work.

      (let ((lst '(a a a a a)))
        (def-test (each3 :group tf-basic-each)
            (:each (:apply write-to-string (:equal "A")))
          lst))
      

      When I tried calling other tests from inside an NST test, I triggered stack exhaustion errors. So probably do not do that.

  4. Suites, tags and other multiple test abilities

    You can define an nst-package which can contain nst-groups which can contain nst-tests. That seems to be the limit of nestability. So an nst-package is what I have been calling a suite when talking about other frameworks.

    top

  5. Fixtures and Freezing Data

    NST has fixtures. And fixtures. and… The following is just a simple example, please look at the documentation for more details. First we define three groups of fixtures. We intend to use the first one at the test-group level (all tests in that group have access) and the next two at individual test level (just the tests specifying that will have access). We will pretend the first two groups have an expensive calculation that we want to cache to avoid doing the calculation every time the fixture is called. The empty form after the fixtures group name takes a lot of different options and you will have to read the documentation for that..

    (def-fixtures suite-fix-1 ()
      (sf1-a '(1 2 3 4))
      ((:cache t) sf1-b (* 23 47)))
    
    (def-fixtures test-fix-1 ()
      (tf1-a '("a" "b" "c" "d"))
      ((:cache t) tf1-b (- 2 1)))
    
    (def-fixtures test-fix-2 ()
      (tf2-a "some boring string here"))
    

    Now we define a test-group that uses those fixtures. In test t3, we do not need to call out the suite level fixture in the list of fixtures to be accessed by that test.

    (nst:def-test-group tf-nst-fix (suite-fix-1)
      (def-test (t1 :group tf-nst-fix :fixtures (test-fix-1))
          (:eql b) 1)
      (def-test (t2 :group tf-nst-fix :fixtures (test-fix-1))
        (:each (:apply length (:equal 1)))
        a)
      (def-test (t3 :group tf-nst-fix :fixtures (test-fix-1 test-fix-2))
         (:equal "some boring string here-a1-1081")
         (format nil "~a-~a~a-~a" tf2-a (first tf1-a) tf1-b sf1-a)))
    
  6. Removing tests

    I do not see a function for removing tests.

  7. Sequencing, Random and Failure Only

    I did not see any shuffle functionality and the tests seem to run only in sequential order.

    There is a make-failure-report function, but I did not see something that looked like the ability to rerun just failing tests.

  8. Skip Capability

    NST seems to offer skips only in the context of running interactively and letting a condition handler in the debugger ask you if you want to skip the test-group or remaining tests.

  9. Random Data Generators

    NST has extensive random data generators. Please see the documentation for details.

24.4 Discussion

We have already seen in the benchmarking session that either I am doing something wrong with NST or it's infrastructure overhead creates speed issues. It can obviously handle very complex systems. That comes, however, at the cost of having to learn a new DSL and I found the criterion learning curve much steeper than I expected.

Take something as simple as a list of characters and integers and validating that the integer is the char-code for the character. First a plain CL version. There are a lot of different ways to do this in CL using every or loop or mapcar. Below is just one of those ways.

(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100)))

(every #'(lambda (x)
           (eq (char-code (first x)) (second x)))
       *test-lst*)

Now an NST version. There may be better ways to write this but as far as I can tell :apply does not accept a lambda function.

(defparameter *test-lst* '((#\a 97) (#\b 98) (#\c 99) (#\d 100)))

(defun test-char-code-sublist (lst)
  (eq (char-code (first lst))
      (second lst)))

(def-fixtures fixture1 ()
  (lst *test-lst*))

(nst:def-test-group tf-nst ()
  (def-test (t-char-codes :group tf-nst :fixtures (lst))
    (:each (:apply test-char-code-sublist (:eq t)))
    lst))

I find myself writing a lot more than I feel necessary in what seems like simple situations. A large part may be simply because I am not going to get far enough up the learning curve for NST given my needs. An article introducing NST claims:

[For simple examples] "the overhead of a separate criteria language seems hardly justifiable. In fact, the criteria bring two primary advantages over Lisp forms. First, criteria can report more detailed information than just pass or fail. In a larger application where the tested values are more complicated objects and structures, the reason for a test's failure may be more subtle. More informative reports can significantly assist the programmer, especially when validating changes to less familiar older or others' code. Moreover, NST's criteria can report multiple reasons for failure. Such more complicated analyses can reduce the boilerplate involved in writing tests; one test against a conjunctive criterion can provide as informative a result as a series of sep-arate tests on the same form. As a project grows larger and more complex, and as a team of programmers and testers becomes less intimately familiar with all of the components of a system, criteria can both reduce tests’ overall verbosity, while at the same time raising the usefulness of their results."

Unfortunately for me, many times in trying to learn the DSL, NST reported that it raised an error but refused to tell me what the error was, regardless of the level of verbosity I set.

Maybe it is just me, but every time I tried to redefine a test, I triggered a hash table error and needed to define a test with a new name.

24.5 Who Uses NST

25 parachute

top

25.1 Summary

homepage Nicolas Hafner zlib 2021

It hits almost everything on my wish list - optionality on progress reports and debugging, good suite setup and reporting, good default error reporting and the ability to provide diagnostic strings with variables, skip failing dependencies, set time limits on long running tests and has decent fixture capability. It does not have the built-in ability to re-run just the last failing tests, but that is a relatively easy add-on. (see Discussion) The bigger limitation is that fixtures at a parent test level (suite level) do not apply to nested child tests. While it is not the fastest, it is in the pack as opposed to the also-rans.

The name of a test is coerced into a string internally, so test names are not functions that can be called on their own.

There are three types of reports: quiet, plain (the default) and interactive (throwing you into the debugger).

Parachute does allow you to set time limits for tests and will report the times for tests.

My wish list would be for (1) the ability of fixtures to apply down the list of nested child tests and (2) for there to be a built-in ability for tests to keep a list of the last tests that failed so that you could just run those over again after you think you have fixed all your bugs.

25.2 Assertion Functions

true false fail is isnt
is-values isnt-values of-type finish  

As with other frameworks, finish simply indicates that the test does not produce an error.

top

25.3 Usage

  1. Report Format

    Parachute provides three types of reports at the moment. Each returns a test-result object as well as printing a report to stream.

    • Quiet for when you just want the summary
    • Interactive for when you want to go into the debugger on failures, and
    • Plain (the default) for the nice progress report with checks and timing and …

    So a basic failing test to show the differences in reporting. We are passing the third assertion a string after the two tested items which can help diagnose failures, followed by the two variables being compared, and then running it to show the default failure report.

    (define-test t1-fail
      (let ((x 1) (y 2))
        (is = 1 2)
        (is equal 1 2)
        (is =  x y "Intentional failure ~a does not equal ~a" x y)
         (fail (error 'floating-point-overflow)
           'division-by-zero)))
    

    Now the quiet report version:

    1. Quiet
      (test 't1-fail :report 'quiet)
      #<QUIET 5, FAILED results>
      
    2. Default "Plain" Report
      (test 't1-fail)
              ? TF-PARACHUTE::T1-FAIL
        0.000 ✘   (is = 1 2)
        0.000 ✘   (is equal 1 2)
        0.000 ✘   (is = x y)
        0.000 ✘   (fail (error 'floating-point-overflow) 'division-by-zero)
        0.010 ✘ TF-PARACHUTE::T1-FAIL
      
      ;; Summary:
      Passed:     0
      Failed:     4
      Skipped:    0
      
      ;; Failures:
         4/   4 tests failed in TF-PARACHUTE::T1-FAIL
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under =.
      
      The test form   2
      evaluated to    2
      when            1
      was expected to be equal under EQUAL.
      
      The test form   y
      evaluated to    2
      when            1
      was expected to be equal under =.
      Intentional failure 1 does not equal 2
      
      The test form   (capture-error (error 'floating-point-overflow))
      evaluated to    [floating-point-overflow] arithmetic error floating-point-overflow signalled
      when            division-by-zero
      was expected to be equal under TYPEP.
      
      #<PLAIN 5, FAILED results>
      

      If you start looking at summary reports for nested tests and notice that the number of test results is not greater than the number of assertions, just remember that a nested test is itself considered an assertion and will pass if all its assertions pass or fail if any of its assertions fail.

    3. Interactive

      The interactive report which throws you into the debugger:

      (test 't1-fail :report 'interactive)
        Test (is = 1 2) failed:
        The test form   2
        evaluated to    2
        when            1
        was expected to be equal under =.
           [Condition of type SIMPLE-ERROR]
      
        Restarts:
         0: [RETRY] Retry testing (is = 1 2)
         1: [ABORT] Continue, failing (is = 1 2)
         2: [CONTINUE] Continue, skipping (is = 1 2)
         3: [PASS] Continue, passing (is = 1 2)
         4: [RETRY] Retry testing TF-PARACHUTE::T1-FAIL
         5: [ABORT] Continue, failing TF-PARACHUTE::T1-FAIL
      
  2. Basics

    So lets look at a test where we know everything will pass, using the default report. This will give us a view of the syntax for various types of assertions.

    (define-test t1
      (true (= 1 1))
      (true "happy")
      (false (numberp "no"))
      (of-type integer 5)
      (of-type character #\space)
      (is = 1 1)
      (is equal "abc" "abc")
      (isnt equal "abc" "d")
      (is-values (values 1 2)
        (= (values 1 2)))
      (is-values (values 1 "a")
        (= 1)
        (string= "a"))
      (fail (error 'division-by-zero)
          'division-by-zero))
    
    (test 't1)
    
            ? TF-PARACHUTE::T1
      0.000 ✔   (true (= 1 1))
      0.000 ✔   (true "happy")
      0.000 ✔   (false (numberp "no"))
      0.000 ✔   (of-type integer 5)
      0.000 ✔   (of-type character #\ )
      0.000 ✔   (is = 1 1)
      0.000 ✔   (is equal "abc" "abc")
      0.000 ✔   (isnt equal "abc" "d")
      0.000 ✔   (is-values (values 1 2) (= (values 1 2)))
      0.000 ✔   (is-values (values 1 "a") (= 1) (string= "a"))
      0.000 ✔   (fail (error 'division-by-zero) 'division-by-zero)
      0.030 ✔ TF-PARACHUTE::T1
    
    ;; Summary:
    Passed:    11
    Failed:     0
    Skipped:    0
    #<PLAIN 12, PASSED results>
    

    As you would hope, changing tested functions does not require manually recompiling parachute tests. We will skip the proof.

  3. Edge Cases: Values expressions, loops. closures and calling other tests
    1. Values expressions

      Parachute has special functionality for dealing with values expressions with its is-values testing function as we saw just above. If you used another testing function and passed values expressions to it, they would be accepted but, as expected, Parachute would only look at the first value in the values expression.

    2. Now looping and closures

      Parachute has no problem with loops or finding variables that have been set in a closure containing the test.

        (let ((l1 '(#\a #\B #\z))
              (l2 '(97 66 122)))
        (define-test t2-loop
          (loop for x in l1 for y in l2 do
            (true (= (char-code x) y)))))
      
      (test 't2-loop :report 'quiet)
      #<QUIET 4, PASSED results>
      
    3. Calling a test inside another test
      (define-test t3 ; a test that tries to call another test in its body
          (true (eql 'a 'a))
          (test 't2))
      
      (test 't3)
              ? TF-PARACHUTE::T3
        0.000 ✔   (true (eql 'a 'a))
              ?   TF-PARACHUTE::T2
        0.000 ✔     (true (= 1 1))
        0.000 ✘     (true (= 2 1))
        0.000 ✔     (is-values (values 1 2) (= (values 1 2)))
        0.000 ✔     (is-values (values 1 "a") (= 1) (string= "a"))
        0.000 ✘   TF-PARACHUTE::T2
      
      ;; Summary:
      Passed:     3
      Failed:     1
      Skipped:    0
      
      ;; Failures:
         1/   4 tests failed in TF-PARACHUTE::T2
      t2 description here
      The test form   (= 2 1)
      evaluated to    ()
      when            t
      was expected to be equal under GEQ.
      
        0.003 ✘ TF-PARACHUTE::T3
      
      ;; Summary:
      Passed:     1
      Failed:     1
      Skipped:    0
      
      ;; Failures:
         2/   3 tests failed in TF-PARACHUTE::T3
      Test for T2 failed.
      #<PLAIN 3, FAILED results>
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Parachute can handle lists of tests.

      (test '(t1 t2))
      
    2. Suites

      Everything is a test in parachute and tests can have parent tests just by adding a :parent <insert-name-here>. This makes suite inheritance easy as demonstrated below..

        (define-test s0)
      
        (define-test t4
          :parent s0
          (true (= 1 1))
          (false (= 1 2)))
      
      (define-test t4-1
        :parent t4
        (true (= 1 1))
        (false (= 1 2)))
      

      Now we can test 's0 and we will get the results for 't4 and t4-1

      (test 's0)
              ? TF-PARACHUTE::S0
              ?   TF-PARACHUTE::T4
        0.000 ✔     (true (= 1 1))
        0.000 ✔     (false (= 1 2))
              ?     TF-PARACHUTE::T4-1
        0.000 ✔       (true (= 1 1))
        0.000 ✔       (false (= 1 2))
        0.003 ✔     TF-PARACHUTE::T4-1
        0.010 ✔   TF-PARACHUTE::T4
        0.010 ✔ TF-PARACHUTE::S0
      
      ;; Summary:
      Passed:     4
      Failed:     0
      Skipped:    0
      #<PLAIN 7, PASSED results>
      

      top

  5. Fixtures and Freezing Data

    First checking whether we can freeze data, change it in the test, then change it back

    (defparameter *keep-this-data* 1)
    (define-test t-freeze-1
      :fix (*keep-this-data*)
      (setf *keep-this-data* "new")
      (true (stringp *keep-this-data*)))
    
    (define-test t-freeze-2
      (is = *keep-this-data* 1))
    
    (test '(t-freeze-1 t-freeze-2))
    

    Now the classic fixture - create a data set for some series of tests and clean it up afterwards

    ;; Create a class for data fixture purposes
    (defclass class-A ()
      ((a :initarg :a :initform 0 :accessor a)
       (b :initarg :b :initform 0 :accessor b)))
    
    (defparameter *some-existing-data-parameter*
      (make-instance 'class-A :a 17.3 :b -12))
    
    (def-fixture f1 ()
      (let ((old-parameter *some-existing-data-parameter*))
        (setf *some-existing-data-parameter*
            (make-instance 'class-A :a 100 :b -100))
        (&body)
        (setf *some-existing-data-parameter* old-parameter)))
    
    (def-test t6-f1 (:fixture f1)
      (is (equal (a *some-existing-data-parameter*) 100))
      (is (equal (b *some-existing-data-parameter*) -100)))
    
    ;; now you can check (a *some-existing-data-parameter*) to ensure defining the test has not changed *some-existing-data-parameter*
    
    (run! 't6-f1)
    
    Running test T6-F1 ..
     Did 2 checks.
        Pass: 2 (100%)
        Skip: 0 ( 0%)
        Fail: 0 ( 0%)
    

    Unfortunately, fixtures are not visible at the child test level. This is shown in the following example.

      (defparameter *my-param* 1)
      (define-test parent-test-1
          (with-fixtures '(*my-param*)
            (setf *my-param* 2)
            (is = 2 *my-param*)))
      (define-test child-test-1 :parent parent-test-1
        (is = 2 *my-param*)
        (is eq 'a 'a))
    
    (test 'parent-test-1)
            ? TF-PARACHUTE::PARENT-TEST-1
      0.000 ✔   (is = 2 *my-param*)
            ?   TF-PARACHUTE::CHILD-TEST-1
      0.000 ✘     (is = 2 *my-param*)
      0.000 ✔     (is eq 'a 'a)
      0.000 ✘   TF-PARACHUTE::CHILD-TEST-1
      0.003 ✘ TF-PARACHUTE::PARENT-TEST-1
    
    ;; Summary:
    Passed:     2
    Failed:     1
    Skipped:    0
    
    ;; Failures:
       1/   2 tests failed in TF-PARACHUTE::PARENT-TEST-1
       1/   2 tests failed in TF-PARACHUTE::CHILD-TEST-1
    The test form   *my-param*
    evaluated to    1
    when            2
    was expected to be equal under =.
    

    So effectively you cannot set suite level fixtures with Parachute. In case you were trying to reconcile the failure counts, there was only 1 assertion that failed. However two tests failed at the child-test-1 level - the assertion on *my-param* and, therefore, child-test-1 itself. Two tests failed at the parent-test-1 level - child-test-1 and, therefore, parent-test-1 itself.

  6. Removing tests

    Parachute can remove specific tests with remove-test or all the tests in a package with remove-all-tests-in-package.

    (remove-test 't1)
    
    (remove-all-tests-in-package optional-package-name)
    
  7. Sequencing, Random and Failure Only

    While tests normally follow a sequential order, parachute allows you to specify to either shuffle the assertions in a test or shuffle the tests within a suite.

      (define-test shuffle
        :serial NIL
        ...)
    
    (define-test shuffle-suite :serial NIL)
    
  8. Skip Capability

    Parachute has multiple skip abilities including skipping based on assertions, tests or implementations.

    1. Assertions
      (define-test stuff
        (true :pass)
        (skip "Not ready yet"
          (is = 5 (some-unimplemented-function 10))))
      
    2. Tests
      (define-test suite
        :skip (test-a))
      
    3. Implementation
      (define-test stuff
        (skip-on (clisp) "Not supported on clisp."
          (is equal #p"a/b/" (merge-pathnames "b/" "a/"))))
      
  9. Random Data Generators

    None, but the helper libraries can fullfill this well.

25.4 Discussion

It is certainly possible to extend parachute to retest just the tests that failed last time. Since parachute can run against a list of tests, all you need is a function to save a list of the names of the tests that fail. The following might be one way to do that.

(defun collect-test-failure-names (test-results)
  "This function takes the report output of a parachute test and returns a list of the
   names of the tests that failed."
  (when (typep test-results 'parachute:report)
    (loop for test-result across (results test-results)
          when (and (typep test-result 'parachute::test-result)
                    (eq (status test-result) :failed))
            collect (name (expression test-result)))))

You might also consider whether the helper library Protest would be a good add-on as it has a Parachute module.

Since I questioned what is meant by extensibility at the very beginning of this report, allow me to quote the Parachute documentation:

"Extending Parachute Test and Result Evaluation

"Parachute follows its own evaluation semantics in order to run tests. Primarily this means that most everything goes through one central function called eval-in-context. This functions allows you to customise evaluation based on both what the context is, and what the object being "evaluated" is.

Usually the context is a report object, but other situations might also be conceived. Either way, it is your responsibility to add methods to this function when you add a new result type, some kind of test subclass, or a new report type that you want to customise according to your desired behaviour.

The evaluation of results is decoupled from the context and reports in the sense that their behaviour does not, by default, depend on it. At the most basic, the result class defines a single :around method that takes care of recording the duration of the test evaluation, setting a default status after finishing without errors, and skipping evaluation if the status is already set to something other than :unknown.

Next we have a result object that is interesting for anything that actually produces direct test results– value-result. Upon evaluation, if the value slot is not yet bound, it calls its body function and stores the return value thereof in the value slot.

However, the result type that is actually used for all standard test forms is the comparison-result. This also takes a comparator function and an expected result to compare against upon completion of the test. If the results match, then the test status is set to :passed, otherwise to :failed.

Since Parachute allows for a hierarchy in your tests, there have to be aggregate results as well, and indeed there are. Two of them, actually. First is the base case, namely parent-result which does two things on evaluation: one, it binds *parent* to itself to allow other results to register themselves upon construction, and two it sets its status to :failed if any of the children have failed.

Finally we have the test-result which takes care of properly evaluating an actual test object. What this means is to evaluate all dependencies before anything else happens, and to check the time limit after everything else has happened. If the time limit has exceeded, set the description accordingly and mark the result as :failed. For its main eval-in-context method however it checks whether any of the dependencies have failed, and if so, mark itself as :skipped. Otherwise it calls eval-in-context on the actual test object.

The default evaluation procedure for a test itself is to simply call all the functions in the tests list in a with-fixtures environment.

And that describes the semantics of default test procedures. Actual test forms like is are created through macros that emit an (eval-in-context context (make-instance 'comparison-result #|…|#)) form. The *context* object is automatically bound to the context object on call of eval-in-context and thus always refers to the current context object. This allows results to be evaluated even from within opaque parts like user-defined functions.

Report Generation

"It should be possible to get any kind of reporting behaviour you want by adding methods that specialise on your report object to eval-in-context. For the simple case where you want something that prints to the REPL but has a different style than the preset plain report, you can simply subclass that and specialise on the report-on and summarize functions that then produce the output you want.

Since you can control pretty much every aspect of evaluation rather closely, very different behaviours and recovery mechanisms are also possible to achieve. One final aspect to note is result-for-testable, which should return an appropriate result object for the given testable. This should only return fresh result objects if no result is already known for the testable in the given context. The standard tests provide for this, however they only ever return a standard test-result instance. If you need to customise the behaviour of the evaluation for that part, it would be a wise idea to subclass test-result and make sure to return instances thereof from result-for-testable for your report.

Finally it should be noted that if you happen to create new result types that you might want to run using the default reports, you should add methods to format-result that specialise on the keywords :oneline and :extensive for the type. These should return a string containing an appropriate description of the test in one line or extensively, respectively. This will allow you to customise how things look to some degree without having to create a new report object entirely." top

25.5 Who uses parachute

The following list is just pulling the results (ql:who-depends-on :parachute) and adding urls to a few of them. ("3b-hdr" 3d-matrices "3d-vectors" array-utils atomics "binpack" "canonicalized-initargs" cesdi "cl-elastic" "cl-markless" "class-options" "classowary" colored com-on "compatible-metaclasses" "definitions-systems" "enhanced-boolean" "enhanced-defclass" "enhanced-find-class" "enhanced-typep" "evaled-when" "fakenil" "first-time-value" float-features "inheriting-readers" "its" "method-hooks" mmap "nyaml/test" "object-class" "origin.test" pathname-utils "protest/parachute" "radiance" "shared-preferences" "shasht/test" "simple-guess" "slot-extra-options" "trivial-custom-debugger/test" "trivial-jumptables" uax-14 uax-9 "with-output-to-stream" "with-shadowed-bindings")

26 prove

top

26.1 Summary

homepage Eitaro Fukamachi MIT 2020

As most readers will know, the author has archived Prove in favor of Rove. Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.

Prove does report all assertion failures in a test and allows user generated diagnostic messages, albeit without the ability to provide variables. Interactive debugging is optional and it does have a *default-slow-threshold* parameter which defaults to 150 milleconds to handle slow tests.

On the downside, it does not have fixtures and I find the situation with suites and tags to be confusing. I really want to be able to turn off progress reports. Finally, it is somewhat slower than most of the frameworks, but not the orders of magnitude slower that you are faced with using clunit, clunit2 or nst.

26.2 Assertion Functions

ok is isnt is-values is-type like
is-print is-error is-expand pass fail skip

top

26.3 Usage

  1. Report Format

    Set prove:*debug-on-error* T for invoking CL debugger whenever getting an error during running tests.

    Prove has four different reporters (:list, :dot, :tap or :fiveam) with :list being the default) for different formatting. Set prove:*default-reporter* to the desired reporter to change the format. Lets take a very basic failing test just to see what the syntax and output looks like.

    We have inserted diagnostic strings into the first two assertions. The second string has a format directive just to show that prove does not use them in these tests.

    (deftest t1-fail
      (let ((x 1) (y 2))
       (ok (= 1 2) "We know 1 is not = 2")
       (is 3 4 "We know 3 is not ~a" 4)
       (ok (equal 1 2))
       (ok (equal x y))))
    

    If we now use run-test:

    1. List Reporter
      (run-test 't1-fail)
       T1-FAIL
          × We know 1 is not  = 2
            NIL is expected to be T
      
          × We know 3 is not ~a
            3 is expected to be 4
      
          × NIL is expected to be T
      
          × NIL is expected to be T
      
    2. TAP Reporter
      (setf *default-reporter* :tap)
       (run-test 't1-fail)
      # T1-FAIL
          not ok 1 - We know 1 is not  = 2
          #    got: NIL
          #    expected: T
          not ok 2 - We know 3 is not ~a
          #    got: 3
          #    expected: 4
          not ok 3
          #    got: NIL
          #    expected: T
          not ok 4
          #    got: NIL
          #    expected: T
      not ok 4 - T1-FAIL
      NIL
      
    3. DOT Reporter
      (setf *default-reporter* :dot)
      (run-test 't1-fail)
      .
      NIL
      
    4. Fiveam Reporter
      (setf *default-reporter* :fiveam)
      (run-test 't1-fail)
      f
      #\f
      

      We will stick with the default reporter for the rest of this section.

  2. Basics

    Prove has a limited number of test functions so we can check them all out in a single test that we know will pass. We will skip the diagnostic strings except for the like function since we know everything will pass.

    (deftest t1
      (let ((x 1) (y 2))
        (ok (= x 1))
        (is #(1 2 3) #(1 2 3) :test #'equalp)
        (isnt y 3)
        (is-values (values 1 2) '(1 2))
        (is-type #(1 2 3) 'simple-vector)
        (like "su9" "\\d" "Do we have a digit in the tested string?")
        (is-print (princ "jabberwok") "jabberwok")
        (is-error (error 'division-by-zero) 'division-by-zero)))
    

    The like test function uses regex for cl-ppcre. The default list reporter will turn this around as:

    (run-test 't1)
     T1
        ✓ T is expected to be T
    
        ✓ #(1 2 3) is expected to be #(1 2 3)
    
        ✓ 2 is not expected to be 3
    
        ✓ (1 2) is expected to be (1 2)
    
        ✓ #(1 2 3) is expected to be a type of SIMPLE-VECTOR
    
        ✓ Do we have a digit in the tested string?
    
        ✓ (PRINC "jabberwok") is expected to output "jabberwok" (got "jabberwok")
    
        ✓ (ERROR 'DIVISION-BY-ZERO) is expected to raise a condition DIVISION-BY-ZERO (got #<DIVISION-BY-ZERO {10027D1673}>)
    NIL
    

    As you would hope, changing tested functions does not require manually recompiling prove tests. We will skip the proof.

  3. Edge Cases: Values expressions, loops. closures and calling other tests
    1. Value expressions

      Similar to NST and Parachute, Prove does have special functionality with respect to values expressions and can look at the individual values coming from a values expression.

      (deftest t2-values
        (is-values (values 1 2) '(1 2)) ; passes
        (is-values (values 1 2) '(1 3)) ; fails
        (ok (equalp (values 1 2) (values 1 3)))) ; passes
      
    2. Looping and closures.

      Prove has no problems with looping and taking variables declared in a closure surrounding the test. The following passes.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop
          (loop for x in l1 for y in l2 do
            (ok (= (char-code x) y)))))
      
    3. Calling other tests

      Prove has a subtest macro which is intended to allow tests to be nested. So, for example, I compile the following and I get a nice indented report.

      (subtest "sub-1"
        (is 1 1)
        (is 1 2)
        (subtest "sub-2"
        (is 'a 'a)
        (is 'a 'b)))
       sub-1
          ✓ 1 is expected to be 1
      
          × 1 is expected to be 2
      
         sub-2
            ✓ A is expected to be A
      
            × A is expected to be B
      
      NIL
      
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Prove does not provide a way to run against lists of tests.

    2. Suites

      In spite of the fact that all my examples above are really done in the REPL, prove is at heart based on files of tests. So even without looking for "suite" functions or classes, each file is effectively a suite. Each package is also considered a suite and the macro subtest also creates a suite.

      Prove provides multiple functions to run different sets of tests.

      • run runs a test which can be a file pathname, a directory pathname or an asdf system name.
      • run-test runs a single test as we have been using above.
      • run-test-all runs all the tests in the current package
      • run-test-package runs all the tests in a specific package
      • run-test-system runs a testing ASDF system.

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    Prove has remove-test and remove-test-all

  7. Sequencing, Random and Failure Only

    Prove tests run sequentially and I do not see any shuffle or random order functionality. I also do not see a way to collect just the failing tests to be able to rerun just those.

  8. Skip Capability

    Prove can skip a specified number of tests using the skip function. Unfortunately it marks them as passed rather than skipped. You can provide a string as to why they were skipped, but why mark them as passed? In fact, why do you need to be counting tests? You should be able to mark particular tests as skip.

    (skip 3 "No need to test these on Mac OS X")
    ;->  ✓ No need to test these on Mac OS X (Skipped)
    ;    ✓ No need to test these on Mac OS X (Skipped)
    ;    ✓ No need to test these on Mac OS X (Skipped)
    
  9. Random Data Generators

26.4 Discussion

Prove has a lot of "market share", but I am not sure how much of that is due to cl-project and some of the other libraries by Eitaro Fukamachi like caveman2 and clack that hard code prove into what you are building. Whether you like prove or not, at least it was an attempt to get people to actually test their code.

In spite of the fact that the author has archived prove and stated that rove is now the successor, his libraries have not moved over to rove and prove still has functionality lacking in rove (and vice versa).

If I were to use prove, I would write another test reporter that did not have progress reports and would return a list of just failing tests. I would still have to write my own fixture macros. Or I could just use a framework that does that.

top

26.5 Who Uses Prove?

Many libraries on quicklisp use prove. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :prove)

top

27 ptester

top

27.1 Summary

homepage Kevin Layer LLGPL 2016

Ptester was released by Allegro. Phil Gold's commentary on ptester in his 2007 blog is still relevant today. "Ptester is barely a test framework. It has no test suites and no test functions. All it provides is a set of macros for checking function results (test (analogous to lisp-unit:assert-equal), test-error, test-no-error, test-warning, and test-no-warning) and a wrapper macro designed to enclose the test clauses which merely provides a count of success and failures at the end. ptester expects that all testing is done in predefined functions and lacks the dynamic approach present in other frameworks."

Yes, you can do testing with it, but you can do much better with other frameworks.

27.2 Assertion Functions

None - normal CL predicates resolving to T or nil

top

27.3 Usage

  1. Report Format

    Reporting or interactivity is optional. Set *break-on-test-failures* if you want to go into interactive debugging when a test failure occurs.

  2. Basics

    The test macro by default applies eql to the subsequent arguments. This can be changed by specifying the actual test to use. The following including assertions about errors and warnings. The one item that might need a little explanation is the values test where we can explicitly flag to the test that it needs to be looking at multiple values.

      (with-tests (:name "t1")
        (test 1 1)
        (test 'a 'a)
        (test "ptester" "ptester" :test 'equal)
        (test  '(a b c) (values 'a 'b 'c) :multiple-values t)
        (test-error (error 'division-by-zero) :condition-type 'division-by-zero)
        (test-warning (warn "foo")))
    
    Begin t1 test
    **********************************
    End t1 test
    Errors detected in this test: 0
    Successes this test:6
    

    Now with a deliberately failing test. No, you cannot compare two values expressions with each other.

      (with-tests (:name "t2")
        (let ((x 2) (y 'd))
          (test x 1)
          (test y 'a)
          (test "ptester" "ptester" :test 'equal)
          (test  '(values 'a 'b 'c) (values 'a 'b 'c) :multiple-values t)
          (test-error (error 'division-by-zero) :condition-type 'floating-point-overflow)
          (test-warning (warn "foo"))))
    Begin t2 test
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: 1
      wanted: 2
         got: 1
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: 'A
      wanted: D
         got: A
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: (VALUES 'A 'B 'C)
      wanted values: VALUES, 'A, 'B, 'C
         got values: A, B, C
     * * * UNEXPECTED TEST FAILURE * * *
    Test failed: (ERROR 'DIVISION-BY-ZERO)
    Reason: detected an incorrect condition type.
      wanted: FLOATING-POINT-OVERFLOW
         got: #<SB-PCL::CONDITION-CLASS COMMON-LISP:DIVISION-BY-ZERO>
    **********************************
    End t2 test
    Errors detected in this test: 4 UNEXPECTED: 4
    Successes this test:2
    
  3. Edge Cases: Closures and calling other tests

    Ptester has no problem dealing with variables declared in a closure encompassing the test or with loops..

    Since ptester does not have a callable test "instance", a ptester test cannot call another test.

  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      None except using with-tests

    2. Suites

      None except using with-tests

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    None

  8. Skip Capability

    None

  9. Random Data Generators

    None

27.4 Discussion

You can do better with other frameworks.

top

27.5 Who depends on ptester?

("cl-base64-tests" "getopt-tests" "puri-tests")

top

28 rove

28.1 Summary

homepage Eitaro Fukamachi BSD 3 Clause 2020

If you use package-inferred systems, there may be more capabilities than if you do not. Without a package-inferred system, you get no consolidated summary of all the tests. It does have fixtures that can be used once per package or once per test, but there is no ability to use different fixtures with respect to different tests and no composable fixtures. In addition,signal testing seems incomplete compared to other frameworks.

As noted in the functionality tables, there have been reports that rove crashes with multithreaded results. See https://tychoish.com/post/programming-in-the-common-lisp-ecosystem/: "rove doesn't seem to work when multi-threaded results effectively. It's listed in the readme, but I was able to write really trivial tests that crashed the test harness."

As mentioned in the Benchmarking section, I ran into an as yet unidentified issue with rove and sbcl. Several attempts to run the benchmark (10 iterations) on sbcl triggered heap exhaustion during garbage collection (even on a clean sbcl instance).

Compared to Prove, Rove does have better failure reporting, is faster (but still not even in the middle of the pack) and has added fixtures. It is still missing some of the capabilities that that Prove has such as time limits on tests as well as test functions such as is-type, likes and is-values expression capabilities.

Given the multithreaded concerns, the issue I had with benchmarking and the missing functionality both with respect to non-package-inferred systems and in comparison to Prove, I cannot recommend Rove.

28.2 Assertion Functions

As mentioned above, Rove does not have as many assertion functions as Prove, the library it is supposed to be replacing. The assertion functions are limited to:

ok ng (not-good?) signals outputs expands pass fail

28.3 Usage

  1. Report Format

    It has three different styles of reporting. The default style is the detailed :spec style. A simpler style that just shows dot progression is :dot and a style that just reports the result is :none. Turning off progress reporting is using the :none style. We show them all in the first basic passing test.

    To go interactive rather than reporting (setf rove:*debug-on-error* t).

  2. Basics

    Starting off with a basic multiple passing assertion test. We have added a macro and an assertion that uses the expands capability provided by Rove. I admit to not being entirely clear why deftest and testing are separate macros. Adding the testing macro allows a description string, but I am not seeing other additional functionality. Can anyone hit me with a clue stick?

    (defmacro defun-addn (n)
      (let ((m (gensym "m")))
        `(defun ,(intern (format nil "ADD~A" n)) (,m)
           (+ ,m ,n))))
    
    (deftest t1
      (testing "Basic passing test"
        (ok (equal 1 1))
        (ok (signals (error 'division-by-zero) 'division-by-zero))
        (ng (equal 1 2))
        (ok (expands '(defun-addn 10)
                 `(defun add10 (#:m)
                    (+ #:m 10))))))
    
    (rove:run-test 't1)
    t1
      Basic passing test
        ✓ Expect (EQUAL 1 1) to be true.
        ✓ Expect (ERROR 'DIVISION-BY-ZERO) to signal DIVISION-BY-ZERO.
        ✓ Expect (EQUAL 1 2) to be false.
        ✓ Expect '(DEFUN-ADDN 10) to be expanded to `(DEFUN ADD10 (#:M) (+ #:M 10)).
    
    ✓ 1 test completed
    T
    

    You can add a :compile-at keyword parameter to deftest. The available options are :definition-time (the default) or :run-time.

    You can add a :style keyword parameter to run-test to get different formats. The above was the default :spec style. Below we show the :dot and :none styles.

    (rove:run-test 't1 :style :dot)
    ....
    
    ✓ 1 test completed
    T
    (rove:run-test 't1 :style :none)
    T
    

    On to a failing test. In this case we pass a diagnostic string on to the first two assertions. Rove does not allow variables to be passed to the diagnostic string. In the :spec style, Rove will show the parameters were that were provided to the second assertion that failed.

    (deftest t2
      (testing "Basic failing test"
        (let ((x 1) (y 2))
          (ok (equal 1 2) "we know 1 is not equal to 2")
          (ok (equal x y) "we know ~a is not equal to ~a")
          (ok (equal (values 1 2) (values 1 2))))))
    T2
    ROVE> (run-test 't2)
    t2
      Basic failing test
        × 0) we know 1 is not equal to 2
        × 1) we know ~a is not equal to ~a
        ✓ Expect (EQUAL (VALUES 1 2) (VALUES 1 2)) to be true.
    
    × 1 of 1 test failed
    
    0) t2
         › Basic failing test
       we know 1 is not equal to 2
         (EQUAL 1 2)
    
    1) t2
         › Basic failing test
       we know ~a is not equal to ~a
         (EQUAL X Y)
             X = 1
             Y = 2
    

    Rove does not require you to manually recompile a test after a tested function has been modified.

  3. Edge Cases: Values Expressions, loops. closures and calling other tests
    1. Values Expressions

      Unlike Prove (or Lift or Parachute) Rove has no special functionality for dealing with values expressions. It accepts values expressions but only compares the first value in each. Thus the following passes:

      (deftest t2-values-expressions
          (testing "values expressions"
            (ok (equalp (values 1 2) (values 1 3)))
            (ok  (equalp (values 1 2 3) (values 1 3 2 7)))))
      
    2. Looping and closures

      Rove has no problem looping through assertions pulling the variables from a closure.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop
          (loop for x in l1 for y in l2 do
            (ok (= (char-code x) y)))))
      
    3. Tests calling tests

      Rove tests can call other Rove tests. As with most frameworks, this results in two test results rather than a combined test result.

  4. Conditions

    We saw in the first basic passing test Rove checking an error condition. I want to show what happens when an error condition fails because the result is different depending on whether the assertion function is ok or ng. It does not throw you into the debugger because we have *debug-on-error* set to nil, but shows the typical debugger output.

    (deftest t7-wrong-error
      (ok (signals (error 'floating-point-overflow)
              'division-by-zero)))
    
    (rove:run-test 't7-wrong-error)
    t7-wrong-error
      × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO. (3333ms)
    
    × 1 of 1 test failed
    
    0) t7-wrong-error
       Expect (ERROR 'FLOATING-POINT-OVERFLOW) to signal DIVISION-BY-ZERO.
       FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled
         (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO)
    
         1: ((FLET "H0" :IN #:DROP-THRU-TAG-2) arithmetic error FLOATING-POINT-OVERFLOW signalled)
         2: (SB-KERNEL::%SIGNAL arithmetic error FLOATING-POINT-OVERFLOW signalled)
         3: (ERROR FLOATING-POINT-OVERFLOW)
         4: ((LABELS ROVE/CORE/ASSERTION::MAIN :IN #:DROP-THRU-TAG-2))
         5: ((FLET "MAIN0" :IN #:DROP-THRU-TAG-2))
         6: ((LAMBDA NIL))
         7: ((LAMBDA NIL :IN RUN-TEST))
         8: ((:METHOD ROVE/REPORTER:INVOKE-REPORTER (T T)) #<SPEC-REPORTER PASSED=0, FAILED=1> #<FUNCTION (LAMBDA NIL :IN RUN-TEST) {102D9E34FB}>)
         9: (SB-INT:SIMPLE-EVAL-IN-LEXENV (RUN-TEST (QUOTE T7-WRONG-ERROR)) #<NULL-LEXENV>)
         10: (EVAL (RUN-TEST (QUOTE T7-WRONG-ERROR)))
         11: (SWANK::EVAL-REGION (rove:run-test 't7-wrong-error)
         )
         12: ((LAMBDA NIL :IN SWANK-REPL::REPL-EVAL))
         13: (SWANK-REPL::TRACK-PACKAGE #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E317B}>)
         14: (SWANK::CALL-WITH-RETRY-RESTART Retry SLIME REPL evaluation request. #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E311B}>)
         15: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<FUNCTION (LAMBDA NIL :IN SWANK-REPL::REPL-EVAL) {102D9E30FB}>)
    

    I am surprised, however, at the results if we change the assertion from ok to ng. We know it is going to be the wrong error, so I would have expected the ng assertion function to return a pass. But it does not.

    (deftest t7-wrong-error-NG
      (ng (signals (error 'floating-point-overflow)
              'division-by-zero)))
    
      × 0) Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO.
    
    × 1 of 1 test failed
    
    0) t7-wrong-error-ng
       Expect (ERROR 'FLOATING-POINT-OVERFLOW) not to signal DIVISION-BY-ZERO.
       FLOATING-POINT-OVERFLOW: arithmetic error FLOATING-POINT-OVERFLOW signalled
         (SIGNALS (ERROR 'FLOATING-POINT-OVERFLOW) 'DIVISION-BY-ZERO)
    ...
    

    signals returns true or an error. ng accepts T or nil and getting back an error triggers an error rather than a failure.

  5. Suites, tags and other multiple test abilities
    1. Lists of tests

      Rove does not run lists of tests.

    2. Suites

      Rove's RUN-SUITE function will run all the tests in a particular package but does not accept a style parameter and simply prints out the results of each individual test, without summarizing.

      Rove's RUN function does accept a style parameter but seems to handle only package-inferred systems. I confirm issue #42 that it will not run with non-package inferred systems.

      Since the author really likes the lots of packages style of structuring CL programs, I would not be surprised if he recommends having lots of test packages as the equivalent of how other testing frameworks treat suites of tests.

      (run-suite :tf-rove)
      

      top

  6. Fixtures and Freezing Data

    Rove provides SETUP for fixtures that are done once only in a package and TEARDOWN for cleanup. For a fixture that should be run before and after every test, Rove provides DEFHOOK.

    (defparameter *my-var-suite* 0)
    (defparameter *my-var-hook* 0)
    (setup
      (incf *my-var-suite*))
    
    (teardown
      (format t "Myvar ~a~%" *my-var-suite*))
    
    (defhook
        :before (incf *my-var-hook*)
        :after (format t "My-var-hook ~a~%" *my-var-hook*))
    
  7. Removing tests

    None apparently

  8. Sequencing, Random and Failure Only

    Everything is just done in sequential order. There is no obvious way to collect and run just failed tests.

  9. Skip Capability
    1. Assertions

      Yes

    2. Tests

      No

    3. Implementation

      No

  10. Random Data Generators

    None

28.4 Additional Discussion Points

The author claims rove is the successor to prove and cites the following differences. Rove supports package-inferred systems, has fewer dependencies, reports details of failure tests, has thread support and has fixtures.

Rove is clearly targetted at package-inferred systems. In fact some of the functionality doesn't work unless your system is package-inferred. Personally I do not like package-inferred systems. Other people have the completely opposite view. In any event I did not test any of the frameworks with a package inferred system so I cannot comment on whether they work or do not work in that circumstance.

To show that Rove actually is improved over Prove with respect to reporting details on failure, the following shows first prove, then rove on a simple

(let ((x 1) (y 2))
  (deftest t35
      (ok (= x y))))

Running with Prove

(run-test 't35)
T34-PROVE
× NIL is expected to be T

Now Rove:

  (run-test 't35)
  t35
  × 0) Expect (= X Y) to be true.

× 1 of 1 test failed

0) t34-rove
   Expect (= X Y) to be true.
     (= X Y)
         X = 1
         Y = 2
NIL

Both prove and rove would have accepted diagnostic message strings in the assertion.

On the whole, my concerns expressed in the summary still stand. There are better frameworks out there.

28.5 Who Uses

Many libraries on quicklisp use rove. If you have quicklisp, you can get a list of those with:

(ql:who-depends-on :rove)

top

29 rt

29.1 Summary

  Kevin M. Rosenberg MIT 2010

RT reminds me of Ptester (I wonder why) and is a part of CL history. See, e.g. Supporting the Regression Testing of Lisp Programs in 1991. Tests are limited to a single assertion and everything seems to be an A-B comparison using EQUAL. While you might think it is just of historical significance, there are still a surprising number of packages in quicklisp (29 at last count) that use it including major packages like ironclad, cffi, usocket, clsql and anaphora.

29.2 Assertion Functions

RT's tests do not accept multiple assertions. The test itself acts as an assertion that the included form is true or nil.

top

29.3 Usage

  1. Report Format and Basics

    We start with a basic passing test just to show the reporting.

    (deftest t1
      (= 1 1)
      t)
    
    (do-test 't1)
    T1
    ; processing (DEFTEST T4 ...)
    

    Now a deliberately failing test:

    (deftest t1-fail
      (= 1 2)
      t)
    
    (do-test 't1-fail)
    
    Test T1-FAIL failed
    Form: (= 1 2)
    Expected value: T
    Actual value: NIL.
    
  2. Multiple assertions, loops. closures and calling other tests

    RT tests do not handle multiple assertions, loops, closures or calling other tests

  3. Suites, tags and other multiple test abilities
    1. Lists of tests

      RT cannot directly handle lists of tests (although you could loop through list, the results would not be composable)

    2. Suites

      RT does not have suites per se. You can run all the tests that have been defined using the DO-TESTS function. By default it prints to *standard-output* but accepts an optional stream parameter which would allow you to redirect the results to a file or other stream of your choice. do-tests will print the results for each individual test and then summarize with something like the following:

      5 out of 8 total tests failed: T4, T1-FAIL, T1-FUNCTION, T2-LOOP,
         T2-LOOP-CLOSURE.
      
  4. Fixtures and Freezing Data

    None, although rt's package which tests rt has a setup macro that could have been placed in the rt package to use for fixtures. You could use it for reference in writing your own.

  5. Removing tests

    RT has functions for rem-test and rem-all-tests.

  6. Sequencing, Random and Failure Only

    RT runs tests in their order of original definition.

  7. Skip Capability

    None

  8. Random Data Generators

    None

29.4 Discussion

While it is still used in major projects, I think Parachute or Fiasco would be better if you are starting a new project.

top

29.5 Who depends on rt?

("anaphora" "cffi" "cl-azure" "cl-cont" "cl-irc" "cl-performance-tuning-helper" "cl-photo" "cl-sentiment" "cl-store" "clsql" "cxml-stp/" "hyperobject" "infix-dollar-reader" "ironclad" "kmrcl" "lapack" "lml" "lml2" "narrowed-types" "nibbles/s" "osicat" "petit.string-utils" "qt" "quadpack" "trivial-features" "trivial-garbage" "umlisp" "usocket" "xhtmlgen")

top

30 should-test

30.1 Summary

homepage Vsevolod Dyomkin MIT 2019

Should test is pretty basic. It will report all the failing assertions in a test and does offer the opportunity to provide diagnostic strings to assertions, albeit without variables. It does offer the opportunity to just run the tests that failed last time, so you do not have to run through all the tests in the package every time. Unfortunately you cannot turn off progress reporting, you cannot go interactive into the debugger, it has no fixture capacity and cannot run lists of tests. Its suite capability are limited to you creating separate packages.

30.2 Assertion Functions

Assertion types are the minimal

be signal print-to

top

30.3 Usage

  1. Report Format

    The summary report will contain full failure reports if *verbose* is set to T (the default) or just test names otherwise.

    There is no optionality with respect to reporting or interactive. It is all reporting.

  2. Basics

    The basic all assertions passing test showing use of both be and signal. Calling test with the keyword parameter :test enables us to specify the test to be run. One item that is not clear is function of the empty form following the test name.

    (deftest t1 ()
      (should be = 1 1)
      (should signal division-by-zero (error 'division-by-zero)))
    
    (test :test 't1)
    Test T1:   OK
    T
    

    It just reported that the entire test passed.

    Now a basic failing test. This should have two failing assertions and one passing assertion. We put a diagnostic string in the first test which shows in the result but it does not allow us to insert variables into the string.

    (deftest t1-fail ()
      "describe t1-fail"
      (let ((x 1)(y 2))
        (should be = x y "intentional failure x ~a y ~a" x y)
        (should be = (+ x 2) (+ x 3))
        (should be equal (values 1 2) (values 1 2))
        (should signal division-by-zero (error 'floating-point-overflow))))
    
    (test :test 't1-fail)
    Test T1-FAIL:
    Y FAIL
    expect: 1 2 "intentional failure x ~a y ~a" 1
    actual: 2
    (+ X 3) FAIL
    expect: 3
    actual: 4
    (ERROR 'FLOATING-POINT-OVERFLOW) FAIL
    expect: DIVISION-BY-ZERO
    actual: #<FLOATING-POINT-OVERFLOW {1009B89F63}>
      FAILED
    NIL
    (#<FLOATING-POINT-OVERFLOW {1009B89F63}> (4) (2))
    NIL
    

    Should-test has no special functionality for dealing with values expressions. It does accept them but as you would expect only looks at the first value in each values epxression. The following will pass.

    (deftest t1-unequal-values ()
      (should be equal (values 1 2) (values 1 3)))
    

    We get the expected and actual values without the extra blank lines that annoy me in fiveam. The list at the end shows the specific actual assertion values that failed.

    If we had set *verbose* to nil we would have just gotten the last three lines of the report.

    Test T1-FAIL:   FAILED
    NIL
    ((4) (2))
    NIL
    

    Should-test handles redefinitions of tested functions without forcing you to manually recompile the test. We will skip the proof.

  3. Edge Cases: Closures and calling other tests
    1. Looping and closures.

      Should-test cannot access the variables declared in a closure encompassing the test. This does not work:

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (deftest t2-loop-closure ()
          (loop for x in l1 for y in l2 do
            (should be = (char-code x) y))))
      
    2. Calling other tests

      Suppose you defined a test which also calls another test.

      (deftest t3 ()
        (should be = 1 1)
        (test :test 't1-fail))
      

      We know that t1-fail will fail. Will embedding it in test t3 cause t3 to fail as well? Yes.

      Test T3: Test T1-FAIL:
      Y FAIL
      expect: 1 2 "intentional failure x ~a y ~a" 1
      actual: 2
      (+ X 3) FAIL
      expect: 3
      actual: 4
      (ERROR 'FLOATING-POINT-OVERFLOW) FAIL
      expect: DIVISION-BY-ZERO
      actual: #<FLOATING-POINT-OVERFLOW {100A218BC3}>
        FAILED
        FAILED
      NIL
      (#<FLOATING-POINT-OVERFLOW {100A218BC3}> (4) (2))
      NIL
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Should-test does not handle lists of tests.

    2. Suites

      The test function for Should-test runs all the tests in the current package by default. As you have seen above, giving it a :test keyword parameter will trigger just the named test. Giving it a :package keyword parameter will cause it to run all the tests in the specified package. The :failed key to test will re-test only the tests which failed at their last run. All in all, there are better frameworks.

      (test :failed t)
      

      top

  5. Fixtures and Freezing Data

    None

  6. Removing tests

    None

  7. Sequencing or Random

    Should-test will run tests in the same order each time (no shuffle capability). As noted in the suite discussion, it is one of the few frameworks to have failure only functionality built in.

    (test :failed t)
    
  8. Skip Capability

    None

  9. Random Data Generators

    None

    top

30.4 Who Depends on Should-test

("cl-redis-test" "mexpr-tests" "rutils-test)

top

31 simplet

31.1 Summary

homepage Noloop GPLv3 2019

In simplet, tests can have only one assertion that they get from some function and suites can take multiple tests. From the standpoint of other frameworks, simplet "tests" are the assertion clauses and simplet "suites" are the way to package multiple assertions. If a suite has no tests, or a test has no function returning T or NIL, they are marked "PENDING".

Simplet's run function takes only an optional parameter to return a string rather than printing to the REPL.

I am just going to show one example of usage and leave it at that. Given all the functionality in other frameworks, I cannot recommend.

(suite "suite 2"
       (test "one"
         #'(lambda ()
             (let ((x 1))
               (= x 1))))
       (test "two"
         #'(lambda ()(eq 'a 'a)))
       (test "three" #'(lambda ()(= 2 1)))
       (test "four" #'(lambda ()(= 1 1))))
(#<FUNCTION (LAMBDA () :IN NOLOOP.SIMPLET::CREATE-SUITE) {100A391A6B}>)

(run)
#...Simplet...#

one: T
two: T
three: NIL
four: T
-----------------------------------
suite 2: NIL

Runner result: NIL

NIL

The author uses simplet in testing assert-p, eventbus and skeleton-creator

32 tap-unit-test

32.1 Summary

homepage Christopher K. Riesbeck, John Hanley MIT 2017

Tap-unit-test is a version of a slightly older version of lisp-unit with TAP reporting. There have not been any real updates since 2011 and I cannot find anyone using it, so I would simply look to either lisp-unit or lisp-unit2 if you like their approach to things.

32.2 Assertion Functions

assert-eq assert-eql assert-equal assert-equality
assert-equalp assert-error assert-expands assert-false
assert-prints assert-true fail logically-equal
set-equal unordered-equal    

top

32.3 Usage

  1. Report Format and basic syntax

    TAP-unit-test defaults to a reporting format shown below. You can do (setf *use-debugger* :ask) or (setf *use-debugger* t), but that will only throw you into the debugger if there is an actual error generated, not a failure (or failure to see the correct error).

    We can start with a basic failing test to show the reporting format. We will provide a diagnostic string in the first assertion. Tap-unit-test has an unordered-equal assertion helper that might be useful for some which is shown in this example:

    (define-test t1-fail
      "describe t1-fail"
      (let ((x 1))
        (assert-true (= x 2) "Deliberate failure. We know 2 is not ~a" x)
        (assert-equal x 3)
        (assert-true (unordered-equal '(3 2 1 1) '(1 2 3 2))) ; Return true if l1 is a permuation of l2.
        (assert-true (set-equal '(a b c d) '(b a c c))) ;every element in both sets needs to be in the other
        (assert-error 'division-by-zero
                      (error 'floating-point-overflow)
                      "testing condition assertions")
        (assert-true (unordered-equal '(3 2 1) '(1 3 4)))
        (assert-true (logically-equal t nil)) ; both true or both false
        (assert-true (logically-equal nil t)))) ; both true or both false
    

    Unlike lisp-unit, when you call run-tests in tap-unit-test, you call unquoted test names, even when you are running it on several tests. Also note that it does not return any type of object as a test result. If we now run it we get the following report:

    (run-tests t1-fail)
    
    T1-FAIL: (= X 2) failed:
    Expected T but saw NIL
       "Deliberate failure. We know 2 is not ~a" => "Deliberate failure. We know 2 is not ~a"
       X => 1
    T1-FAIL: 3 failed:
    Expected 1 but saw 3
    T1-FAIL: (UNORDERED-EQUAL '(3 2 1 1) '(1 2 3 2)) failed:
    Expected T but saw NIL
    T1-FAIL: (SET-EQUAL '(A B C D) '(B A C C)) failed:
    Expected T but saw NIL
    T1-FAIL: (ERROR 'FLOATING-POINT-OVERFLOW) failed:
    Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100D41D203}>
       "testing condition assertions" => "testing condition assertions"
    T1-FAIL: (UNORDERED-EQUAL '(3 2 1) '(1 3 4)) failed:
    Expected T but saw NIL
    T1-FAIL: (LOGICALLY-EQUAL T NIL) failed:
    Expected T but saw NIL
    T1-FAIL: (LOGICALLY-EQUAL NIL T) failed:
    Expected T but saw NIL
    T1-FAIL: 0 assertions passed, 8 failed.
    NIL
    

    Tap-unit test does not need to manually recompile tests when a tested function is modified. We will skip the proof.

    1. Edge Cases: Value expressions, closures and calling other tests
      1. Values expressions

        Tap-unit-test has no special functionality for dealing with values expressions. It does accept them as input, but as expected, only compares the first value in a values expression.

        (define-test t2
          "describe t2"
          (assert-equal 1 2)
          (assert-equal 2 3)
          (assert-equalp (values 1 2) (values 1 2)))
        
        (run-tests t2)
        

        We get what we expected, two failing assertions and one passing assertion. Does tap-unit-test follow the lisp-unit ability to actually look at all members of the values expression or just the first one? Yes. So far only the two lisp-units and tap-unit-test actually compared each item in two values expressions.

        (define-test t2-values-expressions
            (assert-equal (values 1 2) (values 1 3))
            (assert-equal (values 1 2 3) (values 1 3 2)))
        
        (run-tests t2-values-expressions)
        T2-VALUES-EXPRESSIONS: (VALUES 1 3) failed:
        Expected 1; 2 but saw 1; 3
        T2-VALUES-EXPRESSIONS: (VALUES 1 3 2) failed:
        Expected 1; 2; 3 but saw 1; 3; 2
        T2-VALUES-EXPRESSIONS: 0 assertions passed, 2 failed.
        
      2. Closures

        Unfortunately no luck with closure variables. It does, however, handle looping through assertions if the variables are dynamic or defined within the test. We will skip the proof.

    2. Calling another test

      While tests are not functions in tap-unit-test, they can call other tests.

      (define-test t3 ()
        "describe t3"
        (assert-equal 'a 'a)
      
      (run-tests t2))
      (run-tests t3)
      T2: 2 failed:
      Expected 1 but saw 2
      T2: 3 failed:
      Expected 2 but saw 3
      T2: 1 assertions passed, 2 failed.
      T3: 1 assertions passed, 0 failed.
      
  2. Suites, tags and other multiple test abilities
    1. Lists of tests

      As mentioned earlier, tap-unit-test uses unquoted test names and does not return any kind of test-results object. Running multiple specific tests would look like the following.

      (run-tests t7-bad-error t1-fail)
      T7-BAD-ERROR: (ERROR 'FLOATING-POINT-OVERFLOW) failed:
      Should have signalled DIVISION-BY-ZERO but saw #<FLOATING-POINT-OVERFLOW {100BEDBFA3}>
         "testing condition assertions. This should fail" => "testing condition assertions. This should fail"
      T7-BAD-ERROR: 0 assertions passed, 1 failed.
      T1-FAIL: Y failed:
      Expected 1 but saw 2
      12
      T1-FAIL: 1 assertions passed, 1 failed.
      TOTAL: 1 assertions passed, 2 failed, 0 execution errors.
      
    2. Packages

      If you want to run all the tests in a package, just call run-tests with no parameters

    3. Suites and Tags

      Tap-unit-test has no suites or tags capability

  3. Fixtures and Freezing Data

    None

  4. Removing tests

    Tap-unit-test has a remove-tests function which actually does take a quoted list of test names unlike some of the other functions which use unquoted names.

    (remove-tests '(t1 t2))
    
  5. Sequencing, Random and Failure Only

    None

  6. Skip Capability
  7. Generators

    Tap-unit-test has a make-random-state function for generating random data. See example below:

    (make-random-state)
    #S(RANDOM-STATE :STATE #.(MAKE-ARRAY 627 :ELEMENT-TYPE '(UNSIGNED-BYTE 32)
                                         :INITIAL-CONTENTS
                                         '(0 2567483615 454 2531281407 4203062579
                                           3352536227 284404050 622556438
                                           ...)))
    

32.4 Discussion

Basically lisp-unit and lisp-unit2 have moved on and tap-unit-test exists for historical reasons. There are enough syntactic differences that if someone is using it for an existing code base, pulling it out of quicklisp could be a breakage. No one is using it As far as I can tell.

top

33 unit-test

33.1 Summary

homepage Manuel Odendahl, Alain Picard MIT 2012

Again, another framework that does the basics. It will report all assertions that failed in a test. It will do a progress report on the tests, not the assertions which cannot be turned off. It does allow you to provide diagnostic strings to assertions to help in debugging, but does not allow you to pass in any variables. It has no interactivity option, so you cannot just hop into the debugger on a test failure. It has no fixture capacity, but it does have suites.

33.2 Assertion Functions

test-assert test-condition test-equal

33.3 Usage

  1. Report Format

    Everything a return of a list of test-result objects. There is no provision for dropping into the debugger. run-test has an optional parameter for output that sends output by defult to *debug-io*

  2. Basics

    Unit-test has a limited vocabulary for test functions. The deftest macro will create an instance of a unit-test class with the first parameter being the unit name (used to group related tests) and the second parameter being the name of the test itself.

    (deftest :test "t1"
      (let ((x 1))
        (test-assert (=  x 1))
        (test-equal "a" "a" )
        (test-condition
         (/ 1 (- x 1))
         'division-by-zero)))
    

    In this case we used a string "t1" as the name of the test. We could have used a symbol 't1 or a keyword :t1. Unfortunately the run-test method is only defined for unit tests, which makes calling a single test a little clumsy. We have to call get-test-by-name and pass that to run-test. In your test I would assume that you would add another method to handle however you write your test names. We will continue to use get-test-by-name as a reminder.

    (run-test (get-test-by-name "t1"))
    (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL>
     #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL>
     #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
    

    Not the most exciting report in the world. But lets take a look at a failing test. We can put a diagnostic string into test-assert, but not into =test-equal. The equality test for test-equal is equal, but you can change that using a keyword parameter as shown below

    (deftest :test "t1-fail"
      (let ((x 1))
        (test-assert (= x 2) "we know that X (1) does not equal 2")
        (test-equal "a" 'a :test #'eq )
        (test-condition
         (/ 1 (- x 1))
         'floating-point-overflow)))
    
    (run-test (get-test-by-name "t1-fail"))
    (#<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: CRASH REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: 'A STATUS: FAIL REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (= X
                                                      2) STATUS: FAIL REASON: we know that X (1) does not equal 2>
                         #<TEST-EQUAL-RESULT FORM: (VALUES 1 3 4 5) STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (/ 1 (- X 1)) STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: a STATUS: PASS REASON: NIL>
                         #<TEST-EQUAL-RESULT FORM: (= X 1) STATUS: PASS REASON: NIL>)
    

    The first thing I notice is that the list of results also includes the list of results from when we ran test t1. Everything just gets pushed to a non-exported list *unit-test-results*. So if you want to just see the results for the next test you are going to run, you need to run some cleanup.

    T1-fail generated three results, so again a little clumnsy on ensuring you see all the results from the test. Let's set *unit-test-results* to nil after every test so we can this clean.

    There is an exported variable =*unit-test-debug*, but looking at the source code, it does not appear to be actually used for anything, leaving it open for your to write your own code using it as a flag.

    If a test calls a function that is later modified, the test does not need to be recompiled to check the tested function correctly. We will skip the proof.

  3. Edge Cases: Value expressions, closures and calling other tests
    1. Value expressions

      Like most of the frameworks, unit-test will test a values expression by only checking the first value. We will skip the proof.

    2. Looping and closures

      Unit-test provided a little bit of a surprise here. If you run a test where the assertion is inside a loop, the test-result object will be pushed to unit-test::*unit-test-results*, but that list will not be printed to the REPL. You just get NIL in the REPL. We will skip the proof.

      Unit-test had no problem testing functions that use variables provided in a closure. We will skip the proof.

      Tests can call other tests, but there is no composition, just another test result.

      (deftest :test "t3"
        (test-assert (= 1 1))
        (run-test (get-test-by-name "t1")))
      
  4. Suites, tags and other multiple test abilities
    1. Lists of tests

      Unit-test has no provision to handle lists of tests although you could write a method on run-test that would do so.

    2. Suites

      Looking at the examples above, we gave all the tests the unit name of ":test". This is essentially the suite named :test. If we call run-all-tests, all the tests would be run. If we had given tests different unit names, we could run all the tests with those names by passing the keyword parameter :unit to run-all-tests

      (run-all-tests :unit :some-unit-name)
      

      top

  5. Fixtures and Freezing Data

    From the source code:

    ;;;;   For more complex tests requiring fancy setting up and tearing down
    ;;;;   (as well as reclamation of resources in case a test fails), users are expected
    ;;;;   to create a subclass of the unit-test class using the DEFINE-TEST-CLASS macro.
    ;;;;   The syntax is meant to be reminiscent of CLOS, e.g
    ;;;;
    ;;;;   (define-test-class my-test-class
    ;;;;     ((my-slot-1 :initarg :foo ...)
    ;;;;      (my-slot-2 (any valid CLOS slot options)
    ;;;;      ....))
    ;;;;   After this, the methods
    ;;;;   (defgeneric run-test :before
    ;;;;               ((test my-test-class) &key (output *debug-io*))   and
    ;;;;   (defgeneric run-test :after
    ;;;;               ((test my-test-class) &key (output *debug-io*))   may be
    ;;;;   specialized to perform the required actions, possibly accessing the
    ;;;;   my-slot-1's, etc.
    ;;;;
    ;;;;   The test form is protected by a handler case.  Care should be taken
    ;;;;   than any run-test specialization also be protected not to crash.
    
  6. Removing tests

    None

  7. Sequencing, Random and Failure Only

    I do not see any capability to shuffle test order or to run only the tests that have previously failed.

  8. Skip Capability
    1. Assertions
    2. Tests
    3. Implementation
  9. Random Data Generators

    None

33.4 Discussion

Compared to other frameworks it feels a little clumsy and basic. I would top

33.5 Who Depends on Unit-Test?

It is used by cl-fad and several of the bknr programs.

34 xlunit

34.1 Summary

homepage Kevin RosenBerg BSD 2015

Xlunit stops at the first failure in a test, so you only get partial failure reporting (joining lift and kaputt in this regard). That, in and of itself would cause me to look elsewhere. Phil Gold's original concern was that while you can create hierarchies of test suites, they are not composable.

34.2 Assertion Functions

assert-condition assert-eql assert-equal
assert-false assert-not-eql assert-true

top

34.3 Usage

I find the terminology of xlunit to be confusing after getting used to other frameworks.

Xlunit requires that you create a class for a test-case or suite. Every "test" is then a named test-method on that class. def-test-method adds a test to the suite. The class can, of course, have slots for variables that any test in the suite can use.

Test-methods can have multiple assertions and can be applied to either a test-case or a test-suite. The macro get-suite applies to either test-case or a test-suite classes and creates an instance of that class.

I notice that cambl-test (one of the libraries that uses xlunit) wraps a define-test macro around def-test-method to make this feel more natural. That version is here:

(defclass amount-test-case (test-case)
  ()
  (:documentation "test-case for CAMBL amounts"))

(defmacro define-test (name &rest body-forms)
  `(def-test-method ,name ((test amount-test-case) :run nil)
     ,@body-forms))
  1. Report Format

    Xlunit reports a single dot for a test with at least a passing assertion, an F for failure and E for errors. In testing suites, xlunit will provide one dot per test, the time it took to run the suite and, if everything is successful, OK with a count of the tests and a count of the assertions.

  2. Basics

    We will create a test-case named tf-xlunit that we can use to attach tests, each of whichcan have multiple assertions. The form immediately after the test method name takes both the class to which it applies and whether to run the method immediately upon compilation. The libraries using xlunit seem to define all methods with :run nil.

    (defclass tf-xlunit (xlunit:test-case) ())
    
    (def-test-method t1 ((test tf-xlunit) :run nil)
      (assert-equal "a" "a")
      (assert-condition 'division-by-zero (error 'division-by-zero))
      (assert-false (= 1 2))
      (assert-eql 'a 'a)
      (assert-not-eql 'a 'b)
      (assert-true (= 1 1)))
    

    Unfortunately you need to run all the tests methods applicable to a test-case or suite at once. For clarity purposes, we create a separate test-case class for each method so that we do not get burdened with results from other methods. Effectively a test-case can be viewed the same as other frameworks think of suites.

    The reporting is a bit underwhelming. Even more so as we get to failures.

    (xlunit:textui-test-run (xlunit:get-suite tf-xlunit))
    .
    Time: 0.0
    
    OK (1 tests)
    #<TEST-RESULTS {10016CD973}>
    

    All the assertions passed in order for the entire test to pass.

    Now a test that should have six assertion failures. This time we are going to put a diagnostic message intothe first assertion. Xlunit does not provide the ability to insert variables into the diagnostic message or provide trailing variables.

    I am going to create a new suite so that we just see the results for this test and will continue to do that until we get to the suites discussion.

    (defclass tf-xlunit-t1-fail (xlunit:test-case) ())
    
    (def-test-method t1-fail ((test tf-xlunit-t1-fail) :run nil)
      (assert-equal "a" "b" "Deliberate failure on our part")
      (assert-condition 'division-by-zero (error 'floating-point-overflow))
      (assert-false (= 1 1))
      (assert-eql 'a 'b)
      (assert-not-eql 'a 'a)
      (assert-true (= 1 2)))
    

    And now, the failure report:

    (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail))
    .F
    Time: 0.0
    
    There was 1 failure:
    1) T1-FAIL: Assert equal: "a" "b"
     Deliberate failure on our part
    
    FAILURES!!!
    Run: 1   Failures: 1   Errors: 0
    #<TEST-RESULTS {1003FF1353}>
    

    Yes, the test failed. Unfortunately it only reported the first assertion failure, not all of them. No, I do not know why a dot appeared before the failure indicator. I was really hoping for it to tell me all the different assertion failures.

  3. Edge Cases: Value Expressions, closures and calling other tests
    1. Value expressions

      XLunit has no special functionality for dealing with values expressions. Like most of the frameworks, xlunit will check values expressions but only look at the first value.

    2. Closures.

      Xlunit has no problem dealing with variables from closures. We will skip the proof.

    3. Calling tests from inside tests

      As with several frameworks, xlunit allows a test to call another test, but there is no composition - you get two separate reports.

        (defclass tf-xlunit-t3 (xlunit:test-case) ())
      
        (def-test-method t3 ((test tf-xlunit-t3) :run nil);
          (assert-equal 1 1)
          (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t1-fail)))
      
      (xlunit:textui-test-run (xlunit:get-suite tf-xlunit-t3))
      ..F
      Time: 0.0
      
      There was 1 failure:
      1) T1-FAIL: Assert equal: "a" "b"
         Deliberate failure on our part
      
      FAILURES!!!
      Run: 1   Failures: 1   Errors: 0
      Time: 0.003333
      
      OK (1 tests)
      #<TEST-RESULTS {100B2823A3}>
      
  4. Suites, fixtures and other multiple test abilities
    1. Lists of tests

      I did not see a way to run only a subset of the methods applicable to a test-case.

    2. Suites and fixtures

      I am going to cheat here combine the discussion of fixtures and suites together and use an example from the source code. Here we create a test-case named math-test-case with two additional slots for numbera and numberb. Before any tests are run, there is a set-up method which initialises those slots. We then add three test methods having a single assertion earch (one slightly modified from the source code).

      (defclass math-test-case (test-case)
        ((numbera :accessor numbera)
         (numberb :accessor numberb))
        (:documentation "Test test-case for math testing"))
      
      (defmethod set-up ((tcase math-test-case))
        (setf (numbera tcase) 2)
        (setf (numberb tcase) 3))
      
      (def-test-method test-addition ((test math-test-case) :run nil)
        (let ((result1 (+ (numbera test) (numberb test)))
              (result2 (+ 1 (numbera test) (numberb test))))
          (assert-true (= result1 5))
          (assert-true (= result2 6))))
      
      (def-test-method test-subtraction ((test math-test-case) :run nil)
        (let ((result (- (numberb test) (numbera test))))
          (assert-equal result 1)))
      
         ;;; This method is meant to signal a failure
      (def-test-method test-subtraction-2 ((test math-test-case) :run nil)
        (let ((result (- (numbera test) (numberb test))))
          (assert-equal result 1 "This is meant to failure")))
      

      Now we run all the methods applicable to math-test-case classes.

        (xlunit:textui-test-run (xlunit:get-suite math-test-case))
      .F..
      Time: 0.0
      
      There was 1 failure:
      1) TEST-SUBTRACTION-2: Assert equal: -1 1
         This is meant to failure
      
      FAILURES!!!
      Run: 3   Failures: 1   Errors: 0
      #<TEST-RESULTS {10051EF373}>
      

      As we can see, while there are four assertions in total, the report shows 3, meaning the number of methods run. If we looked at the internal details of the test-results instance returned at the end, it would show a count of 3 as well.

      top

  5. Removing tests

    Xlunit has a remove-test function

  6. Sequencing, Random and Failure Only

    Everything is sequential. There are no provisions for collecting and re-running only failed tests.

  7. Skip Capability

    None

  8. Random Data Generators

    None

34.4 Discussion

Phil Gold's 2007 review essentially concluded that xlunit feels clunky and lacks composition. I see no reason to differ from his conclusion.

34.5 Who Depends on XLUnit?

cambl, cl-heap (no longer maintained) and cl-marshal

top

35 xptest

35.1 Summary

No homepage Craig Brozensky Public Domain 2015

XPtest is very old and it does the basics. It will report all the failed assertions and provides the ability to generate failure reports with diagnostic strings. It does not provide an interactive session (no debugger, just the report). It also does not seem to provide any signal testing, so you would have to write your own condition handlers. Overall it just feels clumsy. It was not tested in the original Phil Gold blogging note.

35.2 Assertion Functions

None - It just relies on CL predicates.

35.3 Usage

Xptest is very simple. You create a test-suite and a fixture. Tests are methods of the fixture and you then add them to the test-suite. You use regular CL predicates in your test and trigger a failure function if they are not true.

  1. Report Format and Basic Operation
    ;; A test fixture and a suite get defined up front
    (def-test-fixture tf-xptest-fixture () ())
    
    (defparameter *tf-xptest-suite* (make-test-suite "tf-xptest-suite" "test framework demonstration"))
    
    (defmethod t1 ((test tf-xptest-fixture))
      (let ((x 1) (y 'a))
      (unless (equal 1 x)
        (failure "t1.1 failed"))
      (unless (eq 'a y)
        (failure "t1.2 failed"))))
    
    (add-test (make-test-case "t1" 'tf-xptest-fixture :test-thunk 't1) *tf-xptest-suite*)
    
    (defmethod t1-fail ((test tf-xptest-fixture))
      (let ((x 1) (y 'a))
      (unless (equal 2 x)
        (failure "t1-fail.1 failed"))
      (unless (eq 'b y)
        (failure "t1-fail.2 failed"))))
    
    (add-test (make-test-case "t1-fail" 'tf-xptest-fixture :test-thunk 't1-fail) *tf-xptest-suite*)
    

    You can use run-test on the test-suite. That will return a list of test-result objects, but that is not terribly useful. Digging into those objects will give you start and stop times, the test-fixture, a test-failure condition (if it failed) or a error condition if something else bad happened. Slightly more useful is running report-result on the list returned from run-test, but it only reports which tests passed and which tests failed.

    (run-test *tf-xptest-suite*)
    (#<TEST-RESULT {1002F88803}> #<TEST-RESULT {1002F88C93}>)
    
    (report-result (run-test *tf-xptest-suite*))
    Test t1 Passed
    Test t1-fail Failed
    

    There is a keyword parameter option of :verbose, but if you try to use it, it generates a format control error in the xptest source code that I am not going to try to debug.

    Xptest properly picks up changes in tested functions without having to manually recompile tests.

  2. Multiple assertions, loops. closures and calling other tests
    1. Multiple assertions and value expressions

      Xptest relies on CL for predicates and assertions, so you have to build your own multiple assertion test and decide how you would handle value expressions.

    2. Closures.

      Xptest has no problem with the loop inside a closure test.

      (let ((l1 '(#\a #\B #\z))
            (l2 '(97 66 122)))
        (defmethod t2-loop ((test tf-xptest-fixture))
          (loop for x in l1 for y in l2 do
            (unless (equal (char-code x) y)
              (failure "t2-loop")))))
      
      (add-test (make-test-case "t2-loop" 'tf-xptest-fixture :test-thunk 't2-loop) *tf-xptest-suite*)
      
      (report-result (run-test *tf-xptest-suite*))
      
  3. Conditions

    You would have to write your own condition handlers.

  4. Suites and fixtures

    I am going to cheat here again and show the example from the source code.

    (defparameter *math-test-suite* nil)
    
      (def-test-fixture math-fixture ()
        ((numbera
          :accessor numbera)
         (numberb
          :accessor numberb))
        (:documentation "Test fixture for math testing"))
    
      (defmethod setup ((fix math-fixture))
        (setf (numbera fix) 2)
        (setf (numberb fix) 3))
    
      (defmethod teardown ((fix math-fixture))
        t)
    
      (defmethod addition-test ((test math-fixture))
        (let ((result (+ (numbera test) (numberb test))))
          (unless (= result 5)
            (failure "Result was not 5 when adding ~A and ~A"
                     (numbera test) (numberb test)))))
    
      (defmethod subtraction-test ((test math-fixture))
        (let ((result (- (numberb test) (numbera test))))
          (unless (= result 1)
            (failure "Result was not 1 when subtracting ~A ~A"
                     (numberb test) (numbera test)))))
    
          ;;; This method is meant to signal a failure
      (defmethod subtraction-test2 ((test math-fixture))
        (let ((result (- (numbera test) (numberb test))))
          (unless (= result 1)
            (failure "Result was not 1 when subtracting ~A ~A"
                     (numbera test) (numberb test)))))
    
      (setf *math-test-suite* (make-test-suite
                             "Math Test Suite"
                             "Simple test suite for arithmetic operators."
                             ("Addition Test" 'math-fixture
                                              :test-thunk 'addition-test
                                              :description "A simple test of the + operator")
                             ("Subtraction Test" 'math-fixture
                                                 :test-thunk 'subtraction-test
                                                 :description "A simple test of the - operator")))
    
      (add-test (make-test-case "Substraction Test 2" 'math-fixture
                                :test-thunk 'subtraction-test2
                                :description "A broken substraction test, should fail.")
                *math-test-suite*)
    
      (report-result (run-test *math-test-suite*))
    

    top

  5. Removing tests

    Xptest has a remove-test function

  6. Sequencing, Random and Failure Only

    Sequential only

  7. Skip Capability

    None

  8. Random Data Generators

    None

    top

35.4 Discussion

I do not see anything here that would really make me consider it.

35.5 Who Depends on xptest?

Nothing in quicklisp. No idea about the wider world.

36 Helper Libraries

36.1 assert-p

  1. Summary
    homepage Noloop GPL3 2020

    This is a library to help build your own assertions and is built on assertion-error by the same author (see below). The only library currently using it is Cacau.

    I was really hoping for more here. Consider the following code from the library:

    (defun not-equalp-p (actual expected)
      "Check actual not equalp expected"
      (assertion (not (equalp actual expected)) actual expected 'not-equalp))
    

    Seven of the test frameworks described above provide assertions that accept diagnostic messages and pass variables to those diagnostic messages. Another eight provide assertions that accept diagnostic messages but without variables. Compared to those, this seems really elementary. I will leave it to writers of testing frameworks as to whether it is worthwhile, but from my perspective, it does not add anything useful to the forest of CL testing.

    top

36.2 assertion-error

  1. Summary
    homepage Noloop GPL3 2019

    This is a library to build your own assertion-error conditions. It does depend on dissect. The only library currently using it is cacau.

    The entire source code is:

    (define-condition assertion-error (error)
      ((assertion-error-message :initarg :assertion-error-message :reader assertion-error-message)
       (assertion-error-result :initarg :assertion-error-result :reader assertion-error-result)
       (assertion-error-actual :initarg :assertion-error-actual :reader assertion-error-actual)
       (assertion-error-expected :initarg :assertion-error-expected :reader assertion-error-expected)
       (assertion-error-stack :initarg :assertion-error-stack :reader assertion-error-stack)))
    
    (defun get-stack-trace ()
      (stack))
    

    I will leave it to writers of testing frameworks as to whether it is worthwhile.

    top

36.3 check-it

top

  1. Summary
    homepage Kyle Littler LLGPL 2015

    Check-it is the opposite of a mock and stub library which provides known values. Check-it provides randomized input values based on properties of the input. Some testing frameworks provide random value generators, but this is more complete, so use this with your favorite test framework. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.

  2. Usage
    1. General Usage

      The general usage is to call generate on a generator given a specific type with optional specifications. The following examples use optional lower and upper bounds.

      (check-it:generate
       (check-it:generator (integer -3 10)))
      6
      
      (check-it:generate
       (check-it:generator (character #\a #\k)))
      #\f
      
      (let ((gen-i (check-it:generator (list (integer -10 10)
                                             :min-length 3
                                             :max-length 10))))
        (check-it:generate gen-i))
      (5 0 8 -2 9)
      
      (check-it:generate (check-it:generator (string :min-length 3 :max-length 10)))
      "Uw76ZV"
      
    2. Values must meet a predicate

      You can ensure that values meet a specific predicate. The generator will keep trying until that predicate is met. In the following example we want a character between #\a and #\f but not #\c.

      (check-it:generate
       (check-it:generator
        (check-it:guard (lambda (x) (not (eql x #\c))) (character #\a #\f))))
      #\e
      
    3. Or Generator

      The OR generator takes subgenerators and randomly chooses one. For example:

      (let ((gen-num (check-it:generator (or (integer) (real)))))
        (loop for x from 1 to 5 collect
                                (check-it:generate gen-num)))
      (7 6.685932 6 -9 9)
      
    4. Struct Generator

      If you have a struct that has default constructor functions, you can use a struct generator to build out the slots.

      (check-it:generate
       (check-it:generator
        (check-it:struct b-struct :slot-1 (integer) :slot-2 (string) :slot-3 (real))))
      #S(B-STRUCT :SLOT-1 2 :SLOT-2 "iE4qZ5U00oOs" :SLOT-3 5.9885387)
      

      For more fun and games you can do with this library, see https://github.com/DalekBaldwin/check-it

      top

36.4 cl-fuzz

  1. Summary
    homepage Neil T. Dantam BSD 2 Clause 2018

    Cl-fuzz is another random data generating library. To use it you need to define a function to generate random data, then generate a function to perform some tests, then pass both to fuzz:run-tests (not perform-tests as the readme states). To be honest, I do not think there is much utility here compared to the frameworks we have looked at plus check-it and cl-quickcheck.

    top

36.5 cl-quickcheck

  1. Summary
    homepage Andrew Pennebaker MIT 2020

    Cl-quickcheck focuses on "property based tests". In other words, write tests use random inputs matching some specification, apply a operation to the data and assert something about that result. Cl-quickcheck is effectively an assertion library with the ability to generate different types of inputs. If you look at packages in quicklisp which use it, Burgled-Batteries uses it in conjunction with Lift; Test-utils uses it in conjunction with Prove and only json-streams uses it on its own. As such I decided to put it in the Helpers section rather than in the frameworks section.

    Cl-quickcheck has somewhat more functionality than check-it in that it does have assertions. I still think the generators are the real raison d'être for both these libraries. See helper-generators for a functional comparison of the generators between check-it and cl-quicklisp.

  2. Assertions
    is is= isnt isnt= should-signal
  3. Report Format

    Cl-quickcheck follows the typical pattern of . for a passing test. Instead of an f, it prints X for failures.

    To jump immediately into the debugger rather than a report format, set *break-on-failure* to t.

    To eliminate progress reports, set *loud* to nil.

  4. Usage

    The number of iterations of a test using a generator is set by *num-trials* which starts with a default value of 100.

    To take a silly example, the following is a test that asserts that any integer multiplied by two will equal the integer plus itself and we will set *num-trials* 20. Thus n will be set to a random integer generated by an-integer and the assertion will be run 20 times with a new n generated each time.

    (setf *num-trials* 20)
    (for-all ((n an-integer))
             (is= (* 2 n) (+ n n)))
    ....................
    

    If we modify that to be always wrong, we only get a single resulting X

    (for-all ((n an-integer))
             (is= (* 2 n) (+ n n 1)))
    X
    

    The following are all passing assertions.

    (is= 1 1)
    (is = 1 1 1)
    (should-signal 'division-by-zero (error 'floating-point-overflow))
    (for-all ((n an-integer))
                        (is= (* 2 n) (+ n n)))
    
  5. Miscellaneous Comments

    The first thing I had to learn looking at cl-quickcheck was that a-boolean, a-real, an-index, an-integer, k-generator, m-generator and n-generator are functions in a variable (funcallable) but a-char, a-list, a-member, a-string, a-symbol and a-tuple were functions of their own. The difference in how they can called is confusing, at least for me.

    top

36.6 hamcrest

top

  1. Summary
    homepage Alexander Artemenko New BSD 2020

    Hamcrest's idea is to use pattern matching to make unit tests more readable.

  2. Usage

    top

36.7 mockingbird

top

  1. Summary
    homepage Christopher Eames MIT 2017

    Stubs and Mocks are used to ensure constant values are returned instead of computed values for use in testing.

  2. Usage

    Assume two functions for this usage demonstration:

    (defun foo (x) x)
    (defun bar (x y) (+ x (foo x)))
    

    The WITH-STUBS macro provides lexical scoping for calling functions with guaranteed results.

      (with-stubs ((foo 10))
            (foo 1))
      10
    
    (with-stubs ((foo 10))
          (bar 3))
    6
    

    As an example of how this would look used in a testing framework, the following uses parachute and mb is the nickname for mockingbird.

    (define-test mockingbird-1
                    (mb:with-stubs ((foo 10))
                        (is = (bar 3) 6)))
    MOCKINGBIRD-1
    (test 'mockingbird-1)
            ? TF-PARACHUTE::MOCKINGBIRD-1
      0.003 ✔   (is = (bar 3) 6)
      0.007 ✔ TF-PARACHUTE::MOCKINGBIRD-1
    
    ;; Summary:
    Passed:     1
    Failed:     0
    Skipped:    0
    #<PLAIN 2, PASSED results>
    

    The WITH-DYNAMIC-STUBS macro provides dynamic scoping for calling functions with guaranteed results.

    (with-dynamic-stubs ((foo 10))
          (bar 3))
    13
    

    The WITH-MOCKS macro provides lexical scoping for calling functions but ensuring they return nil.

    (with-mocks (foo)
      (foo 5))
    NIL
    (with-mocks (foo)
      (bar 5))
    10
    

    top

36.8 portch

  1. Summary
    homepage Nick Allen BSD 3 Clause 2009

    Portch helps organize tests written with Franz's portable ptester library. I will leave discussion of this library to users of ptester.

    top

36.9 protest

top

  1. Summary
    homepage Michał Herda LLGPL 2020

    Protest is a wrapper around other testing libraries, currently 1am and parachute. It wraps around test assertions and, in case of failure, informs the user of details of the failed test step. Other useful reading would be The concept of a protocol by Robert Strandh.

  2. Usage
  3. Discussion

    top

36.10 rtch

  1. Summary
    download David Thompson LLGPL 2008

    Rtch helps organize RT tests based on their position in a directory hierarchy. I will leave it to users of rt as to whether it is helpful. Note that the link is to a sourceforge download tar file rather than a homepage.

    top

36.11 testbild

  1. Summary
    homepage Alexander Kahl GPLv3 2010

    Testbild is an older library focused on a set of CLOS classes which can be used as a common interface for the output of test results. I will leave it to the writers of test frameworks as to whether incorporating these classes is useful for them.

    top

36.12 test-utils

  1. Summary
    homepage Leo Zovic MIT 2020

    Test-utils provides convenience functions and macros for prove and cl-quickcheck. It adds new generators to cl-quickcheck such as a-ratio, a-number, a-keyword, an-atom, a-pair, a-vector and a-hash.

    It also has QUIET-CHECK which runs a cl-quickcheck suite but only sends to *standard-output* on failure.

    It provides additional generators for:

    • a-ratio
    • a-number
    • a-keyword
    • an-atom
    • a-pair
    • a-vector
    • a-hash
    • a-value
    • a-alist
    • a-plist
    • an-improper-list
    • an-array

    top

37 Test Coverage Tools

top

37.1 sb-cover

The following is a sample sequence running sb-cover on the package you want to test

(require :sb-cover)
;; now you need to tell sbcl to instrument what it is about to load
(declaim (optimize sb-cover:store-coverage-data))
(asdf:oos 'asdf:load-op :your-package-name-here :force t)

;; Now run your tests. (run-all-tests 'blah-blah-blah-package)

(sb-cover:report "path-to-directory-for-the-coverage-htmlpages" :form-mode :car)

;; now restore sbcl to its normal state
(declaim (optimize (sb-cover:store-coverage-data 0)))
;; to restore

The last line turns off the instrumentation after the report has been generated. The sb-cover:report line should have generated one or more html pages, starting with a page named cover-index.html in the specified directory directory which shows:

  • expression
  • branch

on a file by file basis in the your package. Now the html pages will also print out the source file, color coded showing not executed expressions and, where the expression might have conditionals or branches, whether each of those conditional points or branches were actually triggered in the test. E.g.

(defun foo (x)
  (if (evenp x) 1 2))

If the test only ran with (foo some-even-number) and not (foo some-odd-number), that fact would be highlighted.)

sb-cover can be enabled globally. (eval '(declaim (optimize sb-cover:store-coverage-data)))

Per pfdietz: "The problem I have with sb-cover is that is can screw up when the readtable is changed. It needs to somehow record readtable information to properly annotate source files."

top

37.2 ccl code coverage

I have not used this tool.

(setf ccl:*compile-code-coverage* t)

Comment: when ccl:*compile-code-coverage* was set to t, compiling ironclad triggered an error:

[package ironclad]……. > Error: The value (&LAP . 0) is not of the expected type VAR. > While executing: DECOMP-VAR, in process listener(1).

top

37.3 cover

I have not tried cover.

top

38 Appendix

38.1 Problem Space

Testing covers a lot of ground, there are unit tests, regression tests, test driven development, etc. Testing often runs on an automated basis, but CL being CL, it can be part of an interactive development process. Some people write their unit tests first, then develop to pass the tests (test driven development). Testing is also not error checking.

Ideally a testing framework should make it as easy as possible to write tests, cover different inputs and produce a report showing what passed and failed. If you are writing a library rather than an application, it can be useful to recognize that your test suites are a client to your library's API (and if you find it hard to write the tests, think about how a client user will feel).

Assuming the source is available, the tests should be part of your user documentation in showing how to use the library and an ideal testing framework should make it easy for users to see the tests as examples.

I have seen reasoned arguments that unit tests should only cover exported functions, generally on the grounds that this implicitly tests the internal functions and any additional testing is just adding technical debt. My response is typically, fine, so long as the tests on the exported function can show how it failed. If it depends on 100 internal functions, can you trace back to the real point of failure? If testing is a defense against change, then testing code that has no reason to change does not add value to your test suite - until, of course, you refactor and suddenly it does. By the way, for those who are concerned about static typing, unit tests do not replace static typing unless you actually test by really throwing different inputs at the function being tested.

38.2 Terminology

Different frameworks address different problem sets but before I discuss the problem space, I want to get some terminology out of the way first.

  1. Testing Types
    • Integration testing deals with how units of software interact.
    • Mutation Testing - Pfdietz brought the concept of mutation testing to my attention. This concept targets testing your test suite by inserting errors into programs and measuring the ability of the test suite to detect them.
    • Property based testing (PBT) makes statements about the output of your code based on the input. The statements are tested by feeding random data to tests that are focused on the stated properties. E.g. in testing an addition function, adding zero to thenumber should result in the same number and changing the order of the inputs should result in the same number. Similarly, a function that reverses a list should always result in (a) a list and (b) the first element of the result was the last element of the input, etc. This obviously requires more thinking about each test. In some respects, what this gets you thinking about is what constitutes valid input and edge cases, and then you need to write generators to randomly generate input that meets (or fails to meet) those criteria. PBT is not a replacement for what I will call result testing, it is an additional testing strategy. cl-quickcheck and nst provide property based testing.
    • Regression tests verify that software that has already been written is still correct after it is changed or combined with other software. In other words, it worked yesterday, does it still work after a bug fix, after refactoring or after another system has been connected. Interactive development from the repl does not address this problem.
    • Unit testing deals with a separate software system or subsystem. (I am not interested in arguing how small the unit needs to be. I leave that up to the TDD missionaries and the TDD haters.) Unit testing can be a part of regression testing - regression tests are often built on suites of unit tests. You might have multiple tests for each function and a suite of tests for every function in a file. As I use the term "unit test", I am talking about how much code is covered, not whether the unit tests are "property based tests" or result testing.
  2. Other Terms
    • Assertions - the types of equality tests available. "Assert-eq", "Assert-equal" and "Assert-true" are typical. Some packages provide assertions that have descriptive messages to help debug failures. Some packages (e.g. Kaputt) provide built-in assertions test float comparisions. Some packages allow you to define your own assertions.
    • Code coverage apparently means different things to different people. I have seen test suites that cover every function, but only with a single simple expected input and 100% code coverage victory has been declared. That is barely a hand wave. As one person has said, that checks that your code is right, but does not check that your code is not wrong. Of course, there are trivial bits of code where it is pointless to try to think about possible different inputs to test.
    • Fixtures (sometimes referred to as contexts)- Fixtures create a temporary environment with a known data set used for the tests. They may be static variables, constructed database tables, etc. Typically there is a setup and teardown process to ensure that the testing environment is in a known state.
    • Mocks - Mocking is a variation on Fixtures. While fixtures are intended to create a known data collection to test against, mocking is intended to eliminate external dependencies in code and create known faux code which can be used as an input (sometimes called stubs) or compared after a test is run to see if there are expected or unexpected side effects.
    • Parametrization means running the same test body with different input each time. You can do this either by running a test against a collection of test data or you can do it within a single test by running the test body against a list of forms or test data. How you do it will depend on which way will make it easier to determine what test and what parameters triggered the failure.
    • Refactoring typically requires rewriting unit tests for everything that was touched, then re-running test suites to ensure that everything still works together.
    • Reporting - a failing test should generate a usable bug report. Do we know the input, output, function involved, expected result and what we thought we were testing for? Note that what we thought we are testing for is not the same as the expected result.
    • Shuffle Testing - Randomly changing the sequence in which tests are applied.
    • TAP - TAP (the Test Anything Protocol) is a text based interface between testing modules, decoupling reporting of errors from the presentation of reports. In other words, you can write a TAP consumer which takes TAP spec output from the test harness and the TAP consumer would be responsible for generating user friendly reports (or do other things like compare the tests against your own list of functions to generate your own code coverage report). Development on the SPEC seems to have ceased in 2017. There was a hackernews discussion in JUne 2020 which can be found here.
    • Verification means that your code contains every bug in the specification.
    • Validation means that it is doing the right thing. This may or may not be possible to automate. I do not envy front end developers or designers dealing with clients.
    • TAP Output - Test Anything Protocol is a formally specified output, considered by some to be a superior alternative to xUnit type testing. Depending on the output mechanisms, TAP can be easy to read but difficult to parse.

    top

  3. Discussion

    In general, each test has three parts - the setup, the action and the validation. Does the testing framework make it easy to see each of those segments when reading the test or reading any report coming out of the test?

    Similarly, when the tests are run and a test fails, is it obvious from the report what happened or do you need to start a debugging session with limited information? It is one thing for a test to report failure, another thing to report what was expected compared to what was generated, and still a much better result to indicate that the correct value was in (aref array-name 2) instead of the expected (aref array-name 0) - the context of the failure.

    Does the test framework allow long enough names for tests and hierarchies (or accept comments) to give meaningful reports?

    How easy is it to run parameterized tests - the test logic is the same, but you run different parameters through the same tests and expect different results?

    top