Last weekend fate flew me to Sydney for AusHac2011. Ever since starting on my Haskell Tour presentation, I’ve been a little Haskell obsessed. AusHac is just the most recent symptom. In addition to seeking out others with the same condition, I learned how to use QuickCheck for verifying that types implement type class laws. In particular, I checked that Bool and Int both properly implement the Semiring class.
On the way to Sydney, I started reading A Play on Regular Expressions:
Cody, Hazel, and Theo, two experienced Haskell programmers and an expert in automata theory, develop an elegant Haskell program for matching regular expressions: (i) the program is purely functional; (ii) it is overloaded over arbitrary semirings, which not only allows to solve the ordinary matching problem but also supports other applications like computing leftmost longest matchings or the number of matchings, all with a single algorithm; (iii) it is more powerful than other matchers, as it can be used for parsing every context-free language by taking advantage of laziness.
In the first scene, our players make a full-string yes/no regular expression matcher. Then Theo recommends generalizing to arbitrary semirings. The Bool semiring gives us yes/no matching. The Int semiring counts the number of matches. Other instances do other useful things.
What is a semiring? A semiring is a type equipped with addition and multiplication, zero and one:
class Semiring s where
zero, one :: s
(.+.), (.*.) :: s -> s -> s
A semiring is more than a bag of methods, the methods should obey some laws:
Zero is the additive identity:
zero .+. x == x
x .+. zero == x
One is the multiplicative identity:
one .*. x == x
x .*. one == x
Zero annihilates multiplication:
zero .*. x == zero
x .*. zero == zero
Addition is associative:
(x .+. y) .+. z == x .+. (y .+. z)
Multiplication is associative:
(x .*. y) .*. z == x .*. (y .*. z)
Addition is commutative:
x .+. y == y .+. x
Multiplication distributes over addition:
x .*. (y .+. z) == (x .*. y) .+. (x .*. z)
(x .+. y) .*. z == (x .*. z) .+. (y .*. z)
Some familiar types are Semirings:
instance Semiring Bool where
zero = False
one = True
(.+.) = (||)
(.*.) = (&&)
instance Semiring Int where
zero = 0
one = 1
(.+.) = (+)
(.*.) = (*)
This bad semiring instance breaks the law:
data Criminal = Bad | Worse
deriving (Eq, Show)
instance Semiring Criminal where
zero = Bad
one = Bad
(.+.) = (.*.)
Bad .*. v = v
v .*. Bad = v
_ .*. _ = Worse
Tests should pass for Bool and Int. They should fail for Criminal
QuickCheck is a favorite, fun, and clever testing library among Haskell rascals. QuickCheck doesn’t provide the test grouping and running you expect from a full test framework. Instead QuickCheck’s virtue is that it lets you test properties using automatically generated data. QuickCheck is a perfect fit for our semiring laws:
prop_additiveIdentity_Int :: Int -> Bool
prop_additiveIdentity_Int x =
zero .+. x == x &&
x .+. zero == x
Give it a try:
ghci> quickCheck prop_additiveIdentity_Int
+++ OK, passed 100 tests.
Perfect. How does QuickCheck do it? As always with Haskell, if you want a broad idea of what a function does, you should check its type:
ghci> :t quickCheck
quickCheck :: Testable prop => prop -> IO ()
Here we see that quickCheck acts on anything Testable. What’s Testable? (Some output omitted for clarity.)
ghci> :info Testable
class Testable prop where property :: prop -> Property
instance Testable Bool
instance (Arbitrary a, Show a, Testable prop) =>
Testable (a -> prop)
A boolean is testable as is any boolean valued function such that the argument can be displayed (Show) and the argument can be randomly generated (Arbitrary). We need to instantiate Arbitrary for our Criminal class:
instance Arbitrary Criminal where
arbitrary = elements [Bad, Worse]
The elements combinator makes implementing Arbitrary easy. Now we can test Criminal too:
prop_additiveIdentity_Criminal :: Criminal -> Bool
prop_additiveIdentity_Criminal x =
zero .+. x == x &&
x .+. zero == x
Notice that the code for prop_additiveIdentity_Int and prop_additiveIdentity_Criminal is the same. Only the types differ. Can we generalize?
Generalizes readily:
prop_additiveIdentity :: (Semiring r, Eq r) => r -> Bool
prop_additiveIdentity x =
zero .+. x == x &&
x .+. zero == x
What happens when we quickCheck? (Formatted for clarity.)
ghci> quickCheck prop_additiveIdentity
Ambiguous type variable `r0' in the constraints:
(Arbitrary r0) arising from a use of `quickCheck'
(Semiring r0) arising from a use of `prop_additiveIdentity'
(Show r0) arising from a use of `quickCheck'
(Eq r0) arising from a use of `prop_additiveIdentity'
Probable fix: add a type signature that fixes these type variable(s)
Haskell can’t tell which instance we want to check. We need to tell it:
ghci> quickCheck (prop_additiveIdentity :: Int -> Bool)
+++ OK, passed 100 tests.
Okay.
Consider another one of our laws:
prop_annihilation :: (Semiring r, Eq r) => r -> Bool
prop_annihilation x =
zero .*. x == zero &&
x .*. zero == zero
This one fails for Criminal:
ghci> quickCheck (prop_annihilation :: Criminal -> Bool)
*** Failed! Falsifiable (after 1 test):
Worse
QuickCheck has a conjunction operator for combining tests:
(.&&.) :: (Testable prop1, Testable prop2) => prop1 -> prop2 -> Property
Concrete tests conjoin without trouble:
ghci> quickCheck (prop_additiveIdentity_Int .&&. prop_additiveIdentity_Criminal)
+++ OK, passed 100 tests.
But type class variables lead to ambiguity! (Clarity.)
ghci> quickCheck (prop_additiveIdentity .&&. prop_annihilation)
Ambiguous type variable `r0'
In the first argument of `(.&&.)', namely `prop_additiveIdentity'
Ambiguous type variable `r1'
In the second argument of `(.&&.)', namely `prop_annihilation'
Type variables always need to be bound. How do we supply types for these type variables? We can be explicit of course:
prop_conjoin_Int =
(prop_additiveIdentity :: Int -> Bool) .&&.
(prop_annihilation :: Int -> Bool)
Not very satisfying though. The whole point of this exercise is to pass in semiring types to a giant semiring law checker. Since types aren’t first class values, we will pass in a representative value from each semiring instance. How can we use a value to remove type ambiguity? Haskell has very clever prelude function:
asTypeOf :: a -> a -> a
v `asTypeOf` _ = v
What does asTypeOf do? It throws away its second argument. How is that useful? Notice the type. The signature forces unification of first and second arguments:
ghci> zero
Ambiguous type variable `s0' in the constraint:
(Semiring s0) arising from a use of `zero'
ghci> zero `asTypeOf` True
False
ghci> zero `asTypeOf` (1 :: Int)
0
ghci> zero `asTypeOf` Bad
Bad
Perfect. Now we can pass in the semiring type:
prop_conjoin :: (Semiring r, Arbitrary r, Show r, Eq r) =>
(r -> Bool) -> Property
prop_conjoin t =
prop_additiveIdentity `asTypeOf` t .&&.
prop_annihilation `asTypeOf` t
However, using it is a little awkward:
ghci> quickCheck (prop_conjoin ((\x -> True) :: Int -> Bool))
+++ OK, passed 100 tests.
How can we just provide a semiring instead of a function mapping the semiring to a Bool? Remember that asTypeOf ignores its typing argument. The parameter is only used for unification. The first argument can have any type so long as the return type matches:
asFunctionOf :: (a -> b) -> a -> (a -> b)
v `asFunctionOf` _ = v
prop_conjoin' :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_conjoin' t =
prop_additiveIdentity `asFunctionOf` t .&&.
prop_annihilation `asFunctionOf` t
Our new and improved conjunction is easy to use:
ghci> quickCheck (prop_conjoin' True)
+++ OK, passed 100 tests.
ghci> quickCheck (prop_conjoin' (1 :: Int))
+++ OK, passed 100 tests.
ghci> quickCheck (prop_conjoin' Bad)
*** Failed! Falsifiable (after 1 test):
Worse
Only problem is that quickCheck doesn’t tell us which of the two conjuncts failed.
Again QuickCheck has a helpful combinator:
printTestCase :: Testable prop => String -> prop -> Property
Just apply it to each conjunct:
prop_withMessage :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_withMessage t =
printTestCase "Zero is the additive identity."
(prop_additiveIdentity `asFunctionOf` t) .&&.
printTestCase "Zero annihilates multiplication."
(prop_annihilation `asFunctionOf` t)
Then quickCheck prints the message on failure:
ghci> quickCheck (prop_withMessage Bad)
*** Failed! Falsifiable (after 1 test):
Zero annihilates multiplication.
Worse
We’re ready for all the other laws.
When we combine all of our laws, the resulting expression has more noise than one would like:
isValidSemiring_noisy :: (Semiring a, Eq a, Arbitrary a, Show a) =>
a -> Property
isValidSemiring_noisy t =
printTestCase "Zero is the additive identity." ((\x ->
zero .+. x == x &&
x .+. zero == x) `asFunctionOf` t) .&&.
printTestCase "Zero annihilates multiplication." ((\x ->
one .*. x == x &&
x .*. one == x) `asFunctionOf` t) .&&.
printTestCase "Zero annihilates multiplication." ((\x ->
zero .*. x == zero &&
x .*. zero == zero) `asFunctionOf` t) .&&.
printTestCase "Addition is associative." ((\x y z ->
(x .+. y) .+. z == x .+. (y .+. z)) `asFunctionOf` t) .&&.
printTestCase "Multiplication is associative." ((\x y z ->
(x .*. y) .*. z == x .*. (y .*. z)) `asFunctionOf` t) .&&.
printTestCase "Addition is commutative." ((\x y ->
x .+. y == y .+. x) `asFunctionOf` t) .&&.
printTestCase "Multiplication distributes over addition." ((\x y z ->
x .*. (y .+. z) == (x .*. y) .+. (x .*. z) &&
(x .+. y) .*. z == (x .*. z) .+. (y .*. z)) `asFunctionOf` t)
How to refactor?
Start with a smaller expression:
prop_withMessage :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_withMessage t =
printTestCase "Zero is the additive identity."
(prop_additiveIdentity `asFunctionOf` t) .&&.
printTestCase "Zero annihilates multiplication."
(prop_annihilation `asFunctionOf` t)
The two conjuncts have an obviously similar structure. Factor it out:
prop_withMessage' :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_withMessage' t =
let
law s l = printTestCase s (l `asFunctionOf` t)
in
law "Zero is the additive identity."
prop_additiveIdentity .&&.
law "Zero annihilates multiplication."
prop_annihilation
Factor it all the way out:
law :: (Arbitrary a, Show a, Testable b) =>
String -> (a -> b) -> a -> Property
law s l t = printTestCase s (l `asFunctionOf` t)
prop_withMessage'' :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_withMessage'' t =
law "Zero is the additive identity."
prop_additiveIdentity t .&&.
law "Zero annihilates multiplication."
prop_annihilation t
Last but not least, factor out the type threading:
laws :: [a -> Property] -> a -> Property
laws [] t = error "`laws' requires at least one `law'."
laws [l] t = l t
laws (l:ls) t = l t .&&. laws ls t
prop_withMessage''' :: (Semiring r, Arbitrary r, Show r, Eq r) =>
r -> Property
prop_withMessage''' = laws [
law "Zero is the additive identity."
prop_additiveIdentity,
law "Zero annihilates multiplication."
prop_annihilation]
That’s as good as it’s going to get.
At last we have concise language for expressing type class laws:
isValidSemiring :: (Semiring a, Eq a, Arbitrary a, Show a) =>
a -> Property
isValidSemiring = laws [
law "Zero is the additive identity." $ \x ->
zero .+. x == x &&
x .+. zero == x,
law "One is the multiplicative identity." $ \x ->
one .*. x == x &&
x .*. one == x,
law "Zero annihilates multiplication." $ \x ->
zero .*. x == zero &&
x .*. zero == zero,
law "Addition is associative." $ \x y z ->
(x .+. y) .+. z == x .+. (y .+. z),
law "Multiplication is associative." $ \x y z ->
(x .*. y) .*. z == x .*. (y .*. z),
law "Addition is commutative." $ \x y ->
x .+. y == y .+. x,
law "Multiplication distributes over addition." $ \x y z ->
x .*. (y .+. z) == (x .*. y) .+. (x .*. z) &&
(x .+. y) .*. z == (x .*. z) .+. (y .*. z)]
Define properties:
prop_semiringBool = isValidSemiring True
prop_semiringInt = isValidSemiring (1 :: Int)
prop_semiringBad = expectFailure $ isValidSemiring Bad
Verify to your heart’s content.
Commentary