Anonymous sums from scratch
Quick links
Intro
Haskell has strong support for sum types, but has no built-in notion of anonymous sum types. We must explicitly name the sum type and uniquely label each of its branches:
data Result e a = Error e | Success a
base
provides a handful of common sums too:
data Bool = False | True
data Maybe a = Nothing | Just a
data Either a b = Left a | Right b
Anonymous sums lift the requirement on type naming and branch labeling. Instead, we just express the different possible types the sum type wraps. Some hypothetical syntax:
eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()
The +
operator here delimits the possible types. A value of type Fruit + Candy
will either be a Fruit
or a Candy
. Likewise, a value of type [Meat + Grain + Vegetable]
is a list where each element is either a Meat
, a Grain
,
or a Vegetable
.
If we were to build these types as regular sums, they might look something like this:
data SnackItem = SnackItemFruit Fruit | SnackItemCandy Candy
data MealItem
= MealItemMeat Meat
| MealItemGrain Grain
| MealItemVegetable Vegetable
There are plenty of existing libraries already offering anonymous sums, but the tricks the libraries use - often either in the name of flexibility or performance - can obscure understanding the core concepts. This post aims to demystify anonymous sums by building them from scratch using as simple of an encoding as possible. The goal for the code is to be educational: ease of understanding is valued above everything else here. Performance is not a goal. In the main portion of the post, we’ll build out anonymous sum support in just under 25 lines of Haskell.
Finding an encoding
Before we take a crack at implementing anonymous sums, we should spend some time
thinking about how we might encode them at a high level. We need a means for
supporting arbitrarily many branches and a means for performing some
approximation of case analysis on each individual branch. We’re after simplicity
here, so it’d be great if we can reuse whatever code we can from base
.
To that end, let’s turn our attention back to Either
:
data Either a b = Left a | Right b
This is certainly a useful type when we have two branches. We could even make
the argument (ignoring ergonomics) that types like Maybe a
and Bool
are
redundant in light of Either
:
type Bool = Either () ()
type Maybe a = Either () a
We are free to plug in whatever types we want for the Either
type
constructor’s parameters, which means we can also nest more Either
s:
type MySum = Either Int (Either String Char)
While not the friendliest type in town, MySum
expresses a new sum
type with three branches:
branchInt :: MySum
= Left 42
branchInt
branchString :: MySum
= Right $ Left "abc"
branchString
branchChar :: MySum
= Right $ Right 'a'
branchChar
foo :: MySum -> String
= \case
foo Left x -> show x
Right (Left s) -> s
Right (Right c) -> [c]
We can keep nesting Either
s in this way to have sum types with arbitrarily
many branches. The glaring pain point though is that the ergonomics are very
rough - both in construction and in case analysis.
We could take a cue from this old haskell-cafe message and improve the ergonomics a bit with some operators:
type (:|:) a b = Either a b
(???) :: (a -> c) -> (b -> c) -> Either a b -> c
???) = either
(
-- Note that this input type has the nested Either in the left branch rather
-- than the right, i.e. Either (Either Int String) Char.
bar :: Int :|: String :|: Char -> String
=
bar -> show x) ???
(\x -> s) ???
(\s -> [c]) (\c
This gets us a very rough approximation of anonymous sums: we can support
arbitrarily many branches and can perform case analysis on those branches. The
unfortunate bit is that the nested Either
representation is directly exposed
to the user.
This representation will serve as great inspiration for our own implementation though, and hopefully it already should feel like anonymous sums in Haskell are well within reach. What remains should be a relatively light amount of type-level elbow grease.
Elbow grease
If we consider a type like Either Int (Either String Char)
, we can draw a
parallel between nesting that Either String Char
and recursion. We can think
about the problem as if we’re starting from a list of the possible types in our
sum - [Int, String, Char]
. The first possible type is Int
, so we slot that
type in Either
’s first type parameter, producing Either Int hole
. We only
have one other type parameter to work with (hole
), but we have two additional
possible types in our sum: String
and Char
. Rather than slotting in both
those types in one fell swoop via Either String Char
, we can recurse one step
at a time. With that in mind, the next thing we slot in would be a Either String <hole>
, giving us Either Int (Either String hole)
. We just have one
more possible type in our sum - Char
- so we use Char
to plug the last hole,
giving us Either Int (Either String Char)
.
Given that Haskell has support for recursive type functions, we can fairly
directly capture the algorithm that nests Either
’s for us automatically:
type family SSumRep (as :: [Type]) :: Type where
SSumRep (a ': as) = Either a (SSumRep as)
SSumRep
computes the representation of our simple sum encoding by walking its
input list of types. The type at the head of the list goes in the left branch,
and the nesting for the types in the tail of the list is computed in the right
branch.
This computation is incomplete though: we need to handle the terminating case where the input list of types is empty:
type family SSumRep (as :: [Type]) :: Type where
SSumRep '[] = ... what to put here? ...
SSumRep (a ': as) = Either a (SSumRep as)
What type should we use for the deepest-nested Either
’s right branch? We could
use ()
or really any type we want, but as we’ll see soon, the guardrails we’ll
define will prevent the user at compile-time from ever creating or matching on
this branch in the first place. If that branch will never exist as a value at
runtime, the perfect type for that slot is Void
:
type family SSumRep (as :: [Type]) :: Type where
SSumRep '[] = Void
SSumRep (a ': as) = Either a (SSumRep as)
We can test our type function in GHCi:
λ> :kind! SSumRep '[]
SSumRep '[] :: *
= Void
λ> :kind! SSumRep '[Int]
SSumRep '[Int] :: *
= Either Int Void
λ> :kind! SSumRep '[Int, String]
SSumRep '[Int, String] :: *
= Either Int (Either [Char] Void)
λ> :kind! SSumRep '[Int, String, Char]
SSumRep '[Int, String, Char] :: *
= Either Int (Either [Char] (Either Char Void))
Now we can hide away the representation in a newtype:
type SSum :: [Type] -> Type
newtype SSum as = SSum { unSSum :: SSumRep as }
A benefit of using the Either
representation under the hood is that we can
newtype-derive the instances we want:
deriving newtype instance (Eq (SSumRep as)) => Eq (SSum as)
deriving newtype instance (Show (SSumRep as)) => Show (SSum as)
Note that newtype-deriving the Show
instance exposes the Either
representation, but with simplicity in mind for this post, we’ll ignore this.
Deriving a custom Show
instance is left as an exercise for the reader.
Building
We have our representation, and we have it hidden away in a newtype. Next we
need a way to build SSum
values. We want something with a type signature like
this:
build :: a -> SSum as
But we need a means of expressing that type a
is in the list of types as
, so
we’re missing a constraint. We’ll create that constraint by walking the list of
types via typeclass instances, but first we’ll need to introduce the typeclass:
type BuildSSum :: Type -> [Type] -> Constraint
class BuildSSum a as where
build' :: Proxy as -> a -> SSumRep as
build'
aligns fairly closely with build
above. The key differences are 1)
there’s a Proxy as
argument and 2) it operates directly over the sum representation
rather than the newtype wrapper. The Proxy as
argument is a bit of noise, but
it lets us sidestep turning on -XAllowAmbigousTypes
. It can otherwise be
ignored.
For our instances, we’ll start with the case where the input list of types is empty:
instance (TypeError ('Text "BuildSSum: Impossible!")) => BuildSSum a '[] where
= error "Unreachable" build'
Remember that SSumRep '[]
computes the type Void
and there’s no way to
produce a value of type Void
(ignoring undefined
/error
), so there’s
nothing meaningful we can (or want to) do in this branch at the value level. We
use TypeError
from GHC.TypeLits
as an instance constraint such that the user
gets a compile-time error if they try to build an SSum '[]
value.
The next instance we need covers the case when type a
matches the type at the
head of list as
:
instance BuildSSum a (a ': as) where
= Left x build' _proxy x
If a
and the head of input list of types match, then we know that we’re at the
right spot “positionally” in our nested Either
representation to inject the
value of type a
. This idea becomes clearer when the last instance is in place.
This one handles the recursion - both at the type-level by walking the input
list of types and the value level by walking the nested representation:
instance {-# OVERLAPPABLE #-} (BuildSSum a as) => BuildSSum a (a' ': as) where
= Right $ build' (Proxy @as) x build' _proxy x
The key point of this instance is that the type of the value we want to inject
(a
) does not match the head (a'
) of the input list, so we’re not at the
right spot in the representation yet. We continue the recursion, but wrapped up
in a Right
. Another important point is the instance constraint, which is what
allows the recursion in the first place.
The last piece is to implement our build
function. This would be the function
we’d expose to users:
build :: forall a as. (BuildSSum a as) => a -> SSum as
= SSum $ build' (Proxy @as) x build x
Let’s gain some confidence it’s doing the right thing (remember that the Show
instance exposes the representation, but that’s a positive here in GHCi for
testing’s sake):
λ> build @Int 42 :: SSum '[Int, String, Char]
Left 42
λ> build "abc" :: SSum '[Int, String, Char]
Right (Left "abc")
λ> build 'a' :: SSum '[Int, String, Char]
Right (Right (Left 'a'))
λ> build 'a' :: SSum '[]
<interactive>:12:1: error:
• BuildSSum: Impossible!
• In the expression: build @String "abc" :: SSum '[]
In an equation for ‘it’: it = build @String "abc" :: SSum '[]
Case analysis
Now that we can construct anonymous sum values, next up is destructing them via (an approximation of) case analysis. We’ll shoot for a function with a type along the lines of this:
match :: SSum as -> Maybe a
This function says that if the input sum is the branch containing the value of
type a
, then it’ll give us that value. Otherwise, it’ll give us Nothing
.
Like before, we lack a constraint to relate as
and a
. The new typeclass and
instances we’ll need should look fairly familiar to BuildSSum
(again, the
proxies are largely noise to sidestep turning on -XAllowAmbigousTypes
):
type MatchSSum :: Type -> [Type] -> Constraint
class MatchSSum a as where
match' :: Proxy as -> Proxy a -> SSumRep as -> Maybe a
instance (TypeError ('Text "MatchSSum: Impossible!")) => MatchSSum a '[] where
= error "Unreachable"
match' _proxy1 _proxy2 _rep
instance MatchSSum a (a ': as) where
= either Just (const Nothing)
match' _proxy1 _proxy2
instance {-# OVERLAPPABLE #-} (MatchSSum a as) => MatchSSum a (a' ': as) where
= either (const Nothing) (match' (Proxy @as) proxy2) match' _proxy1 proxy2
The MatchSSum a '[]
instance is for the case where the input list of types is
empty. We once again outlaw this case at compile-time.
The MatchSSum a (a ': as)
instance captures when we have a type match between
our input type a
and the head of the input list of types as
. So long as
users do not subvert the use of build
when creating values by constructing the
representation manually, there’s no way for the types to match and the
representation to not be Left x :: Either a (SSumRep as)
This is why it’s fine
to ignore the possibility of a right branch via const Nothing
.
Similiarly, in the MatchSSum a (a' ': as)
instance, we have a type mismatch
between the input type a
and the head of the input list of types. We know
our representation can’t be a left branch here, so we ignore it via const Nothing
and continue the recursion down the right branch.
We now have the appropriate relation for match
’s type variables:
match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a
SSum rep) = match' (Proxy @as) (Proxy @a) rep match (
Let’s try it out in GHCi:
λ> let ssum = build @Int 42 :: SSum '[Int, String, Char]
ssum :: SSum '[Int, String, Char]
λ> match @Int ssum
Just 42
it :: Maybe Int
λ> match @String ssum
Nothing
it :: Maybe String
λ> match @Char ssum
Nothing
it :: Maybe Char
What about casing over every branch? While the machinery we’ve defined does not
support exhaustive matching, we can still get a rough approximation of full case
analysis using functions straight from base
:
casingDemo :: SSum '[Int, String, Char] -> IO ()
=
casingDemo ssum $ asum
fold @Int ssum >>= \x ->
[ match Just $ print $ 2 * x
@String ssum >>= \s ->
, match Just $ print s
@Char ssum >>= \c ->
, match Just $ print $ replicate 5 c
]
asum
here has type [Maybe (IO ())] -> Maybe (IO ())
and selects the first
successful match in the list. fold
here is Maybe (IO ()) -> IO ()
. The
resulting IO ()
is a is a no-op if there was no match, otherwise, the branch’s
action is performed. The compiler does not enforce that all branches are
handled, but in the pursuit of keeping things simple, having some syntax
stitched together from existing functions is a reasonable compromise.
We now have the pieces we need to declare victory: we have a representation for anonymous sums hidden away from the user, we can build sum values, and match on those values. The implementation clocks in at 24 lines of Haskell, not counting the import preamble, the newtype-deriving lines, and the 3 kind signatures included for documentation’s sake. Counting everything, including whitespace, it comes in at just under 50 lines.
The complete code is available in the repo at commit 0568906f. The rest of this post will explore the implementation a bit further.
Limitations
A limitation of our approach can be seen when the user legitimately wants duplicate types in their sum:
λ> let ssum = build "abc" :: SSum '[String, String]
ssum :: SSum '[String, String]
A type like SSum '[String, String]
is similar to Either String String
. The
user may wish to assign convention-based significance to the branches in their
sum, but with our approach, a value of type SSum '[String, String]
will have a
String
in the first slot and never the second:
λ> match @String ssum
Just "abc"
it :: Maybe String
λ> ssum
Left "abc"
it :: SSum '[String, String]
This may at first seem like a legitimate limitation, but we can solve for this
by tagging the contituent types. Const a b
is poly-kinded, so we can stick a
type of any kind in the b
slot, including type-level strings:
λ> :{
let ssum = build (Const @String @"Right" "abc")
:: SSum '[Const String "Left", Const String "Right"]
:}
ssum :: SSum '[Const String "Left", Const String "Right"]
λ> match @(Const String "Right") ssum
Just (Const "abc")
it :: Maybe (Const String "Right")
Another apparent limitation is that there’s no way for the user to express that
a branch is just the branch and no value. We can achieve something effectively
equivalent by using a similar means of tagging via Proxy
:
λ> let ssum = build (Proxy @"True") :: SSum '[Proxy "False", Proxy "True"]
ssum :: SSum '[Proxy "False", Proxy "True"]
λ> match @(Proxy "True") ssum
Just Proxy
it :: Maybe (Proxy "True")
These tagging approaches aren’t the most pleasant things to work with, but they are in line with keeping our implementation as simple and concise as possible.
Exhaustive matching
The matching support in our implementation lets the user get at a possible value of a particular type “one type at a time”:
match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a
There’s no compiler enforcement that the user checks every branch in the sum. There are likely a few different ways we could implement exhaustive matching. Here’s one approach that should look familiar to the work we’ve done up to this point:
ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r
SSum rep) matchers = ematch' (Proxy @as) (Proxy @r) matchers rep
ematch (
type EMatchSSum :: Type -> [Type] -> Constraint
class EMatchSSum r as where
ematch' :: Proxy as -> Proxy r -> Matchers r as -> SSumRep as -> r
instance EMatchSSum r '[] where
= f rep
ematch' _proxy1 _proxy2 f rep
instance (EMatchSSum r as) => EMatchSSum r (a' ': as) where
= either f (ematch' (Proxy @as) proxy2 rest)
ematch' _proxy1 proxy2 (f, rest)
type family Matchers (r :: Type) (as :: [Type]) :: Type where
Matchers r '[] = Void -> r
Matchers r (a ': as) = (a -> r, Matchers r as)
The truly new bit here is the Matchers
type function. It’s easiest to
understand this function by testing it (output formatted for readability):
λ> :kind! Matchers String '[Int, String, Char]
Matchers String '[Int, String, Char] :: *
= ( Int -> String
, ( String -> String
, ( Char -> String
, Void -> String
)
)
)
We already drew a direct parallel between Either
and our anonymous sum type,
and in Matchers
we now draw a parallel betwen (,)
and an anonymous product.
Matchers
is given a final return type and an input list of types, and computes
an anonymous product via nesting (,)
where the first element of every pair is
a function type from the specific type in our anonymous sum to the return type.
If the user hands us a value of type Matchers r as
and a value of type Sum as
, we have everything we need to produce a value of type r
.
Usage looks like this:
exhaustiveDemo :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo ssum
ematch ssum-> show x
( \x -> s
, ( \s -> [c]
, ( \c
, absurd
)
) )
This version of the code is available at commit e412c964. This syntax will be improved upon in the next section.
Syntax improvements
Early in the post, a hypothetical syntax for expressing the type of an anonymous
sum was shown using (+)
:
eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()
We can add support for this syntax using another type function:
type family (+) (a :: Type) (as :: Type) :: Type where
+) a (SSum as) = SSum (a ': as)
(+) a b = SSum '[a, b]
(infixr +
Now we can use whatever style we prefer: Int + String + Char
and SSum '[Int, String, Char]
are equivalent types.
Separately, we can clean up the syntax for exhaustive matching seen in the previous section:
exhaustiveDemo :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo ssum
ematch ssum-> show x
( \x -> s
, ( \s -> [c]
, ( \c
, absurd
)
) )
The tuple nesting here is fairly brutal. We won’t worry about hiding away this
anonymous-product-as-tuples representation as this post has gotten long enough
already, but we will make a more usable syntax. We’ll start by adding a synonym
for (,)
that we can apply infix (borrowing the name that PureScript uses):
/\) :: a -> b -> (a, b)
(/\) = (,)
(infixr /\
Next we’ll slightly tweak ematch
’s signature. It currently is this:
ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r
We’ll change it so that the anonymous sum value is handed to us as the first element of a tuple, and the anonymous product of matchers is handed to us in the second element:
ematch :: forall r as. (EMatchSSum r as) => (SSum as, Matchers r as) -> r
With these two pieces, we now have an (arguably) much more aesthetically-pleasing means to do exhaustive matching:
exhaustiveDemo2 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo2 ssum $ ssum
ematch /\ (\x -> show x)
/\ (\s -> s)
/\ (\c -> [c])
/\ absurd
This version of the code is available at commit 3abcaab1.
The computations done in each match are very simple here, but they’ve been kept
as lambdas for readability. If you’re like me and the parentheses get under the
skin a bit, we can (ab)use do
notation to get rid of them:
exhaustiveDemo3 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo3 ssum $ ssum
ematch /\ do
-> show x
\x /\ do
-> s
\s /\ do
-> [c]
\c /\ absurd
If the lambdas also come off as noisy, we can always (ab)use the MonadReader r ((->) r)
instance:
exhaustiveDemo4 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo4 ssum $ ssum
ematch /\ do
show
asks /\ do
ask/\ do
pure
asks /\ absurd
It’s probably worth considering a means of getting rid of the need for the user
to spell out absurd
to “close off” their anonymous product of matcher
functions, but that’s left as an exercise for the reader.
End
Thanks for reading! I hope this post helps demystify anonymous sums. As hinted at earlier in the post, I don’t recommend using the exact representation demonstrated here in production: what’s shown in this post is meant purely for education. There are many libraries on Hackage adding support for anonymous sums that would be better candidates. Each library makes tradeoffs in the design space, so I recommend doing your own evaluation if you wind up needing to pull in an anonymous sum in prod.