# Anonymous sums from scratch

## Quick links

## Intro

Haskell has strong support for sum types, but has no built-in notion of
*anonymous* sum types. We must explicitly name the sum type and uniquely label
each of its branches:

`data Result e a = Error e | Success a`

`base`

provides a handful of common sums too:

```
data Bool = False | True
data Maybe a = Nothing | Just a
data Either a b = Left a | Right b
```

Anonymous sums lift the requirement on type naming and branch labeling. Instead, we just express the different possible types the sum type wraps. Some hypothetical syntax:

```
eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()
```

The `+`

operator here delimits the possible types. A value of type `Fruit + Candy`

will either be a `Fruit`

or a `Candy`

. Likewise, a value of type `[Meat + Grain + Vegetable]`

is a list where each element is either a `Meat`

, a `Grain`

,
or a `Vegetable`

.

If we were to build these types as regular sums, they might look something like this:

```
data SnackItem = SnackItemFruit Fruit | SnackItemCandy Candy
data MealItem
= MealItemMeat Meat
| MealItemGrain Grain
| MealItemVegetable Vegetable
```

There are plenty of existing libraries already offering anonymous sums, but the tricks the libraries use - often either in the name of flexibility or performance - can obscure understanding the core concepts. This post aims to demystify anonymous sums by building them from scratch using as simple of an encoding as possible. The goal for the code is to be educational: ease of understanding is valued above everything else here. Performance is not a goal. In the main portion of the post, we’ll build out anonymous sum support in just under 25 lines of Haskell.

## Finding an encoding

Before we take a crack at implementing anonymous sums, we should spend some time
thinking about how we might encode them at a high level. We need a means for
supporting arbitrarily many branches and a means for performing some
approximation of case analysis on each individual branch. We’re after simplicity
here, so it’d be great if we can reuse whatever code we can from `base`

.

To that end, let’s turn our attention back to `Either`

:

`data Either a b = Left a | Right b`

This is certainly a useful type when we have two branches. We could even make
the argument (ignoring ergonomics) that types like `Maybe a`

and `Bool`

are
redundant in light of `Either`

:

```
type Bool = Either () ()
type Maybe a = Either () a
```

We are free to plug in whatever types we want for the `Either`

type
constructor’s parameters, which means we can also nest more `Either`

s:

`type MySum = Either Int (Either String Char)`

While not the friendliest type in town, `MySum`

expresses a new sum
type with three branches:

```
branchInt :: MySum
= Left 42
branchInt
branchString :: MySum
= Right $ Left "abc"
branchString
branchChar :: MySum
= Right $ Right 'a'
branchChar
foo :: MySum -> String
= \case
foo Left x -> show x
Right (Left s) -> s
Right (Right c) -> [c]
```

We can keep nesting `Either`

s in this way to have sum types with arbitrarily
many branches. The glaring pain point though is that the ergonomics are very
rough - both in construction and in case analysis.

We could take a cue from this old haskell-cafe message and improve the ergonomics a bit with some operators:

```
type (:|:) a b = Either a b
(???) :: (a -> c) -> (b -> c) -> Either a b -> c
???) = either
(
-- Note that this input type has the nested Either in the left branch rather
-- than the right, i.e. Either (Either Int String) Char.
bar :: Int :|: String :|: Char -> String
=
bar -> show x) ???
(\x -> s) ???
(\s -> [c]) (\c
```

This gets us a very rough approximation of anonymous sums: we can support
arbitrarily many branches and can perform case analysis on those branches. The
unfortunate bit is that the nested `Either`

representation is directly exposed
to the user.

This representation will serve as great inspiration for our own implementation though, and hopefully it already should feel like anonymous sums in Haskell are well within reach. What remains should be a relatively light amount of type-level elbow grease.

## Elbow grease

If we consider a type like `Either Int (Either String Char)`

, we can draw a
parallel between nesting that `Either String Char`

and recursion. We can think
about the problem as if we’re starting from a list of the possible types in our
sum - `[Int, String, Char]`

. The first possible type is `Int`

, so we slot that
type in `Either`

’s first type parameter, producing `Either Int hole`

. We only
have one other type parameter to work with (`hole`

), but we have two additional
possible types in our sum: `String`

and `Char`

. Rather than slotting in both
those types in one fell swoop via `Either String Char`

, we can recurse one step
at a time. With that in mind, the next thing we slot in would be a `Either String <hole>`

, giving us `Either Int (Either String hole)`

. We just have one
more possible type in our sum - `Char`

- so we use `Char`

to plug the last hole,
giving us `Either Int (Either String Char)`

.

Given that Haskell has support for recursive type functions, we can fairly
directly capture the algorithm that nests `Either`

’s for us automatically:

```
type family SSumRep (as :: [Type]) :: Type where
SSumRep (a ': as) = Either a (SSumRep as)
```

`SSumRep`

computes the representation of our simple sum encoding by walking its
input list of types. The type at the head of the list goes in the left branch,
and the nesting for the types in the tail of the list is computed in the right
branch.

This computation is incomplete though: we need to handle the terminating case where the input list of types is empty:

```
type family SSumRep (as :: [Type]) :: Type where
SSumRep '[] = ... what to put here? ...
SSumRep (a ': as) = Either a (SSumRep as)
```

What type should we use for the deepest-nested `Either`

’s right branch? We could
use `()`

or really any type we want, but as we’ll see soon, the guardrails we’ll
define will prevent the user at compile-time from ever creating or matching on
this branch in the first place. If that branch will never exist as a value at
runtime, the perfect type for that slot is `Void`

:

```
type family SSumRep (as :: [Type]) :: Type where
SSumRep '[] = Void
SSumRep (a ': as) = Either a (SSumRep as)
```

We can test our type function in GHCi:

```
λ> :kind! SSumRep '[]
SSumRep '[] :: *
= Void
λ> :kind! SSumRep '[Int]
SSumRep '[Int] :: *
= Either Int Void
λ> :kind! SSumRep '[Int, String]
SSumRep '[Int, String] :: *
= Either Int (Either [Char] Void)
λ> :kind! SSumRep '[Int, String, Char]
SSumRep '[Int, String, Char] :: *
= Either Int (Either [Char] (Either Char Void))
```

Now we can hide away the representation in a newtype:

```
type SSum :: [Type] -> Type
newtype SSum as = SSum { unSSum :: SSumRep as }
```

A benefit of using the `Either`

representation under the hood is that we can
newtype-derive the instances we want:

```
deriving newtype instance (Eq (SSumRep as)) => Eq (SSum as)
deriving newtype instance (Show (SSumRep as)) => Show (SSum as)
```

Note that newtype-deriving the `Show`

instance exposes the `Either`

representation, but with simplicity in mind for this post, we’ll ignore this.
Deriving a custom `Show`

instance is left as an exercise for the reader.

### Building

We have our representation, and we have it hidden away in a newtype. Next we
need a way to build `SSum`

values. We want something with a type signature like
this:

`build :: a -> SSum as`

But we need a means of expressing that type `a`

is in the list of types `as`

, so
we’re missing a constraint. We’ll create that constraint by walking the list of
types via typeclass instances, but first we’ll need to introduce the typeclass:

```
type BuildSSum :: Type -> [Type] -> Constraint
class BuildSSum a as where
build' :: Proxy as -> a -> SSumRep as
```

`build'`

aligns fairly closely with `build`

above. The key differences are 1)
there’s a `Proxy as`

argument and 2) it operates directly over the sum representation
rather than the newtype wrapper. The `Proxy as`

argument is a bit of noise, but
it lets us sidestep turning on `-XAllowAmbigousTypes`

. It can otherwise be
ignored.

For our instances, we’ll start with the case where the input list of types is empty:

```
instance (TypeError ('Text "BuildSSum: Impossible!")) => BuildSSum a '[] where
= error "Unreachable" build'
```

Remember that `SSumRep '[]`

computes the type `Void`

and there’s no way to
produce a value of type `Void`

(ignoring `undefined`

/`error`

), so there’s
nothing meaningful we can (or want to) do in this branch at the value level. We
use `TypeError`

from `GHC.TypeLits`

as an instance constraint such that the user
gets a compile-time error if they try to build an `SSum '[]`

value.

The next instance we need covers the case when type `a`

matches the type at the
head of list `as`

:

```
instance BuildSSum a (a ': as) where
= Left x build' _proxy x
```

If `a`

and the head of input list of types match, then we know that we’re at the
right spot “positionally” in our nested `Either`

representation to inject the
value of type `a`

. This idea becomes clearer when the last instance is in place.
This one handles the recursion - both at the type-level by walking the input
list of types and the value level by walking the nested representation:

```
instance {-# OVERLAPPABLE #-} (BuildSSum a as) => BuildSSum a (a' ': as) where
= Right $ build' (Proxy @as) x build' _proxy x
```

The key point of this instance is that the type of the value we want to inject
(`a`

) does not match the head (`a'`

) of the input list, so we’re not at the
right spot in the representation yet. We continue the recursion, but wrapped up
in a `Right`

. Another important point is the instance constraint, which is what
allows the recursion in the first place.

The last piece is to implement our `build`

function. This would be the function
we’d expose to users:

```
build :: forall a as. (BuildSSum a as) => a -> SSum as
= SSum $ build' (Proxy @as) x build x
```

Let’s gain some confidence it’s doing the right thing (remember that the `Show`

instance exposes the representation, but that’s a positive here in GHCi for
testing’s sake):

```
λ> build @Int 42 :: SSum '[Int, String, Char]
Left 42
λ> build "abc" :: SSum '[Int, String, Char]
Right (Left "abc")
λ> build 'a' :: SSum '[Int, String, Char]
Right (Right (Left 'a'))
λ> build 'a' :: SSum '[]
<interactive>:12:1: error:
• BuildSSum: Impossible!
• In the expression: build @String "abc" :: SSum '[]
In an equation for ‘it’: it = build @String "abc" :: SSum '[]
```

### Case analysis

Now that we can construct anonymous sum values, next up is destructing them via (an approximation of) case analysis. We’ll shoot for a function with a type along the lines of this:

`match :: SSum as -> Maybe a`

This function says that if the input sum is the branch containing the value of
type `a`

, then it’ll give us that value. Otherwise, it’ll give us `Nothing`

.
Like before, we lack a constraint to relate `as`

and `a`

. The new typeclass and
instances we’ll need should look fairly familiar to `BuildSSum`

(again, the
proxies are largely noise to sidestep turning on `-XAllowAmbigousTypes`

):

```
type MatchSSum :: Type -> [Type] -> Constraint
class MatchSSum a as where
match' :: Proxy as -> Proxy a -> SSumRep as -> Maybe a
instance (TypeError ('Text "MatchSSum: Impossible!")) => MatchSSum a '[] where
= error "Unreachable"
match' _proxy1 _proxy2 _rep
instance MatchSSum a (a ': as) where
= either Just (const Nothing)
match' _proxy1 _proxy2
instance {-# OVERLAPPABLE #-} (MatchSSum a as) => MatchSSum a (a' ': as) where
= either (const Nothing) (match' (Proxy @as) proxy2) match' _proxy1 proxy2
```

The `MatchSSum a '[]`

instance is for the case where the input list of types is
empty. We once again outlaw this case at compile-time.

The `MatchSSum a (a ': as)`

instance captures when we have a type match between
our input type `a`

and the head of the input list of types `as`

. So long as
users do not subvert the use of `build`

when creating values by constructing the
representation manually, there’s no way for the types to match and the
representation to not be `Left x :: Either a (SSumRep as)`

This is why it’s fine
to ignore the possibility of a right branch via `const Nothing`

.

Similiarly, in the `MatchSSum a (a' ': as)`

instance, we have a type mismatch
between the input type `a`

and the head of the input list of types. We know
our representation can’t be a left branch here, so we ignore it via `const Nothing`

and continue the recursion down the right branch.

We now have the appropriate relation for `match`

’s type variables:

```
match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a
SSum rep) = match' (Proxy @as) (Proxy @a) rep match (
```

Let’s try it out in GHCi:

```
λ> let ssum = build @Int 42 :: SSum '[Int, String, Char]
ssum :: SSum '[Int, String, Char]
λ> match @Int ssum
Just 42
it :: Maybe Int
λ> match @String ssum
Nothing
it :: Maybe String
λ> match @Char ssum
Nothing
it :: Maybe Char
```

What about casing over every branch? While the machinery we’ve defined does not
support exhaustive matching, we can still get a rough approximation of full case
analysis using functions straight from `base`

:

```
casingDemo :: SSum '[Int, String, Char] -> IO ()
=
casingDemo ssum $ asum
fold @Int ssum >>= \x ->
[ match Just $ print $ 2 * x
@String ssum >>= \s ->
, match Just $ print s
@Char ssum >>= \c ->
, match Just $ print $ replicate 5 c
]
```

`asum`

here has type `[Maybe (IO ())] -> Maybe (IO ())`

and selects the first
successful match in the list. `fold`

here is `Maybe (IO ()) -> IO ()`

. The
resulting `IO ()`

is a is a no-op if there was no match, otherwise, the branch’s
action is performed. The compiler does not enforce that all branches are
handled, but in the pursuit of keeping things simple, having some syntax
stitched together from existing functions is a reasonable compromise.

We now have the pieces we need to declare victory: we have a representation for anonymous sums hidden away from the user, we can build sum values, and match on those values. The implementation clocks in at 24 lines of Haskell, not counting the import preamble, the newtype-deriving lines, and the 3 kind signatures included for documentation’s sake. Counting everything, including whitespace, it comes in at just under 50 lines.

The complete code is available in the repo at commit 0568906f. The rest of this post will explore the implementation a bit further.

## Limitations

A limitation of our approach can be seen when the user legitimately wants duplicate types in their sum:

```
λ> let ssum = build "abc" :: SSum '[String, String]
ssum :: SSum '[String, String]
```

A type like `SSum '[String, String]`

is similar to `Either String String`

. The
user may wish to assign convention-based significance to the branches in their
sum, but with our approach, a value of type `SSum '[String, String]`

will have a
`String`

in the first slot and never the second:

```
λ> match @String ssum
Just "abc"
it :: Maybe String
λ> ssum
Left "abc"
it :: SSum '[String, String]
```

This may at first seem like a legitimate limitation, but we can solve for this
by tagging the contituent types. `Const a b`

is poly-kinded, so we can stick a
type of any kind in the `b`

slot, including type-level strings:

```
λ> :{
let ssum = build (Const @String @"Right" "abc")
:: SSum '[Const String "Left", Const String "Right"]
:}
ssum :: SSum '[Const String "Left", Const String "Right"]
λ> match @(Const String "Right") ssum
Just (Const "abc")
it :: Maybe (Const String "Right")
```

Another apparent limitation is that there’s no way for the user to express that
a branch is just the branch and no value. We can achieve something effectively
equivalent by using a similar means of tagging via `Proxy`

:

```
λ> let ssum = build (Proxy @"True") :: SSum '[Proxy "False", Proxy "True"]
ssum :: SSum '[Proxy "False", Proxy "True"]
λ> match @(Proxy "True") ssum
Just Proxy
it :: Maybe (Proxy "True")
```

These tagging approaches aren’t the most pleasant things to work with, but they are in line with keeping our implementation as simple and concise as possible.

## Exhaustive matching

The matching support in our implementation lets the user get at a possible value of a particular type “one type at a time”:

`match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a`

There’s no compiler enforcement that the user checks every branch in the sum. There are likely a few different ways we could implement exhaustive matching. Here’s one approach that should look familiar to the work we’ve done up to this point:

```
ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r
SSum rep) matchers = ematch' (Proxy @as) (Proxy @r) matchers rep
ematch (
type EMatchSSum :: Type -> [Type] -> Constraint
class EMatchSSum r as where
ematch' :: Proxy as -> Proxy r -> Matchers r as -> SSumRep as -> r
instance EMatchSSum r '[] where
= f rep
ematch' _proxy1 _proxy2 f rep
instance (EMatchSSum r as) => EMatchSSum r (a' ': as) where
= either f (ematch' (Proxy @as) proxy2 rest)
ematch' _proxy1 proxy2 (f, rest)
type family Matchers (r :: Type) (as :: [Type]) :: Type where
Matchers r '[] = Void -> r
Matchers r (a ': as) = (a -> r, Matchers r as)
```

The truly new bit here is the `Matchers`

type function. It’s easiest to
understand this function by testing it (output formatted for readability):

```
λ> :kind! Matchers String '[Int, String, Char]
Matchers String '[Int, String, Char] :: *
= ( Int -> String
, ( String -> String
, ( Char -> String
, Void -> String
)
)
)
```

We already drew a direct parallel between `Either`

and our anonymous sum type,
and in `Matchers`

we now draw a parallel betwen `(,)`

and an anonymous product.
`Matchers`

is given a final return type and an input list of types, and computes
an anonymous product via nesting `(,)`

where the first element of every pair is
a function type from the specific type in our anonymous sum to the return type.
If the user hands us a value of type `Matchers r as`

and a value of type `Sum as`

, we have everything we need to produce a value of type `r`

.

Usage looks like this:

```
exhaustiveDemo :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo ssum
ematch ssum-> show x
( \x -> s
, ( \s -> [c]
, ( \c
, absurd
)
) )
```

This version of the code is available at commit e412c964. This syntax will be improved upon in the next section.

## Syntax improvements

Early in the post, a hypothetical syntax for expressing the type of an anonymous
sum was shown using `(+)`

:

```
eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()
```

We can add support for this syntax using another type function:

```
type family (+) (a :: Type) (as :: Type) :: Type where
+) a (SSum as) = SSum (a ': as)
(+) a b = SSum '[a, b]
(infixr +
```

Now we can use whatever style we prefer: `Int + String + Char`

and `SSum '[Int, String, Char]`

are equivalent types.

Separately, we can clean up the syntax for exhaustive matching seen in the previous section:

```
exhaustiveDemo :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo ssum
ematch ssum-> show x
( \x -> s
, ( \s -> [c]
, ( \c
, absurd
)
) )
```

The tuple nesting here is fairly brutal. We won’t worry about hiding away this
anonymous-product-as-tuples representation as this post has gotten long enough
already, but we will make a more usable syntax. We’ll start by adding a synonym
for `(,)`

that we can apply infix (borrowing the name that PureScript uses):

```
/\) :: a -> b -> (a, b)
(/\) = (,)
(infixr /\
```

Next we’ll slightly tweak `ematch`

’s signature. It currently is this:

`ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r`

We’ll change it so that the anonymous sum value is handed to us as the first element of a tuple, and the anonymous product of matchers is handed to us in the second element:

`ematch :: forall r as. (EMatchSSum r as) => (SSum as, Matchers r as) -> r`

With these two pieces, we now have an (arguably) much more aesthetically-pleasing means to do exhaustive matching:

```
exhaustiveDemo2 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo2 ssum $ ssum
ematch /\ (\x -> show x)
/\ (\s -> s)
/\ (\c -> [c])
/\ absurd
```

This version of the code is available at commit 3abcaab1.

The computations done in each match are very simple here, but they’ve been kept
as lambdas for readability. If you’re like me and the parentheses get under the
skin a bit, we can (ab)use `do`

notation to get rid of them:

```
exhaustiveDemo3 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo3 ssum $ ssum
ematch /\ do
-> show x
\x /\ do
-> s
\s /\ do
-> [c]
\c /\ absurd
```

If the lambdas also come off as noisy, we can always (ab)use the `MonadReader r ((->) r)`

instance:

```
exhaustiveDemo4 :: SSum '[Int, String, Char] -> String
=
exhaustiveDemo4 ssum $ ssum
ematch /\ do
show
asks /\ do
ask/\ do
pure
asks /\ absurd
```

It’s probably worth considering a means of getting rid of the need for the user
to spell out `absurd`

to “close off” their anonymous product of matcher
functions, but that’s left as an exercise for the reader.

## End

Thanks for reading! I hope this post helps demystify anonymous sums. As hinted at earlier in the post, I don’t recommend using the exact representation demonstrated here in production: what’s shown in this post is meant purely for education. There are many libraries on Hackage adding support for anonymous sums that would be better candidates. Each library makes tradeoffs in the design space, so I recommend doing your own evaluation if you wind up needing to pull in an anonymous sum in prod.