Anonymous sums from scratch

Intro

Haskell has strong support for sum types, but has no built-in notion of anonymous sum types. We must explicitly name the sum type and uniquely label each of its branches:

data Result e a = Error e | Success a

base provides a handful of common sums too:

data Bool = False | True
data Maybe a = Nothing | Just a
data Either a b = Left a | Right b

Anonymous sums lift the requirement on type naming and branch labeling. Instead, we just express the different possible types the sum type wraps. Some hypothetical syntax:

eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()

The + operator here delimits the possible types. A value of type Fruit + Candy will either be a Fruit or a Candy. Likewise, a value of type [Meat + Grain + Vegetable] is a list where each element is either a Meat, a Grain, or a Vegetable.

If we were to build these types as regular sums, they might look something like this:

data SnackItem = SnackItemFruit Fruit | SnackItemCandy Candy
data MealItem
  = MealItemMeat Meat
  | MealItemGrain Grain
  | MealItemVegetable Vegetable

There are plenty of existing libraries already offering anonymous sums, but the tricks the libraries use - often either in the name of flexibility or performance - can obscure understanding the core concepts. This post aims to demystify anonymous sums by building them from scratch using as simple of an encoding as possible. The goal for the code is to be educational: ease of understanding is valued above everything else here. Performance is not a goal. In the main portion of the post, we’ll build out anonymous sum support in just under 25 lines of Haskell.

Finding an encoding

Before we take a crack at implementing anonymous sums, we should spend some time thinking about how we might encode them at a high level. We need a means for supporting arbitrarily many branches and a means for performing some approximation of case analysis on each individual branch. We’re after simplicity here, so it’d be great if we can reuse whatever code we can from base.

To that end, let’s turn our attention back to Either:

data Either a b = Left a | Right b

This is certainly a useful type when we have two branches. We could even make the argument (ignoring ergonomics) that types like Maybe a and Bool are redundant in light of Either:

type Bool = Either () ()
type Maybe a = Either () a

We are free to plug in whatever types we want for the Either type constructor’s parameters, which means we can also nest more Eithers:

type MySum = Either Int (Either String Char)

While not the friendliest type in town, MySum expresses a new sum type with three branches:

branchInt :: MySum
branchInt = Left 42

branchString :: MySum
branchString = Right $ Left "abc"

branchChar :: MySum
branchChar = Right $ Right 'a'

foo :: MySum -> String
foo = \case
  Left x -> show x
  Right (Left s) -> s
  Right (Right c) -> [c]

We can keep nesting Eithers in this way to have sum types with arbitrarily many branches. The glaring pain point though is that the ergonomics are very rough - both in construction and in case analysis.

We could take a cue from this old haskell-cafe message and improve the ergonomics a bit with some operators:

type (:|:) a b = Either a b

(???) :: (a -> c) -> (b -> c) -> Either a b -> c
(???) = either

-- Note that this input type has the nested Either in the left branch rather
-- than the right, i.e. Either (Either Int String) Char.
bar :: Int :|: String :|: Char -> String
bar =
  (\x -> show x) ???
  (\s -> s)      ???
  (\c -> [c])

This gets us a very rough approximation of anonymous sums: we can support arbitrarily many branches and can perform case analysis on those branches. The unfortunate bit is that the nested Either representation is directly exposed to the user.

This representation will serve as great inspiration for our own implementation though, and hopefully it already should feel like anonymous sums in Haskell are well within reach. What remains should be a relatively light amount of type-level elbow grease.

Elbow grease

If we consider a type like Either Int (Either String Char), we can draw a parallel between nesting that Either String Char and recursion. We can think about the problem as if we’re starting from a list of the possible types in our sum - [Int, String, Char]. The first possible type is Int, so we slot that type in Either’s first type parameter, producing Either Int hole. We only have one other type parameter to work with (hole), but we have two additional possible types in our sum: String and Char. Rather than slotting in both those types in one fell swoop via Either String Char, we can recurse one step at a time. With that in mind, the next thing we slot in would be a Either String <hole>, giving us Either Int (Either String hole). We just have one more possible type in our sum - Char - so we use Char to plug the last hole, giving us Either Int (Either String Char).

Given that Haskell has support for recursive type functions, we can fairly directly capture the algorithm that nests Either’s for us automatically:

type family SSumRep (as :: [Type]) :: Type where
  SSumRep (a ': as) = Either a (SSumRep as)

SSumRep computes the representation of our simple sum encoding by walking its input list of types. The type at the head of the list goes in the left branch, and the nesting for the types in the tail of the list is computed in the right branch.

This computation is incomplete though: we need to handle the terminating case where the input list of types is empty:

type family SSumRep (as :: [Type]) :: Type where
  SSumRep '[] = ... what to put here? ...
  SSumRep (a ': as) = Either a (SSumRep as)

What type should we use for the deepest-nested Either’s right branch? We could use () or really any type we want, but as we’ll see soon, the guardrails we’ll define will prevent the user at compile-time from ever creating or matching on this branch in the first place. If that branch will never exist as a value at runtime, the perfect type for that slot is Void:

type family SSumRep (as :: [Type]) :: Type where
  SSumRep '[] = Void
  SSumRep (a ': as) = Either a (SSumRep as)

We can test our type function in GHCi:

λ> :kind! SSumRep '[]
SSumRep '[] :: *
= Void

λ> :kind! SSumRep '[Int]
SSumRep '[Int] :: *
= Either Int Void

λ> :kind! SSumRep '[Int, String]
SSumRep '[Int, String] :: *
= Either Int (Either [Char] Void)

λ> :kind! SSumRep '[Int, String, Char]
SSumRep '[Int, String, Char] :: *
= Either Int (Either [Char] (Either Char Void))

Now we can hide away the representation in a newtype:

type SSum :: [Type] -> Type
newtype SSum as = SSum { unSSum :: SSumRep as }

A benefit of using the Either representation under the hood is that we can newtype-derive the instances we want:

deriving newtype instance (Eq (SSumRep as)) => Eq (SSum as)
deriving newtype instance (Show (SSumRep as)) => Show (SSum as)

Note that newtype-deriving the Show instance exposes the Either representation, but with simplicity in mind for this post, we’ll ignore this. Deriving a custom Show instance is left as an exercise for the reader.

Building

We have our representation, and we have it hidden away in a newtype. Next we need a way to build SSum values. We want something with a type signature like this:

build :: a -> SSum as

But we need a means of expressing that type a is in the list of types as, so we’re missing a constraint. We’ll create that constraint by walking the list of types via typeclass instances, but first we’ll need to introduce the typeclass:

type BuildSSum :: Type -> [Type] -> Constraint
class BuildSSum a as where
  build' :: Proxy as -> a -> SSumRep as

build' aligns fairly closely with build above. The key differences are 1) there’s a Proxy as argument and 2) it operates directly over the sum representation rather than the newtype wrapper. The Proxy as argument is a bit of noise, but it lets us sidestep turning on -XAllowAmbigousTypes. It can otherwise be ignored.

For our instances, we’ll start with the case where the input list of types is empty:

instance (TypeError ('Text "BuildSSum: Impossible!")) => BuildSSum a '[] where
  build' = error "Unreachable"

Remember that SSumRep '[] computes the type Void and there’s no way to produce a value of type Void (ignoring undefined/error), so there’s nothing meaningful we can (or want to) do in this branch at the value level. We use TypeError from GHC.TypeLits as an instance constraint such that the user gets a compile-time error if they try to build an SSum '[] value.

The next instance we need covers the case when type a matches the type at the head of list as:

instance BuildSSum a (a ': as) where
  build' _proxy x = Left x

If a and the head of input list of types match, then we know that we’re at the right spot “positionally” in our nested Either representation to inject the value of type a. This idea becomes clearer when the last instance is in place. This one handles the recursion - both at the type-level by walking the input list of types and the value level by walking the nested representation:

instance {-# OVERLAPPABLE #-} (BuildSSum a as) => BuildSSum a (a' ': as) where
  build' _proxy x = Right $ build' (Proxy @as) x

The key point of this instance is that the type of the value we want to inject (a) does not match the head (a') of the input list, so we’re not at the right spot in the representation yet. We continue the recursion, but wrapped up in a Right. Another important point is the instance constraint, which is what allows the recursion in the first place.

The last piece is to implement our build function. This would be the function we’d expose to users:

build :: forall a as. (BuildSSum a as) => a -> SSum as
build x = SSum $ build' (Proxy @as) x

Let’s gain some confidence it’s doing the right thing (remember that the Show instance exposes the representation, but that’s a positive here in GHCi for testing’s sake):

λ> build @Int 42 :: SSum '[Int, String, Char]
Left 42

λ> build "abc" :: SSum '[Int, String, Char]
Right (Left "abc")

λ> build 'a' :: SSum '[Int, String, Char]
Right (Right (Left 'a'))

λ> build 'a' :: SSum '[]
<interactive>:12:1: error:
    • BuildSSum: Impossible!
    • In the expression: build @String "abc" :: SSum '[]
      In an equation for ‘it’: it = build @String "abc" :: SSum '[]

Case analysis

Now that we can construct anonymous sum values, next up is destructing them via (an approximation of) case analysis. We’ll shoot for a function with a type along the lines of this:

match :: SSum as -> Maybe a

This function says that if the input sum is the branch containing the value of type a, then it’ll give us that value. Otherwise, it’ll give us Nothing. Like before, we lack a constraint to relate as and a. The new typeclass and instances we’ll need should look fairly familiar to BuildSSum (again, the proxies are largely noise to sidestep turning on -XAllowAmbigousTypes):

type MatchSSum :: Type -> [Type] -> Constraint
class MatchSSum a as where
  match' :: Proxy as -> Proxy a -> SSumRep as -> Maybe a

instance (TypeError ('Text "MatchSSum: Impossible!")) => MatchSSum a '[] where
  match' _proxy1 _proxy2 _rep = error "Unreachable"

instance MatchSSum a (a ': as) where
  match' _proxy1 _proxy2 = either Just (const Nothing)

instance {-# OVERLAPPABLE #-} (MatchSSum a as) => MatchSSum a (a' ': as) where
  match' _proxy1 proxy2 = either (const Nothing) (match' (Proxy @as) proxy2)

The MatchSSum a '[] instance is for the case where the input list of types is empty. We once again outlaw this case at compile-time.

The MatchSSum a (a ': as) instance captures when we have a type match between our input type a and the head of the input list of types as. So long as users do not subvert the use of build when creating values by constructing the representation manually, there’s no way for the types to match and the representation to not be Left x :: Either a (SSumRep as) This is why it’s fine to ignore the possibility of a right branch via const Nothing.

Similiarly, in the MatchSSum a (a' ': as) instance, we have a type mismatch between the input type a and the head of the input list of types. We know our representation can’t be a left branch here, so we ignore it via const Nothing and continue the recursion down the right branch.

We now have the appropriate relation for match’s type variables:

match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a
match (SSum rep) = match' (Proxy @as) (Proxy @a) rep

Let’s try it out in GHCi:

λ> let ssum = build @Int 42 :: SSum '[Int, String, Char]
ssum :: SSum '[Int, String, Char]

λ> match @Int ssum
Just 42
it :: Maybe Int

λ> match @String ssum
Nothing
it :: Maybe String

λ> match @Char ssum
Nothing
it :: Maybe Char

What about casing over every branch? While the machinery we’ve defined does not support exhaustive matching, we can still get a rough approximation of full case analysis using functions straight from base:

casingDemo :: SSum '[Int, String, Char] -> IO ()
casingDemo ssum =
  fold $ asum
    [ match @Int ssum >>= \x ->
        Just $ print $ 2 * x
    , match @String ssum >>= \s ->
        Just $ print s
    , match @Char ssum >>= \c ->
        Just $ print $ replicate 5 c
    ]

asum here has type [Maybe (IO ())] -> Maybe (IO ()) and selects the first successful match in the list. fold here is Maybe (IO ()) -> IO (). The resulting IO () is a is a no-op if there was no match, otherwise, the branch’s action is performed. The compiler does not enforce that all branches are handled, but in the pursuit of keeping things simple, having some syntax stitched together from existing functions is a reasonable compromise.

We now have the pieces we need to declare victory: we have a representation for anonymous sums hidden away from the user, we can build sum values, and match on those values. The implementation clocks in at 24 lines of Haskell, not counting the import preamble, the newtype-deriving lines, and the 3 kind signatures included for documentation’s sake. Counting everything, including whitespace, it comes in at just under 50 lines.

The complete code is available in the repo at commit 0568906f. The rest of this post will explore the implementation a bit further.

Limitations

A limitation of our approach can be seen when the user legitimately wants duplicate types in their sum:

λ> let ssum = build "abc" :: SSum '[String, String]
ssum :: SSum '[String, String]

A type like SSum '[String, String] is similar to Either String String. The user may wish to assign convention-based significance to the branches in their sum, but with our approach, a value of type SSum '[String, String] will have a String in the first slot and never the second:

λ> match @String ssum
Just "abc"
it :: Maybe String

λ> ssum
Left "abc"
it :: SSum '[String, String]

This may at first seem like a legitimate limitation, but we can solve for this by tagging the contituent types. Const a b is poly-kinded, so we can stick a type of any kind in the b slot, including type-level strings:

λ> :{
let ssum = build (Const @String @"Right" "abc")
             :: SSum '[Const String "Left", Const String "Right"]
:}
ssum :: SSum '[Const String "Left", Const String "Right"]

λ> match @(Const String "Right") ssum
Just (Const "abc")
it :: Maybe (Const String "Right")

Another apparent limitation is that there’s no way for the user to express that a branch is just the branch and no value. We can achieve something effectively equivalent by using a similar means of tagging via Proxy:

λ> let ssum = build (Proxy @"True") :: SSum '[Proxy "False", Proxy "True"]
ssum :: SSum '[Proxy "False", Proxy "True"]

λ> match @(Proxy "True") ssum
Just Proxy
it :: Maybe (Proxy "True")

These tagging approaches aren’t the most pleasant things to work with, but they are in line with keeping our implementation as simple and concise as possible.

Exhaustive matching

The matching support in our implementation lets the user get at a possible value of a particular type “one type at a time”:

match :: forall a as. (MatchSSum a as) => SSum as -> Maybe a

There’s no compiler enforcement that the user checks every branch in the sum. There are likely a few different ways we could implement exhaustive matching. Here’s one approach that should look familiar to the work we’ve done up to this point:

ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r
ematch (SSum rep) matchers = ematch' (Proxy @as) (Proxy @r) matchers rep

type EMatchSSum :: Type -> [Type] -> Constraint
class EMatchSSum r as where
  ematch' :: Proxy as -> Proxy r -> Matchers r as -> SSumRep as -> r

instance EMatchSSum r '[] where
  ematch' _proxy1 _proxy2 f rep = f rep

instance (EMatchSSum r as) => EMatchSSum r (a' ': as) where
  ematch' _proxy1 proxy2 (f, rest) = either f (ematch' (Proxy @as) proxy2 rest)

type family Matchers (r :: Type) (as :: [Type]) :: Type where
  Matchers r '[] = Void -> r
  Matchers r (a ': as) = (a -> r, Matchers r as)

The truly new bit here is the Matchers type function. It’s easiest to understand this function by testing it (output formatted for readability):

λ> :kind! Matchers String '[Int, String, Char]
Matchers String '[Int, String, Char] :: *
= ( Int -> String
  , ( String -> String
    , ( Char -> String
      , Void -> String
      )
    )
  )

We already drew a direct parallel between Either and our anonymous sum type, and in Matchers we now draw a parallel betwen (,) and an anonymous product. Matchers is given a final return type and an input list of types, and computes an anonymous product via nesting (,) where the first element of every pair is a function type from the specific type in our anonymous sum to the return type. If the user hands us a value of type Matchers r as and a value of type Sum as, we have everything we need to produce a value of type r.

Usage looks like this:

exhaustiveDemo :: SSum '[Int, String, Char] -> String
exhaustiveDemo ssum =
  ematch ssum
    ( \x -> show x
    , ( \s -> s
      , ( \c -> [c]
        , absurd
        )
      )
    )

This version of the code is available at commit e412c964. This syntax will be improved upon in the next section.

Syntax improvements

Early in the post, a hypothetical syntax for expressing the type of an anonymous sum was shown using (+):

eatSnack :: Fruit + Candy -> IO ()
eatMeal :: [Meat + Grain + Vegetable] -> IO ()

We can add support for this syntax using another type function:

type family (+) (a :: Type) (as :: Type) :: Type where
  (+) a (SSum as) = SSum (a ': as)
  (+) a b = SSum '[a, b]
infixr +

Now we can use whatever style we prefer: Int + String + Char and SSum '[Int, String, Char] are equivalent types.

Separately, we can clean up the syntax for exhaustive matching seen in the previous section:

exhaustiveDemo :: SSum '[Int, String, Char] -> String
exhaustiveDemo ssum =
  ematch ssum
    ( \x -> show x
    , ( \s -> s
      , ( \c -> [c]
        , absurd
        )
      )
    )

The tuple nesting here is fairly brutal. We won’t worry about hiding away this anonymous-product-as-tuples representation as this post has gotten long enough already, but we will make a more usable syntax. We’ll start by adding a synonym for (,) that we can apply infix (borrowing the name that PureScript uses):

(/\) :: a -> b -> (a, b)
(/\) = (,)
infixr /\

Next we’ll slightly tweak ematch’s signature. It currently is this:

ematch :: forall r as. (EMatchSSum r as) => SSum as -> Matchers r as -> r

We’ll change it so that the anonymous sum value is handed to us as the first element of a tuple, and the anonymous product of matchers is handed to us in the second element:

ematch :: forall r as. (EMatchSSum r as) => (SSum as, Matchers r as) -> r

With these two pieces, we now have an (arguably) much more aesthetically-pleasing means to do exhaustive matching:

exhaustiveDemo2 :: SSum '[Int, String, Char] -> String
exhaustiveDemo2 ssum =
  ematch $ ssum
    /\ (\x -> show x)
    /\ (\s -> s)
    /\ (\c -> [c])
    /\ absurd

This version of the code is available at commit 3abcaab1.

The computations done in each match are very simple here, but they’ve been kept as lambdas for readability. If you’re like me and the parentheses get under the skin a bit, we can (ab)use do notation to get rid of them:

exhaustiveDemo3 :: SSum '[Int, String, Char] -> String
exhaustiveDemo3 ssum =
  ematch $ ssum
    /\ do
         \x -> show x
    /\ do
         \s -> s
    /\ do
         \c -> [c]
    /\ absurd

If the lambdas also come off as noisy, we can always (ab)use the MonadReader r ((->) r) instance:

exhaustiveDemo4 :: SSum '[Int, String, Char] -> String
exhaustiveDemo4 ssum =
  ematch $ ssum
    /\ do
         asks show
    /\ do
         ask
    /\ do
         asks pure
    /\ absurd

It’s probably worth considering a means of getting rid of the need for the user to spell out absurd to “close off” their anonymous product of matcher functions, but that’s left as an exercise for the reader.

End

Thanks for reading! I hope this post helps demystify anonymous sums. As hinted at earlier in the post, I don’t recommend using the exact representation demonstrated here in production: what’s shown in this post is meant purely for education. There are many libraries on Hackage adding support for anonymous sums that would be better candidates. Each library makes tradeoffs in the design space, so I recommend doing your own evaluation if you wind up needing to pull in an anonymous sum in prod.