Typing is Hard

Type Checking and Type Inference

Type checking is the process of taking a given program in some programming language and working out whether all variables and expressions have the correct types, i.e. strings are assigned to strings, arithmetic expressions involve only numbers, etc. Some languages also offer type inference, tasking the compiler with the task of figuring out correct types on its own. Depending on a language’s features the type checking and type inference problems range from trivial to undecidable.

Common terms


A type checker is complete if it can check every correctly typed program.


A type checker is sound if it only accepts correctly typed programs.


A decision problem is decidable if for any input we can compute whether the input satifies the problem in finite time. Examples of decidable problems include primality testing and boolean satisfiability. The halting problem for example is undecidable: We cannot check whether a program runs infinitely long in finite time. We are interested in the type checking and type inference problems for programming languages: Given some input program, does it type check? And given some program, which types should we assign to the expression such that it typechecks?

Hindley-Milner Type System

The Hindley-Milner (HM) type system is a type system for the simple typed lambda calculus with parametric polymorphism, which is used as a base for many functional languages. Type-checking for HM is decidable and efficient, its complexity is linear in the size of the program. Type inference is more difficult than type checking, since the compiler has to solve the type constraints incurred for expressions in the program. It is decidable for the HM type system, however the problem itself is PSPACE-hard and EXPTIME-complete, meaning that in the worst case it needs at least a polynomial amount of extra space and exponential time relative to the input size. Fortunately, the type inference algorithm is linear when the nesting depth of polymorphic variables is bounded, as is the case for most applications. There exist many type inference algorithms, the best known one is the so-called Algorithm W. Many functional programming languages implement variants of the HM type system.

Dependent types

In simple terms dependent types allow types to depend not only on other types, but on values. This is best understood by an example: Normally we can only encode very coarse information, such as “x is of integer type”. Dependent types allow us to define more detailed types. For example we could then create a type “even integer”, whose only inhabitants are even integers. This is strictly more powerful than the previous setting: Dependent types in general make type inference undecidable as can be shown by reduction to the Post Correspondence Problem.

How Hard is Type-Checking Your Favorite Language?

Below I’ve compiled a list of languages and how hard type checking/type inference is in these languages. If you find a mistake or are missing a language please file an issue with your language, its type checking complexity and optimally a resource that supports your claim. I do not claim completeness nor correctness of the properties shown here, it is mainly an amalgamation of blog posts I’ve been collecting.


undecidable, C++ Templates are Turing Complete, Todd L. Veldhuizen (2003), even its grammar is undecidable


unsound, undecidable, there is an excellent SO answer by Eric Lippert on this topic. Other fun things include a SAT solver using the C# type-checker


decidable, uses Hindley-Milner, but currently unsound, due to an interesting compiler bug: (String.length " ") ^ (-1) : Int


undecidable, GitHub user cannorin implemented the untyped lambda calculus in F#


decidable, since type inference is only used for variable initialization


1998 standard, no extensions: decidable, variant of HM

2010 standard, no extensions: decidable, restriction of System F

with sufficient1 extensions: undecidable, as type checking in System F is undecidable. There exist Turing Machines in Haskell types. A simpler variant is to implement the SKI calculus.


decidable, surprisingly. Idris has dependent types, but at compile time it will only evaluate expressions which it knows to be total (terminating and covering all inputs).


undecidable, because Java Generics are Turing complete. Java 5 or later is unsound, as shown by Amin and Tate (2015)


undecidable, since we can cause the type checker to loop infinitely


decidable, uses Hindley-Milner


undecidable, both type inference (since Rust has rank-3-types) and type checking, as shown by this Smallfuck interpreter implemented using traits.


undecidable, since it admits a type-level SKI calculus, unsound, as shown by Amin and Tate (2015). Scala 2.13.3 (newest as of writing this) also exhibits the same problem.


undecidable, as proven here by reduction to the word problem for finitely generated groups2


undecidable, unsound. The TypeScript documentation mentions its unsoundness and the motivations behind them.

Undecidability: TypeScript’s type system was proven Turing complete until they disallowed self-referential types. However Robbie Ostrow wrote a program checking the Collatz conjecture, and as the generalized form of the Collatz conjecture is undecidable3, the TypeScript type system is undecidable as well.


undecidable, since evaluation of recursive functions at compile time is possible, thus requiring the compiler to solve the halting problem.


Where are Python, Bash, etc.?

Type checking and type inference work primarily for statically typed languages. While there exist extensions to some dynamic languages imbuing them with static type checking these are not part of the language, and the complexity depends not on the language but the extension.

What about unsafe casts?

Some languages offer explicit unchecked casts, which are accepted by the type checker and may potentially fail at runtime. Examples are casting from Object to some subclass in C# and Java, casting values of type interface{} in Go or using unsafeCoerce in Haskell. I’ve chosen not to account for such casts/coercions since unchecked downcasting is an inherently unsafe operation not covered by type checker guarantees.

I’ve spotted a mistake/imprecision, what should I do?

Great work! Please report it on the official issue tracker detailing what is wrong and I will try to fix it as soon as possible.

  1. Using only RankNTypes suffices to stump inference. Interestingly enough this is only undecidable for N >= 3.

  2. I think a reduction to the PCP-problem is also possible.

  3. Thanks to reddit user /u/nckl for pointing that out.