A good tutorial to start with: http://adam.chlipala.net/cpdt/html/Universes.html

The way type theorists avoid the Russell’s paradox is to introduce the notion of “levels” of type universe like $\tau: Set : \mathcal{U}_0 : \mathcal{U}_1 : ...$ With ADT, it is natural for us to think about its constructors:

data Nat : Set where
	Z : Nat
	S : Nat -> Nat

For types inductively defined as the $Nat$ case, the type of $Nat$ should inhabit a “larger” universe than its constructors do, which would otherwise make no sense because it results in a universe containing itself, and it is bad (Agda, for some historical reason, chose Set for all type universes, so there would be Set_0, Set_1, ..., but this is just some naming flaw).

Let us see a more sophisticated case:

Set Printing Universes.

Inductive Expr : Type -> Type :=
	| Const : forall (T: Type), T -> Expr T
	| Pair  : forall (T1 T2: Type), Expr T1 -> Expr T2 -> Expr (T1 * T2) 
	| Eq    : forall T, Expr T -> Expr T -> Expr bool
.

Print Expr.
(* Inductive
	Expr
	: Type@{Expr.u0} ->
	Type@{max(Set,Expr.u0+1,Expr.u1+1,Expr.u2+1,Expr.u3+1)} :=
	  Const : forall T : Type@{Expr.u1}, T -> Expr T
	| Pair : forall (T1 : Type@{Expr.u2}) (T2 : Type@{Expr.u3}),
	Expr T1 -> Expr T2 -> Expr (T1 * T2)
	| Eq : forall T : Type@{Expr.u0}, Expr T -> Expr T -> Expr bool
*)

Coq automatically infers the level of each “type” for its constructors as well as the index type of the Expr type. The resulting type of Expr is the maximum among all its constructors. Obvisouly this constraint is satisfiable. Strict positivity is a property of recursive types that there are no self-references in its constructors' parameters' contravariant positions. Types like $t: (T \to T) \to T$ is not typable. It is simple if we assign level $L$ to $T$ and the level of $T$ should be at least $T + 1$ to avoid impredicativity but this makes the index type live at higher level.

Strict positivity enforces termination in a neat way given a language really checks this feature (Haskell does not enforce this so its types are technically not ADT).

In a more theoretic way…

In universal algebra a signature is a list of operations with their arities

$$ \sigma = \vec{(op_i, n_i)} $$

where $n_i \in \mathbb N$ is just the index of each operator. For example, the natural number can be defined as $(Z, 0), (S, 1)$. We define the set $T_\sigma$ of terms inductively as:

$op_i(t_1, \cdots, t_{n_i}) \in T_\sigma, i \le k$ and $t_1, \cdots, t_{n_i} \in T_\sigma$ for some constant $k$.

and we have an isomorphism $T_\sigma \simeq \sum_{j}^{i}T^{n_j}\sigma$. Let us bring in some category theory. First, we replace arities as numbers with arities as sets. We shall say that we have a map $t: N_i \to T\sigma$ where $N_i$ has $n_i$ elements as terms $t_1, \cdots, t_{n_i}$. Next, let us allow operations with parameters. Each symbol $op$ shall have a parameter set $S_i$ and an arity $N_i$:

$$ \sigma = \vec{(op_i, S_i, N_i)} $$

Screenshot 2024-04-05 at 12.28.20 PM.png

As a datatype, $T_\sigma$ is described as follows.

data T : Set where
  op_1 : S_1 × (N_1 → T) → T
  ...
  op_k : S_k × (N_k → T) → T

Actually, we could describe the final coalgebra with the exact same data, by writing something like

codata T : Set where
  op_1 : S_1 × (N_1 → T) → T
  ...
  op_k : S_k × (N_k → T) → T