Outline
- Observability in digital systems
- Observability in principal agent problems
In this post, we’re going to talk about how certain common failure modes which occur within principal agent contexts arise to only a very limited degree when the agent is a digital one.
We’ll start by describing what makes digital systems remarkable, especially in the context of intelligent systems. Then we’ll describe the principal agent problem and see how its failure mechanism line up directly with the defining characteristic of digital systems.
Digital Abstraction and Environmental Isolation
What makes a system digital? Typical answers to this question will say something like “Digital systems perform operations on discrete domains instead of continuous operations on continuous domains.” This is true, but it misses the bigger picture of what properties a digital system is designed to satisfy, which lead to the discrete design.
The bigger picture that we will attempt to illustrate in this section is that a digital system is an attempt to realize a pure, abstract system–the kind that mathematicians reason about in their heads–in the real physical world. The major difficulty here is to construct a system whose dynamics are properly isolated from the vagaries of the environment in which the system must necessarily be embedded. Another way of putting this is that the system should be closed to influence or informational leakage from the environment. This section introduces a modeling framework and presents some stylized arguments for why environmental isolation tends to imply discretization. Readers less interested in mathematics can skip to the next section where we apply this perspective to look at what makes digital intelligence so novel relative to the forms of intelligence that we’ve known heretofore.
Our discussion will likely be applicable to various abstract systems, but we’ll be particularly interested in computational systems, which we model as deterministic state machines. Such a system consists of a state space, $L$, and a state transition function, $\tau_L : L \to L$ , which takes a current state $\ell_t$ and maps it to the next state $\ell_{t+1}$ (A digital system will want for $L$ to be a finite/discrete domain, but we’ll start continuous and see what happens).
What does it mean for our abstract system to be implemented in the world? Let $U$ be a set representing the possible states of the world, and let $\tau : U \to U$ be the world’s state transition function.
We’ll say that the world realizes an abstract state machine $\tau_L$ if there exists a mapping $m : U \to L$ such that the following commutation relationship holds:
This relationship implies a kind of causal closure; Any information about the world which is thrown away by $m$ cannot later influence $\ell$. To get a better picture about what this means, let’s assume that $U$ has a product structure, $U = S \times E$ where $S$ represents the state of the system and $E$ represents its environment, and that $m$ only depends on the system state, i.e., $m = m_S \circ \pi_S$, where $\pi_S$ is a projector from $U$ onto $S$. Our digital abstraction relationship says that the environment cannot causally affect the behavior of the system since $m(\tau(s,e)) = \tau_L(m_S(s))$ $\forall e \in E$.
We can make a stylized argument to show why it isn’t possible to have this kind of causal closure when the abstract system is continuous.
Let $\tau_S = \pi_S \circ \tau$ be the restriction of $\tau$ to $S$. We’ll suppose $\tau_S$ decomposes into two sequential steps:
$$ \tau_S = \tau_2 \circ \tau_1 $$where $\tau_1 : S \times E \to S$ captures any coupling from the environment, and $\tau_2 : S \to S$ captures the internal system update. We’ll also model $S$ and $L$ as metric spaces and assume that there exist $\epsilon_2 > \epsilon_1 > 0$ such that for any $s \in S$, the set $\{\tau_1(s,e) : e \in E\}$ contains a ball of size $\epsilon_1$ and is contained in a ball of size $\epsilon_2$. This essentially means that the environmental effect on the system state is isotropic; environmental effects can move the system state in any direction by at least $\epsilon_1$ but no further than $\epsilon_2$.
If $m_S \circ \tau_S$ is a continuous, non-constant function, then our abstraction breaks. We just need to find an $s$ and $s'$ with $||s - s'|| < \epsilon_1$ where $\tau_S \circ m_S(s) \neq \tau_S \circ m_S(s')$ ) (these will exist since the composition is continuous and non-constant). Then we know that we can choose $e$ and $e'$ to make $||\tau_1(s,e) - s|| = 0$ and $||\tau_1(s,e') - s|| > \epsilon_1$ so that $m(s,e) \neq m(s,e')$.
On the other hand, we can preserve the abstraction if we are willing to make $\tau_S$ discrete. We can define a mesh $G \subset S$ with spacing $\delta > 2\epsilon_2$, and let $r : S \to G$ be the nearest-point rounding function, which maps each point to its closest mesh point. We can then define a discretized system update:
$$\tau'_2 = r \circ \tau_2 \circ r$$The rounding on the right erases any environmental perturbation stemming from $\tau_1$, assuming that $s$ started on a mesh point. The rounding on the left, ensures that the system state ends on a mesh point after the internal update, so that the commutation condition can be preserved as an inductive invariant. In this case, having made $\tau_S$ discrete, our abstraction is effectively discrete as well, since $m_S$ only sees values on the mesh $G$.
In practice, this version of $\tau_S$ is unphysical. We can’t snap the physical state to a specific point on a mesh. But we can create potential wells which form a basin of attraction to combat environmentally induced drift. Each full basin is then mapped by $m_S$ to a single point in the discrete abstraction space. The physical system can then implement the abstracted logic as long as it can manage to move the state into a vicinity corresponding to the target basin.
To recap: The goal of this development is to shift the way that we think about digital systems from “systems with discrete domains” to “systems with a shocking level of environmental isolation, which use discretization to achieve this.”
Informational and Computational Closure
Our formulation is a bit too restrictive to describe real systems, which may in fact have some form of interaction with the environment.
However, we can replace the concept of environmental isolation with a related and equally powerful idea: informational closure. Informational closure says that the abstract model can only depend on a narrow information channel from the environment; specifically, one which transmits information at a finite and bounded rate.
We can model this as follows: Let $i \in I$ be an input symbol that the abstract model is allowed to receive from the environment, selected by a channel $p : E \to I$. The abstract model now has the form $\tau_L : L \times I\to L$, and the generalized commutation condition is:
The important sense in which I want to say that this is a closure is the following:
- For an informational closed / digital system, it is possible to know or certify that the system’s behavior was only influenced by the information contained in $I$.
- In contrast, for an informationally open system, it is generally impossible to fully identify the information that influenced the behavior of the system.
We can identify two somewhat distinct reasons why most systems which we might try to abstract are not informationally closed:
- The size of “sensory” information channels from the environment can’t be easily bounded.
- Information leaks into the model via model underapproximation.
The first point pertains to the fact that many abstract systems pertain to systems that purposefully couple to their environment via sensory channels. Because these are often analog or continuous channels, we would usually estimate the “true” information by looking at the signal to noise ratio and calculating the finite bit rate of the effective information channel. But for the purposes of our modeling, this is only valid if we know that the abstract system in question is not actually influenced by the noise, and it’s not at all obvious that this is true.
This points to a deeper issue, which is captured by the second point. Most analog systems will be affected by random noise–not just noise contained in explicit sensory information channels, but also noise in the actual operation of the system.
We can usually think of this noise as a kind of underapproximation error. Usually, our models of systems are coarse-grainings of smaller scale dynamics (e.g. cellular biology coarse-grains molecular and atomic physics coarse grains various field theories coarse grain…). In such cases, dynamics from the lower scale tend to feed up into the higher scale in the form of noise. Provided that the system is not insulated against such noise in the form of some kind of digital error correction, it is difficult to quantity the amount of information that leaks into the system via such noise.
This example can help us motivate the dual side of informational closure, which is computational closure. For a given abstraction, informational closure implies computational closure: There cannot be computations which both matter for the behavior of the abstract model and which are not part of the abstract model, except to the extent that the outputs of these computations are contained in the bounded environmental information channel.
This takes us back to our original claim that digital computation is a way of realizing a pure, abstract system in the real world. A common adage is that all models are wrong but some are useful. Many models are obviously nothing more than models (though it’s not uncommon for people to forget this in certain contexts). But informationally closed abstractions blur the line greatly. It’s not uncommon, or terribly invalid, to treat digital systems as an exception to the rule. Digital abstractions can be correct models of the thing they are trying to model.
Informational Closure and Verifiability
Above, we discussed the fact that information closure allows us to certify a statement such as the following:
- I put a digital system into state $\ell_0$ and ran it for $T$ steps (perhaps with information inputs $(i_t)_T$) and it reached state $\ell_T$ without using any information not contained in these inputs.
I can certify this in the sense that if I know that the system satisfies my abstract model, then I can know that it is true. But what if I want to allow someone else to trustlessly verify my claim?
It would be nice if I could just ask them to instantiate their own copy of the digital system, run it with the specified inputs, and check that it reaches the claimed outcome. Our formalism isn’t rich enough to say that this should be possible–we haven’t said that $i$ should be something that we can control or what that would even mean. But, as we all know, this is something that we can do with the digital systems that we all use.
A basic reason why this is possible with digital systems is that the informational content of the model, the model state, and the model inputs are all closed/finite. This makes possible procedures for reliably and exactly transferring this finite quantity of information and transforming it into a working implementation. On the other hand, if we tried to perform verification in this manner for a informationally-open system (like a human), we would obviously run into horrible difficulty.
Informational Closure and Intelligence
Up until now, there has never been a system which is both informationally closed and highly intelligent.
Humans are intelligent and not digital. Computers are digital but heretofore, not very intelligent. Now, we seem to have something that is both: Intelligence implemented on a digital computer. What do we make of this?
It would be surprising to me if the fact of informational closure wasn’t transformative for the way that we reason about an intelligence in certain contexts. On the other hand, I haven’t seen much discussion about the observation that the AI which has wrought happens to be digital.
I think that the reason for this is that, while the digital nature of AI certainly helps with much of the reasoning that we want to do about AI, there are many questions that still feel very difficult or intractable; basic questions of alignment are here.
On the other hand, if we restrict our focus to human-level (as opposed to vastly superhuman level) AI, and take a ceteris paribus stance on alignment questions–noting that alignment isn’t a fully solved problem for humans either–we might find that informational closure in itself is a highly useful and perhaps revolutionary property for an intelligence to have in certain contexts. The remainder of this post addresses this.
Informational closure in game theory
Many game-theoretic dilemmas are affected by the problem of information asymmetry, where the information available to individual participants is not all equal. Private information can easily translate to hidden actions, where a given participant is the only one who has complete information about their own actions.
Often the existence of information asymmetry prevents the game from reaching a desirable equilibrium. Two important examples of such breakdown are the problem of Moral Hazard in principal agent contexts and the problem of Adverse Selection in market contexts. We’ll focus on moral hazard for now, with a separate article devoted to the market problem.
The principal agent problem concerns the context of principal who wishes to have an agent act on their behalf and in their best interests; the “problem” is that in the presence of conflicts of interest and the absence of additional incentives or structure, the agent will tend to pursue their own interests instead of those of the principal. To address the principal agent problem is then to find the additional incentives or structure that will align the agent’s interests with the principal or otherwise induce them to act in the principal’s interests.
There are a few known ways to achieve the alignment required here:
Equity Sharing. When the principal’s interests consists of the cultivation of some material good or profit, a degree of alignment can be achieved by giving the agent a share in this outcome. The alignment here isn’t perfect, due in part to differences of magnitude. But equity sharing is still one of the best tools.
Enforceable Contracts. The bigger problem with equity sharing is that there is often no such material good that can be identified with the principal’s interest. In these cases, the agent can enter into a contract to adhere to certain standards in how it services the principal.
Very often the material content of such an agreement involves forbidden actions and sources of information:
- A public judge is prohibited from accepting bribes, or making decisions based on protected features.
- A government acquisitions officer is prohibited from accepting kick-backs from contractors.
- A consultant may be forbidden from sharing or acting on private information from one client with working with another one.
From monitoring to verifiable certification
In all of these situations, the contract is enforceable only to the extent that these actions can be detected. In practice, this is a very difficult monitoring problem and immense societal resources are expended in institutions that serve to detect such failures and police against them.
When we talk about detection here, what we are talking about is the process of taking a natural information asymmetry between a human principal and human agent and trying to patch it up. This is difficult because information asymmetries among humans are naturally quite large. It’s worth noting that part of the reason that information asymmetries are large is because organic intelligence is more or less evolved in order to enable a high amount of entanglement with local informational environments.
Now, the punch line: What if, for the first conceivable time in history, the agent is not an intelligent human system but an intelligent digital system? If we make this transformation, then the information gap suddenly plummets down to nothing.
As we’ve seen above, this enables a qualitatively different form of contract enforcement, which is not based on monitoring but on certification. Unlike a human agent, a digital agent can produce a verifiable certificate consisting of the following items:
- The set of informational inputs which were used in making a set of decisions.
- The procedure or algorithm which was used in making the decision (for example, this may be a deterministic implementation of LLM inference).
- The outcome of the decision.
In this case, ruling out a violation such as bribery is as simple as:
- Checking that no information about bribery is contained in the informational inputs.
- Checking that the decision procedure properly maps from the given inputs to the given outputs. (Note: We can often simplify the verification complexity for observers using formal verification techniques and notions of proof based on cryptographic assumptions / ZK proofs).
One might quip here that the first problem–checking that no information about bribery has been slipped into a set of informational inputs–could be a difficult one. But it’s not difficult to see how the same ideas that we’re playing with here are part of the solution to that general problem. The nature of digital intelligence means that the very process of constructing the intelligent agent is also informationally closed, and so its informational inputs (e.g. the training procedure, training data, etc.) be verified by a separate entity.