Skip to content

Commit

Permalink
Added details for state abstraction
Browse files Browse the repository at this point in the history
  • Loading branch information
adityam committed Jul 21, 2024
1 parent c278796 commit 5d9171f
Showing 1 changed file with 191 additions and 8 deletions.
199 changes: 191 additions & 8 deletions approx-mdps/model-approximation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ This gives the second bound.


Similar to the above, we can also bound the difference between the optimal
value function of the true and approximate models.
value functions of the true and approximate models.

:::{#prp-value-error}
#### Value error
Expand All @@ -149,7 +149,7 @@ M$ and $\widehat {\ALPHABET M}$, respectively. Then,
:::{.callout-note collapse="true"}
#### Proof {-}
The proof argument is almost the same as the proof argument for
@prp-policy-error. The first was is as follows:
@prp-policy-error. The first is as follows:
\begin{align}
\| V^{*} - \hat V^{*} \|_
&=
Expand Down Expand Up @@ -225,7 +225,7 @@ $\MISMATCH^{\hat π^*} \hat V^*$ and $\MISMATCH^*
$$
α \le \frac{2}{(1-γ)} \MISMATCH^{\max} \hat V^*.
$$
::::
:::

In some applications, it is useful to have a bound on model approximation error that depends on $V^*$ rather than $\hat V^*$. We provide such a bound below.

Expand Down Expand Up @@ -536,24 +536,207 @@ This bound precisely quantifies the engineering intuition that certainty equival

In the analysis above, we assumed that the model $\ALPHABET M$ and the approximate model $\widehat {\ALPHABET M}$ were defined on the same state space. Suppose that is not the case and the approximate model $\widehat {\ALPHABET M}$ is defined on a different state space $\hat {\ALPHABET S}$, i.e., $\widehat {\ALPHABET M} = (\hat {\ALPHABET S}, \ALPHABET A, \hat P, \hat c, γ)$.

In addition, suppose we are given a surjective function $φ \colon \ALPHABET S \to \hat {\ALPHABET S}$. We can use $φ$ to define a **lifting operator** $\mathcal L_{φ}$ that takes any function on $\hat {\ALPHABET S}$ and "lift" it to $\ALPHABET S$ as follows: for any function $\hat f$ on $\hat {\ALPHABET S}$, $ f = \mathcal L_{φ} \hat f$ is a function on $\ALPHABET S$ given by
$$ f(s) = \hat f(φ(s)), \quad \forall s \in \ALPHABET S. $$
In addition, suppose we are given a surjective function $φ \colon \ALPHABET S \to \hat {\ALPHABET S}$. We can use $φ$ to **lift** any function $\hat f$ defined on $\hat {\ALPHABET S}$ to a function defined on $\ALPHABET S$ given by $f = \hat f \circ φ$, i.e.,
$$f(s) = \hat f(φ(s)), \quad \forall s \in \ALPHABET S.$$
We can also use $φ$ to **project** any function $f$ defined on $\ALPHABET S$ to a function defined on $\ALPHABET S$. For this, we assume that we are given a measure $ν$ on $\ALPHABET S$ that has the following property:
$$
ν(φ^{-1}(\hat s)) = 1, \quad \forall \hat s \in \hat {\ALPHABET S}.
$$
In the sequel, we assume that the measure $ν$ is fixed and do not carry it in the notation. Given $ν$, and a function $f$ defined on $\ALPHABET S$, define
$$ \def\SQ{\mathbin{\square}}
(f \SQ φ)(\hat s) = \int_{s \in φ^{-1}(s)} f(s) ν(ds),
\quad \forall \hat s \in \hat {\ALPHABET S}.
$$

As before, let $V^*$ and $\hat V^*$ denote the optimal value functions of the true model $\ALPHABET M$ and the approximate model $\hat {\ALPHABET M}$, respectively. Moreover, let $π^*$ and $\hat π^*$ be optimal policies for the true model and the approximate model $\widehat {\ALPHABET M}$, respectively.

We are interested in the following questions:

2. **Value error bounds**: What is the error if $\mathcal L_{φ} \hat V^*$ is used as an approximation of $V^*$?
2. **Value error bounds**: What is the error if $\hat V^* \circ φ$ is used as an approximation of $V^*$?

3. **Model approximation error:** What is the error if the policy $\mathcal L_{φ} \hat π^*$ is used instead of the optimal policy $π^*$.
3. **Model approximation error:** What is the error if the policy $\hat π^* \circ φ$ is used instead of the optimal policy $π^*$.

In order to answer these questions, the following question on policy error bounds is useful:

1. **Policy error bounds:** Given a policy $\hat π$ for model $\widehat {\ALPHABET M}$, let $π = \mathcal L_{φ} \hat π$ denote the lifted policy for model $\ALPHABET M$. What is error if $\mathcal L_{φ} \hat V^{\hat π}$ is used as an approximation for $V^{π}$?
1. **Policy error bounds:** Given a policy $\hat π$ for model $\widehat {\ALPHABET M}$, what is error if $\hat V^{\hat π} \circ φ$ is used as an approximation for $V^{\hat π \circ φ}$?


### Policy and value error bounds

As before, we let $\BELLMAN^{π}$ and $\BELLMAN^*$ denote the Bellman operators for policy $π$ and optimality Bellman operator for model $\ALPHABET M$ and let $\hat {\BELLMAN}^{\hat π}$ and $\hat {\BELLMAN B}^*$ denote the corresponding quantifies for model $\hat {\ALPHABET M}$. Note that in this case the operators $\BELLMAN$ and $\hat {\BELLMAN}$ are defined over different spaces.

We now define two classes of _Bellman mismatch functions_:

* Functionals $\MISMATCH^{π}_{φ}, \MISMATCH^*_{φ} \colon [\ALPHABET S \to \reals] \to \reals$, defined as follows:
\begin{align*}
\MISMATCH^{π}_{φ}v &= \NORM{ (\BELLMAN^{π} v) \SQ φ - (\hat {\BELLMAN}^{π\SQ φ}(v \SQ φ)}_{∞}
\\
\MISMATCH^*_{φ} v &= \NORM{ (\BELLMAN^* v) \SQ φ - \hat {\BELLMAN}^* (v \SQ φ) }_{∞}
\end{align*}

* Functionals $\hat \MISMATCH^{\hat π}_{φ}, \hat \MISMATCH^*_{φ} \colon [\hat {\BELLMAN} \to \reals] \to \reals$ defined as follows:
\begin{align*}
\hat \MISMATCH^{\hat π}_{φ}\hat v &= \NORM{ \BELLMAN^{\hat π \circ φ}(\hat v \circ φ) - (\hat {\BELLMAN}^{\hat π} \hat v) \circ φ }_{∞}
\\
\hat \MISMATCH^*_{φ} \hat v &= \NORM{ \BELLMAN^*(\hat v \circ φ) - (\hat {\BELLMAN}^* \hat v) \circ φ }_{∞}
\end{align*}

The Bellman mismatch functionals can be used to bound the performance difference of a policy between the true and the approximate model.

:::{#prp-policy-error-abstract}
#### Policy error

For any (possibly randomized) policy $π$ in $\ALPHABET M$ and $\hat π$ in $\hat {\ALPHABET M}$, we have
\begin{align*}
\NORM{V^π \SQ φ - \hat V^{π \SQ φ}}_{∞} &\le \frac{1}{1-γ} \MISMATCH^{π}_{φ} V^{π}, \\
\NORM{V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ}_{∞} &\le \frac{1}{1-γ} \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}.
\end{align*}
:::

:::{.callout-note collapse="true"}
#### Proof

The proof is similar to the proof of @prp-policy-error. The first bound is obtained as follows:
\begin{align}
\| V^{π} \SQ φ - \hat V^{π \SQ φ} \|_
&=
\| (\BELLMAN^π V^π) \SQ φ - \hat {\ALPHABET B}^{π \SQ φ} \hat V^{π \SQ φ} \|_
\notag \\
&\le
\| (\BELLMAN^π V^π) \SQ φ - \hat {\ALPHABET B}^{π\SQ φ} (V^{π} \SQ φ) \|_
\notag \\
& \quad +
\| \hat {\BELLMAN}^{π\SQ φ} (V^π \SQ φ) - \hat {\ALPHABET B}^{π \SQ φ} \hat V^{π\SQ φ} \|_
\notag \\
&\le
\MISMATCH^π_{φ} V^π + γ \| V^π \SQ φ - \hat V^π \|_
\label{eq:ineq-3-abstract}
\end{align}
where the first inequality follows from the triangle inequality, and the
second inequality follows from the definition of the Bellman mismatch functional
and the contraction property of Bellman operators. Rearranging terms
in \\eqref{eq:ineq-3-abstract} gives us
\begin{equation}
\| V^{π} \SQ φ - \hat V^{π} \|_∞ \le \frac{ \MISMATCH^π_{φ} V^{π}}{1 - γ}.
\label{eq:ineq-4-abstract}\end{equation}
This gives the first bound.

The second bound is obtained as follows
\begin{align}
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_
&=
\| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ} - (\hat {\ALPHABET B}^{\hat π} \hat V^{\hat π}) \circ φ \|_
\notag \\
&\le
\| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ}
- \BELLMAN^{\hat π \circ φ}(\hat V^{\hat π} \circ φ) \|_{∞}
\notag \\
& \quad +
\| \BELLMAN^{\hat π \circ φ}(\hat V^{\hat π} \circ φ) \|_{∞}
- (\hat {\ALPHABET B}^{\hat π} \hat V^{\hat π}) \circ φ \|_
\notag \\
&\le
γ \| V^{\hat π \circ φ} - \hat V^\hat π \circ φ \|_
+
\MISMATCH^{\hat π}_{φ} \hat V^π
\label{eq:ineq-13-abstract}
\end{align}
Rearranging terms in \\eqref{eq:ineq-13-abstract} gives
us
$$\begin{equation}
\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ \le \frac{ \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}}{1 - γ}.
\label{eq:ineq-14-abstract}\end{equation}$$
This gives the second bound.
:::


Similar to the above, we can also bound the difference between the optimal value functions of the true and approximate model.

:::{#prp-value-error-abstract}
#### Value error

Let $V^*$ and $\hat V^*$ denote the optimal value functions for $\ALPHABET M$ and $\hat {\ALPHABET M}$ respectively. Then,
\begin{align*}
\NORM{V^* \SQ φ - \hat V^*}_{∞} &\le \frac{1}{1-γ} \MISMATCH^*_{φ} V^* \\
\NORM{V^* - \hat V^* \circ φ}_{∞} &\le \frac{1}{1-γ} \hat \MISMATCH^*_{φ} \hat V^*
\end{align*}
:::

:::{.callout-note collapse="true"}
#### Proof

The proof argument is similar to the proof of @prp-value-error.
The first bound is obtained as follows:
\begin{align}
\| V^{*} \SQ φ - \hat V^{*} \|_
&=
\| (\BELLMAN^* V^*) \SQ φ - \hat {\BELLMAN}^* \hat V^* \|_
\notag \\
&\le
\| (\BELLMAN^* V^*) \SQ φ - \hat {\BELLMAN}^*(V^* \SQ φ) \|_
+
\| \hat {\BELLMAN}^*(V^* \SQ φ) - \hat {\BELLMAN}^* \hat V^* \|_
\notag \\
&\le
\MISMATCH^*_{φ} V^* + γ \| V^* \SQ φ - \hat V^* \|_
\label{eq:ineq-1-abstract}
\end{align}
where the first inequality follows from the triangle inequality, and the
second inequality follows from the definition of the Bellman mismatch functional
and the contraction property of Bellman operators. Rearranging terms
in \\eqref{eq:ineq-1-abstract} gives us
\begin{equation}
\| V^* \SQ - \hat V^* \|_∞ \le \frac{ \MISMATCH^*_{φ} V^*}{1 - γ}.
\label{eq:ineq-2-abstract}\end{equation}
This gives the first bound.

The second bound is obtained as follows:
and $\hat V^*$.
\begin{align}
\| V^{*} - \hat V^{*} \circ φ \|_
&=
\| \BELLMAN^* V^* - (\hat {\ALPHABET B}^* \hat V^*) \circ φ \|_
\notag \\
&\le
\| \BELLMAN^* V^* - \ALPHABET B^* (\hat V^* \circ φ) \|_
+
\| \BELLMAN^* (\hat V^* \circ φ) - (\hat {\ALPHABET B}^* \hat V^*) \circ φ \|_
\notag \\
&\le
γ \| V^* - \hat V^* \circ φ \|_
+
\hat \MISMATCH^*_{φ} \hat V^*
\label{eq:ineq-11-abstract}
\end{align}
Rearranging terms in \\eqref{eq:ineq-11-abstract} gives us
\begin{equation}
\| V^{*} - \hat V^{*} \|_∞ \le \frac{ \hat \MISMATCH^*_{φ} \hat V^{*}}{1 - γ}.
\label{eq:ineq-12-abstract}\end{equation}
This gives the second bound.
:::

### Model approximation error.

Recall that we can split the model error using triangle inequality as in \eqref{eq:triangle-1}, which we repeat here for convenience.
\begin{equation*}
\| V^* - V^{\hat π^*} \|_∞ \le
\| V^* - \hat V^{\hat π^*} \|_
+
\| V^{\hat π^*} - \hat V^{\hat π^*} \|_∞.
\end{equation*}

@prp-policy-error-abstract and @prp-value-error-abstract provide bounds for both the terms, which immediately gives us the following.

:::{#thm-model-error-hat-V-star-abstract}
#### Model approximation error

The policy $\hat π^*$ is an $α$-optimal policy of $\ALPHABET M$ where
$$
α := \| V^* - V^{\hat π^* \circ} \|_∞ \le
\frac{1}{1-γ} \bigl[ \MISMATCH^*_{φ} \hat V^* + \MISMATCH^{\hat
π^*}_{φ} \hat V^* \bigr].
$$
:::

## Notes {-}

Expand Down

0 comments on commit 5d9171f

Please sign in to comment.