Added details for state abstraction

adityam · Jul 21, 2024 · 5d9171f · 5d9171f
1 parent c278796
commit 5d9171f
Showing 1 changed file with 191 additions and 8 deletions.
diff --git a/approx-mdps/model-approximation.qmd b/approx-mdps/model-approximation.qmd
@@ -133,7 +133,7 @@ This gives the second bound.
 
 
 Similar to the above, we can also bound the difference between the optimal
-value function of the true and approximate models. 
+value functions of the true and approximate models. 
 
 :::{#prp-value-error}
 #### Value error
@@ -149,7 +149,7 @@ M$ and $\widehat {\ALPHABET M}$, respectively. Then,
 :::{.callout-note collapse="true"} 
 #### Proof {-}
 The proof argument is almost the same as the proof argument for
-@prp-policy-error. The first was is as follows:
+@prp-policy-error. The first is as follows:
 \begin{align}
   \| V^{*} - \hat V^{*} \|_∞ 
   &=
@@ -225,7 +225,7 @@ $\MISMATCH^{\hat π^*} \hat V^*$ and $\MISMATCH^*
 $$
     α \le \frac{2}{(1-γ)}  \MISMATCH^{\max}  \hat V^*. 
 $$
-::::
+:::
 
 In some applications, it is useful to have a bound on model approximation error that depends on $V^*$ rather than $\hat V^*$. We provide such a bound below.
 
@@ -536,24 +536,207 @@ This bound precisely quantifies the engineering intuition that certainty equival
 
 In the analysis above, we assumed that the model $\ALPHABET M$ and the approximate model $\widehat {\ALPHABET M}$ were defined on the same state space. Suppose that is not the case and the approximate model $\widehat {\ALPHABET M}$ is defined on a different state space $\hat {\ALPHABET S}$, i.e., $\widehat {\ALPHABET M} = (\hat {\ALPHABET S}, \ALPHABET A, \hat P, \hat c, γ)$. 
 
-In addition, suppose we are given a surjective function $φ \colon \ALPHABET S \to \hat {\ALPHABET S}$. We can use $φ$ to define a **lifting operator** $\mathcal L_{φ}$ that takes any function on $\hat {\ALPHABET S}$ and "lift" it to $\ALPHABET S$ as follows: for any function $\hat f$ on $\hat {\ALPHABET S}$, $ f = \mathcal L_{φ} \hat f$ is a function on $\ALPHABET S$ given by
-$$ f(s) = \hat f(φ(s)), \quad \forall s \in \ALPHABET S. $$
+In addition, suppose we are given a surjective function $φ \colon \ALPHABET S \to \hat {\ALPHABET S}$. We can use $φ$ to **lift** any function $\hat f$ defined on $\hat {\ALPHABET S}$ to a function defined on $\ALPHABET S$ given by $f = \hat f \circ φ$, i.e.,
+$$f(s) = \hat f(φ(s)), \quad \forall s \in \ALPHABET  S.$$ 
+We can also use $φ$ to **project** any function $f$ defined on $\ALPHABET S$ to a function defined on $\ALPHABET S$. For this, we assume that we are given a measure $ν$ on $\ALPHABET S$ that has the following property:
+$$
+  ν(φ^{-1}(\hat s)) = 1, \quad \forall \hat s \in \hat {\ALPHABET S}.
+$$
+In the sequel, we assume that the measure $ν$ is fixed and do not carry it in the notation. Given $ν$, and a function $f$ defined on $\ALPHABET S$, define
+$$ \def\SQ{\mathbin{\square}}
+  (f \SQ φ)(\hat s) = \int_{s \in φ^{-1}(s)} f(s) ν(ds),
+  \quad \forall \hat s \in \hat {\ALPHABET S}.
+$$
 
 As before, let $V^*$ and $\hat V^*$ denote the optimal value functions of the true model $\ALPHABET M$ and the approximate model $\hat {\ALPHABET M}$, respectively. Moreover, let $π^*$ and $\hat π^*$ be optimal policies for the true model and the approximate model $\widehat {\ALPHABET M}$, respectively.
 
 We are interested in the following questions:
 
-2. **Value error bounds**: What is the error if $\mathcal L_{φ} \hat V^*$ is used as an approximation of $V^*$?
+2. **Value error bounds**: What is the error if $\hat V^* \circ φ$ is used as an approximation of $V^*$?
 
-3. **Model approximation error:** What is the error if the policy $\mathcal L_{φ} \hat π^*$ is used instead of the optimal policy $π^*$.
+3. **Model approximation error:** What is the error if the policy $\hat π^* \circ φ$ is used instead of the optimal policy $π^*$.
 
 In order to answer these questions, the following question on policy error bounds is useful:
 
-1. **Policy error bounds:** Given a policy $\hat π$ for model $\widehat {\ALPHABET M}$, let $π = \mathcal L_{φ} \hat π$ denote the lifted policy for model $\ALPHABET M$. What is error if $\mathcal L_{φ} \hat V^{\hat π}$ is used as an approximation for $V^{π}$?
+1. **Policy error bounds:** Given a policy $\hat π$ for model $\widehat {\ALPHABET M}$, what is error if $\hat V^{\hat π} \circ φ$ is used as an approximation for $V^{\hat π \circ φ}$?
+
+
+### Policy and value error bounds
+
+As before, we let $\BELLMAN^{π}$ and $\BELLMAN^*$ denote the Bellman operators for policy $π$ and optimality Bellman operator for model $\ALPHABET M$ and let $\hat {\BELLMAN}^{\hat π}$ and $\hat {\BELLMAN B}^*$ denote the corresponding quantifies for model $\hat {\ALPHABET M}$. Note that in this case the operators $\BELLMAN$ and $\hat {\BELLMAN}$ are defined over different spaces. 
+
+We now define two classes of _Bellman mismatch functions_: 
+
+* Functionals $\MISMATCH^{π}_{φ}, \MISMATCH^*_{φ} \colon [\ALPHABET S \to \reals] \to \reals$, defined as follows:
+  \begin{align*}
+     \MISMATCH^{π}_{φ}v &= \NORM{ (\BELLMAN^{π} v) \SQ φ - (\hat {\BELLMAN}^{π\SQ φ}(v \SQ φ)}_{∞}
+     \\
+     \MISMATCH^*_{φ} v &= \NORM{ (\BELLMAN^* v) \SQ φ - \hat {\BELLMAN}^* (v \SQ φ) }_{∞}
+  \end{align*}
+
+* Functionals $\hat \MISMATCH^{\hat π}_{φ}, \hat \MISMATCH^*_{φ} \colon [\hat {\BELLMAN} \to \reals] \to \reals$ defined as follows:
+  \begin{align*}
+     \hat \MISMATCH^{\hat π}_{φ}\hat v &= \NORM{ \BELLMAN^{\hat π \circ φ}(\hat v \circ φ) - (\hat {\BELLMAN}^{\hat π} \hat v) \circ φ }_{∞}
+     \\
+     \hat \MISMATCH^*_{φ} \hat v &= \NORM{ \BELLMAN^*(\hat v \circ φ) - (\hat {\BELLMAN}^* \hat v) \circ φ }_{∞}
+  \end{align*}
+
+The Bellman mismatch functionals can be used to bound the performance difference of a policy between the true and the approximate model.
+
+:::{#prp-policy-error-abstract}
+#### Policy error
+
+For any (possibly randomized) policy $π$ in $\ALPHABET M$ and $\hat π$ in $\hat {\ALPHABET M}$, we have
+\begin{align*}
+  \NORM{V^π \SQ φ - \hat V^{π \SQ φ}}_{∞} &\le \frac{1}{1-γ} \MISMATCH^{π}_{φ} V^{π}, \\
+  \NORM{V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ}_{∞} &\le \frac{1}{1-γ} \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}.
+\end{align*}
+:::
+
+:::{.callout-note collapse="true"}
+#### Proof
+
+The proof is similar to the proof of @prp-policy-error. The first bound is obtained as follows:
+\begin{align}
+  \| V^{π} \SQ φ - \hat V^{π \SQ φ} \|_∞ 
+  &=
+  \| (\BELLMAN^π V^π) \SQ φ - \hat {\ALPHABET  B}^{π \SQ φ} \hat V^{π \SQ φ} \|_∞ 
+  \notag \\
+  &\le
+  \| (\BELLMAN^π V^π) \SQ φ - \hat {\ALPHABET  B}^{π\SQ φ} (V^{π} \SQ φ) \|_∞ 
+  \notag \\
+  & \quad + 
+  \| \hat {\BELLMAN}^{π\SQ φ} (V^π \SQ φ) - \hat {\ALPHABET  B}^{π \SQ φ} \hat V^{π\SQ φ} \|_∞ 
+  \notag \\
+  &\le
+  \MISMATCH^π_{φ} V^π + γ \| V^π \SQ φ - \hat V^π \|_∞
+  \label{eq:ineq-3-abstract}
+\end{align}
+where the first inequality follows from the triangle inequality, and the
+second inequality follows from the definition of the Bellman mismatch functional
+and the contraction property of Bellman operators. Rearranging terms
+in \\eqref{eq:ineq-3-abstract} gives us
+\begin{equation}
+\| V^{π} \SQ φ - \hat V^{π} \|_∞ \le \frac{ \MISMATCH^π_{φ} V^{π}}{1 - γ}.
+\label{eq:ineq-4-abstract}\end{equation}
+This gives the first bound.
 
+The second bound is obtained as follows
+\begin{align}
+  \| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ 
+  &=
+  \| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ} - (\hat {\ALPHABET  B}^{\hat π} \hat V^{\hat π}) \circ φ \|_∞ 
+  \notag \\
+  &\le
+  \| \BELLMAN^{\hat π \circ φ} V^{\hat π \circ φ}  
+   - \BELLMAN^{\hat π \circ φ}(\hat V^{\hat π} \circ φ) \|_{∞}
+  \notag \\
+  & \quad + 
+  \| \BELLMAN^{\hat π \circ φ}(\hat V^{\hat π} \circ φ) \|_{∞}
+  - (\hat {\ALPHABET  B}^{\hat π} \hat V^{\hat π}) \circ φ \|_∞ 
+  \notag \\
+  &\le
+  γ \| V^{\hat π \circ φ} - \hat V^\hat π \circ φ \|_∞
+  +
+  \MISMATCH^{\hat π}_{φ} \hat V^π 
+  \label{eq:ineq-13-abstract}
+\end{align}
+Rearranging terms in \\eqref{eq:ineq-13-abstract} gives
+us
+$$\begin{equation}
+\| V^{\hat π \circ φ} - \hat V^{\hat π} \circ φ \|_∞ \le \frac{ \MISMATCH^{\hat π}_{φ} \hat V^{\hat π}}{1 - γ}.
+\label{eq:ineq-14-abstract}\end{equation}$$
+This gives the second bound.
+:::
 
 
+Similar to the above, we can also bound the difference between the optimal value functions of the true and approximate model.
 
+:::{#prp-value-error-abstract}
+#### Value error
+
+  Let $V^*$ and $\hat V^*$ denote the optimal value functions for $\ALPHABET M$ and $\hat {\ALPHABET M}$ respectively. Then,
+  \begin{align*}
+  \NORM{V^* \SQ φ - \hat V^*}_{∞} &\le \frac{1}{1-γ} \MISMATCH^*_{φ} V^* \\
+  \NORM{V^* - \hat V^* \circ φ}_{∞} &\le \frac{1}{1-γ} \hat \MISMATCH^*_{φ} \hat V^* 
+  \end{align*}
+:::
+
+:::{.callout-note collapse="true"} 
+#### Proof
+
+The proof argument is similar to the proof of @prp-value-error. 
+The first bound is obtained as follows:
+\begin{align}
+  \| V^{*} \SQ φ - \hat V^{*} \|_∞ 
+  &=
+  \| (\BELLMAN^* V^*) \SQ φ - \hat {\BELLMAN}^* \hat V^* \|_∞ 
+  \notag \\
+  &\le
+  \| (\BELLMAN^* V^*) \SQ φ - \hat {\BELLMAN}^*(V^* \SQ φ)  \|_∞ 
+  + 
+  \| \hat {\BELLMAN}^*(V^* \SQ φ) - \hat {\BELLMAN}^* \hat V^* \|_∞ 
+  \notag \\
+  &\le
+  \MISMATCH^*_{φ} V^* + γ \| V^* \SQ φ - \hat V^* \|_∞
+  \label{eq:ineq-1-abstract}
+\end{align}
+where the first inequality follows from the triangle inequality, and the
+second inequality follows from the definition of the Bellman mismatch functional
+and the contraction property of Bellman operators. Rearranging terms
+in \\eqref{eq:ineq-1-abstract} gives us
+\begin{equation}
+\| V^* \SQ - \hat V^* \|_∞ \le \frac{  \MISMATCH^*_{φ} V^*}{1 - γ}.
+\label{eq:ineq-2-abstract}\end{equation}
+This gives the first bound.
+
+The second bound is obtained as follows:
+and $\hat V^*$.
+\begin{align}
+  \| V^{*} - \hat V^{*} \circ φ \|_∞ 
+  &=
+  \| \BELLMAN^* V^* - (\hat {\ALPHABET  B}^* \hat V^*) \circ φ \|_∞ 
+  \notag \\
+  &\le
+  \| \BELLMAN^* V^* - \ALPHABET  B^* (\hat V^* \circ φ) \|_∞ 
+  + 
+  \| \BELLMAN^* (\hat V^* \circ φ) - (\hat {\ALPHABET  B}^* \hat V^*) \circ φ \|_∞ 
+  \notag \\
+  &\le
+  γ \| V^* - \hat V^* \circ φ \|_∞
+  +
+  \hat \MISMATCH^*_{φ} \hat V^* 
+  \label{eq:ineq-11-abstract}
+\end{align}
+Rearranging terms in \\eqref{eq:ineq-11-abstract} gives us
+\begin{equation}
+\| V^{*} - \hat V^{*} \|_∞ \le \frac{ \hat \MISMATCH^*_{φ} \hat V^{*}}{1 - γ}.
+\label{eq:ineq-12-abstract}\end{equation}
+This gives the second bound.
+:::
+
+### Model approximation error.
+
+Recall that we can split the model error using triangle inequality as in \eqref{eq:triangle-1}, which we repeat here for convenience.
+\begin{equation*} 
+  \| V^* - V^{\hat π^*} \|_∞ \le
+  \| V^* - \hat V^{\hat π^*} \|_∞ 
+  + 
+  \| V^{\hat π^*} - \hat V^{\hat π^*} \|_∞.
+\end{equation*}
+
+@prp-policy-error-abstract and @prp-value-error-abstract provide bounds for both the terms, which immediately gives us the following.
+
+:::{#thm-model-error-hat-V-star-abstract}
+#### Model approximation error
+
+The policy $\hat π^*$ is an $α$-optimal policy of $\ALPHABET M$ where
+$$
+    α := \| V^* - V^{\hat π^* \circ} \|_∞ \le
+    \frac{1}{1-γ} \bigl[ \MISMATCH^*_{φ} \hat V^* + \MISMATCH^{\hat
+    π^*}_{φ} \hat V^* \bigr]. 
+$$
+:::
 
 ## Notes {-}