Add link to model approximation

adityam · Feb 6, 2024 · cb04045 · cb04045
1 parent 2794469
commit cb04045
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 5 deletions.
diff --git a/approx-mdps/model-approximation.qmd b/approx-mdps/model-approximation.qmd
@@ -516,12 +516,12 @@ The per-step cost is the same as before.
 Let $\ALPHABET M$ denote the stochastic model and $\hat {\ALPHABET M}$ denote the deterministic model. Then, the certainty equivalent design is to use the control policy $\hat \pi^*$ in original stochastic model $\ALPHABET M$. We use the Wasserstein distance based bounds in @cor-model-error-instance-independent to bound $\NORM{V^{\hat \pi^*} - V^*}_{∞}$. We assume that there is some norm $\| \cdot \|$ on $\reals^n$ and the Wasserstein distance and Lipschitz constant are computed with respect to this norm.
 
 Since the costs are the same for both models, $ε = 0$. We now characterize $\delta$. For ease of notation, given random variables $X$ and $Y$ with probability laws $\nu_X$ and $\nu_Y$, we will use $\ALPHABET K(X,Y)$ to denote $\ALPHABET K(\nu_X, \nu_Y)$. 
-Recall the Kantorovich-Rubinstein inequality @Villani2008, which states that 
+Recall that Wasserstein distance is defined as [@Villani2008] 
 \begin{equation}\label{eq:Kantorovich}
     \ALPHABET K(\nu_X, \nu_Y) = \inf_{ \substack{ \tilde X \sim \nu_X \\ \tilde Y \sim \nu_Y} }
-    \EXP[ \| \tilde X - \tilde Y \| ]
+    \EXP[ \| \tilde X - \tilde Y \| ].
 \end{equation}
-Now, for a fixed $(s,a)$, define $X = f(s,a) + N$, where $N \sim \nu_N$, and $Y = f(s,a)$. Then, the Wasserstein distance between $P(\cdot | s,a)$ and $\hat P(\cdot | s,a)$ is equal to $\ALPHABET K(X,Y)$, which by Kantorovich-Rubinstein inequality \eqref{eq:Kantorovich}, equals $\EXP[\| N \|]$, which does not depend on $(s,a)$. Thus, 
+Now, for a fixed $(s,a)$, define $X = f(s,a) + N$, where $N \sim \nu_N$, and $Y = f(s,a)$. Then, the Wasserstein distance between $P(\cdot | s,a)$ and $\hat P(\cdot | s,a)$ is equal to $\ALPHABET K(X,Y)$, which by \eqref{eq:Kantorovich} equals $\EXP[\| N \|]$, which does not depend on $(s,a)$. Thus, 
 $$
   δ = \EXP[\NORM{N}].
 $$

diff --git a/mdps/lipschitz-mdps.qmd b/mdps/lipschitz-mdps.qmd
@@ -440,14 +440,17 @@ As discussed in @exm-lipschitz-inventory, the inventory management example is $(
 $$
   L_V = \frac{p + \max\{ c_h + c_s \}}{1 - γ}.
 $$
+
+Later, in the notes on [model approximation], we show that the bound on the Lipschitz constant is useful to understand the approximation error if we use a policy designed for a model with a slightly different demand distribution.
+
+[model approximation]: ../approx-mdps/model-approximation.qmd#example-inventory
+
 To understand the tightness of this bound, we consider a specific instance of inventory management problem where the demand is $\text{Exp}(1)$, $c_h = 2$, $c_s = 5$, and $p = 1$. The theoretical maximum value of the Lipschitz constant (for $γ = 0.9$) is
 $L_V = 60$. In @fig-lipschitz-animation, we show the animation of this upper bound, in the style of the wikipedia animation shown at the beginning of this lecture. 
 
 {{< embed ../julia-examples/inventory-management/inventory-management.ipynb#fig-lipschitz-animation >}}
 
 Note that since the demand is $\text{Exp}(1)$, most of the mass of the demand is in the range $[0,10]$. So, the region of the value function of interest is perhaps $[-20,20]$ or so. We plot a larger region to highlight the fact that the bound on the Lipschitz constant has to capture the Lipschitz constant of the value function over the entire real line.
-
-
 :::
 
 
@@ -593,4 +596,6 @@ Let $(\ALPHABET S, d_S)$ be a metric space and $s, s' \in \ALPHABET S$.
 The material in this section is taken from @Rachelson2010 and @Hinderer2005.
 
 The proof of Lipschitz continuity for the inventory management problem in @exm-lipschitz-inventory is adapted from @Muller1997b.
+Later, in the notes on [model approximation], we show that the bound on the Lipschitz constant is useful to understand the approximation error if we use a policy designed for a model with a slightly different demand distribution.
+