Skip to content

Commit

Permalink
Add link to model approximation
Browse files Browse the repository at this point in the history
  • Loading branch information
adityam committed Feb 6, 2024
1 parent 2794469 commit cb04045
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 5 deletions.
6 changes: 3 additions & 3 deletions approx-mdps/model-approximation.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -516,12 +516,12 @@ The per-step cost is the same as before.
Let $\ALPHABET M$ denote the stochastic model and $\hat {\ALPHABET M}$ denote the deterministic model. Then, the certainty equivalent design is to use the control policy $\hat \pi^*$ in original stochastic model $\ALPHABET M$. We use the Wasserstein distance based bounds in @cor-model-error-instance-independent to bound $\NORM{V^{\hat \pi^*} - V^*}_{∞}$. We assume that there is some norm $\| \cdot \|$ on $\reals^n$ and the Wasserstein distance and Lipschitz constant are computed with respect to this norm.

Since the costs are the same for both models, $ε = 0$. We now characterize $\delta$. For ease of notation, given random variables $X$ and $Y$ with probability laws $\nu_X$ and $\nu_Y$, we will use $\ALPHABET K(X,Y)$ to denote $\ALPHABET K(\nu_X, \nu_Y)$.
Recall the Kantorovich-Rubinstein inequality @Villani2008, which states that
Recall that Wasserstein distance is defined as [@Villani2008]
\begin{equation}\label{eq:Kantorovich}
\ALPHABET K(\nu_X, \nu_Y) = \inf_{ \substack{ \tilde X \sim \nu_X \\ \tilde Y \sim \nu_Y} }
\EXP[ \| \tilde X - \tilde Y \| ]
\EXP[ \| \tilde X - \tilde Y \| ].
\end{equation}
Now, for a fixed $(s,a)$, define $X = f(s,a) + N$, where $N \sim \nu_N$, and $Y = f(s,a)$. Then, the Wasserstein distance between $P(\cdot | s,a)$ and $\hat P(\cdot | s,a)$ is equal to $\ALPHABET K(X,Y)$, which by Kantorovich-Rubinstein inequality \eqref{eq:Kantorovich}, equals $\EXP[\| N \|]$, which does not depend on $(s,a)$. Thus,
Now, for a fixed $(s,a)$, define $X = f(s,a) + N$, where $N \sim \nu_N$, and $Y = f(s,a)$. Then, the Wasserstein distance between $P(\cdot | s,a)$ and $\hat P(\cdot | s,a)$ is equal to $\ALPHABET K(X,Y)$, which by \eqref{eq:Kantorovich} equals $\EXP[\| N \|]$, which does not depend on $(s,a)$. Thus,
$$
δ = \EXP[\NORM{N}].
$$
Expand Down
9 changes: 7 additions & 2 deletions mdps/lipschitz-mdps.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -440,14 +440,17 @@ As discussed in @exm-lipschitz-inventory, the inventory management example is $(
$$
L_V = \frac{p + \max\{ c_h + c_s \}}{1 - γ}.
$$

Later, in the notes on [model approximation], we show that the bound on the Lipschitz constant is useful to understand the approximation error if we use a policy designed for a model with a slightly different demand distribution.

[model approximation]: ../approx-mdps/model-approximation.qmd#example-inventory

To understand the tightness of this bound, we consider a specific instance of inventory management problem where the demand is $\text{Exp}(1)$, $c_h = 2$, $c_s = 5$, and $p = 1$. The theoretical maximum value of the Lipschitz constant (for $γ = 0.9$) is
$L_V = 60$. In @fig-lipschitz-animation, we show the animation of this upper bound, in the style of the wikipedia animation shown at the beginning of this lecture.

{{< embed ../julia-examples/inventory-management/inventory-management.ipynb#fig-lipschitz-animation >}}

Note that since the demand is $\text{Exp}(1)$, most of the mass of the demand is in the range $[0,10]$. So, the region of the value function of interest is perhaps $[-20,20]$ or so. We plot a larger region to highlight the fact that the bound on the Lipschitz constant has to capture the Lipschitz constant of the value function over the entire real line.


:::


Expand Down Expand Up @@ -593,4 +596,6 @@ Let $(\ALPHABET S, d_S)$ be a metric space and $s, s' \in \ALPHABET S$.
The material in this section is taken from @Rachelson2010 and @Hinderer2005.

The proof of Lipschitz continuity for the inventory management problem in @exm-lipschitz-inventory is adapted from @Muller1997b.
Later, in the notes on [model approximation], we show that the bound on the Lipschitz constant is useful to understand the approximation error if we use a policy designed for a model with a slightly different demand distribution.


0 comments on commit cb04045

Please sign in to comment.