Let \(x\in\mathbb{R}^{n_x}, p\in\mathbb{R}^{n_p}\), a measure of merit \(f(x,p):\mathbb{R}^{n_x}\times\mathbb{R}^{n_p}\to\mathbb{R}\) and the relationship \(g(x,p)=0\) for a function \(g:\mathbb{R}^{n_x}\times\mathbb{R}^{n_p}\to\mathbb{R}^{n_x}\) nonsingular in everywhere. First \begin{align} d_pf &= d_pf(x(p)) = \partial_xfd_px \label{df} \end{align}
second, \begin{align} g(x,p) = 0 \Rightarrow d_pg =0 \nonumber \end{align} only because \(g =0\) everywhere. Now, expanding the total derivative \begin{align} g_xd_px+g_p &=0 \Rightarrow d_px = -g_x^{-1}g_p \nonumber \end{align} Substituting \(d_px\) in \(\eqref{df}\)
\begin{align} d_pf &= \underset{=\lambda}{\underbrace{-\partial_xfg_x^{-1}}}g_p \nonumber \end{align}
The term \(-\partial_xfg_x^{-1}\) is a row vector times a \(n_x\times n_p\) matrix and will be understood as the solution to the linear equation
\begin{align} -\partial_x f g_x^{-1} &= \lambda \nonumber \end{align} thus,
\begin{align} g_x^{\mathsf{T}} \lambda &= -\partial_x f \label{adjointEq} \end{align}
In terms of \(\lambda\) Eq. \(\eqref{df}\) turns to, \(d_p f = \lambda^{\mathsf{T}}g_p\).
Another derivation is useful. Define the Lagrangian
\begin{align} \mathcal{L}(x,p,\lambda) &= f(x) + \lambda^{\mathsf{T}}g \end{align} \begin{align} \mathcal{L}(x,p,\lambda) &= f(x) + \lambda^{\mathsf{T}}g(x,p) \nonumber \end{align}
As $g(x,p) = 0 $ in everywhere and we may chosse
If we choose
then we can avoid the need to compute
{% include image.html url="https://media.giphy.com/media/Q1aRmd8e90WIw/giphy.gif" description="It will be clear why we are doing this accounts." width="70" height="50" %}
In second approach the problem solver must
- evaluate
$f(x,p)$ - solve
$g(x,p)=0$ - compute the gradient
$d_p f$
Consider the problem $$ \begin{align} \underset{p}{\hbox{min}} ; F(x,p) &= \int^T_0 f(x,p,t)\label{mainOptProblem}\ \hbox{subject to,} &\begin{cases} h(x,\dot{x},p,t) =0 \ g(x(0),p) = 0 \end{cases}\nonumber \end{align} $$ where,
$p$ is vector of unknown parameters,$x$ is a vector valued function,$h(x,\dot{x}, p, t)=0$ is an ODE in implicit form, and$g(x(p),p) = 0$ is the initial conditions.
A gradiente-based optmization algorithm requires to calculate the total derivative
but, calculating
Let
be the Lagrangian corresponding to the optimization problem
where, the vector of Lagrange multipliers
$\lambda(t)$ is a function of time and$\mu$ is vector of multipliers associated with initial conditions.
So, one can computing
and this comes from the fact that
Integrating by parts the term with
$$ \begin{align} \int_0^T \lambda^{\mathsf{T}}\partial_{\dot{x}}h d_p\dot{x} dt &= \lambda^{\mathsf{T}}\partial_{\dot{x}}h d_px\bigg|{0}^T -\int_0^T(\dot{\lambda}^{\mathsf{T}}\partial{\dot{x}}h+\lambda^{\mathsf{T}}d_t \partial_{\dot{x}}h)d_px; dt\nonumber \end{align} $$
substituting in
$$ \begin{align} d_p\mathcal{L} = & \int_0^T [\partial_x f + \lambda^{\mathsf{T}}(\partial_x h -d_t \partial_{\dot{x}}h)-\dot{\lambda}^{\mathsf{T}}\partial_{\dot{x}}h]d_p x + \partial_{p}f + \lambda^{\mathsf{T}}\partial_p h; dt \nonumber\ & + \lambda^{\mathsf{T}}\partial_{\dot{x}}h d_p x\bigg|{T} + (-\lambda^{\mathsf{T}}\partial{\dot{x}}h + \mu^{\mathsf{T}}g_{x(0)})\bigg|_{0} d_p x(0) + \mu^{\mathsf{T}}g_p\nonumber \end{align} $$
Setting
$$ \begin{align} \mu^{\mathsf{T}}=\lambda^{\mathsf{T}}\partial_{\dot{x}}h\bigg|{0}g^{-1}{x(0)} \nonumber \end{align} $$
and
we can avoid to calculate
- Calculate
$\int_0^T h(x,\dot{x},p,t)=0$ with initials conditions$g(x(0),p)=0$ .- Calculate
$\eqref{avoidpx}$ for$\lambda$ from$t=T$ to$0$ , with$\lambda(T)=0$ .- Set
$$ \begin{align} d_p F :=& f_p + \lambda^{\mathsf{T}}\partial_p h ; dt + \lambda^{\mathsf{T}}\partial_{\dot{x}}h\bigg|{0}g^{-1}{x(0)}\partial_p g \nonumber \end{align} $$
[@bradley2010pde]: https://cs.stanford.edu/~ambrad/adjoint_tutorial.pdf "Bradley, Andrew M. "Pde-constrained optimization and the adjoint method." (2010)."