-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathSpecial_Review_Session.tex
1822 lines (1443 loc) · 79.7 KB
/
Special_Review_Session.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[12pt]{article}
\usepackage{fullpage,marginnote,caption}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{graphicx}
\usepackage{framed}
\usepackage{hyperref}
\setlength{\parskip}{\baselineskip}%
\setlength{\parindent}{0pt}%
\title{EE 127/227A Study Guide / Tips and Tricks}
\author{Joshua Achiam}
\begin{document}
\input{exercises_def.tex}
\newtheorem{example}{Example}
\theoremstyle{remark}
\newtheorem*{myremark}{Remark}
\newcommand{\Span}[1]{\text{span}(\{#1\})}
\maketitle
\section{Vector Concepts}
\subsection{Linear Independence}
A set of vectors $\{u_1, ..., u_n\}$, with $u_i \in \Real{m}$ for all $i$, is \textbf{linearly independent} if and only if no $u_i$ can be expressed as a linear combination of the other vectors in the set.
One way to check for linear independences is as follows: consider the sum
%
\begin{equation*}
y = \sum_{i=1}^n \alpha_i u_i, \;\; \forall i, \alpha_i \in \Real{}.
\end{equation*}
%
First, note that $y$ is a vector in $\Real{m}$. If we can pick any coefficients $\alpha_i \neq 0$ and obtain $y = 0$, then the set is \textbf{not linearly independent}. When vectors are not linearly independent, we call them \textbf{linearly dependent}. To state that symbolically:
%
\begin{equation*}
\exists \{\alpha_1, ..., \alpha_n\} \text{ not all equal to 0 } : \sum_{i=1}^n \alpha_i u_i = 0 \;\;\; \Longleftrightarrow \;\;\; \text{ the vectors } u_1, ..., u_n \text{ are linearly dependent.}
\end{equation*}
\begin{example}Consider the vectors
%
\begin{equation*}
u_1 = \left[ \begin{matrix}
1 \\ 0 \\ 0
\end{matrix}\right], \;\;\; u_2 = \left[ \begin{matrix}
0 \\ 1 \\ 0
\end{matrix}\right], \;\;\; u_3 = \left[ \begin{matrix}
1 \\ 1 \\ 0
\end{matrix}\right].
\end{equation*}
%
The set of vectors $\{u_1, u_2, u_3\}$ are \textbf{linearly dependent} because we can write
%
\begin{equation*}
u_3 = u_1 + u_2;
\end{equation*}
%
that is, we can write $0 = u_1 + u_2 - u_3$.
The set of vectors $\{u_1, u_2\}$ is \textbf{linearly independent} because the only way to get $0 = \alpha_1 u_1 + \alpha_2 u_2$ is if we pick $\alpha_1 = \alpha_2 = 0$.
\end{example}
\subsection{Orthogonality and Orthonormality}
Two vectors $x,y$ are said to be \textbf{orthogonal} if their inner product is zero: $x^T y = 0$.
A set of vectors $\{u_1,..., u_n\}$ is said to be orthogonal if $\forall i \neq j, u_i^T u_j = 0$.
A set of vectors $\{u_1, ..., u_n\}$ is said to be \textbf{orthonormal} if it is orthogonal, and the vectors all have unit magnitude---that is, if
%
\begin{equation*}
u_i^T u_j = \left\{ \begin{array}{ll}
1 & i = j, \\
0 & i \neq j.
\end{array}\right.
\end{equation*}
\subsection{Span}
The \textbf{span} of a set of vectors $\{u_1, ..., u_n\}$ with $u_i \in \Real{m}$ for all $i$ is the set of all the linear combinations of those vectors. Specifically,
%
\begin{equation*}
\text{span}(\{u_1, ..., u_n\}) = \left\{ \sum_{i=1}^n \alpha_i u_i \; : \; \forall i, \alpha_i \in \Real{}\right\}.
\end{equation*}
\pagebreak
\section{Matrix Concepts}
\subsection{Dyads}
A \textbf{dyad} is a matrix equal to the outer product of two vectors. The matrix
%
\begin{equation*}
A = uv^T \;\;\; u \in\Real{m}, v \in \Real{n}
\end{equation*}
%
is a dyad. It has dimensions $A \in \Real{m \times n}$, and components $A_{ij} = u_i v_j$.
When we matrix multiply by a vector $x \in \Real{n}$, we get the following:
%
\begin{equation*}
Ax = uv^T x = (v^T x) u.
\end{equation*}
To prove that statement, consider the following derivation for the $i$th component of the vector $Ax$:
%
\begin{align*}
(Ax)_i &= \sum_{j=1}^n A_{ij} x_j \\
&= \sum_{j=1}^n u_i v_j x_j \\
&= u_i \sum_{j=1}^n v_j x_j \\
\therefore (Ax)_i &= u_i (v^T x).
\end{align*}
\subsection{Orthogonal Matrix}
A matrix is said to be an \textbf{orthogonal matrix} if it is square, real-valued, and \textbf{its columns are orthonormal}. If $U$ is an orthogonal matrix, then $U^T U = U U^T = I$, where $I$ is the identity matrix.
\subsection{Trace}
The trace of a square matrix $A \in \Real{n \times n}$ is the sum of its diagonal elements,
%
\begin{equation*}
\tr(A) = \sum_{i=1}^n A_{ii}.
\end{equation*}
The trace of a square matrix is also equal to the sum of its eigenvalues,
%
\begin{equation*}
\tr(A) = \sum_{i=1}^n \lambda_i,
\end{equation*}
%
where $\lambda_1, ..., \lambda_n$ are the eigenvalues of $A$.
\subsection{Invertibility}
A square matrix is invertible if and only if any of the following equivalent conditions are met:
%
\begin{itemize}
\item it is full rank,
\item all of its eigenvalues are nonzero,
\item its determinant is nonzero,
\item the columns of the matrix are linearly independent.
\end{itemize}
To be clear: if \textit{any} of these is true, \textit{all} of them are true.
\pagebreak
\section{Range and Nullspace}
\subsection{Range}
The \textbf{range} of a matrix $A \in \Real{m\times n}$ is the set $\range(A) = \{Ax \; | \; x \in \Real{n}\}$. The \textbf{rank} of a matrix is equal to the dimension of its range: $\rank(A) = \dim \range(A)$.
The range can equivalently be defined as the \textbf{span of the columns} of $A$, also called the column space of $A$. To see this, let $a_1, ..., a_n$ be the column vectors of $A$. That is, write
%
\begin{equation*}
A = \left[ \begin{matrix}
\vline & & \vline \\
a_1 & \cdots & a_n \\
\vline & & \vline
\end{matrix}\right].
\end{equation*}
%
Then, for all $x \in \Real{n}$, you have
%
\begin{equation*}
Ax = \left[ \begin{matrix}
\vline & & \vline \\
a_1 & \cdots & a_n \\
\vline & & \vline
\end{matrix}\right] \left[ \begin{matrix}
x_1 \\
\vdots \\
x_n
\end{matrix}\right] = \sum_{i=1}^n x_i a_i.
\end{equation*}
%
Thus, $\range(A) = \{Ax | x \in \Real{n} \} = \{\sum_{i=1}^n x_i a_i \; | \; \forall i, x_i \in \Real{}\} = \text{span}(\{a_1, ..., a_n\})$.
\subsection{Nullspace}
The \textbf{null space} of a matrix $A \in \Real{m\times n}$ is the set $\nulsp(A) = \{x \; | \; x \in \Real{n}, \; Ax = 0\}$. The \textbf{nullity} of a matrix is equal to the dimension of its nullspace: $\text{nullity}(A) = \dim \nulsp(A)$.
\subsection{Rank-Nullity Theorem}
\begin{theorem}[Rank-Nullity Theorem] For any matrix $A \in \Real{m \times n}$, the sum of the rank and nullity is equal to $n$. That is,
%
\begin{equation*}
\rank(A) + \text{\textnormal{nullity}}(A) = \dim \range(A) + \dim \nulsp(A) = n.
\end{equation*}
\end{theorem}
\subsection{Examples}
\begin{example}[Simple Ranges and Null Spaces]
Consider the following simples matrices:
%
\begin{align*}
(a) \;\; & A = \left[ \begin{matrix}
\lambda_1 & 0 \\
0 & \lambda_2
\end{matrix} \right], &&& (b) \;\; & A = \left[ \begin{matrix}
\lambda_1 & 0 \\
0 & 0
\end{matrix} \right], &&& (c) \;\; & A = uv^T, \;\; u \in \Real{m}, v \in \Real{n},
\end{align*}
%
where $\lambda_1, \lambda_2 \neq 0$, and $u,v \neq 0$. What are the ranges and nullspaces of these matrices?
First, consider case (a). Here, every vector in $\Real{2}$ is in the range space! Direct proof: let $y$ be any vector in $\Real{2}$. Then for $x = [ \lambda_1^{-1} y_1, \lambda_2^{-1} y_2]^T$, we have $y = Ax$, so $y$ is in the range space. Thus, $\range(A) = \Real{2}$.
As for the nullspace: for any $x$, we have $y = Ax = [\lambda_1 x_1, \lambda_2 x_2]^T$. Because $\lambda_1, \lambda_2 \neq 0$, we can only get $y=0$ if both $x_1$ and $x_2$ are equal to $0$. Thus, $\nulsp(A) = \{0\}$.
Consider case (b). We can immediately see that any vector proportional to $e_2 = [0, 1]^T$ is in the null space. Proof: let $v = \alpha e_2$, where $\alpha \in \Real{}$ is any scalar coefficient. Then $Av = 0$. Thus, $\nulsp(A) = \text{span}(\{e_2\})$. As for the range of $A$: let $x = [x_1, x_2]^T$. Observe that $Ax = [\lambda_1 x_1, 0]^T$. So for any $y = [y_1, 0]$, we can pick $x = [\lambda_1^{-1} y_1, 0]$, and then $y = Ax$, so $y \in \range(A)$. From this, we find that $\range(A) = \text{span}(\{e_1\})$, where $e_1 = [1, 0]$.
For case (c), we can compute the range space very easily:
%
\begin{align*}
\range(A) &= \{ Ax \; | \; x \in \Real{n}\} \\
&= \{ uv^T x \; | \; x \in \Real{n}\} \\
&= \{ (v^T x) u \; | \; x \in \Real{n}\} \\
&=^{\dagger} \text{span}(\{u\}).
\end{align*}
%
$^\dagger$ To see that last step, let $y = \alpha u$ be a vector in the span of $\{u\}$. For $x = \alpha v / \|v\|_2^2$, we have $Ax = uv^T (\alpha v / \|v\|_2^2) = \alpha u$. Thus, $\text{span}(\{u\}) \subseteq \range(A)$. From the third step, we see that every element in $\range(A)$ is in the span of $A$, because it has the form $(v^T x) u$ for some $x$, thus $\range(A) \subseteq \text{span}(\{u\})$. This proves the equality.
The nullspace for (c) is the set of all vectors orthogonal to $v$: $\nulsp(A) = \{x \in \Real{n} \; | \; v^T x = 0\}$. You can see this by noting that if $v^T x = 0$, then $Ax = uv^T x = (v^T x) u = 0$.
\end{example}
\pagebreak
\section{Fundamental Theorem of Linear Algebra}
\begin{theorem}[Fundamental Theorem of Linear Algebra]For any matrix $A \in \Real{m\times n}$,
%
\begin{equation*}
\nulsp(A) \overset{\perp}{\oplus} \range(A^T) = \Real{n}.
\end{equation*}
\end{theorem}
What does this theorem tell us? What does it mean? To explain this, we have to first define the notation in the problem. The $\oplus$ symbol means \textbf{direct sum}. Let $X,Y \subseteq \Real{n}$ be two vector subspaces. The space $Z$ is said to be the direct sum of $X$ and $Y$ if and only if every vector in $z \in Z$ can be written as a sum $z = x + y$, with $x \in X, y \in Y$, and the choice of $x$ and $y$ is \textit{unique}; also, every vector $x+y$ must be in $Z$. So, when the uniqueness conditions are satisfied,
%
\begin{equation*}
X \oplus Y = \{x+y | x \in X, y \in Y\}.
\end{equation*}
The notation $\overset{\perp}{\oplus}$ means that the two subspaces we are direct-summing are \textit{orthogonal}: $\nulsp(A) \perp \range(A^T)$. This means that $\forall x \in \nulsp(A), \forall y \in \range(A^T), x \perp y$ (or equivalently, $x^T y = 0$).
\subsection{How to Apply Fundamental Theorem of Linear Algebra}
You will use this theorem primarily for one of two reasons:
%
\begin{itemize}
\item to get the range of $A$ when you only know the null space of $A$, or to get the null space when you only know the range,
\item or as a proof step, when you have to recognize that $x \in \nulsp(A) \implies x \perp \range(A^T)$, or some equivalent statement.
\end{itemize}
\subsection{Getting the range or nullspace when you only know the other}
\textbf{Correct way}: Suppose that $V = \{v_1, ..., v_r\}$ is an orthonormal basis for $\range(A^T)$---that is, $\text{span}(V) = \range(A^T)$. Because $\range(A^T) \overset{\perp}{\oplus} \nulsp(A) = \Real{n}$, any linearly independent vectors $V' = \{v_{r+1}, ..., v_n\}$ that are all perpendicular to the vectors in $V$ constitute a basis for $\nulsp(A)$, so $\nulsp(A) = \text{span}(V')$.
\textbf{Incorrect way}: $\nulsp(A) = \Real{n} \setminus \range(A^T)$. \textbf{This is totally wrong. Do not write this.}
\vspace{10pt}
\begin{example}[Why the incorrect way is incorrect] Consider the matrix
%
\begin{equation*}
A = \left[ \begin{matrix}
1 & 0 \\
0 & 0
\end{matrix}\right].
\end{equation*}
%
We can see that $\range(A^T) = \Span{e_1}$, where $e_1 = [1, 0]^T$. The nullspace is $\nulsp(A) = \Span{e_2}$, where $e_2 = [0, 1]^T$. But what is the set $\Real{2} \setminus \range(A^T)$?
%
\begin{align*}
\Real{2} \setminus \range(A^T) &= \{x \in \Real{2} | x \notin \range(A^T)\} \\
&= \{x \in \Real{2} | x \not\propto e_1\}
\end{align*}
%
That is, $\Real{2} \setminus \range(A^T)$ is the entire plane, minus the line along the $e_1$-axis. This is \textit{not} equivalent to the nullspace, which is \textit{only} the line along the $e_2$-axis.
\end{example}
\pagebreak
\section{Eigenvalue Decomposition and Singular Value Decomposition}
\subsection{Eigendecomposition}
For real, symmetric matrices $A \in \Sym{n}$, $A$ has an eigendecomposition
%
\begin{equation*}
A = V\Lambda V^T = \sum_{i=1}^n \lambda_i v_i v_i^T,
\end{equation*}
%
where $V$ is the matrix whose columns are the orthonormal eigenvectors $v_i$. To recap on matrix sizes:
%
\begin{itemize}
\item $V = [v_1, ..., v_n] \in \Real{n \times n}$
\item $\Lambda = \diag{\lambda_1, ..., \lambda_n} \in \Real{n \times n}$.
\end{itemize}
\subsection{SVD}
\subsubsection{Compact Form}
When we have a nonsquare matrix, we do not have an eigendecomposition, but we \textit{do} have something similar: the singular value decomposition (SVD). Consider a matrix $A \in \Real{m \times n}$ with rank $r$. It has an SVD given by
%
\begin{equation*}
A = U \Sigma V^T = \sum_{i=1}^r \sigma_i u_i v_i^T,
\end{equation*}
%
where $\sigma_1 > \sigma_2 > ... > \sigma_r > 0$, $\{u_1, ..., u_r\}$ are orthonormal vectors in $\Real{m}$, and $\{v_1, ..., v_r\}$ are orthonormal vectors in $\Real{n}$. On matrix sizes:
%
\begin{itemize}
\item $U = [u_1, ..., u_r] \in \Real{m \times r}$,
\item $\Sigma = \diag{\sigma_1, ..., \sigma_r} \in \Real{r \times r}$,
\item $V = [v_1, ..., v_r] \in \Real{n \times r}$.
\end{itemize}
%
Note that $U^T U = V^T V = I_r$. However, also note that when $m > r$ and $n > r$, $UU^T \neq I_{m}$, and $VV^T \neq I_n$. (Too see this, recognize that $U$ and $V$ are rank $r$, so $UU^T$ and $VV^T$ can be at most rank $r$, but the identity matrix $I_m$ has rank $m >r$, and $I_n$ has rank $n > r$.)
The vectors $u_1, ..., u_r$ are called the \textbf{left singular vectors} of $A$, and the vectors $v_1, ..., v_r$ are called the \textbf{right singular vectors} of $A$.
Important note: \textbf{singular values must be strictly positive}, unlike eigenvalues, which can be positive, zero, or negative.
\subsubsection{Expanded Form}
You can also express the SVD as
%
\begin{equation}
A = \tilde{U} \tilde{\Sigma} \tilde{V},
\end{equation}
%
where
%
\begin{itemize}
\item $\tilde{U} = [\begin{matrix} U & U_+ \end{matrix}] \in \Real{m \times m}$, where $U_+ \in \Real{m \times (m - r)}$ is a matrix chosen so that $\tilde{U}$ is orthogonal ($U_+$ has columns $u_{r+1}, ..., u_m$ which complete a basis with $u_1, ..., u_r$, so that $\tilde{U} \tilde{U}^T = \tilde{U}^T \tilde{U} = I_m$).
\item $\tilde{\Sigma} = \left[\begin{matrix}\Sigma & 0_{r \times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times(n-r)} \end{matrix}\right] \in \Real{m \times n}$,
\item $\tilde{V} = [\begin{matrix} V & V_+ \end{matrix}] \in \Real{n \times n}$, where $V_+ \in \Real{n \times (n - r)}$ is a matrix chosen so that $\tilde{V}$ is orthogonal ($V_+$ has columns $v_{r+1}, ..., v_n$ which complete a basis with $v_1, ..., v_r$, so that $\tilde{V}\tilde{V}^T = \tilde{V}^T \tilde{V} = I_n$).
\end{itemize}
\subsection{Recovering $U$ or $V$ given $A$, $\Sigma$, and the other}
Starting from the $A = U \Sigma V^T$ definition above, we can derive some other relationships which would allow us to obtain $U$ or $V$, if we had the other matrices.
%
\begin{align*}
AV\Sigma^{-1} &= U \Sigma V^T V \Sigma^{-1} \\
&\overset{(a)}{=} U \Sigma \Sigma^{-1} \\
\therefore AV\Sigma^{-1} &= U
\end{align*}
%
where $(a)$ obtains from the fact that $V^T V = I_r$. Check the dimensions of these matrix multiplications---why did we have to right-multiply by $V$?
Also,
%
\begin{align*}
\Sigma^{-1}U^T A &= \Sigma^{-1} U^T U \Sigma V^T \\
&= \Sigma^{-1}\Sigma V^T \\
\therefore A^T U \Sigma^{-1} &= V.
\end{align*}
\subsection{Recovering $u_i$ or $v_i$ given $A$, $\sigma_i$, and the other}
We can derive vector relationships that are directly analagous to the matrix relationships in the previous subsection.
\begin{align*}
\frac{1}{\sigma_i} Av_i &= \frac{1}{\sigma_i} \left( \sum_{j=1}^r \sigma_j u_j v_j^T \right) v_i \\
&= \frac{1}{\sigma_i} \sum_{j=1}^r \sigma_j u_j \left( v_j^T v_i \right) \\
&\overset{(a)}{=} \frac{1}{\sigma_i} \sum_{j=1}^r \sigma_j u_j \delta_{ij} \\
&= \frac{1}{\sigma_i} \sigma_i u_i \\
\therefore \frac{1}{\sigma_i} Av_i &= u_i.
\end{align*}
%
The step at $(a)$ comes from recalling that the $v_i$ are orthonormal vectors: $v_i^T v_j = 0$ if $i \neq j$, and $v_i^T v_j = 1$ if $i = j$. We summarize this with the symbol $\delta_{ij}$\footnote{This is called the Kronecker delta.}, which is equal to $1$ when $i=j$ and $0$ otherwise.
By the same strategy, we can derive the relationship
%
\begin{equation*}
\frac{1}{\sigma_i} A^T u_i = v_i.
\end{equation*}
You should completely understand the proof above, and be able to synthesize the proof for the latter statement by yourself. On a test, if you have to recover $v_i$ or $u_i$ given $A$, $\sigma_i$, and the other, you should prove that your answer is correct. The proof strategy here is a valid way to do that.
\subsection{Connecting SVD to Eigendecomposition}
Consider an arbitrary nonsquare matrix $A \in \Real{m \times n}$ with SVD $A = \sum_{i=1}^r \sigma_i u_i v_i^T$. We will show that the singular values of $A$ and the eigenvalues of $A^T A$ (and $AA^T$) are connected.
\begin{fact}
The eigenvalues of the matrix $A^T A$ (and $AA^T$) are equal to the squares of the singular values of $A$. That is, if $\sigma_i$ is a singular value of $A$, then $\lambda_i = \sigma_i^2$ is an eigenvalue of $A^T A$ (and $AA^T$).
\end{fact}
Note that
%
\begin{align*}
A &= \tilde{U} \tilde{\Sigma} \tilde{V}^T \\
\implies A^T A &= \tilde{V} \tilde{\Sigma}^T \tilde{U}^T \tilde{U} \tilde{\Sigma} \tilde{V}^T\\
&= \tilde{V} \tilde{\Sigma}^T \tilde{\Sigma} \tilde{V}^T \\
&= \tilde{V} \left[\begin{matrix}\Sigma & 0_{r \times (m-r)} \\ 0_{(n-r)\times r} & 0_{(n-r)\times(m-r)} \end{matrix}\right] \left[\begin{matrix}\Sigma & 0_{r \times (n-r)} \\ 0_{(m-r)\times r} & 0_{(m-r)\times(n-r)} \end{matrix}\right] \tilde{V}^T \\
&= \tilde{V} \left[\begin{matrix}\Sigma^2 & 0_{r \times (n-r)} \\ 0_{(n-r)\times r} & 0_{(n-r)\times(n-r)} \end{matrix}\right] \tilde{V}^T.
\end{align*}
%
Observe that this specifies an eigendecomposition for $A^TA$, with eigenvalues $\sigma_i^2$! If you don't like the matrix form, let's do it another way. We derive the following:
%
\begin{align*}
A^T A &= \left(\sum_{i=1}^r \sigma_i u_i v_i^T\right)^T \left(\sum_{j=1}^r \sigma_j u_j v_j^T\right) \\
&= \sum_{i=1}^r \sum_{j=1}^r \sigma_i \sigma_j v_i u_i^T u_j v_j^T \\
&= \sum_{i=1}^r \sum_{j=1}^r \sigma_i \sigma_j v_i (u_i^T u_j) v_j^T \\
&\overset{(a)}{=} \sum_{i=1}^r \sum_{j=1}^r \sigma_i \sigma_j v_i \delta_{ij} v_j^T \\
&= \sum_{i=1}^r \sigma_i^2 v_i v_i^T
\end{align*}
%
This also proves the result. For (a), recall that the $u_i$ are orthonormal. The symbol $\delta_{ij}$, called the Kronecker delta, is defined so that $\delta_{ij} = 1$ if $i=j$ and $\delta_{ij} = 0$ otherwise.
There is another connection worth pointing out.
\begin{fact}
Consider a square, symmetric matrix $A \in \Sym{n}$. The absolute values of the nonzero eigenvalues of $A$ are its singular values.
\end{fact}
To see this, note the following:
%
\begin{align*}
A &= \sum_{i=1}^n \lambda_i v_i v_i^T \\
&= \sum_{i=1}^n |\lambda_i| \left( \sign(\lambda_i) v_i \right) v_i^T.
\end{align*}
%
Define $u_i \doteq \sign(\lambda_i) v_i$, and choose the ordering so that $|\lambda_1| > |\lambda_2| > ...$. Choose $\sigma_i = |\lambda_i|$ for $i=1, ..., r$ where $r$ is the rank of $A$ (equivalently, $r$ is the index of the last nonzero eigenvalue). This is an SVD for $A$. (Recall that singular values have to be strictly positive.)
\pagebreak
\section{The Most Important Equation in Linear Algebra: $Ax = b$}
Here, we consider one of the most important questions in linear algebra that you need to know the answer to: given a matrix $A \in \Real{m \times n}$ and a vector $b \in \Real{m}$, what choices of $x$ solve $Ax = b$?
\subsection{Case 1: $A$ is invertible (One Solution)}
Suppose $m=n$ and $A$ is full rank, so $A$ is invertible. Then there is a single, unique solution, and it is
%
\begin{equation*}
x = A^{-1} b.
\end{equation*}
\subsection{Case 2: $b \notin \range(A)$ (Zero Solutions)}
The statement $b \in \range(A)$ means that there is some $x$ such that $b = Ax$. If $b \notin \range(A)$, \textbf{then there is no $x$ such that $Ax=b$}.
\subsection{Case 3: $A$ is not invertible, but $b \in \range(A)$ (Infinity Solutions)}
In this case, there is at least some $x_0 \in \Real{n}$ such that $Ax_0 = b$ (because $b \in \range(A)$), but $A$ is not invertible, so it is not \textit{unique}.
Recall that $A$ being not invertible means that $A$ has some eigenvalues equal to zero, which means that the nullspace of $A$ is not empty. Thus, for all $v \in \nulsp(A)$, $x_0 + v$ is a solution. To see this, note that
%
\begin{align*}
\forall v \in \nulsp(A), A(x_0 + v) &= Ax_0 + Av \\
&= b + 0 \\
&= b.
\end{align*}
How do we find an $x_0$ that satisfies the equation $Ax=b$ in the first place, though? We can use the SVD to accomplish this. If $A = U\Sigma V^T$ is an SVD of $A$, then a pseudo-inverse of $A$ can be obtained as $A^\dagger = V\Sigma^{-1} U^T$. Picking $x_0 = A^{\dagger} b$ gives us a solution.
\pagebreak
\section{Projections}
In the \textbf{projection problem}, you are given some set of vectors $\calX$, and some vector $v$, and asked to find the projection of $v$ onto $\calX$. Concretely, you are being asked to solve the problem
%
\begin{equation*}
x^* = \arg \min_{x \in \calX} \|v - x\|.
\end{equation*}
%
That is, you want to find the element $x^*$ in $\calX$ which is as close as possible to $v$ in some sense.
\subsection{Projecting onto a line}
A line is a set $\calL(x_0, u) = \{x \; : \; x = x_0 + t u, \; t \in \Real{}\}$. Here, $x_0$ is just some point that lies on the line, and $u$ is a direction (which we will take to have unit norm, without loss of generality, though this is not necessary).
In the problem of \textbf{projecting onto a line}, we have some point $v \in \Real{n}$, and we would like to find the closest point to $v$ on the line, $x^* \in \calL(x_0, u)$, in the sense of Euclidean distance ($L_2$-norm). We first set up the optimization problem:
%
\begin{equation*}
x^* = \arg \min_{x \in \calL(x_0, u)} \|v - x\|_2.
\end{equation*}
Now, we write out the constraint explicitly---this involves adding a variable $t$:
%
\begin{align*}
x^* = \arg \min_{x,t} & \|v - x\|_2. \\
\text{s.t. } & x = x_0 + t u
\end{align*}
Observe that if we substitute $x = x_0 + tu$ in the objective, we can remove the variable $x$ from the optimization problem, and solve a new problem that is entirely in $t$. The optimal $x$ relates to the optimal $t$ according to
%
\begin{align*}
x^* = x_0 + t^* u,
\end{align*}
%
and the optimal $t$ is the solution to
%
\begin{align*}
t^* = \arg \min_t \|v - (x_0 + tu)\|_2.
\end{align*}
This last optimization problem is easy to solve. First, square the objective (which in this case does not change the solution---can you see why?), and then expand it:
%
\begin{align*}
t^* &= \arg \min_t \|v - (x_0 + tu)\|_2^2 \\
&= \arg \min_t (v - x_0 - tu)^T (v - x_0 - tu) \\
&= \arg \min_t \|v - x_0\|_2^2 + t^2 \|u\|_2^2 - 2t u^T (v - x_0).
\end{align*}
%
The optimization problem is now a quadratic, convex program with a single scalar variable. Setting the gradient equal to zero and solving for $t^*$ obtains the solution:
%
\begin{equation*}
t^* = \frac{u^T (v-x_0)}{\|u\|_2^2}.
\end{equation*}
%
When $u$ has unit norm, this simplifies to $t^* = u^T (v - x_0)$.
\subsection{Projecting onto a plane}
TODO
\subsection{Projection onto a subspace}
TODO
\subsection{Vectors and Tensors: Invariance Under Rotations}
TODO
\pagebreak
\section{Optimization Concepts}
\subsection{You Can Optimize In Any Order}
Consider the following optimization problem:
%
\begin{equation*}
\min_{x\in\calX, y\in\calY} f(x,y).
\end{equation*}
Define $h(x) \doteq \min_{y \in\calY} f(x,y)$ and $g(y) \doteq \min_{x \in \calX} f(x,y)$. Then:
%
\begin{equation*}
\min_{x\in\calX, y\in\calY} f(x,y) = \min_{x \in \calX} h(x) = \min_{y \in \calY} g(y);
\end{equation*}
%
that is, you can optimize over $x$ and $y$ in either order and obtain the same result.
Observe that solving $\min_{y \in\calY} f(x,y)$ gives you an optimal point $y^*$ that depends on the choice of $x$. We denote this as $y^*(x)$. Then $h(x) = f(x, y^*(x))$.
Similarly, solving $\min_{x \in\calX} f(x,y)$ gives you an optimal point $x^*$ that depends on the choice of $y$. We denote this as $x^*(y)$. Then $g(y) = f(x^*(y),y)$.
\subsubsection{What about constraints?}
Consider the constrained case, where $x$ and $y$ are related by some constrained---for example, $q(x,y) = 0$. The way to handle this is to fold the constraint into the inner optimization problem, and assign the inner optimization problem a value of $+ \infty$ when it is infeasible (if it is a minimization---otherwise $- \infty$). That is, we consider the problem
%
\begin{equation*}
\min_{x \in \calX, y \in \calY} f(x,y) \; : \; q(x,y) = 0.
\end{equation*}
We define the following:
%
\begin{align*}
h(x) &\doteq \left\{ \begin{array}{ll}
\min_{y \in \calY} f(x,y) \; : \; q(x,y) = 0 \;\;\;& \exists y \in \calY : q(x,y) = 0, \\
+\infty & \text{otherwise.}
\end{array}\right. \\
g(y) &\doteq \left\{ \begin{array}{ll}
\min_{x \in \calX} f(x,y) \; : \; q(x,y) = 0 \;\;\;& \exists x \in \calX : q(x,y) = 0, \\
+\infty & \text{otherwise.}
\end{array}\right.
\end{align*}
Then as before,
%
\begin{equation*}
\min_{x\in\calX, y\in\calY} f(x,y) = \min_{x \in \calX} h(x) = \min_{y \in \calY} g(y).
\end{equation*}
\subsubsection{What if I have a $\min$ and a $\max$? Does this still work?}
Consider the optimization problem
%
\begin{equation*}
\min_{x\in\calX} \max_{y \in \calY} f(x,y).
\end{equation*}
A reasonable question to ask: if you switch the order of optimization here, does it make a difference? The answer is: \textbf{unless special conditions are met, you cannot switch the orders of $\min$ and $\max$}. Later in the course, we will expand on this---it is a topic that deserves special attention.
\subsection{Solving Simple Unconstrained Optimization Problems}
When you are given a problem of the form
%
\begin{equation*}
\min_{x \in \Real{n}} f(x),
\end{equation*}
%
where there are no constraints on $x$, and $f$ is fully differentiable everywhere, and $f$ has an important property called \textbf{convexity}, a sufficient condition for the point $x^*$ to be an optimal point is
%
\begin{equation*}
\nabla_x f(x^*) = 0.
\end{equation*}
When you have obtained an optimal point $x^*$, the optimal value is computed to be $f(x^*)$.
Later in the course, we will go into more depth on convexity, but for now, think of it like this: \textbf{a convex function has only one local minimum, which is its global minimum, and no other minima.} An upward-facing bowl is convex. A downward-facing bowl is not (it is concave).
\subsection{Example: Single-Variable Quadratic Optimization}
Consider the problem
%
\begin{equation*}
\min_{x \in \Real{}} f(x) = ax^2 + bx + c,
\end{equation*}
%
with $a > 0$, so that the parabola is upward-facing, and there is a unique global minimizer.
We can find the minimum by setting the derivative of $f$ to zero and solving:
%
\begin{equation*}
\left.\diff{f}{x}\right|_{x^*} = 2ax^* + b = 0 \;\;\; \implies \;\;\; x^* = \frac{-b}{2a}.
\end{equation*}
\subsection{Example: Multivariable Least-Squares}
In the multivariable least-squares problem, we are trying to solve the optimization
%
\begin{equation*}
x^* = \arg\min_{x \in \Real{n}} \|Ax - b\|_2^2,
\end{equation*}
%
where $A \in \Real{m \times n}$ and $b \in \Real{m}$. For now, take it for granted that \textbf{the objective function is convex}.
To solve this, we first rewrite the objective function:
%
\begin{align*}
\|Ax - b \|_2^2 &= (Ax-b)^T (Ax -b) \\
&= x^T A^T A x - 2 b^T A x + b^T b.
\end{align*}
Next, we examine the sufficient condition $\nabla_x f(x^*) = 0$. We compute the gradient to be:
%
\begin{equation*}
\nabla_x f(x) = 2 A^T A x - 2 A^T b.
\end{equation*}
Any solution to the equation $A^TA x = A^T b$ is an optimal point.
\subsection{Example: $L_2$-Regularized Least-Squares}
In the $L_2$-regularized version of the multivariable least-squares problem, we are trying to solve the optimization
%
\begin{equation*}
x^* = \arg\min_{x \in \Real{n}} \|Ax - b\|_2^2 + \lambda \|x\|_2^2,
\end{equation*}
%
where $A \in \Real{m \times n}$, $b \in \Real{m}$, and $\lambda > 0$.
To solve this, we first rewrite the objective function:
%
\begin{align*}
\|Ax - b \|_2^2 + \lambda \|x\|_2^2 &= (Ax-b)^T (Ax -b) + \lambda x^T x\\
&= x^T A^T A x - 2 b^T A x + b^T b + x^T (\lambda I) x \\
&= x^T \left( A^T A + \lambda I \right) x - 2b^T A x + b^T b.
\end{align*}
We proceed in the same way as in the unregularized case, and determine that any $x$ which satisfies
%
\begin{equation*}
\nabla_x f(x) = 2(A^T A + \lambda I ) x - 2A^T b = 0
\end{equation*}
%
is an optimal point.
\section{Convex Sets}
\textbf{Required reading: Boyd Ch. 2, p 22-38}
A set of points, $\calX \subset \Real{n}$, is \textbf{convex} if for any two points $x_1, x_2 \in \calX$, every point on the line segment between $x_1$ and $x_2$ is also in $\calX$. In symbols, this condition is:
%
\begin{equation*}
\forall x_1, x_2 \in \calX, \forall \alpha \in [0,1], \;\;\; \alpha x_1 + (1-\alpha) x_2 \in \calX.
\end{equation*}
\subsection{Examples of Convex Sets}
\subsubsection{Affine Sets}
A set $\calX$ is \textbf{affine} if for any two points $x_1, x_2 \in \calX$, every point \textit{on the entire line} containing $x_1$ and $x_2$ is also in $\calX$. Mathematically:
%
\begin{equation*}
\forall x_1, x_2 \in \calX, \forall \alpha \in \Real{}, \;\;\; \alpha x_1 + (1-\alpha) x_2 \in \calX.
\end{equation*}
Clearly, \textbf{every affine set is convex.} However, \textbf{not every convex set is affine.}
\begin{example}
Consider the set of solutions to a system of linear equations: $\calX \doteq \{x \; : \; Ax = b\}$. Is this set affine? Suppose that $x_1, x_2 \in \calX$. Thus, we have $Ax_1 = b$ and $Ax_2 = b$. Consider any point $x(\alpha) \doteq \alpha x_1 + (1-\alpha)x_2$. We compute
%
\begin{align*}
Ax(\alpha) &= A(\alpha x_1 + (1-\alpha)x_2) \\
&= \alpha (Ax_1) + (1-\alpha) (Ax_2) \\
&= \alpha b + (1-\alpha) b \\
&= b.
\end{align*}
%
Thus, $x(\alpha) \in \calX$, and $\calX$ is affine.
\end{example}
\subsubsection{Hyperplanes and Halfspaces}
A \textbf{hyperplane} is a set of the form
%
\begin{equation*}
\{x \; : \; a^T x = b\},
\end{equation*}
%
for some fixed vector $a$ and fixed scalar $b$. \textbf{Hyperplanes are convex sets.} To prove this, observe that a hyerplane is an affine set by definition.
A \textbf{halfspace} is a set of the form
%
\begin{equation*}
\{x \; : \; a^T x \leq b\}.
\end{equation*}
%
Geometrically speaking, a halfspace is a hyperplane and all of the points to one side of it. A hyperplane is also a convex set, and the proof is similar. Let $x_1, x_2$ be points in a halfspace; for any $\alpha \in [0,1]$, we have
%
\begin{align*}
a^T (\alpha x_1 + (1-\alpha) x_2) &= \alpha a^T x_1 + (1-\alpha) a^T x_2 \\
&\leq \alpha b + (1-\alpha) b \\
& = b.
\end{align*}
%
Thus every point on the line from $x_1$ to $x_2$ is in the halfspace, and the halfspace is convex.
\subsubsection{Euclidean Balls}
A Euclidean ball centered on $x_c$ with radius $r$, $B(x_c, r)$, is defined to be the set of points
%
\begin{equation*}
B(x_c, r) \doteq \{x_c + ur \; : \; \|u\|_2 \leq 1\}.
\end{equation*}
%
This set is convex. The easy proof is geometric---just look at the ball, it is clearly convex---but we can also do it the hard way. Consider any points $x_1 = x_c + u_1 r$, $x_2 = x_c + u_2 r$, and any $\alpha \in [0,1]$. Consider the point $x(\alpha) \doteq \alpha x_1 + (1-\alpha) x_2$. We have:
%
\begin{align*}
x(\alpha) &= \alpha x_1 + (1-\alpha) x_2\\
&= \alpha (x_c + u_1 r) + (1-\alpha) (x_c + u_2 r) \\
&= x_c + \left( \alpha u_1 + (1-\alpha) u_2 \right) r.
\end{align*}
%
Thus $x(\alpha)$ is in the ball if $u(\alpha) \doteq \alpha u_1 + (1-\alpha) u_2$ satisfies $\|u(\alpha)\|_2 \leq 1$. To check this, we proceed:
%
\begin{align*}
\|u(\alpha)\|_2^2 &= \alpha^2 \|u_1\|_2^2 + (1-\alpha)^2 \|u_2\|_2^2 + 2\alpha(1-\alpha) u_1^T u_2 \\
&\overset{(a)}{\leq} \alpha^2 + (1-\alpha)^2 + 2\alpha(1-\alpha) \\
&= \alpha^2 + 1 + \alpha^2 - 2\alpha + 2\alpha - 2\alpha^2 \\
&= 1.
\end{align*}
%
The step (a) makes use of two facts: 1) by definition, $\|u_1\|_2^2 \leq 1$ and $\|u_2\|_2^2 \leq 1$; and 2) the Cauchy-Schwarz inequality gives $u_1^T u_2 \leq \|u_1\|_2 \|u_2\|$. The last step is just algebra. We conclude that $x(\alpha) \in B(x_c, r)$, and thus that $B(x_c, r)$ is a convex set.
\begin{remark}
We can also equivalently define the ball by the expression
%
\begin{equation*}
B(x_c, r) \doteq \{x \; : \; \|x - x_c \|_2 \leq r\}.
\end{equation*}
\end{remark}
\subsection{Operations Preserving Convexity}
\begin{proposition}Let $\calX_1, \calX_2$ be two convex sets. The intersection, $\calX \doteq \calX_1 \cap \calX_2$, is also convex. \end{proposition}
\begin{proof}
Consider points $x_1, x_2 \in \calX = \calX_1 \cap \calX_2$. Because $x_1$ and $x_2$ are in the intersection of $\calX_1, \calX_2$, both points must be in both sets. Because the sets are convex, we have, for all $\alpha \in [0,1]$,
%
\begin{align*}
& \alpha x_1 + (1-\alpha) x_2 \in \calX_1, \\
& \alpha x_1 + (1-\alpha) x_2 \in \calX_2.
\end{align*}
%
It follows immediately that $\alpha x_1 + (1-\alpha) x_2$ is in $\calX$. Therefore $\calX$ satisfies the convexity definition and is convex.
\end{proof}
\begin{example}[Polyhedron]
A \textbf{polyhedron} is an intersection of halfspaces. Consider $k$ halfspaces $\calH_i \doteq \{x \; : \; a_i^T x \leq b_i\}$, for $i=1, ..., k$. The polyhedron
%
\begin{equation*}
\calP \doteq \bigcap_{i=1}^k \calH_i
\end{equation*}
%
is a convex set.
\end{example}
\begin{proposition} Let $\calX$ be a convex set, and let $f$ be an affine function (it has the form $f(x) = Ax + b$ for some $A,b$). The image of $\calX$ under $f$,
%
\begin{equation*}
f(\calX) \doteq \{f(x) \; : \; x \in \calX\},
\end{equation*}
%
is convex.
\end{proposition}
Proof is left as an exercise to the reader. (It's really easy though!)
\begin{example}[Ellipsoid]
One way to define an ellipsoid is by the set
%
\begin{equation*}
\calE(x_c, A) \doteq \{x_c + Au \; | \; \|u\|_2 \leq 1\}.
\end{equation*}
This is the image of the unit ball $B(x_c,1)$ under the affine function $f(x) = Ax + (I - A)x_c$. To see this, note that $f(x_c + u) = A(x_c + u) + (I-A) x_c = x_c + Au$. Because $B(x_c,1)$ is a convex set, and $\calE(x_c, A)$ is the image of $B(x_c,1)$ under an affine function, $\calE(x_c, A)$ is also convex.
\end{example}
\pagebreak
\section{Convex Functions}
\textbf{Required reading: Boyd Ch. 3, p 67-87.}
A function $f : \Real{n} \to \Real{}$ is \textbf{a convex function} if $\dom f$ is a convex set, and for all points $x, y \in \dom f$, and all $\alpha \in [0,1]$, we have
%
\begin{equation*}
f( \alpha x + (1-\alpha) y ) \leq \alpha f(x) + (1-\alpha) f(y).
\end{equation*}
A function is \textbf{concave} if $-f$ is convex.
\subsection{First-Order Conditions for Convexity}
Suppose $f$ is differentiable. Then $f$ is convex if and only if $\dom f$ is convex, and for every $x,y \in \dom f$,
%
\begin{equation*}
f(y) \geq f(x) + \nabla f(x)^T (y - x).
\end{equation*}
This is saying that the linearization of $f$ at $x$ is a \textit{global underestimator} for $f$.
\subsection{Second-Order Conditions for Convexity}
Suppose $f$ is twice-differentiable. Then $f$ is convex if and only if $\dom f$ is convex and the Hessian of $f$, $\nabla^2 f$, is positive semi-definite everywhere: $\forall x \in \dom f$,
%
\begin{equation*}
\nabla^2 f(x) \succeq 0.
\end{equation*}
\subsection{Examples of Convex Functions}
The following are some examples of convex functions of scalar variables.
\begin{itemize}
\item \textit{Exponential.} $f(x) = e^{ax}$ is convex on $\Real{}$, for any $a \in \Real{}$.
\item \textit{Powers.} $f(x) = x^a$ is convex on $\Realpp{}$, for any $a \geq 1$ or $a \leq 0$. Note: this is not convex on the entire real line, just on $\Realpp{}$. Think about $x^3$---is this convex on the set $\{x \leq 0\}$? (It is not. Check first-order conditions!)
\item \textit{Logarithm.} $f(x) = -\log x$ is convex on $\mathbb{R}_{++}$
\end{itemize}
Some examples of convex vector-valued functions
\begin{itemize}
\item \textit{Norms.} All norms on $\mathbb{R}^n$ are convex (note this follows from the triangle inequality and using it to directly prove $f(\alpha x + (1-\alpha) y) \leq \alpha f(x)+ (1-\alpha) f(y)$.
\item \textit{Max function.} $f(x) = \max\{x_1,\ldots,x_n\}$ is convex on $\mathbb{R}^n$.
\item \textit{Random function.} $f(x,y) = x^2/y$ is convex on $\mathbb{R} \times \mathbb{R}_{++}$ (show that the hessian is PSD).
\item \textit{Affine function.} $f(x) = a\tran x + b$ is convex on $\mathbb{R}^n$.
\end{itemize}
\subsection{Operations that preserve convexity}
Usually in optimization, we have functions that are composed with one another and not nice functions that we can directly compute hessians or use the definition of convexity to show functions are convex. We revert to rules that preserve convexity. Below are some key operations that preserve convexity. All the rules below are worth memorizing.
\begin{itemize}
\item Nonnegative weighted sum of convex functions: $\sum\limits_{i=1}^n a_i f_i \;\;\; a_i \geq 0$ and $f_i$ convex $\forall i$ (extends to infinite sums, integrals)
\item Composition with affine function: $f$ convex $\Rightarrow f(Ax+b)$ is convex
\item Pointwise maximum of convex functions: if $f_i$ are convex for $i = 1,\ldots,n$ then $f(x) = \max\{f_1(x),\ldots,f_m(x)\}$ is convex
\item Pointwise supremum: if $f(x,y)$ is convex in $x$, then for $y\in C \;\;\; g(x) = \sup \limits_{y\in C} f(x,y)$ is convex (note $C$ does not need to be convex!) \textbf{This rule you will use repeatedly}
\item Pointwise Infimum: if $f(x,y)$ is jointly-convex in $(x,y)$ (not the same as separately being convex in $x$ and in $y$) and $C$ is a convex set then $g(x) = \inf \limits{y\in C} f(x,y)$ is convex
\item Composition: $g$ is convex, $h$ convex and nondecreasing, then $f(x) = h(g(x))$ is convex
\item Composition: $g$ is concave, $h$ convex and nonincreasing, then $f(x) = h(g(x))$ is convex
\item Composition: $g$ is concave, $h$ is concave and nondecreasing, then $f(x) = h(g(x))$ is concave
\item Composition: $g$ is convex, $h$ is concave and nonincreasing, then $f(x) = h(g(x))$ is concave.
\end{itemize}
Note for the compositions, we need to be careful about the range of $g(x)$ and domain of $h(x)$ - if the range of $g(x)$ contains values that are not in the domain of $h(x)$ then $h(g(x))$ is undefined.\\
\subsection{Examples of proving convexity using operations that preserve convexity}
\begin{itemize}
\item \textbf{Least squares objective}: Consider $f(x) = \|Ax - b\|_2^2$. This function is \textbf{convex}. Recall that the $L_2$ norm is a convex function, and composition with an affine function preserves convexity. Observe that $f$ is just the composition of $L_2$ norm wih an affine function.
\item \textbf{A certain eigenvalue function}: Consider $f(x) = \lambda_{min}\left(A(x)\right)$, where $A(x) = \sum_{i=1}^n x_i A_i$, and each $A_i \in \Sym{n}$. This is a fairly strange-looking function, to say the least, and it is not immediately clear how to analyze it from first or second-order conditions. Instead, we first exploit the Rayleigh quotient to write it in terms of an optimization problem:
%
\begin{align*}
f(x) = \lambda_{min}(A(x)) &= \min_{\|z\|_2=1} z^T A(x) z \\
&= \min_{\|z\|_2=1} \sum_{i=1}^n z^T (x_i A_i) z \\
&= \min_{\|z\|_2=1} \sum_{i=1}^n x_i \left( z^T A_i z \right) \\
&= \min_{\|z\|_2=1} x^T g(z) && g(z) \doteq [z^T A_1 z, \cdots, z^T A_n z]^T\\
\therefore - f(x) &= \max_{\|z\|_2=1} - x^T g(z).
\end{align*}
Observe that we have now defined the function $-f(x)$ at each point $x$ as a maximum over a family of functions that are affine in $x$. This means that $-f$ is a convex function, thus \textbf{$f$ is concave}.
\item \textbf{Negative log of a concave function}: Recall that $\log x$ is \textbf{concave} on $\Realp{}$. Therefore, $f(x) = -\log x$ is \textbf{convex} on $\Realp{}$. Also, observe that $f$ is nonincreasing. Suppose $g(y)$ is a concave function. By the composition rules, the function $f \circ g (y) = - \log (g(y))$ is convex on $\{y | g(y) > 0\}$.
\item \textbf{Composing many things at once}: Consider the multiply-composed function
\begin{equation*}
f(x) = -\log (-\log (\sum\limits_{i=1}^m e^{a_i\tran x + b_i}))
\end{equation*}
%
on $\dom f = \{ x | \sum\limits_{i=1}^m e^{a_i\tran x + b_i} < 1\}$. $f$ is \textbf{convex}.
First, take as a given that $g(y) \doteq \log (\sum_{i=1}^m e^{y_i})$ is convex in $y$. (This is too complex to show directly with the tools we have---but may become an exercise in the future.)
The composition of $g$ with an affine function, $g(Ax+b)$, is convex in $x$ (because composition of a convex function with an affine function always preserves convexity). Letting the rows of $A$ be $a_i^T$, we have $g(Ax + b) = \log \sum_{i=1}^m e^{a_i^T x + b_i}$. Then $-g(Ax+b)$ is concave in $x$.
As discussed in the previous bullet, the negative log of a concave function---which we have here---is a convex function on the domain where that concave function is positive.
The domain is then:
%
\begin{align*}
\dom f &= \left\{x \; \left| \; -\log \sum_{i=1}^m e^{a_i^T + b_i} > 0\right.\right\} \\
&= \left\{x \; \left| \; \sum_{i=1}^m e^{a_i^T + b_i} < 1\right.\right\}
\end{align*}
We are not finished---we need the domain of $f$ to be a convex set. This is not hard to show, however. First note that $\sum_{i=1}^m e^{a_i\tran x + b_i}$ is a convex function of $x$ by the composition rules (exponential is convex, composition with affine preserves convexity). Hence the domain of $f$ can be characterized as a sub-level set of $\sum_{i=1}^m e^{a_i\tran x + b_i}$, which for convex functions we know will be convex sets (see discussion 5).
\end{itemize}
\pagebreak
\section{Convex Optimization: Definitions}
\textbf{Required reading: Boyd. Ch 4, p 127 - 156.}
A convex optimization problem is one where \textbf{the objective is a convex function, and the feasible set is a convex set}.
The standard form for convex optimization problems is
%
\begin{equation}
\begin{aligned}
p^* = \min_{x} \;\;& f_0(x) \\
\text{s.t. } & f_i (x) \leq 0, \;\;\; i = 1,...,p \\
& Ax = b,
\end{aligned}\label{convexopt}
\end{equation}
%
where $f_0, ..., f_p$ are convex functions.
Recall that the \textbf{sublevel set of a convex function is convex}, so the set
%
\begin{equation*}
\calX_i \doteq \{x \in \Real{n} \; : \; f_i (x) \leq 0\}
\end{equation*}
%
is convex when $f_i$ is convex. Also, the set $\{x \; : \; Ax = b\}$ is an affine subspace of $R^{n}$, which is convex. The intersection of convex sets is convex, so the total feasible set for (\ref{convexopt}), $\calX$, which is defined by
%
\begin{equation*}
\calX = \calX_1 \cap \calX_2 \cap \cdots \cap \calX_p \cap \{x \; : \; Ax = b\},
\end{equation*}
%
is convex.
\begin{remark} Notice that the equality constraint is required to be affine, rather than just a convex function. This is because, for convex function $f$, the set $\{x \; : \; f(x) = 0\}$ is \textbf{not necessarily convex}. For example, consider the function, $f(x) = \|x\|_2 - r$. The set $\{x : \|x\|_2 - r = 0\}$ defines the surface of an $n$-ball, but does not contain any of its interior points. Thus, this set is not convex.
\end{remark}
\subsection{Linear Programs (LP)}
A \textbf{linear program (LP)} is an optimization problem with \textbf{linear objective and constraint functions}. Any linear program can be expressed in the form
%
\begin{equation}
\begin{aligned}
p^* = \min_{x} \;\; & c^T x \\
\text{s.t. } & Gx \preceq h, \\
& Ax = b,
\end{aligned}\label{LP}
\end{equation}
%
for appropriate choices of $c, A, b, G, h$.