- Notation
$$X\in \mathbb{R}^{n\times (k+1)}:Train\ Data$$
-
Cross Entropy
$$L(\beta;y)=-\sum [y_i\log f(x_i\beta)+(1-y_i)\log (1-f(x_i\beta))]$$ $$=-\sum [y_i(\log exp(x_i\beta)-\log (1+exp(x_i\beta))-(1-y_i)\log (1+exp(x_i\beta))]$$ $$=-\sum [y_ix_i\beta-\log(1+exp(x_i\beta))]$$ -
Minimize Loss
$$\underset{\beta}{min}L(\beta;y)\Rightarrow\frac{\partial L(\beta;y)}{\partial \beta}=0$$ $$\frac{\partial L(\beta;y)}{\partial \beta_j}=-\sum [y_ix_{ij}-\frac{exp(x_i\beta)}{1+exp(x_i\beta)}x_{ij}]$$ $$=\sum (\hat y_i-y_i)x_{ij}=x_i^T(\hat y_i-y_i)$$ $$\frac{\partial L(\beta;y)}{\partial \beta}=X^T(\hat y_i-y_i)$$
Learned beta: [ 0.4286 -0.2562 0.3251 0.485 0.6253 -0.7556]
True beta: [ 0.4 -0.2 0.3 0.5 0.6 -0.7])