Summary
**Softmax Function**
* Transforms a vector (t1, ..., tk) into a probability vector.
* Inputs (t) are often called logits.
* Output entries are always non-negative and sum to 1.
* Formula: softmax(t1,...,tk) = [exp(t1)/∑k j=1 exp(tj), ..., exp(tk)/∑k j=1 exp(tj)].
**Multi-class Probabilistic Model**
* Softmax is applied to (θ⊤1 x, ..., θ⊤k x) to obtain probabilities P(y=i|x; θ).
* P(y=i|x; θ) = exp(θ⊤i x) / ∑k j=1 exp(θ⊤j x).
* This probability can be denoted as φi.
* The negative log-likelihood for a single example (x,y) can be computed from this model.
**Classification Context**
* Multi-class classification is a topic covered in the CS229 Lecture Notes under "Classification and logistic regression."
**Gaussian Discriminant Analysis (GDA) Model**
* Used for classification problems where input features 'x' are continuous-valued random variables.
* Models p(x|y) using a multivariate normal distribution.
* Model components: y ∼ Bernoulli(φ), x|y=0 ∼ N(µ0,Σ), x|y=1 ∼ N(µ1,Σ).
The softmax function transforms a vector of inputs, known as logits, into a probability vector where entries are non-negative and sum to one. It is used in multi-class classification to define probabilities P(y=i|x; θ) for different classes, calculated as exp(θ⊤i x) divided by the sum of exp(θ⊤j x) over all classes. The CS229 Lecture Notes also introduce the Gaussian Discriminant Analysis (GDA) model, an alternative for classification problems involving continuous-valued input features, which models p(x|y) using a multivariate normal distribution. [S1, S2, S4]