← Back

main_notes

Summary

**Softmax Function**
*   Transforms a vector (t1, ..., tk) into a probability vector.
*   Inputs (t) are often called logits.
*   Output entries are always non-negative and sum to 1.
*   Formula: softmax(t1,...,tk) = [exp(t1)/∑k j=1 exp(tj), ..., exp(tk)/∑k j=1 exp(tj)].

**Multi-class Probabilistic Model**
*   Softmax is applied to (θ⊤1 x, ..., θ⊤k x) to obtain probabilities P(y=i|x; θ).
*   P(y=i|x; θ) = exp(θ⊤i x) / ∑k j=1 exp(θ⊤j x).
*   This probability can be denoted as φi.
*   The negative log-likelihood for a single example (x,y) can be computed from this model.

**Classification Context**
*   Multi-class classification is a topic covered in the CS229 Lecture Notes under "Classification and logistic regression."

**Gaussian Discriminant Analysis (GDA) Model**
*   Used for classification problems where input features 'x' are continuous-valued random variables.
*   Models p(x|y) using a multivariate normal distribution.
*   Model components: y ∼ Bernoulli(φ), x|y=0 ∼ N(µ0,Σ), x|y=1 ∼ N(µ1,Σ).

The softmax function transforms a vector of inputs, known as logits, into a probability vector where entries are non-negative and sum to one. It is used in multi-class classification to define probabilities P(y=i|x; θ) for different classes, calculated as exp(θ⊤i x) divided by the sum of exp(θ⊤j x) over all classes. The CS229 Lecture Notes also introduce the Gaussian Discriminant Analysis (GDA) model, an alternative for classification problems involving continuous-valued input features, which models p(x|y) using a multivariate normal distribution. [S1, S2, S4]

Ask this document