Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary
Authors: Michal Aharon and Michael Elad
PDF
Introduction of Sparse Representation
Background
-
In sparse and redundant modeling, a signal $y \in R^n$ is represented as a linear combination of some atoms taken from a dictionary $D \in R^{n*m}$ which contains $m$ atoms $d_i \in R^n$. A representation of the signal $y$ is then any vector $x \in R^n$ satisfying $y = Dx$
-
$m > n$: the representation is redundant.
-
The solutions to $y = Dx$ are infinite and we prefer the sparest one, i.e, the one with the smallest ${||x||}_0$
-
The task of computing a representation of a signal can be formly described by
- However, the constrait above is often relaxed and replaced by ${||y-Dx||}_2 \le \epsilon$. This allows for additive noise and model deviations
Problem
-
Solving this problem was proved to be an NP-hard problem
-
Two approximation techniques :
-
Matching-pursuit (MP) methods that find the solution one entry at a time in a greedy way
-
Basis-pursuit (BP) algorithm that replaces the $L_0$ norm by the $L_1$ norm
-
-
Two types of dictionary
-
Prespecified dictionaries
-
Dictionaries obtained by learning procedure
-
This paper
-
The image-signature-dictionary (ISD), is an 2D image in which each patch can serve as a representing atom
-
A near shift-invariant property is obtained, due to the overlap between atoms extracted from the ISD in nearby locations
-
By taking patches of varying sizes, near scale-invariance is potentially obtained and exploited
The Structure of ISD
-
$D_s \in R^{\sqrt{m}*\sqrt{m}} : $ signature dictionary
-
$y’ \in R^{\sqrt{n}*\sqrt{n}} : $ imasge patch and its single column form $y \in R^n$
-
$d_s \in R^m : $ a vector obtained by a column lexicographic ordering of the ISD
-
$C_{[k,l]} : $ the linear operator that extract a patch of size $\sqrt{n}*\sqrt{n}$ from the $D_s$ in location $[k,l]$
-
$C_{[k,l]}d_s : $ the extracted patch as a column vector
-
$x$ is the concatenation of the $m$ coefficients in the array $x_{[k,l]}$
ISD Training
Energy Function
We assume that each patch is represented by a fixed number of atoms, L. The energy function that the ISD is expected to minimize :
s.t.
s.t.
Two stages in training
-
The sparse coding stage: find $\hat{x_i}$
Assuming that $d_s$ is known and fixed, we can solve the inner optimization problem and find the sparsest representation $\hat{x_i}$ for each example $y_i$
This problem can be solved by any pursuit algorithm, such as a greedy method——the orthogonal matching pursuit (OMP)
The OMP selects at each stage an atom from the dictionary that best resembles the residual. After each such selection, the signal is back-projected onto the set of chosen atoms, and the new residual signal is calculated (Don’t understand)
-
Dictionary update stage: $\hat{d_s}$
Assuming that the sparse representation vectors $\hat{x_i}$ have been computed, we seek the best ISD $d_s$ to minimize:
The gradient of the error expression:
where
Thus, the optimal ISD is obtained simply by:
-
Stochastic gradient appoach (SG)
-
The algorithm above updates the $x_i$ for all examples {y_i}_{i=1}^N and then updates the ISD
-
Another way is to update the ISD after the computation of each $x_i$ (SG):
The step-size parameter $\mu$ is typically set as $\mu = \frac{\mu_0}{Iteration}$
-
Treatment of the mean
-
Remove the mean in training and testing data
-
Redefine the $C_{[k,l]}$ to also remove the mean of the extrated patch