PLS regression

Partial Least Squares (PLS) regression is popular in chemometrics, but not so well known in data streams. It is a linear regression model. Data is projected into lower dimensional space, and a regression model is produced.

PLS regression has nice properties for streaming data analysis. It can be updated recursively, and it can adapt over time.

Batch model. There is no closed form solution for producing a model, therefore, an iterative optimisation procedure is used.

Let $\mathbf{X}$ be a $n \times r$ matrix of input data, where each line is an observation. Let $\mathbf{y}$ be an $n \times 1$ vector of the target values, corresponding to the observations in $\mathbf{X}$. Assume that the data is standardised prior to the analysis to zero mean and unit variance. Let $k$ be a parameter indicating the dimensionality of the projected data, such that $0 < k < r$.

InItialize: $\mathbf{E}_0 = \mathbf{X}$ and $\mathbf{u}_0 = \mathbf{y}$.

Loop: repeat the following steps for $i=1$ to $k$:

\[\mathbf{w}_i = \mathbf{E}^T_{i-1}\mathbf{u}_{i-1}/(\mathbf{u}_{i-1}^T\mathbf{u}_{i-1})\]
\[w_i = w_i/ \sqrt{w_i^Tw_i}\]
\[\mathbf{t} = \mathbf{X}\mathbf{w}_i\]
\[q = \mathbf{u}^T_{i-1}\mathbf{t}/(\mathbf{t}^T\mathbf{t})\]
\[\mathbf{u}_i = \mathbf{u}_{i-1}\]
\[\mathbf{p}_i = \mathbf{E}^T_{i-1}\mathbf{t}/(\mathbf{t}^T\mathbf{t})\]
\[\mathbf{E}_i = \mathbf{E}_{i-1} - \mathbf{t}\mathbf{p}_i^T\]
\[\mathbf{u}_i = \mathbf{u}_{i-1} - \mathbf{t}q_i\]

Collect the results into matrixes:

$\mathbf{W} = (\mathbf{w}_1,\ldots,\mathbf{w}_k)$,
$\mathbf{P} = (\mathbf{p}_1,\ldots,\mathbf{p}_k)$, and
$\mathbf{q} = (q_1,\ldots,q_k)^T$.

In Python the iterative optimisation procedure can be implemented as follows, assuming data and labels are in the numpy array format.

import numpy as np
def nipals(data,labels,k):
r = np.shape(data)[1]
W = np.zeros((k,components))
P = np.zeros((k,components))
Q = np.zeros(components)
for i in range(k):
	w = np.dot(data.T,u)*1.0/np.dot(u.T,u)
	wnorm = np.sqrt(np.dot(w.T,w))
	w = w*1.0 / wnorm
	t = np.dot(data,w)
	tt = np.dot(t.T,t)
	q = np.dot(t.T,labels)*1.0/tt
	u = labels
	p = np.dot(data.T,t)*1.0/tt
	data = data - np.outer(t,p)
	labels = labels - t*q
	W[:,i] = w
	P[:,i] = p
	Q[i] = q
return P,W,Q

Prediction. Predictions for unseen data can be made as $\hat{y} = \mathbf{x}^T\beta$, where $\beta = \mathbf{W}(\mathbf{P}^T\mathbf{W})^{-1}\mathbf{q}$.

In Python prediction can be implemented as follows.

model = np.dot(np.dot(W,np.linalg.pinv(np.dot(P.T,W))),Q)

Online update. For updating the regression model with new observations we need to store $\mathbf{P}$ and $\mathbf{q}$. When a new labeled observation $\mathbf{x},y$ arrives, a training dataset is constructed as $\mathbf{E}_0 = (\mathbf{P},\mathbf{x})^T$ and $\mathbf{u}_0 = (\mathbf{q},y)^T$. Then the batch learning procedure (specified above) is applied to the constructed dataset.

In Python:

def update_PLS(P,Q,data,labels):
	k = np.shape(P)[1]
	data_new = np.vstack((P.T,data))
	label_new = np.hstack((Q,labels))
	P,W,Q = nipals(data_new,label_new,k)
	return P,W,Q

Online adaptation. When data distribution is evolving over time, it may be useful to include a forgetting factor. In such a case the update procedure is the similar, but the training datasets are constructed as $\mathbf{E}_0 = (\alpha\mathbf{P},\mathbf{x})^T$ and $\mathbf{u}_0 = (\alpha\mathbf{q},y)^T$, where $\alpha \in (0,1)$ is the forgetting factor. $\alpha = 1$ would correspond to no forgetting at all. The lower $\alpha$, the faster the forgetting.

In Python:

def update_PLS(P,Q,data,labels):
	k = np.shape(P)[1]
	data_new = np.vstack((alfa*P.T,data))
	label_new = np.hstack((alfa*Q,labels))
	P,W,Q = nipals(data_new,label_new,k)
	return P,W,Q

More details can be found in, for instance, this paper by S. Qin from 1998.