Gaussian Process Kernels

As I point out in http://www.jianping-lai.com/2017/03/10/guassian-process/, kernel can be decomposed into $$\vec{\epsilon}$$, where $$\vec{x} = \vec{m} +\vec{\epsilon}$$ and $$\vec{\epsilon}=\bf{L}*\vec{u}$$.

For linear kernel $$k(x_i,x_j) = x_i*x_j$$,

import numpy as np

def k(a,b):
    return a*b

x = np.arange(0,1.0,0.2)
n = x.shape[0]

c = np.zeros((n,n))

for i in range(n):
    for j in range(n):
        c[i,j] = k(x[i],x[j])
A,S,B = np.linalg.svd(c, full_matrices=False)


L =  np.linalg.cholesky(c+1e-15*np.eye(n))
z = np.matmul(A,np.sqrt(np.diag(S)))


def Print(m):
    for row in m:
        for e in row:
            print '{:+.2f}'.format(e),
        print

Print(z)
print "########"
Print(L)

[simterm]
%%%%%%%%%%%% SVD %%%%%%%%%%%%%
+0.00 +0.00 +0.00 +0.00 +0.00
-0.20 +0.00 -0.00 -0.00 +0.00
-0.40 +0.00 +0.00 -0.00 +0.00
-0.60 -0.00 -0.00 -0.00 +0.00
-0.80 +0.00 +0.00 +0.00 +0.00
%%%%%%%%% Cholesky %%%%%%%%%%%
+0.00 +0.00 +0.00 +0.00 +0.00
+0.00 +0.20 +0.00 +0.00 +0.00
+0.00 +0.40 +0.00 +0.00 +0.00
+0.00 +0.60 +0.00 +0.00 +0.00
+0.00 +0.80 +0.00 +0.00 +0.00
[/simterm]

both the SVD and Cholesky decomposition leads to that
$$x_0 = 0.0 u$$
$$x_1 = 0.2 u$$
$$x_2 = 0.4 u$$
$$x_3 = 0.6 u$$
$$x_4 = 0.8 u$$

as we have $$u_1 \sim u_2 \sim u \sim -u$$. These results leads to a straight line.

import numpy as np
def k(a,b):
    return min(a,b)

x = np.arange(0,1.0,0.1)
n = x.shape[0]

c = np.zeros((n,n))

for i in range(n):
    for j in range(n):
        c[i,j] = k(x[i],x[j])
A,S,B = np.linalg.svd(c, full_matrices=False)

L =  np.linalg.cholesky(c+1e-15*np.eye(n))
z = np.matmul(A,np.sqrt(np.diag(S)))

def diff(m):
    for idx in range(m.shape[0]-1):
        print np.sum(np.square(m[idx+1] - m[idx]))

diff(z)
print "########"
diff(L)

[simterm]
%%%%%%%%%%%% SVD %%%%%%%%%%%%%
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
%%%%%%%%% Cholesky %%%%%%%%%%%
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
[/simterm]
This result is essentially saying

Difference between y’s of two nearby data points $$\epsilon_{i+1} – \epsilon_{i}$$ has a constant standard deviation of $$\sqrt{0.1}$$ (subscript represents for data points sequentially). This gives you a randomized but continuous data structure.

Guassian Process

for Gaussian Process, SVD should be equivalent to Cholesky decomposition; when co-variance matrix is positive definite, they should be equal.

import numpy as np
u,s,v = np.linalg.svd(c)
#c == np.matmul(np.matmul(u, np.diag(s)), v)
#np.matmul(u,np.diag(s)) == np.matmul(np.diag(s),v).T
L = np.cholesky(c)
# most often L ~=  np.cholesky(c+1e-15*np.eye(c.shape[0])) for numerical stablility
# c == np.matmul(L,L.T)
# L ~ np.matmul(u,np.diag(s)) 
# when c is positive definte, they should be equal

as random variable

$$!\vec{x} =\vec{m}+\vec{\epsilon} $$
$$!\vec{\epsilon} = \bf{L}*\vec{u}$$
where $$\bf{L}$$ is the covariance matrix and $$\vec{u}\sim N(\vec{0},\bf{I})$$