CSE 569 – HW4 – Fall 2018
Aditya Vikram Sharma1215126588
November 26, 2018
1 K-Means
K-Means algorithm is used to compute cluster the points given in the dataset into multiple clusters
based on the value of ‘K’. In this assignment, 2 datasets have been given which have 1600 and 1500
points respectively. The Simulations have been performed for both the datasets. Initially, to nd
the optimal value of ‘K’, Elbow-rule is used where a graph between ‘K’ values and Sum of Squared
Errors/ Total number of points for each cluster for a xed random initialization are plotted and the
optimal ‘K’ value is the point in the graph where the curve bends like an elbow. Next, xing this
‘K’, simulation for various values of ‘r’ which is the random initialization is performed and a graph
of ‘r’ vs (Sum of Squared errors)/ (Total number of points) are plotted to identify the best random
initialization value. The cluster of the random initialization is shown is depicted.
The algorithm performed for K-means in this assignment is shown here : Algorithm 1:
K-Means1
k GetInput
2 for i 0to kdo 3
C entroid i RandomInitialization
4 end
5 newC entroid Centroid
6 for j 0to MaxIter do 7
forx 0to TotPoints do 8
fory 0to kdo 9
dist EucDist(Centroidy, Pointsx)
10 end
11 ind Index(min(dist))
12 C lusterind Pointsx)
13 end
14 newC entroid avg(Clusuterind)
15 if(newCentroid != Centroid) then16
newC entroid Centroid
17 continue
18 else 19
break
20 end
21 end For both the dataset, simulation is performed and the corresponding plots are shown in the next
subheadings.
1

1.1 Dataset 1
This dataset contains 1600 points in total. A depiction of the dataset is shown in the gure below. Figure 1: Dataset 1
The optimal ‘K’ value for this dataset was calculated by plotting various values of ‘K’ vs ‘Sum of
Squared errors/ Total Points’ graph for a xed random initialization using the elbow rule. The optimal
value of ‘K’ turned out to be 2 which can be seen in the below gure. Figure 2: Elbow Rule
Now, that we xed K=2, the algorithm is run ‘r’ times with different random initializations where
‘r’ is set to 5 to nd the random initialization with the lowest Sum of Square error/ Total Points value
and a graph for that is shown in the gure 3.
2

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Figure 3: Random Initialization vs SSE/Total points
From the above plot, the SSE/ Total Points value for all the random Initializations are :
1.762324974077466, 1.762324974077466, 1.7622818440138992, 1.762324974077466, 1.7622793729899464
and it is clear that Random Initialization 5 has the lowest SSE/ Total Points value.
Using this random initialization, the clustering for the points is done with K = 2 and the plot for
that is shown below. Figure 4: Clustering of Dataset 1 for K = 2
With this random initialization and K value, the algorithm converged at Iteration 4.
3

1.2 Dataset 2
This dataset contains 1500 points in total. A depiction of the dataset is shown in the gure below. Figure 5: Dataset 2
The optimal ‘K’ value for this dataset was calculated by plotting various values of ‘K’ vs ‘Sum of
Squared errors/ Total Points’ graph for a xed random initialization using the elbow rule. The optimal
value of ‘K’ turned out to be 3 which can be seen in the below gure. Figure 6: Elbow Rule
Now, that we xed K=3, the algorithm is run ‘r’ times with different random initializations where
‘r’ is set to 5 to nd the random initialization with the lowest Sum of Square error/ Total Points value
and a graph for that is shown in the gure 3.
4

Figure 7: Random Initialization vs SSE/Total points
From the above plot, the SSE/ Total Points value for all the random Initializations are :
1.0261745127196509, 1.026653218697321, 1.1787796700494195, 1.178782241593976, 1.0261849230725586
and it is clear that Random Initialization 1 has the lowest SSE/ Total Points value.
Using this random initialization, the clustering for the points is done with K = 3 and the plot for
that is shown below. Figure 8: Clustering of Dataset 2 for K = 3
With this random initialization and K Value, the algorithm converged at Iteration 8.
The python code for the K-means algorithm implemented is shown in the next page.
5

# –
*- coding: utf-8 –
*-
“””
Created on Wed Nov 21 11:09:15 2018
@author: 1796a
“””
importmatplotlib.pyplotasplt
importnumpyasnp
importmatplotlib.pyplotaspltt
fromcollectionsimport defaultdict
fromcopyimport deepcopy
file=open(C: Users 1796a Documents Sem 1 FSL Assignment
Codes Assignment 4 Dataset_2.txt)
, !
X=np.loadtxt(file)
def euc_dist(X, Y): #Function to Calculate Eucledian Distance
return (np.linalg.norm(X-Y))
def sse(X,Y): #Function to Calculate Sum of Squared Erros
return (X-Y)
**2
def kmeansClust(k=3, i=1):
“””
C is a dictionary which stores the centeroids.
M is a list which randomly permutates the input, to take
initial centeroids
, !
d is a list of eucledian distances between centroid and each pt
in dataset
, !
Cluster is a dictionary which appends each pt to the
corressponding index
, !
PrevCent stores the previous value of centroid, to check if it
has changed
, !
after each iteration
“””
C=defaultdict(list)
size=len(X)
np.random.seed(i)
M=np.take(X,np.random.permutation(X.shape0),axis=0)
plt.scatter(X:,0, X:,1,s=25, cmap=viridis)
plt.savefig(“C: Users 1796a Documents Sem
1 FSL Assignment Codes Assignment
4 initialPoints2.png”)
,
!
, !
for iin range(0,k):
Ci=Mi
max_iter=0
PrevCent=deepcopy(C)
list(C)
for _in range(50): # Run loop max of 50 times
max_iter=max_iter+1
Cluster=defaultdict(list)
for iin range(0,len(X)):
d=list()
for jin range(0,k):
d.append(euc_dist(Cj,Xi))
ind=d.index(min(d))
Clusterind.append(Xi)
for clin range(0,k):
6

Ccl0=0
Ccl1=0
for lin range(0,len(Clustercl)):
Ccl0=Ccl0+ (Clustercll0)/len(Clustercl)
, !
Ccl1=Ccl1+ (Clustercll1)/len(Clustercl)
, !
for p,cinzip(PrevCent, C):
if (list(PrevCentp)!=list(Cc) ):
PrevCent=deepcopy(C)
flag=0
break
else :
print (“Converged at Iteration “, max_iter)
flag=1
break
if flag==1:
break
Colors=b,y,g,c,m,k,w,r
d=0
for valinrange(0,k):
for pin range(0,len(Clusterval)):
pltt.scatter(Clustervalp0,
Clustervalp1,s=25, c=Colorsval)
, !
#Calculate SSE here
d=d+sse(Clustervalp0, Cval0)+ sse(Clustervalp1,Cval1)
, !
pltt.scatter(Cval0, Cval1, marker= *, c=black,
s=200, alpha=0.5)
, !
pltt.savefig(“C: Users 1796a Documents Sem
1 FSL Assignment Codes Assignment
4 Cluster_3_r1_Graph.png”)
,
!
, !
pltt.show()
return (d/size)
def main():
Val={}
k=int(input(Enter number of Clusters: ))
r=int(input(Enter max r value: ))
for iin range(1,r+1):
print (“R = “, i)
Vali=kmeansClust(k=k,i=i)
print (Val)
lists=sorted(Val.items())
x, y=zip( *lists)
plt.plot(x, y)
plt.xlabel(“Random Initialization”)
plt.ylabel(“Sum of Squared Errors”)
plt.gcf().subplots_adjust(left=0.15)
plt.savefig(“C: Users 1796a Documents Sem
1 FSL Assignment Codes Assignment 4 SSE_K_Graph2.png”)
, !
plt.show()
if __name__==”__main__”:
main()
7

2 Gaussian Mixture Model
Gaussian Mixture model is generally the representation of sum of gaussian densities. In this task,
there are two datasets on which the EM Algorithm of the GMM is performed which contains 1600
and 1500 points respectively. The EM algorithm has two stages – the E-step and the M-step where at
the E-step, which is the expectation step, the initial parameters are evaluated and calculated and at the
maximization step the log likelihood function is maximized and the parameters are estimated again.
This process is repeated till the algorithm converges which is when the estimated parameters equal
the previous estimate of the iteration. The algorithm implemented in shown below. Algorithm 2:
Gaussian Mixture Model – EM Algorithm1
Mean
2 P
Covariance
3 if (initialization =kmeans) then4
Centroid(Kmeans)
5 iniLog LogLikelihood
6 while (Algo != Converge) do7
fori 0to len(df) do 8
C
i
PMF (Data)
9 C
i C
i S um
(C
i)
10 V al:append (C
i)
11 end
12 forj 0to len(Val) do 13
C
j
M aximize (C )
14 N ewV al:append (C 1h; C 2h )
15 j N ewV al V al
+N ewV al
16 n (P
j)
17 end
18 dif f eucledian (; n )
19 n
20 end The task involves two stages – one to initialize the mean vectors and covariance randomly and the
second is to use the K-means centroid values as the mean vectors and covariance. This process is
done for both the datasets and the plots are shown in the respective subheading.
2.1 Dataset 1
This dataset contains 1600 points in total. A depiction of the dataset is shown in the gure below. Figure 9: Dataset 1
8

2.1.1 Random Initialization
Here, the mean vectors and covariance is randomly initialized for the K value set 2 and is used to the
the E-M algorithm. The plot for the clustering and the log likelihood can be seen below. Figure 10: GMM Clustering with Random Initialization
Figure 11: Log Likelihood Graph for Random Initialization
As, we can see from the gures that the algorithm converges very well and the log likelihood graph
is also converging at a very good rate with respect to the Iterations which can be seen in gure 11.
The Plot and Log likelihood graph for K-means initialization is shown in the next page.
9

2.1.2 K-means Initialization
In this phase, the mean vectors and covariance is initialized using K-Means clustering with K=2
which is done in the previous task. The Centroid of the K-clusters are used during initilization. Figure 12: GMM Clustering with K-Means Initialization
Figure 13: Log Likelihood Graph for K-Means Initialization
As it is visible, both the initialization work well and can be used in their own way. Both work
well for the EM Algorithm, however, it is theoratically better to use the K-means initialization as it
already has the centroids and the EM algorithm can converge quickly.
10

2.2 Dataset 2
This dataset contains 1500 points in total. A depiction of the dataset is shown in the gure below. Figure 14: Dataset 2
2.2.1 Random Initialization
Here, the mean vectors and covariance is randomly initialized for the K value set 2 and is used to the
the E-M algorithm. The plot for the clustering and the log likelihood can be seen below. Figure 15: GMM Clustering with Random Initialization
As, we can see from the gures that the algorithm converges very well and the log likelihood graph is
also converging at a very good rate with respect to the Iterations which can be seen in gure 16. The
Plot and Log likelihood graph for K-means initialization is shown in the next page.
11

Figure 16: Log Likelihood Graph for Random Initialization
2.2.2 K-means Initialization In this phase, the mean vectors and covariance is initialized using K-Means clustering with K=3
which is done in the previous task. The Centroid of the K-clusters are used during initilization. Figure 17: GMM Clustering with K-Means Initialization
12

Figure 18: Log Likelihood Graph for K-Means Initialization
As it is visible, both the initialization work well and can be used in their own way. Both work
well for the EM Algorithm, however, it is theoratically better to use the K-means initialization as it
already has the centroids and the EM algorithm can converge quickly. The code is implemented in
Python 3.6 which is shown in the next page.
13

# –
*- coding: utf-8 –
*-
“””
Created on Mon Nov 26 23:40:23 2018
@author: 1796a
“””
importnumpyasnp
importpandasaspd
importrandomasrand
importmatplotlib.pyplotasplt
fromscipy.statsimport norm
fromsysimport maxint
np.random.seed(42)
file=open(C: Users 1796a Documents Sem 1 FSL Assignment
Codes Assignment 4 Dataset_1.txt)
, !
X=np.loadtxt(file)
labels=(1 *800)+(2
*800)
#1600 samples
data={x: xs,y: ys,label: labels}
df=pd.dF(data=X)
fig=plt.figure()
plt.scatter(datax, datay,24, c=datalabel)
def prob(val, myu, s, lam):
p=lam
for iin range(len(val)):
p *=norm.pdf(vali, myui, sii)
return p
def expectation(dF, param):
for iin range(dF.shape0):
x=dFxi
y=dFyi
p_cluster1=prob(x, y,list(parammyu1), list(params1), paraml0 )
, !
p_cluster2=prob(x, y,list(parammyu2), list(params2), paraml1 )
, !
if p_cluster1>p_cluster2:
dFlabeli=1
else :
dFlabeli=2
return dF
def maximization(dF, param):
pts_cluster1=dFdFlabel==1
pts_cluster2=dFdFlabel==2
14

per_cluster1=len(pts_cluster1)/float(len(dF))
per_cluster2=1-per_cluster1
paraml=per_cluster1, per_cluster2
parammyu1=pts_cluster1x.mean(), pts_cluster1y.mean()
, !
parammyu2=pts_cluster2x.mean(), pts_cluster2y.mean()
, !
params1= pts_cluster1x.std(),0, 0,
pts_cluster1y.std()
, !
params2= pts_cluster2x.std(),0, 0, pts_cluster2y.std()
, !
return param
def dist(old_params, new_params):
dist=0
for param inmyu1,myu2:
for iin range(len(old_params)):
dist+=(old_paramsparami-new_paramsparami) **2
return dist
**0.5
value=maxint
epsilon=0.01
iters=0
df_copy=df.copy()
df_copylabel=map(l x: x+1, np.random.choice(2,len(df)))
params=pd.dF(guess)
while (value>epsilon):
iters+=1
newVal=expectation(df_copy.copy(), params)
newParams=maximization(newVal, params.copy())
value=dist(params, newParams)
print (“iteration {}, value {}”.format(iters, value))
df_copy=newVal
params=newParams
fig=plt.figure()
plt.scatter(df_copya, df_copyb,24, c=df_copylabel)
15

3 Binomial Mixture Model Extra Credit
Binomial Mixture Model to predict the outcome or values of two unknowns can be approached using
Expectation Maximization algorithm. Here, in this dataset, we have 10000 coin tosses by 2 coins
‘C1’ and ‘C2’. There are 1000 sets in the dataset having 100 tosses in each set and each set cor-
ressponds to a single coin, but we do not have the label for the coin, hence we can use Expectation
Maximization to estimate the missing parameters/ probabilities of occurance of both the coins C1 and
C2.
The notations going to be used –
1 = Probability Coin C1 landing a head
1 = Probability of Coin C2 landing a head
For simulation, we set
1 = 0.6 and

1 = 0.5 initially just like how its mentioned in 1. Instead
of calculating which coin would have generated the tosses, it is better to calculate the probability of
each possible completion of the missing values(which is the label in this case).
Let us consider the rst set(rst row) in the dataset. There are 61 Heads (1’s) and 39 Tails (0’s).
P(61 HjC 1) = 100
C61 61
1 (1

1)39
= 0 :0798 (1)
P (61 HjC 2) = 100
C61 61
2 (1

2)39
= 0 :00711 (2)
Taking the weighted average here, we get
W1= (1) (1) + (2)
= 0
:9181 (3)
W 2= (2) (1) + (2)
= 0
:0819 (4)
Since we know that, each set of data (row) is performed by a single coin, we have (X
1; X
2; X
3; ::::::; X
1000)
as the value for each set of data where each X is a vector of 100 coin tosses. We can denote the hidden
value of label of coins as ( (Z
1; Z
2; Z
3; ::::::; Z
1000)
.
We know that, L( j X ) = P(X j) = X
z P
(X; Z j) (5)
If we denote ‘h’ by the number of heads in the i’th set,
P(Z
ij
X
i;
) = P
(X
i; Z
ij
) P
(X
ij
) (6)
where, P(X
ij
) = 100
Ch h
1 (1

1)(1
h)
+ 100
Ch h
2 (1

2)(1
h)
(7)
and, Zi2
C1; C 2 (8)
Therefore, since we know that E(H) = n*P (n=100 here) which is
E(N o of H in C
1j X
i;
1) = 100
P (Z
i=
C1j X
i;
1)
(9)
Therefore, at the Expectation step (E-step) of the E-M algorithm we calculate the Binomial Probability
Mass Function for both the coins C1 and C2 and we append the row probability of both the coins every
set of 100 values (every row).
16

After the expectation step, we need to perform Maximization. So, at the Maximization step, we
maximize the log likelihood function.
The value at each iteration are calculated by normalizing the number of heads in total of all 1000
iterations of coin C1 with respect to the sum of number of heads and tails for coin C1. Similarly the
value of the coin C2 is calculated. This process continues and the values keep getting updated
at each iteration and the process will halt when the difference between the current value and the
previous value is less than a threshold value. The threshold value set in this algorithm is 1e-6.
The algorithm implemented for this problem is given below. Algorithm 3:
Binary Mixture Model1
df Load Dataset
2 1 0.6
3 2 0.5
4 ( 1 ; 2)
5 dif f eucledian (; n ) n
6 lim 1e-6
7 maxI ter 10000
8 while (iterlim) do 9
fori 0to len(df) do 10
C1
h
BinomialPMF (C1)
11 C2
h
BinomialPMF (C2)
12 C1
h C
1
h C
1
h +
C 2
h
13 C2
h C
2
h C
1
h +
C 2
h
14 V al:append (C 1
h; C
2
h)
15 end
16 forj 0to len(Val) do 17
C1h C1 N o H
18 C1t C1 N o T
19 C2h C2 N o H
20 C2t C2 N o T
21 N ewV al:append (C 1h; C 2h )
22 end
23 n1 S um
(C 1h ) S um
(C 1h )+ S um (C 1t)
24 n2 S um
(C 2h ) S um
(C 2h )+ S um (C 2t)
25 n (n 1; n 2)
26 dif f eucledian (; n )
27 n
28 end The Simulation for the algorithm was performed using Python 3.6. After performing the simula-
tions, the results obtained were
1 = 0.5488
2 = 0.4527
and it has been observed that, even upon changing the initial parameters, that is, initial values, the –
nal values converge at the above specied values. The screenshots of the simulation have been shown
in the gure next page.
17

The updation of parameters over various iterations is shown in the Figure 19.
Figure 19: Parameter Updation over various iterations
The Final values for both the coins, that is probability of heads landing on coins C1 and C2 is
shown in the gure 20. Figure 20: Parameter Updation over various iterations
A simulation was performed over the dataset using the EM Algorithm in Python 3.6 with initial
values set as 0.6 and 0.5 for the two coins respectively and nal values obtained as 0.5488 and 0.4527
respectively.
References
1https://www.nature.com/articles/nbt1406
———————————————————————————————————————- 18