Internals of kMeansTaim

This document describes the structure and execution of kMeansTrain, implemented as kMeansPredict function.

images/kMeansTrain.png:alt:k-meansTainningStructure

kMeansTrain fits new centers based native k-means using the existed samples and initial centers provied by user. In order to achieve to accelertion training, DV elements in a sample are input at the same time and used for computing distance with KU centers and updating centers. The static configures are set by template parameters and dynamic by arguments of the API in which dynamic ones should not greater than static ones.

There are Applicable conditions:

1.Dim*Kcluster should less than a fixed value. For example, Dim*Kcluster<=1024*1024 for centers with float stored in URAM and 1024*512 for double on U250.

2.KU and DV should be configured properly due to limitation to URAM. For example,KU*DV=128 when centers are stored in URAM on U250.

3.The dynamic confugures should close to static ones in order to void unuseful computing inside.

Caution

These Applicable conditions.

Benchmark

The below results are based on:
  1. dataset from UCI;
    1. http://archive.ics.uci.edu/ml/datasets/NIPS+Conference+Papers+1987-2015
    2. http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
    3. http://archive.ics.uci.edu/ml/datasets/US+Census+Data+%281990%29
  2. all data as double are processed;
  3. unroll factors DV=8 and KU=16;
  4. results compared to Spark 2.4.4 and initial centers from Spark to ensure same input;
  5. Spark 2.4.4 is deployed in a server which has 56 processers(Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz)

Training Resources(Device: U250)

D K LUT LUTAsMem REG BRAM URAM DSP
5811 80 295110 50378 371542 339 248 420
561 144 260716 26016 371344 323 152 420
68 2000 255119 24295 372487 309 168 425

Training Performance(Device: U250)

D K samples
1 thread

on spark(s)

8 threads

on spark(s)

16 threads on spark(s) 32 threads on spark(s) 48 threads on spark(s)
fpga

execute(s)

fpga

freq(MHz)

5811 80 11463 93.489 (3.17X) 49.857 (1.69X) 49.860 (1.63X) 48.001 (1.89X) 50.875 (1.72X)
29.410
(1X)
202
561 144 7352
10.781

(5.04X)

6.557

(3.06X)

6.546

(3.06X)

6.216

(2.91X)

6.190 (2.89X)
2.136
(1X)
269
68 2000 857765 547.001 (3.44X) 173.116 (1.08X) 170.217 (1.07X) 161.169 (1.01X) 166.214 (1.04X)
158.903
(1X)
239