namespace clustering¶

// namespaces

namespace xf::data_analytics::clustering::internal
    namespace xf::data_analytics::clustering::internal::kmeansScan

kMeansTrain¶

#include "xf_DataAnalytics/clustering/kmeansTrain.hpp"

template <
    typename DT,
    int Dim,
    int Kcluster,
    int KU,
    int DV = 128 / KU
    >
void kMeansTrain (
    ap_uint <512> data [1<< 20],
    ap_uint <512> kcenters [(Kcluster+KU+1)*((Dim *sizeof(DT)*8+511)/512)]
    )

k-means is a clustering algorithm that aims to partition n samples into k clusters in which each sample belongs to the cluster with the nearest mean. The implementation is based on “native k-means”(also referred to as Lloyd’s algorithm). The implemenation aims to change computational complexity O(Nsample * Kcluster * Dim * maxIter) to O(Nsample* (Kcluster/KU)*(Dim/DV)*maxIter) by accelerating calculating distances.Athough more speedup are achieved to as KU*DV grows in theory,KU and DV should be configured properly because the both effect on storing centers on chip. The input data contains : 1) dynamic configures in data[0],including the number of samples,the number of dimensions,the number of clusters,the maximum number of iterations,the distance threshold used for determining whether the iteration is converged. 2) initial centers, which are provided by host and compressed into many 512-bit packages. 3) smaples used for training,which are also compressed. kcenters is used for output best centers only.

Parameters:

DT	data type, supporting float and double
Dim	the maximum number of dimensions,dynamic number of dimension should be not greater than the maximum.
Kcluster	the maximum number of cluster,dynamic number of cluster should be not greater than the maximum.
KU	unroll factor of Kcluster, KU centers are took part in calculating distances concurrently with one sample. After Kcluster/KU+1 times at most, ouput the minimum distance of a sample and Kcluster centers.
DV	unroll factor of Dim, DV elements in a center are took part in calculating distances concurrently with one sample.
data	input data from host
kcenters	the output best centers