Main Guide¶
Overview¶
The Cosine Similarity Alveo Product allows you to use a Xilinx Alveo accelerator card to find the best matches of a given target vector of integers within a set of population vectors of integers. The target vector is paired with each population vector in turn to compute the ` <https://en.wikipedia.org/wiki/Cosine_similarity>`__ score of the pair. The scores are sorted, and the highest scores are returned, along with an identifier (called its row index) for the corresponding population vectors.
Using the API¶
Follow the steps below to use the API.
Instantiate a CosineSim object
Load the population vectors into the Alveo accelerator card
Run one or more matches by supplying for each run a target vector and the number of matches to return
Instantiate a object¶
To instantiate a CosineSim object, first select the options for the object by instantiating an Options object and setting its members, as shown in the example below. Note that all API identifiers are contained within the namespace xilinx_apps::cosinesim.
#include "cosinesim.hpp" xilinx_apps::cosinesim::Options options; options.vecLength = 200; // every target vector and population vector will have 200 elements options.numDevices = 2; // Use 2 Alveo accelerator cards
Options::vecLength determines the vector length, or number of elements, of every population and target vector. Options::numDevices determines how many Alveo accelerator cards to use for storing population vectors and executing the cosine similarity match. Options::numDevices should be at least 1 and no more than the number of installed Alveo acceleration cards. If you specify fewer than the number of installed cards, the choice of which cards are used is undefined.
NOTE: Setting the xclbinPath
and xcbinPathCStr
data members of the Options object currently has no effect, as the XCLBIN (FPGA program) file is always picked up from the default installation location under /opt/xilinx
.
Next, instantiate the CosineSim object. The template parameter specifies the integral type of each target and population vector element. Currently, only 32-bit signed integer types, such as std::int32_t
, are supported.
xilinx_apps::cosinesim::CosineSim<std::int32_t> cosineSim(options);
Load the population vectors¶
Loading the population vectors into the Alveo accelerator card is accomplished with the procedure shown in the code example below.
cosineSim.startLoadPopulation(myPopVectors.size()); for (unsigned myIndex = 0; myIndex < myPopVectors.size(); ++myIndex) { // Get a population vector buffer and its row index xilinx_apps::cosinesim::RowIndex rowIndex = 0; buf = cosineSim.getPopulationVectorBuffer(rowIndex); // Fill the buffer and save the row index memcpy(buf, myPopVectors[myIndex], sizeof(myPopVectors[myIndex])); popMap[rowIndex] = myIndex; // Mark the buffer as finished cosineSim.finishCurrentPopulationVector(buf); } cosineSim.finishLoadPopulation();
The entire loading process is started and ended with calls to CosineSim::startLoadPopulation() and CosineSim::finishLoadPopulation(), respectively. For CosineSim::startLoadPopulation() you must supply the total number of population vectors to load. In between the start and finish calls, for every population vector to add, you must fetch a buffer from the CosineSim object and fill it with your population vector values.
The buffer to fill is fetched with CosineSim::getPopulationVectorBuffer(). In addition to returning a pointer to the buffer, the function sets its argument to the row index of the population vector within the Alveo card. Because the row index may be different from your data index, you should save the row index in a map from row index to your population vector (or associated object), so that when you perform the match, you can identify your population vector based on the row index returned in the match results, as demonstrated in the next section.
As the returned buffer is treated as a plain C integer array, you can use any C or C++ technique to copy vector elements into the buffer. In the example above, myPopVectors[myIndex]
is assumed to be a Value[]
array, where Value
is the template parameter type of CosineSim, so memcpy
is used.
After filling the buffer, you must call CosineSim::finishCurrentPopulationVector() to process the population vector. After CosineSim::finishCurrentPopulationVector() is called, there is no way to modify the vector, and once CosineSim::finishLoadPopulationVectors() has been called, there is no way to add more population vectors. To change the population vector set, you will need to resubmit all population vectors, starting with another call to CosineSim::startLoadPopulation().
Multi-threading note: While the API is designed to accommodate writing to population vectors from multiple threads simultaneously, the API functions themselves are not thread safe. To use the API in a multi-threaded application,you must wrap each call to CosineSim::getPopulationVectorBuffer() and CosineSim::finishCurrentPopulationVector() in a critical section (with mutex locking). Once a thread has acquired a buffer, the thread can write to the buffer in parallel with other threads writing to their separate buffers. The get and finish calls do not need to be in the same critical section. That is, you can unlock the mutex between the two function calls.
Run a match¶
After the population vectors have been loaded into the Alveo accelerator card, you can call CosineSim::matchTargetVector() with a target vector to find the population vectors that have the highest cosine similarity with the target vector. The example below shows how to use the function.
std::vector<xilinx_apps::cosinesim::Result> results; results = cosineSim.matchTargetVector(10, testVector); for (xilinx_apps::cosinesim::Result &result : results) std::cout << result.similarity << " " << popMap[result.index] << std::endl;
In the example, testVector
, the target vector, is assumed to be a C array of Value
integers. Along with that target vector, the function takes the number of results that you would like it to return. In this case, we’re requesting the top 10 matches. The maximum number of results currently supported is 100. The function returns a std::vector
of Result objects, where each object contains the cosine similarity score (Result::similarity) and row index (Result::index) of a population vector. Use the map you built during population vector loading to convert the returned row index back to your population vector index or object.
You can call CosineSim::matchTargetVector() repeatedly with different target vectors. The target vectors do not need to be present in the set of population vectors, as each call to CosineSim::matchTargetVector() transfers the target vector to the Alveo accelerator card before running the cosine similarity search.
Alveo accelerator card storage capacity¶
The number of population vectors that an Alveo accelerator card can hold depends on both the vector length of a population vector as well as the memory capacity of the Alveo accelerator card. The Alveo U50 accelerator card, for example, can hold approximately
1.6 billion / len
population vectors, where len
is the vector length rounded up to a multiple of 4.
Error handling¶
Every CosineSim member function can potentially throw an exception of type Exception if a run-time error, such as a hardware communication error, is encountered. You can wrap your load and match operations in a try
/ catch
block to handle the error. The exception object provides the Exception::what() member function for retrieving a text message for the error. As there are currently no programmatically recoverable errors, the Exception object does not supply an error code. As with any hardware device, recovering from an Alveo accelerator card error may require the intervention of a system operator, so you should consider sending the error message to a destination suitable for the system administrator to access it.
API usage errors are handled within CosineSim simply by emitting an error message to stdout
or stderr
and aborting. API usage errors include passing out-of-range or unsupported values as arguments to CosineSim member functions.
Linking your application¶
You have a few choices for how to link the API code into your application:
Linking directly with the Cosine Similarity shared library (.so)
Linking with the Cosine Similarity dynamic loader archive (.a)
Including the Cosine Similarity dynamic loader source file (.cpp)
Linking directly (.so)¶
The simplest method of linking the API into your application is to link directly with the shared library (.so), placing a run-time dependency of your application on the shared library. Simply add the following arguments to your link line:
-L/opt/xilinx/apps/graphanalytics/cosinesim/1.1/lib -lXilinxCosineSim
Linking with the dynamic loader archive (.a)¶
To avoid having a run-time dependency on the shared library, but instead load the shared library on demand (internally using dlopen()
), you can link with the loader archive by adding the following arguments to your link line:
-L/opt/xilinx/apps/graphanalytics/cosinesim/1.1/lib -lXilinxCosineSim_loader -ldl
Including the dynamic loader source file (.cpp)¶
Another way to avoid a run-time dependency on the shared library is by including the loader source file in a header or source file of your program:
#define XILINX_COSINESIM_INLINE_IMPL #include "cosinesim.hpp" // Code that uses the API goes here #include "cosinesim_loader.cpp"
Note that you will still have to include -ldl
on your link line to pull in the standard dynamic loading library.
The loader source file is located in /opt/xilinx/apps/graphanalytics/cosinesim/1.1/src
. Note the macro definition that comes before the inclusion of cosinesim.hpp
.
TIP: When using either dynamic loading technique, if the order of symbol loading causes unexplained behavior in your application, you can try adding libXilinxCosineSim.so
to the list of pre-loaded shared libraries, as explained in ` <https://stackoverflow.com/questions/426230/what-is-the-ld-preload-trick>`__.
Type-erased base class¶
The CosineSim class is a template class that ensures type safety for vector elements. However, if you need a type-free non-template class, you can use CosineSim ‘s base class, CosineSimBase, directly. Its member functions are the same as for the template class, except that data pointers are of type void *
, and you must also pass the length in bytes of a vector element.