Synthea Demo Query Reference¶
This section describes the queries supplied with the Synthea demo in greater detail.
Introduction¶
Operating on the patient graph are a set of GSQL operations and queries for creating graph,
generating the patient vectors, and invoking the
cosine similarity operations of the Recommendation Engine. The query source files can be found in the query
subdirectory under the demo directory.
The queries are organized by file into levels of operation. Starting at the lowest level, the files are:
schema_xgraph.gsql
: Defines the graph schema – the types of vertices and edgesload_xgraph.gsql
: Loads the Synthea patient data from CSV filesbase.gsql
: Infrastructure queries for generating patient vectors and invoking cosine similarity operationsclient.gsql
: Queries built on top ofbase.gsql
queries that an application can invoke to control the graph and cosine similarity operationsquery.gsql
: Queries built on top ofbase.gsql
to be invoked from a terminal window as a simple demo
The following subsection describes the last three files in greater detail.
Base.gsql Queries¶
The queries of base.gsql
generally return raw data as GSQL data objects using GSQL RETURN
statements.
The queries are intended to be called from higher-level queries that determine the appropriate style of output
for their intended uses. The main queries in base.gsql
include:
patient_vector
: Given a patient vertex, this query returns a patient vector. The query pulls data from the patient vertex itself and walks the edges of the patient vertex to pull data from the patient’s medical history nodes. To form feature maps (vectors of features) for group attributes, the query callsudf_get_similarity_vec
, a UDF defined as part of this Synthea demo.cosinesim_embed_vectors
: This query traverses all patient vertices, callspatient_vector
to form the patient vector for each patient, and embeds the vector into the patient vertex. Cosine similarity operations use these embeddings, so this query must be called first. In a typical database application, embeddings would be created in bulk at data loading time, and individually as the database is updated.cosinesim_match_sw
: This query performs a cosine similarity search using only the CPU, for comparison purposes with the hardware-accelerated version. To save on redundant calculations, it leverages a partial computation (the cosine similarity “normal” for each vector) embedded into the patient vertex at the timecosinesim_set_num_devices
: This query sets the number of Alveo U50 cards to use. If the number of cards is less than the total number of cards installed in the server, cards are chosen in an unspecified manner.load_graph_cosinesim_ss_fpga_core
: This query transfers all patient vectors to the Alveo card(s).cosinesim_ss_fpga_core
: Given a patient vertex and a number of results to return, this query invokes the cosine similarity match operation on the Alveo card(s), returning the best-matching patients and the cosine similarity score for that patient.
Client.gsql Queries¶
The queries in client.gsql
differ from their base.gsql
counterparts in that the client.gsql
queries
return their data in JSON format (using the GSQL PRINT
statement), to facilitate processing of the data
from an application calling the query through TigerGraph’s REST interface.
The main queries in client.gsql
include:
client_cosinesim_embed_vectors
: Creates the embeddings by callingcosinesim_embed_vectors
inbase.gsql
. The returned data includes the execution time in milliseconds.client_cosinesim_match_sw
: Performs a cosine similarity match using only the CPU by callingcosinesim_match_sw
inbase.gsql
. The output is a JSON array whose elements are a dictionary of matching patient ID and consine similarity score.client_cosinesim_set_num_devices
: Sets the number of Alveo U50 cards to use by callingcosinesim_set_num_devices
inbase.gsql
. The query returns the number of devices set.client_cosinesim_get_alveo_status
: Returns status information about the Recommendation Engine, such as whether the patient vectors have been loaded into the Alveo card and the number of cards to use.client_cosinesim_load_alveo
: Transfers all patient vectors to the Alveo card(s) usingload_graph_cosinesim_ss_fpga_core
inbase.gsql
.client_cosinesim_match_alveo
: Invokes the cosine similarity match operation on the Alveo card(s), given the target patient to match and the desired number of matches to return. The output is a JSON array whose elements are a dictionary of matching patient ID and consine similarity score. This query callscosinesim_ss_fpga_core
inbase.gsql
to perform the work.
Query.gsql Queries¶
The queries in query.gsql
are simple demonstrations of post-processing the raw data of the base.gsql
queries. These queries are not intended to be called from an application, as they produce human-readable
output in files. The match.sh
demo script uses these queries.
cosinesim_ss_tg
: A CPU-based implementation of cosine similarity based oncosinesim_match_sw
inbase.gsql
load_graph_cosinesim_ss_fpga
: Transfers all patient vectors to the Alveo card(s) usingload_graph_cosinesim_ss_fpga_core
inbase.gsql
.cosinesim_ss_fpga
: Runs a cosine similarity match operation in the Alveo cards(s) usingcosinesim_ss_fpga_core
inbase.gsql
.