Healthcare Concept Similarity with UMLS

In this use case, we will try to find similar drugs/healthcare terms from the Unified Medical Language System (UMLS) dataset. UMLS is a collection of health and biomedical vocabularies from a wide variety of Healthcare data sources. One of the knowledge source in UMLS is a metathesaurus that is a collection of medical concepts called atoms and links them through useful relationships.

For the purpose of this demo, we use a small subset of the atoms file MRCONSO.RRF and use relationships between atoms from the relationships file MRREL.RRF. The atoms and their relations are modeled as a graph database using TigerGraph database and Cosine Similarity is used as the match similarity measure.

Each atom is converted into an embedding representation by probabilistically capturing their relations. We use the Node2Vec algorithm to compute embeddings for each atom vertex in the graph.

This Example selects a random vertex in the graph from a query of UMLS database and returns top matching drugs, drug ingredients or related healthcare concepts.

In General, finding Cosine Similarity on large datasets take a large amount of time on CPU. With the Xilinx Cosine Similarity Acceleration, speedups > ~ 80x can be achieved.

Run Jupyter Notebook

  • Install the Drug Similarity Demo Plugin

(fpga)$ cd drug_similarity
(fpga)$ su - tigergraph
Password:
(fpga)$ bin/install_udf.sh
  • Run the command below to start Jupyter Notebook

(fpga)$ cd jupyter-demo
(fpga)$ jupyter notebook drug_similarity_TG_demo.ipynb
  • Follow the step-by-step instructions in the notebook once it is loaded in your browser.

This Notebook can be downloaded from Healthcare Conconcept Similarity with UMLS notebook on GitHub.