Unified Medical Language System (UMLS)


The Unified Medical Language System or UMLS is collection or large group of controlled vocabularies that are from biomedical and health related sciences. These vocabularies come from all over the globe consisting of both national and international terms and concepts. With a large number of terms and concepts found in these vocabularies a mapping system was developed that could assist in linking the terms together. The UMLS was developed by the National Library of Medicine (NLM) in 1986 by Donald A. B. Lindberg, M.D. who, at the time, was the director of the NLM. The NLM also maintains and updates the UMLS, updating it every quarter. The information found in the UMLS is available to the public free of charge.

The goal of the UMLS is to provide interoperability and easy availability to certain information. It is there to make it easier for health professionals or researchers to find relevant information from a wide variety of automated sources, such as electronic health records (EHRs), journals, databases, etc. The UMLS also tries to focus on bridging the communication gap between a wide variety of cultures. To fit these goals the NML devised the UMLS which they use to facilitate the development of systems that will be programmed to understand the meaning of the many terms found in health and biomedicine. To do this, the NLM produces the UMLS Knowledge Sources which are simply databases. Then they give system developers the application software tools that will allow the system developer to devise a system that will allow them to build or enhance an electronic information system. This information system will then be able to understand the terms and concepts and will be able to properly access, retrieve, create or modify any biomedical or health data.

The UMLS consists of three Knowledge Sources. These three sources are the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon.

  • Metathesaurus - The main database of the UMLS and functions much like a regular thesaurus. The Metathesaurus consists of a wide variety of terms and concepts from many controlled vocabularies and also includes the relationship between these terms.
  • Semantic Network - Consists of very broad subjects called Semantic Types which provide a consistent categorization of all terms or concepts represented in the Metathesaurus. Secondly, is consists of a set of useful and important relationships called Semantic Relations that exist between the many Semantic Types.
  • SPECIALIST Lexicon - A database that was developed to provide the lexical information needed for the Natural Language Processing (NLP). This lexicon is intended to be a general English lexicon that includes many biomedical terms. The lexicon is said to include over 200,000 words and terms.

These knowledge sources are distributed with tools called lexical tools that are usually a set of Java programs that allow the user to manage lexical variation in biomedical text. These lexical tools use the SPECIALIST Lexicon to generate lexical variants of terms that would be appropriate for use in indexing. Also included is a program called MetamorphoSYS which is the installation wizard and customization program. Some other tools used are Ivg, which is a program that uses the SPECIALIST lexicon to generate lexical variants of a given term and to support the parsing of natural language text. There is also MetaMap which is an online tool is an online tool that, when given an arbitrary piece of text, finds and returns the relevant Metathesaurus concepts.


The UMLS is used in a variety of ways by many institutions. It can be used for information retrieval, create patient data, and it can create research data. The UMLS can also be used to control the communication between several different systems. It can also develop systems that will parse the terms involved. Some of the applications created using the UMLS that are used by the NLM include PubMed, Indexing Initiative, Enterprise Vocubulary Services, and ClinicalTrials.gov. PubMed's primary component is MEDLINE which is a database that contains 16 million articles from journals in regards to biomedicine. MEDLINE's records are indexed using MeSH, or Medical Subject Headings. You can also access MEDLINE from other applications such as the NLM Gateway. Other applications have been developed by other organizations other than just the NLM. For example, the Agency for Healthcare Research and Quality developed the National Guidelines Clearinghouse and the National Quality Measures Clearinghouse. With the UMLS being free of charge, there have been many customized applications developed using it. All users have to do is agree to the UMLS license agreement.

Here is a list of some other applications created using the UMLS:

Graphics (screenshots)

Screenshot of ClinicalTrials.gov main page.

Screenshot of PubMed's main page.

Screenshot of TOXNET's main page.

Web Resources

Related Terminology

Controlled Vocabulary - Specific words and phrases (descriptors) used when creating subject headings for a book, article, etc. for a specific index or catalog.

Electronic Health Record - A medical record or any other information relating to the past, present or future physical and mental health, or condition of a patient which resides in computers which capture, transmit, receive, store, retrieve, link, and manipulate multimedia data for the primary purpose of providing health care and health-related services.

Database - A comprehensive collection of related data organized for convenient access, generally in a computer.

Natural Language Processing - A range of computational techniques for analyzing and representing naturally occurring text (free text) at one or more levels of linguistic analysis (eg, morphological, syntactic, semantic, pragmatic) for the purpose of achieving human-like language processing for knowledge-intensive applications.

Lexicon - The vocabulary of a particular language, field, social class, person, etc.

Interoperability - The ability of software and hardware on different machines from different vendors to share data.

Biomedicine - The application of the principles of the natural sciences, especially biology and physiology, to clinical medicine.