Journal Indexing Databases: PubMed, Scopus, Web of Science, and DOAJ

Getting a paper published is one milestone. Getting it found is another problem entirely — and that's where indexing databases do the heavy lifting. This page examines four of the most consequential indexing systems in academic publishing: PubMed, Scopus, Web of Science, and the Directory of Open Access Journals (DOAJ). It covers how each operates, what drives inclusion decisions, and why the distinctions between them matter for researchers, institutions, and readers navigating the scientific literature.


Definition and scope

A journal indexing database is a structured, searchable repository that aggregates bibliographic records — titles, abstracts, author information, citations, and sometimes full text — across a defined set of peer-reviewed publications. Indexing is not archiving. An archive stores content; an index certifies and organizes it, making work discoverable and citeable within the formal scientific record.

The four databases covered here represent distinct institutional models. PubMed is operated by the National Library of Medicine (NLM), a federal agency within the U.S. National Institutes of Health — making it government-funded, free to search, and focused primarily on biomedical and life sciences. Scopus and Web of Science are commercial products: Scopus is owned by Elsevier and Web of Science (formerly part of Thomson Reuters) is owned by Clarivate. DOAJ is a nonprofit, community-maintained directory with a specific mandate to index only legitimate open-access journals.

Combined, these four databases index well over 100 million scholarly records. Web of Science covers approximately 21,000 journals (Clarivate), Scopus indexes around 27,000 (Elsevier Scopus), PubMed's MEDLINE component covers roughly 5,200 journals selected by the NLM Literature Selection Technical Review Committee (NLM MEDLINE Journal Selection), and DOAJ lists over 20,000 open-access journals as of its public-facing count (DOAJ).

Understanding these databases is foundational to making sense of journal metrics like impact factor and citation scores, since nearly all such metrics are calculated from citation data drawn specifically from these indexed pools.


Core mechanics or structure

Each database ingests journal content through a distinct pipeline. PubMed/MEDLINE receives structured metadata in a format called MEDLINE XML or PubMed XML from publishers who have passed the journal selection process. NLM assigns Medical Subject Headings (MeSH) — a controlled vocabulary of approximately 30,000 terms — to individual articles, enabling precision retrieval that keyword search alone cannot match.

Web of Science and Scopus operate on a citation-graph model. They don't just record that a paper exists; they map who cites whom. Every indexed reference list becomes a data point in a network. This is how metrics like the h-index and Eigenfactor score are computed — they emerge from citation linkage, not from raw publication counts. Web of Science subdivides its index into named collections: the Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (AHCI) are the three principal ones, each with separate journal selection processes.

DOAJ operates differently. It doesn't primarily function as a citation tracker. Its core product is a whitelist: journals that meet its criteria receive a DOAJ Seal or basic listing, and that status signals legitimacy to librarians, funders, and researchers wary of predatory journals. DOAJ also feeds metadata into services like EBSCO and CrossRef, amplifying discoverability for listed journals without managing citation linkage itself.


Causal relationships or drivers

What actually gets a journal indexed — or removed — comes down to editorial quality signals, not subject matter alone.

For MEDLINE, the NLM's Literature Selection Technical Review Committee evaluates journals on scope and coverage, quality of content, editorial quality, production quality, and audience, as documented in NLM's MEDLINE journal selection criteria. Journals must demonstrate a track record of peer review and must publish primarily original research.

Clarivate's Web of Science selection criteria are published in its editorial selection process documentation and weight factors including publication timeliness, editorial conventions, peer review rigor, and citation analysis. A journal that attracts inbound citations from already-indexed sources gains a structural advantage in the selection review.

Scopus uses a Content Selection and Advisory Board (CSAB) that evaluates title-level submissions against published criteria covering peer review type, editorial policy, the regularity of publishing, and ethical standards. Scopus accepts a broader disciplinary range than MEDLINE and indexes content across sciences, social sciences, arts, and humanities.

DOAJ's inclusion is driven by open-access compliance — journals must provide unrestricted access to full text — and by a vetting process that examines ISSN validity, licensing transparency, and absence of deceptive practices. The organization publicly delisted over 3,300 journals in 2016 following a major quality review, a fact that DOAJ itself documented as part of efforts to strengthen the reliability of its whitelist.


Classification boundaries

The four databases do not compete on identical terrain. Their scopes diverge in meaningful ways:

The open-access publishing landscape cuts across all four: a fully open-access journal might appear simultaneously in DOAJ, Scopus, Web of Science, and PubMed, with each database adding distinct discovery and credentialing value.


Tradeoffs and tensions

The commercial databases — Scopus and Web of Science — offer citation analytics and impact metrics that PubMed and DOAJ do not provide natively. That analytical power comes with a cost: institutional subscriptions to Web of Science or Scopus can run into hundreds of thousands of dollars annually for large research universities, creating an access asymmetry between well-funded institutions and smaller or lower-income-country institutions.

Coverage breadth introduces its own tension. Scopus indexes more journals than Web of Science, but broader coverage means quality thresholds must accommodate more variance. Web of Science's narrower selection is sometimes cited as a stricter quality signal — a journal indexed in SCIE faces more selective criteria than one indexed only in Scopus — though "more selective" does not automatically mean "more useful" depending on the research question.

DOAJ's whitelist function has genuine value in identifying legitimate open-access venues and filtering out journals with deceptive practices. But DOAJ listing alone does not confer the citation-tracking capabilities that tenure committees and funding agencies use when evaluating research output. A journal can be listed in DOAJ and simultaneously lack any presence in Scopus or Web of Science, leaving authors' work discoverable as open access but analytically invisible in metric-driven evaluation systems. This tension is particularly sharp for article processing charge decisions, where authors may pay to publish open access in a DOAJ-listed journal expecting full visibility, only to find citation metrics unavailable.

The federal open-access mandate in the United States, updated by the Office of Science and Technology Policy in 2022 to require immediate public access to federally funded research, increases the practical importance of DOAJ's verification function even as it doesn't resolve the metric gap.


Common misconceptions

"PubMed and MEDLINE are the same thing." PubMed is the search interface. MEDLINE is the curated database within it. PubMed also contains records from PubMed Central (PMC) and in-process citations that are not yet full MEDLINE records. The distinction matters because MEDLINE assignment carries MeSH indexing; other PubMed records may not.

"If a journal is in DOAJ, it's peer-reviewed." DOAJ requires journals to claim peer review, but it does not independently verify the quality or rigor of that process. A journal listing DOAJ inclusion as a quality credential is citing a necessary-but-not-sufficient condition. The peer review process itself varies enormously even among indexed journals.

"Web of Science Impact Factor applies to all Web of Science-indexed journals." The Journal Impact Factor (JIF), calculated by Clarivate, applies specifically to journals indexed in SCIE, SSCI, or AHCI — the Core Collection. Journals indexed only in the Emerging Sources Citation Index (ESCI) do not receive a JIF, though they appear in the broader Web of Science platform.

"Scopus and Web of Science cover the same journals." The two databases share a large overlap but are not identical. Studies comparing the two have found that each indexes titles the other does not, and disciplinary coverage ratios differ — Scopus has historically indexed more social science and humanities journals relative to Web of Science's science-heavy weighting.

"Indexing guarantees quality." Indexing is an evidence-based quality signal, not a guarantee. Retractions occur in indexed journals; the retraction and correction process operates separately from indexing decisions in most cases.


Checklist or steps

Factors verified when evaluating whether a journal appears in a given database:


Reference table or matrix

Feature PubMed/MEDLINE Scopus Web of Science DOAJ
Operator U.S. National Library of Medicine (federal) Elsevier (commercial) Clarivate (commercial) DOAJ Community (nonprofit)
Access to search Free Subscription Subscription Free
Primary discipline focus Biomedical/life sciences Multidisciplinary Multidisciplinary All (OA only)
Journals indexed (approx.) ~5,200 (MEDLINE) ~27,000 ~21,000 ~20,000
Citation tracking No (links to PubMed Central) Yes Yes No
Article-level MeSH indexing Yes No No Partial (article metadata)
Impact Factor calculated No No (uses CiteScore) Yes (for Core Collection) No
Open access only No No No Yes
Primary quality function Biomedical literature curation Broad citation analytics Citation analytics + prestige metrics OA legitimacy whitelist

The complete picture of how any given journal or article fits into the scholarly record — and who can find it — sits at the intersection of all four of these systems. A biomedical open-access journal might be indexed in all four simultaneously: MEDLINE for MeSH-tagged discoverability, Scopus and Web of Science for citation tracking, and DOAJ as verification of its open-access status. Most journals occupy a smaller portion of that map, and understanding which portions matter depends heavily on discipline, funder requirements, and the intended audience for the research. The broader landscape of journal types and publishing structures is covered across the Scientific Journal Authority index.


References