Journal Indexing Databases: PubMed, Scopus, Web of Science, and DOAJ
Getting a paper published is one milestone. Getting it found is another problem entirely — and that's where indexing databases do the heavy lifting. This page examines four of the most consequential indexing systems in academic publishing: PubMed, Scopus, Web of Science, and the Directory of Open Access Journals (DOAJ). It covers how each operates, what drives inclusion decisions, and why the distinctions between them matter for researchers, institutions, and readers navigating the scientific literature.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps (non-advisory)
- Reference table or matrix
Definition and scope
A journal indexing database is a structured, searchable repository that aggregates bibliographic records — titles, abstracts, author information, citations, and sometimes full text — across a defined set of peer-reviewed publications. Indexing is not archiving. An archive stores content; an index certifies and organizes it, making work discoverable and citeable within the formal scientific record.
The four databases covered here represent distinct institutional models. PubMed is operated by the National Library of Medicine (NLM), a federal agency within the U.S. National Institutes of Health — making it government-funded, free to search, and focused primarily on biomedical and life sciences. Scopus and Web of Science are commercial products: Scopus is owned by Elsevier and Web of Science (formerly part of Thomson Reuters) is owned by Clarivate. DOAJ is a nonprofit, community-maintained directory with a specific mandate to index only legitimate open-access journals.
Combined, these four databases index well over 100 million scholarly records. Web of Science covers approximately 21,000 journals (Clarivate), Scopus indexes around 27,000 (Elsevier Scopus), PubMed's MEDLINE component covers roughly 5,200 journals selected by the NLM Literature Selection Technical Review Committee (NLM MEDLINE Journal Selection), and DOAJ lists over 20,000 open-access journals as of its public-facing count (DOAJ).
Understanding these databases is foundational to making sense of journal metrics like impact factor and citation scores, since nearly all such metrics are calculated from citation data drawn specifically from these indexed pools.
Core mechanics or structure
Each database ingests journal content through a distinct pipeline. PubMed/MEDLINE receives structured metadata in a format called MEDLINE XML or PubMed XML from publishers who have passed the journal selection process. NLM assigns Medical Subject Headings (MeSH) — a controlled vocabulary of approximately 30,000 terms — to individual articles, enabling precision retrieval that keyword search alone cannot match.
Web of Science and Scopus operate on a citation-graph model. They don't just record that a paper exists; they map who cites whom. Every indexed reference list becomes a data point in a network. This is how metrics like the h-index and Eigenfactor score are computed — they emerge from citation linkage, not from raw publication counts. Web of Science subdivides its index into named collections: the Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (AHCI) are the three principal ones, each with separate journal selection processes.
DOAJ operates differently. It doesn't primarily function as a citation tracker. Its core product is a whitelist: journals that meet its criteria receive a DOAJ Seal or basic listing, and that status signals legitimacy to librarians, funders, and researchers wary of predatory journals. DOAJ also feeds metadata into services like EBSCO and CrossRef, amplifying discoverability for listed journals without managing citation linkage itself.
Causal relationships or drivers
What actually gets a journal indexed — or removed — comes down to editorial quality signals, not subject matter alone.
For MEDLINE, the NLM's Literature Selection Technical Review Committee evaluates journals on scope and coverage, quality of content, editorial quality, production quality, and audience, as documented in NLM's MEDLINE journal selection criteria. Journals must demonstrate a track record of peer review and must publish primarily original research.
Clarivate's Web of Science selection criteria are published in its editorial selection process documentation and weight factors including publication timeliness, editorial conventions, peer review rigor, and citation analysis. A journal that attracts inbound citations from already-indexed sources gains a structural advantage in the selection review.
Scopus uses a Content Selection and Advisory Board (CSAB) that evaluates title-level submissions against published criteria covering peer review type, editorial policy, the regularity of publishing, and ethical standards. Scopus accepts a broader disciplinary range than MEDLINE and indexes content across sciences, social sciences, arts, and humanities.
DOAJ's inclusion is driven by open-access compliance — journals must provide unrestricted access to full text — and by a vetting process that examines ISSN validity, licensing transparency, and absence of deceptive practices. The organization publicly delisted over 3,300 journals in 2016 following a major quality review, a fact that DOAJ itself documented as part of efforts to strengthen the reliability of its whitelist.
Classification boundaries
The four databases do not compete on identical terrain. Their scopes diverge in meaningful ways:
- Disciplinary focus: MEDLINE is biomedical. Web of Science spans natural sciences, social sciences, and arts/humanities across separate indexes. Scopus covers a similar multidisciplinary range. DOAJ is discipline-agnostic but access-model specific — open access only.
- Access model neutrality: PubMed, Scopus, and Web of Science index both subscription and open-access journals. DOAJ indexes only open-access titles, functioning as a complement rather than a competitor to the others.
- Article-level vs. journal-level: PubMed indexes at the article level and assigns MeSH terms to individual records. DOAJ is primarily a journal-level directory; it does index article metadata for a subset of listed journals, but that is not its primary function.
- Funding model: PubMed is taxpayer-funded and free to search. Scopus and Web of Science require institutional or individual subscriptions to access full functionality. DOAJ is free and open, sustained by membership fees from libraries and publishers.
The open-access publishing landscape cuts across all four: a fully open-access journal might appear simultaneously in DOAJ, Scopus, Web of Science, and PubMed, with each database adding distinct discovery and credentialing value.
Tradeoffs and tensions
The commercial databases — Scopus and Web of Science — offer citation analytics and impact metrics that PubMed and DOAJ do not provide natively. That analytical power comes with a cost: institutional subscriptions to Web of Science or Scopus can run into hundreds of thousands of dollars annually for large research universities, creating an access asymmetry between well-funded institutions and smaller or lower-income-country institutions.
Coverage breadth introduces its own tension. Scopus indexes more journals than Web of Science, but broader coverage means quality thresholds must accommodate more variance. Web of Science's narrower selection is sometimes cited as a stricter quality signal — a journal indexed in SCIE faces more selective criteria than one indexed only in Scopus — though "more selective" does not automatically mean "more useful" depending on the research question.
DOAJ's whitelist function has genuine value in identifying legitimate open-access venues and filtering out journals with deceptive practices. But DOAJ listing alone does not confer the citation-tracking capabilities that tenure committees and funding agencies use when evaluating research output. A journal can be listed in DOAJ and simultaneously lack any presence in Scopus or Web of Science, leaving authors' work discoverable as open access but analytically invisible in metric-driven evaluation systems. This tension is particularly sharp for article processing charge decisions, where authors may pay to publish open access in a DOAJ-listed journal expecting full visibility, only to find citation metrics unavailable.
The federal open-access mandate in the United States, updated by the Office of Science and Technology Policy in 2022 to require immediate public access to federally funded research, increases the practical importance of DOAJ's verification function even as it doesn't resolve the metric gap.
Common misconceptions
"PubMed and MEDLINE are the same thing." PubMed is the search interface. MEDLINE is the curated database within it. PubMed also contains records from PubMed Central (PMC) and in-process citations that are not yet full MEDLINE records. The distinction matters because MEDLINE assignment carries MeSH indexing; other PubMed records may not.
"If a journal is in DOAJ, it's peer-reviewed." DOAJ requires journals to claim peer review, but it does not independently verify the quality or rigor of that process. A journal listing DOAJ inclusion as a quality credential is citing a necessary-but-not-sufficient condition. The peer review process itself varies enormously even among indexed journals.
"Web of Science Impact Factor applies to all Web of Science-indexed journals." The Journal Impact Factor (JIF), calculated by Clarivate, applies specifically to journals indexed in SCIE, SSCI, or AHCI — the Core Collection. Journals indexed only in the Emerging Sources Citation Index (ESCI) do not receive a JIF, though they appear in the broader Web of Science platform.
"Scopus and Web of Science cover the same journals." The two databases share a large overlap but are not identical. Studies comparing the two have found that each indexes titles the other does not, and disciplinary coverage ratios differ — Scopus has historically indexed more social science and humanities journals relative to Web of Science's science-heavy weighting.
"Indexing guarantees quality." Indexing is an evidence-based quality signal, not a guarantee. Retractions occur in indexed journals; the retraction and correction process operates separately from indexing decisions in most cases.
Checklist or steps
Factors verified when evaluating whether a journal appears in a given database:
- [ ] Confirm the journal has a valid ISSN registered with the ISSN International Centre
- [ ] Search the database's own title list (e.g., NLM's MEDLINE journal list, Scopus Sources, or Master Journal List for Web of Science)
- [ ] Check whether the journal appears in ESCI (Web of Science) vs. Core Collection — these carry different metric eligibility
- [ ] Verify DOAJ listing status at doaj.org using the journal title or ISSN directly
- [ ] Distinguish between the journal being indexed and individual articles from that journal being indexed (new journals may be indexed with a coverage start date that excludes older issues)
- [ ] Confirm that PubMed/MEDLINE records include MeSH terms, indicating full MEDLINE indexing rather than in-process or publisher-submitted status
- [ ] Note whether Scopus or Web of Science records for the journal include citation data or only bibliographic metadata (some regional journals have partial coverage)
- [ ] Cross-reference with ScImago Journal Rank (SJR), which derives from Scopus data, as a secondary signal of indexed standing
Reference table or matrix
| Feature | PubMed/MEDLINE | Scopus | Web of Science | DOAJ |
|---|---|---|---|---|
| Operator | U.S. National Library of Medicine (federal) | Elsevier (commercial) | Clarivate (commercial) | DOAJ Community (nonprofit) |
| Access to search | Free | Subscription | Subscription | Free |
| Primary discipline focus | Biomedical/life sciences | Multidisciplinary | Multidisciplinary | All (OA only) |
| Journals indexed (approx.) | ~5,200 (MEDLINE) | ~27,000 | ~21,000 | ~20,000 |
| Citation tracking | No (links to PubMed Central) | Yes | Yes | No |
| Article-level MeSH indexing | Yes | No | No | Partial (article metadata) |
| Impact Factor calculated | No | No (uses CiteScore) | Yes (for Core Collection) | No |
| Open access only | No | No | No | Yes |
| Primary quality function | Biomedical literature curation | Broad citation analytics | Citation analytics + prestige metrics | OA legitimacy whitelist |
The complete picture of how any given journal or article fits into the scholarly record — and who can find it — sits at the intersection of all four of these systems. A biomedical open-access journal might be indexed in all four simultaneously: MEDLINE for MeSH-tagged discoverability, Scopus and Web of Science for citation tracking, and DOAJ as verification of its open-access status. Most journals occupy a smaller portion of that map, and understanding which portions matter depends heavily on discipline, funder requirements, and the intended audience for the research. The broader landscape of journal types and publishing structures is covered across the Scientific Journal Authority index.
References
- National Library of Medicine — MEDLINE Overview
- NLM — MEDLINE Journal Selection Criteria
- Clarivate — Web of Science Editorial Selection Process
- Elsevier — Scopus Content Coverage
- DOAJ — About the Directory of Open Access Journals
- DOAJ — DOAJ Seal
- U.S. Office of Science and Technology Policy — 2022 Public Access Memo
- ISSN International Centre