DOI and Persistent Identifiers: How Scientific Articles Are Linked
A Digital Object Identifier — the string that looks like 10.1038/nature12373 appended to nearly every journal article published after 2000 — is one of those pieces of infrastructure so quietly essential that most researchers never think about it until it breaks. DOIs and their cousins in the persistent identifier ecosystem are the mechanism that keeps scientific literature findable even as publishers restructure, journals change hands, and URLs rot. This page explains what these identifiers are, how the resolution system works, when different identifier types apply, and how researchers and institutions decide which system fits a given need.
Definition and scope
A Digital Object Identifier is a permanent, unique alphanumeric string assigned to a digital object — most commonly a journal article, dataset, preprint, or book chapter — and registered through the International DOI Foundation (IDF), the governance body that oversees the DOI system (International DOI Foundation). The DOI itself is not a URL. It is a name — specifically a handle — that resolves to a URL through an intermediary system, which means the underlying web address can change without breaking the identifier.
The DOI system runs on the Handle System, developed at the Corporation for National Research Initiatives (CNRI). As of 2023, CrossRef — the largest DOI registration agency for scholarly publishing — manages more than 148 million DOI records (CrossRef), covering journal articles, conference proceedings, books, and reports from thousands of publishers worldwide.
DOIs live under the broader umbrella of persistent identifiers (PIDs), a category that also includes:
- ARK (Archival Resource Key): widely used by libraries, archives, and cultural institutions to identify digital objects without publisher intermediaries
- ORCID iD: a 16-digit identifier for researchers rather than publications, maintained by ORCID
- ROR (Research Organization Registry): assigns unique identifiers to research institutions, maintained at ror.org
- ISSN: the 8-digit International Standard Serial Number identifying a journal title as a whole, rather than individual articles, administered by the ISSN International Centre
Each serves a distinct layer of the scholarly record. A single published article might carry a DOI (for the article), an ORCID for each author, a ROR for the funding institution, and an ISSN for the journal it appears in.
How it works
When a publisher joins CrossRef or another DOI registration agency, it gains the ability to mint DOIs under a licensed prefix — the portion before the slash, such as 10.1038 for Nature Publishing Group. The suffix after the slash is assigned by the publisher and must be unique within that prefix. The complete DOI is then registered with CrossRef's metadata deposit system, which stores bibliographic metadata alongside the target URL.
Resolution follows a straightforward path:
- A reader clicks a DOI link (typically formatted as
https://doi.org/10.xxxx/xxxxx) - The doi.org resolver queries the Handle System
- The Handle System returns the current registered URL for that identifier
- The reader is redirected to the publisher's landing page
The critical protection this provides: if a publisher migrates to a new platform or is acquired — as happened when Elsevier acquired Mendeley in 2013, or when Springer merged with Nature Publishing Group — the DOI continues to resolve correctly, because the registered URL is simply updated in CrossRef's records. The Handle System itself is described in detail in CNRI's Handle System documentation.
For researchers tracking citations, broken links in a paper's reference list are more than an annoyance — they represent genuine loss of verifiable provenance. This is precisely the failure mode DOIs were designed to prevent, and it connects directly to the broader concerns around data availability and reproducibility in the scientific record.
Common scenarios
Journal articles: The most common DOI use case. CrossRef assigns DOIs at article level, enabling citation metrics systems — including those discussed in impact factor and journal metrics — to accurately attribute citations across publishers.
Preprints: Preprint servers including arXiv, bioRxiv, and medRxiv assign their own DOIs through DataCite, a second major registration agency distinct from CrossRef. A preprint DOI and the subsequent published-article DOI coexist as separate identifiers; CrossRef's metadata can link them via a "relation" field. The distinction between preprint and refereed-article identifiers matters more than it might appear — see preprint servers vs refereed journals for the full picture.
Research data: DataCite (datacite.org) specializes in dataset DOIs, and funders including the National Institutes of Health increasingly require that deposited datasets carry a DataCite DOI for discoverability. The NIH Data Management and Sharing Policy, effective January 2023 (NIH Office of Extramural Research), makes this expectation explicit for funded research.
Retractions: When an article is retracted, the DOI remains active — it resolves to a retraction notice rather than disappearing. This preserves the historical record while flagging the problem, a principle endorsed by the Committee on Publication Ethics (COPE) retraction guidelines (COPE).
Decision boundaries
Choosing the appropriate persistent identifier depends on what is being identified and who needs to find it:
| Object type | Primary identifier | Registration agency |
|---|---|---|
| Journal article | DOI | CrossRef |
| Dataset | DOI | DataCite |
| Preprint | DOI | DataCite |
| Researcher | ORCID iD | ORCID |
| Institution | ROR ID | ROR |
| Journal title | ISSN | ISSN International Centre |
The boundary between CrossRef and DataCite is occasionally blurry — both issue DOIs, but CrossRef focuses on scholarly literature with rich citation-linking metadata, while DataCite is optimized for research outputs that are not traditional publications. A genomic dataset deposited in NCBI's Gene Expression Omnibus, for instance, would typically carry a DataCite DOI.
ARKs present a different decision entirely. Libraries and archives prefer ARKs because they are free to assign without registration fees and are not tied to a commercial or consortium intermediary. The California Digital Library maintains ARK infrastructure and publishes the ARK specification (N2T.net / CDL). For journal publishing, however, DOI through CrossRef remains the dominant standard, primarily because CrossRef's citation-linking network — integrated into every major journal indexing database — makes DOIs functionally indispensable for citation tracking.
The broader context for these standards fits within what the scholarly publishing community is trying to accomplish: a navigable, interconnected record of science. A good starting point for that larger picture is the scientific journal authority index, which maps the full terrain of how journals, metrics, access models, and identifiers fit together.
References
- International DOI Foundation
- CrossRef
- DataCite
- ORCID
- Research Organization Registry (ROR)
- ISSN International Centre
- CNRI Handle System
- ARK Identifier Scheme — arks.org
- NIH Data Management and Sharing Policy — NOT-OD-21-013
- Committee on Publication Ethics (COPE) — Retraction Guidelines