Science Bulletin, Volume 70, Issue 4, 26 February 2025, Pages 452-453, https://doi.org/10.1016/j.scib.2024.10.021
Involved members of MultiTroph: Michael C. Orr, Georg Albert, Arong Luo, Huijie Qiao, Ming-Qiang Wang, Douglas Chesters, Chao-Dong Zhu
Summary: The article discusses the challenge of ‘dark data’, i.e. scientific data that is technically available but practically inaccessible due to missing metadata, poor standardization, or disappearing repositories. The authors argue that although open science policies are spreading, inconsistent data-sharing practices hinder large-scale biological research and long-term usability. They propose standardized, future-proof data formats, mandatory metadata documentation, centralized indexing, and repository improvements (including DOIs, file-level access, and cross-linking of scripts) to ensure datasets are Complete, Legible, Error-free, Accessible, and Non-redundant (CLEAN).
Conclusion: To prevent data loss and improve reproducibility, structural changes in repositories and stronger journal policies are urgently needed, alongside retroactive efforts to recover historical dark datasets. Without immediate action, the rapid generation of new biological data will continue to be offset by equally rapid losses, perpetuating knowledge gaps.