Abstract
Research data management (RDM) has emerged as a critical determinant of research integrity and reproducibility, particularly within doctoral research contexts where methodological rigour underpins scholarly outcomes. This dissertation synthesises current evidence examining how specific data management practices—including documentation, version control, standardised file organisation, and early-stage planning—influence error rates, researcher stress, and time-to-submission among PhD candidates. Drawing upon a comprehensive literature review of peer-reviewed studies spanning multiple disciplines, this work identifies strong consensus that robust RDM practices substantially reduce human error in data handling, lower cognitive load and associated anxiety, and accelerate project completion timelines. The analysis reveals that foundational practices such as Data Management Plans (DMPs), electronic laboratory notebooks (ELNs), and systematic backup protocols create measurable improvements in research efficiency. However, significant challenges persist, including inconsistent adoption across laboratories and departments, perceptions of DMPs as administrative burdens, and resource constraints limiting training provision. The findings underscore the necessity for institutional investment in RDM infrastructure and training, whilst highlighting research gaps regarding non-STEM disciplines and long-term institutional impacts. This synthesis offers evidence-based recommendations for doctoral researchers, supervisors, and institutions seeking to enhance research quality and researcher wellbeing.
Introduction
The contemporary research landscape faces an unprecedented reproducibility crisis, with estimates suggesting that a substantial proportion of published findings cannot be independently verified or replicated (Ioannidis, 2005; Baker, 2016). This crisis has profound implications for scientific progress, public trust in research, and the efficient allocation of research resources. Within this context, doctoral researchers occupy a particularly vulnerable position: they must navigate complex methodological requirements, manage substantial datasets, and produce original contributions to knowledge whilst operating under significant time pressures and resource constraints.
Research data management encompasses the organisation, documentation, storage, and preservation of research data throughout the project lifecycle (Briney, Coates and Goben, 2020). Effective RDM practices have been increasingly recognised as fundamental to ensuring research integrity, facilitating reproducibility, and enabling the verification of scientific claims (Cunha-Oliveira, Ioannidis and Oliveira, 2024). For doctoral researchers specifically, robust data management represents not merely a technical requirement but a foundational competency that shapes the quality and credibility of their scholarly contributions.
The significance of this topic extends beyond individual research projects. Poor data management practices contribute to wasted resources, duplicated efforts, and the publication of unreliable findings that may misinform subsequent research and policy decisions (Kovács, Hoekstra and Aczél, 2020). Furthermore, the psychological burden of managing complex datasets without adequate systems or training can exacerbate the well-documented mental health challenges faced by doctoral candidates (Levecque et al., 2017). Understanding which specific practices most effectively reduce errors, minimise stress, and expedite project completion therefore has substantial implications for research quality, researcher wellbeing, and institutional efficiency.
This dissertation addresses a critical gap in the literature by synthesising evidence on the relationships between specific RDM practices and measurable outcomes in doctoral research contexts. Whilst numerous studies have examined individual aspects of data management or focused on particular disciplinary contexts, there remains a need for comprehensive synthesis that identifies the practices with the strongest evidence base and clarifies their mechanisms of action. Such synthesis is essential for informing the development of training programmes, institutional policies, and individual researcher practices that can meaningfully enhance research integrity and reproducibility.
Aim and objectives
The overarching aim of this dissertation is to critically evaluate the evidence regarding which research data management practices most effectively reduce errors, lower stress, and shorten time-to-submission in PhD projects, thereby contributing to enhanced research integrity and reproducibility.
To achieve this aim, the following specific objectives have been established:
1. To identify and categorise the foundational data management practices most frequently associated with improved research outcomes in doctoral contexts, including documentation protocols, version control systems, file organisation conventions, and planning tools.
2. To evaluate the strength of evidence linking specific RDM practices to error reduction in data handling, analysis, and reporting within PhD research projects.
3. To examine the mechanisms through which robust data management practices influence researcher stress and cognitive load during the doctoral journey.
4. To assess the impact of effective RDM on project timelines, specifically examining how systematic data practices affect time-to-submission for doctoral candidates.
5. To identify barriers to the adoption of effective RDM practices and evaluate strategies for overcoming these challenges at individual, laboratory, and institutional levels.
6. To highlight gaps in current knowledge and propose directions for future research that could further enhance understanding of RDM effectiveness in doctoral contexts.
Methodology
This dissertation employs a systematic literature synthesis methodology to examine the relationships between research data management practices and outcomes in doctoral research contexts. Literature synthesis represents an appropriate methodological approach for this investigation, enabling the integration of findings from diverse studies to establish patterns, identify consensus, and reveal knowledge gaps (Snyder, 2019).
### Search strategy and source identification
The primary evidence base for this synthesis derives from a comprehensive search conducted across multiple academic databases, including Semantic Scholar, PubMed, and associated repositories. This search identified 1,142 potentially relevant papers, which underwent systematic screening for relevance to PhD-level research data management practices with specific focus on error reduction, stress mitigation, and project completion efficiency. Following initial screening, 794 papers were assessed for eligibility, of which 614 met predefined inclusion criteria. The final synthesis incorporates the 50 most relevant papers, selected based on methodological quality, relevance to doctoral contexts, and contribution to the research questions.
### Inclusion and exclusion criteria
Papers were included if they addressed research data management practices in academic or research settings, examined outcomes related to error reduction, researcher wellbeing, or project efficiency, and were published in peer-reviewed venues. Studies were excluded if they focused exclusively on data management in non-research contexts, lacked empirical or systematic evidence, or were published in sources that could not be verified as peer-reviewed.
### Search organisation
Eight unique search groups encompassing 22 targeted searches were employed to ensure comprehensive coverage across foundational RDM concepts, PhD-specific contexts, interdisciplinary approaches, challenges and limitations, and outcome decomposition. This structured approach enabled systematic identification of evidence across the multiple dimensions relevant to the research objectives.
### Analysis approach
The synthesis employed thematic analysis to organise findings according to the primary outcome categories: error reduction, stress reduction, and time-to-submission effects. Evidence strength was evaluated based on study design, sample characteristics, consistency of findings across studies, and methodological rigour. Claims were categorised according to evidence strength on a scale permitting comparison across different practice domains.
### Supplementary sources
To enrich the synthesis and provide additional context, supplementary searches were conducted to identify relevant institutional guidance, policy documents, and methodological literature from university sources and research organisations. These sources were evaluated for quality and relevance before incorporation.
### Limitations of the methodology
This synthesis approach carries inherent limitations, including potential publication bias favouring positive findings, heterogeneity in study designs and outcome measures that complicates direct comparison, and the possibility that relevant studies in specialised disciplinary venues may not have been captured. These limitations are acknowledged in the interpretation of findings.
Literature review
### Foundational concepts in research data management
Research data management encompasses the full spectrum of activities involved in handling research data throughout the project lifecycle, from initial planning through collection, organisation, analysis, preservation, and sharing (Briney, Coates and Goben, 2020). Foundational RDM practices include comprehensive documentation of procedures and data provenance, version control systems, standardised file naming and organisation conventions, regular backup protocols, and the use of specialised tools such as electronic laboratory notebooks (Kanza and Knight, 2022). These practices collectively aim to ensure that data remain findable, accessible, interoperable, and reusable—principles encapsulated in the FAIR data framework that has become increasingly influential in research policy (Wilkinson et al., 2016).
The importance of RDM has grown substantially in recent decades, driven by several converging factors. The exponential growth in data volumes across research disciplines has created unprecedented challenges for organisation and retrieval. Simultaneously, heightened scrutiny of research reproducibility has focused attention on the role of data practices in enabling verification of findings. Funder mandates requiring data management plans and data sharing have formalised expectations, whilst technological advances have created both new possibilities and new complexities for data handling (Michener, 2015).
### Data management plans and early-stage planning
Data Management Plans represent structured documents outlining how research data will be handled throughout a project’s lifecycle. These plans typically address data collection methods, file formats, naming conventions, storage arrangements, access controls, preservation strategies, and sharing intentions (Michener, 2015). The development of DMPs at project outset is increasingly mandated by funding bodies and institutions, reflecting recognition that early planning substantially influences downstream data quality and accessibility.
Evidence supports the value of DMPs in anticipating challenges before they become problematic. Miksa et al. (2019) articulated ten principles for machine-actionable data management plans, emphasising that effective plans should be living documents integrated into research workflows rather than static compliance exercises. Similarly, Jäckel and Lehmann (2023) examined DMPs in collaborative projects, identifying both benefits in coordinating multi-partner data practices and challenges in maintaining plan relevance as projects evolve.
However, the literature also reveals tensions surrounding DMP implementation. Birkbeck, Nagle and Sammon (2022) conducted a comprehensive literature analysis identifying persistent challenges in research data management practices, including the perception among some researchers that DMPs represent administrative burdens rather than practical tools. This perception appears most prevalent when DMPs are disconnected from daily research activities, functioning as box-ticking exercises rather than living guidance documents. The effectiveness of DMPs therefore depends substantially on their integration into routine workflows and their adaptation to project-specific needs.
### Documentation and data provenance
Clear documentation of research procedures and data provenance emerges consistently as among the most critical RDM practices. Documentation serves multiple functions: it enables researchers to understand and reproduce their own procedures, facilitates handovers between team members, supports verification and replication by others, and provides the audit trail necessary for research integrity (Briney, Coates and Goben, 2020). Comprehensive documentation encompasses methodological protocols, data collection procedures, processing steps, analysis decisions, and any modifications made during the research process.
Cunha-Oliveira, Ioannidis and Oliveira (2024) provided detailed guidance on best practices for data management and sharing in experimental biomedical research, emphasising that documentation should be sufficiently detailed to enable replication without recourse to the original researchers. This standard, whilst challenging to achieve, represents the gold standard for research transparency and reproducibility.
The relationship between documentation practices and error reduction operates through multiple mechanisms. Well-documented procedures reduce reliance on memory, which is subject to decay and distortion. They enable detection of inconsistencies or deviations from planned protocols. They also facilitate review by supervisors, collaborators, or peer reviewers who may identify problems not apparent to the primary researcher. The cognitive benefits of externalising procedural knowledge through documentation extend beyond error prevention to include reduced cognitive load and enhanced capacity for higher-order analytical thinking.
### Version control and file organisation
Version control systems track changes to files over time, enabling researchers to review history, recover previous versions, and understand how documents or datasets have evolved (Kanza and Knight, 2022). Originally developed for software engineering, version control tools such as Git have become increasingly adopted in research contexts, particularly for code and analytical scripts but also for data files and documents.
The importance of version control is underscored by evidence regarding common errors in data management. Kovács, Hoekstra and Aczél (2020) surveyed mistakes in data management among psychological researchers, identifying ambiguous naming conventions and lack of version control among the most frequent sources of errors leading to lost time or compromised results. Without systematic version control, researchers risk overwriting important files, losing track of which version produced which results, or being unable to reproduce analyses due to undocumented modifications.
Standardised file naming and organisation conventions complement version control by ensuring that files can be located reliably and their contents understood from their names and locations. Effective naming conventions typically incorporate date information, project identifiers, version numbers, and descriptive content indicators in consistent formats. Folder hierarchies should reflect logical relationships between project components and facilitate navigation by both the primary researcher and others who may need to access the data (Briney, Coates and Goben, 2020).
### Electronic laboratory notebooks and workflow tools
Electronic laboratory notebooks represent a significant technological advance in research documentation, offering advantages over traditional paper notebooks including searchability, backup capabilities, timestamping, integration with analytical tools, and facilitation of collaboration (Kanza and Knight, 2022). ELNs can automatically capture metadata, link to data files and analysis outputs, and provide audit trails of research activities.
The adoption of ELNs and other workflow tools has been associated with improvements in both error reduction and research efficiency. By structuring the documentation process and providing prompts for required information, these tools reduce the likelihood of omissions. Integration with other systems can automate routine tasks and reduce opportunities for manual errors. The searchability of electronic records can substantially accelerate data retrieval compared to paper-based systems.
However, the benefits of ELNs depend on appropriate tool selection and effective implementation. The landscape of available tools is complex and rapidly evolving, and researchers may struggle to identify solutions suited to their specific disciplinary needs and institutional contexts. Training requirements and learning curves can represent barriers to adoption, particularly when researchers perceive limited immediate return on time investment (Kanza and Knight, 2022).
### Training and skill development
The literature consistently emphasises the importance of training in enabling effective RDM implementation. Kapoor (2023) examined approaches to teaching best practices in research data management to PhD students, identifying that structured training programmes can significantly improve both competence and confidence in data handling. Training appears particularly valuable when delivered early in the doctoral journey, before poor habits become entrenched, and when reinforced through ongoing support and practice opportunities.
Eckel and Fina (2024) provided important evidence regarding the role of data management training in research culture change, noting that whilst progress may be “slow but steady,” sustained training efforts, particularly reaching humanities disciplines traditionally less engaged with formal RDM, can shift attitudes and practices over time. Their work suggests that training benefits extend beyond individual skill development to influence broader research culture within institutions.
The content and format of effective training remain areas requiring further investigation. Evidence suggests that practical, hands-on training integrated with researchers’ actual projects is more effective than abstract instruction (Kapoor, 2023). Peer learning and mentorship approaches may complement formal training programmes by providing ongoing support and modelling of good practice. However, resource constraints often limit the scope and availability of training provision, particularly at institutions without dedicated RDM support staff.
Pálsdóttir (2021) examined data literacy and its relationship to research data management, finding that fundamental literacies around data concepts, tools, and practices represent prerequisites for effective data sharing and management. This suggests that training programmes must address foundational competencies before advancing to more sophisticated practices.
### Research data management in doctoral contexts
PhD projects present distinctive challenges for data management that warrant specific consideration. Doctoral researchers typically operate with limited experience, developing methodological expertise whilst simultaneously producing original research. They may receive inconsistent guidance regarding data practices, face pressure to prioritise immediate outputs over systematic organisation, and work within laboratories or departments with varying standards and support (Fina et al., 2025).
The extended duration of PhD projects—typically three to four years or longer—creates particular requirements for data organisation that can withstand the passage of time. Data collected early in a project must remain accessible and comprehensible years later during write-up. Personnel changes, including supervisor departures or additions of collaborators, require that data systems be intelligible to others without extensive personal explanation.
Fina et al. (2025) examined support for the research data management journey of postgraduate students, highlighting the importance of sustained engagement throughout the doctoral process rather than one-time interventions. Their work suggests that effective support involves multiple touchpoints addressing evolving needs as projects progress from planning through data collection, analysis, and dissemination.
### Institutional and disciplinary variation
Significant variation exists in RDM practices and support across institutions and disciplines. Borghi and Van Gulick (2020) surveyed data management and sharing practices and perceptions among psychology researchers, revealing substantial inconsistency in adoption even within a single discipline. They found that whilst many researchers acknowledged the importance of good data management, actual practices often fell short of ideals, particularly regarding documentation and sharing.
Reichmann et al. (2021) examined data management practices in institutional contexts, identifying tensions between administrative requirements and research practices. Their analysis revealed that researchers often experience institutional RDM policies as disconnected from their daily work, potentially undermining engagement. Effective institutional approaches appear to require balancing standardisation with flexibility to accommodate disciplinary differences.
Disciplinary variation represents a particularly important consideration. STEM disciplines have generally led in RDM practice development, driven partly by larger data volumes, stronger funder mandates, and more established cultures of data sharing. Humanities and social sciences have engaged more slowly, though recent years have seen increased attention to RDM in these fields (Eckel and Fina, 2024; Sendra, Late and Kumpulainen, 2025). The appropriateness of practices developed in STEM contexts for other disciplines requires careful evaluation, as data types, research workflows, and disciplinary cultures differ substantially.
### Evidence regarding error reduction
The evidence linking robust RDM practices to error reduction is consistently strong across the literature. Clear documentation of procedures and data provenance reduces errors arising from memory failures or misunderstandings about analytical decisions (Briney, Coates and Goben, 2020). Version control systems prevent errors associated with file confusion or inadvertent overwriting (Kanza and Knight, 2022). Standardised file organisation reduces time spent searching for data and minimises risks of using incorrect files in analyses (Kovács, Hoekstra and Aczél, 2020).
The survey by Kovács, Hoekstra and Aczél (2020) provides particularly valuable evidence regarding the types and frequencies of data management errors. Their findings indicate that mistakes in data management are common even among experienced researchers and that many such errors are preventable through systematic practices. Importantly, they found that researchers often underestimate their vulnerability to such errors, suggesting that training and awareness-raising have important roles alongside tool provision.
Cunha-Oliveira, Ioannidis and Oliveira (2024) emphasised that robust data management practices are essential for experimental reproducibility, noting that even well-designed studies may yield unreliable results if data handling introduces errors. Their comprehensive guidance on best practices in biomedical research provides a model for standards applicable across empirical disciplines.
### Evidence regarding stress reduction
The relationship between RDM practices and researcher stress has received less direct empirical attention than error reduction but is supported by converging evidence. Implementing structured RDM workflows reduces cognitive load by making data easier to locate and understand throughout a project’s lifecycle (Briney, Coates and Goben, 2020). This reduction in cognitive burden frees mental resources for analytical thinking and creative problem-solving whilst reducing anxiety associated with disorganisation.
Early planning through DMPs helps anticipate challenges before they become stressful bottlenecks (Michener, 2015). By thinking through data handling requirements at project outset, researchers can identify and address potential problems proactively rather than reactively. This anticipatory approach aligns with stress management principles emphasising the value of perceived control over circumstances.
Training programmes targeting RDM skills have been shown to lower anxiety related to data loss or mismanagement among doctoral students (Kapoor, 2023; Eckel and Fina, 2024). The development of competence and the availability of support resources both contribute to reduced stress through enhanced self-efficacy and reduced uncertainty.
The broader literature on doctoral student wellbeing provides important context for understanding stress impacts. High rates of mental health difficulties among PhD candidates have been documented across multiple countries and disciplines (Levecque et al., 2017). Whilst data management represents only one contributor to doctoral stress, its effects may be particularly amenable to intervention through training and support provision.
### Evidence regarding time-to-submission
Efficient RDM streamlines analysis and reporting phases by ensuring that datasets are well-organised and documented from the outset (Briney, Coates and Goben, 2020; Cunha-Oliveira, Ioannidis and Oliveira, 2024). Researchers can proceed directly to analytical work without preliminary reorganisation or reconstruction of undocumented processing steps. This efficiency compounds over time as project complexity increases and analytical requirements intensify during write-up phases.
Projects with robust DMPs report smoother transitions between team members, which is particularly important given high turnover in PhD settings (Fina et al., 2025; Reichmann et al., 2021). When supervisors change, collaborators join, or researchers return to data after extended periods, well-documented and organised data can be understood without extensive personal explanation. This minimises downtime during handovers or corrections and maintains project momentum.
The prevention of errors through good RDM also contributes to reduced time-to-submission by avoiding the delays associated with error detection, diagnosis, and correction. Errors discovered late in research projects can require substantial rework, potentially including repeated data collection or analysis. By reducing error rates, robust RDM practices reduce the probability of such costly delays.
Discussion
### Synthesis of evidence on effective practices
The evidence reviewed in this dissertation strongly supports the proposition that robust research data management practices reduce errors, lower stress, and shorten time-to-submission in PhD projects. This conclusion rests on converging findings across multiple studies, disciplines, and methodological approaches. The practices with the strongest evidence base include comprehensive documentation of procedures and data provenance, systematic version control, standardised file naming and organisation conventions, regular backup protocols, early-stage planning through Data Management Plans, and the use of electronic laboratory notebooks or equivalent workflow tools.
The mechanisms through which these practices produce beneficial outcomes appear well-characterised in the literature. Documentation reduces reliance on fallible memory and enables review by others who may detect problems. Version control prevents errors arising from file confusion and enables recovery from mistakes. Standardised organisation reduces search time and minimises risks of using incorrect files. Planning anticipates challenges before they become crises. Training builds competence and confidence. Together, these practices create research environments in which data remain findable, procedures remain reproducible, and cognitive resources remain available for substantive intellectual work rather than being consumed by organisational challenges.
### Addressing the research objectives
Regarding the first objective—identifying foundational practices—the literature consistently emphasises documentation, version control, file organisation, backups, and planning as core components of effective RDM. These practices appear foundational across disciplines, though their specific implementation may vary according to data types and research workflows.
The second objective—evaluating evidence linking practices to error reduction—is addressed by strong and consistent findings. Multiple studies document that ambiguous documentation, lack of version control, and poor file organisation contribute directly to errors in data handling and analysis. Conversely, systematic implementation of these practices is associated with reduced error rates and enhanced reproducibility.
The third objective—examining mechanisms of stress reduction—is supported by evidence that structured RDM workflows reduce cognitive load and that training enhances self-efficacy. The anticipatory benefits of planning also contribute by reducing uncertainty and enhancing perceived control.
The fourth objective—assessing time-to-submission impacts—is addressed by evidence that efficient data organisation accelerates analysis and reporting phases, whilst reduced errors prevent costly delays. Smooth handovers enabled by good documentation also contribute by maintaining project momentum.
The fifth objective—identifying barriers and strategies—reveals that inconsistent adoption, perceptions of administrative burden, and resource constraints represent significant challenges. Strategies for overcoming these include integrating DMPs into daily workflows, providing practical hands-on training, and building institutional cultures that value RDM.
The sixth objective—highlighting gaps and future directions—identifies needs for research in non-STEM disciplines, comparative studies of digital tools, long-term institutional impact assessment, and interventions addressing resource disparities.
### Critical analysis of limitations
Several limitations of the evidence base warrant acknowledgment. First, much of the literature derives from STEM disciplines, potentially limiting generalisability to humanities and social sciences where data practices and challenges may differ substantially. Whilst recent studies have begun addressing this gap, further research is needed to establish the transferability of findings across disciplinary boundaries.
Second, the literature relies heavily on self-report measures regarding practices and outcomes. Whilst valuable, such measures are subject to social desirability bias and may overestimate the quality of actual practices. Studies incorporating objective measures of data quality or research outcomes would strengthen the evidence base.
Third, the literature on stress impacts, whilst conceptually compelling, includes relatively few direct empirical investigations. The connections between RDM practices and researcher wellbeing are largely inferred from related evidence rather than established through targeted studies.
Fourth, the diversity of tools, contexts, and outcome measures across studies complicates direct comparison and synthesis. Future research would benefit from greater standardisation in how RDM practices and their outcomes are conceptualised and measured.
### Implications for doctoral researchers
For individual doctoral researchers, the evidence supports investing time in establishing robust data management practices early in research projects. Whilst such investment may feel burdensome when facing pressure to produce results, the evidence suggests that this upfront investment yields substantial returns through reduced errors, lower stress, and faster completion. Specific recommendations include developing a Data Management Plan at project outset and treating it as a living document; establishing and consistently applying file naming and organisation conventions; implementing version control for code, documents, and where feasible, data files; documenting procedures contemporaneously rather than retrospectively; maintaining regular backup protocols; and seeking out training opportunities in RDM skills.
### Implications for supervisors and institutions
The evidence has important implications for doctoral supervisors, who play critical roles in modelling and reinforcing data management practices within their research groups. Supervisors should consider explicitly discussing RDM expectations with new students, providing or directing students to training resources, establishing group-level standards for file organisation and documentation, reviewing data management as part of regular supervision, and creating cultures that value careful data handling alongside other research skills.
Institutions bear responsibility for providing infrastructure and support that enables effective RDM. This includes maintaining secure storage and backup systems, providing access to appropriate tools and software, employing specialist staff to support RDM training and consultation, developing policies that set expectations whilst accommodating disciplinary variation, and recognising and rewarding good RDM practice.
### Bridging the gap between ideal and practice
A recurring theme in the literature concerns the gap between acknowledged ideals and actual practices. Many researchers recognise the importance of good data management but struggle to implement it consistently. This gap appears to reflect multiple factors including time pressure, inadequate training, unclear expectations, and lack of perceived immediate benefit.
Closing this gap likely requires multifaceted approaches addressing both individual capabilities and contextual factors. At the individual level, training that is practical, hands-on, and integrated with real research tasks appears more effective than abstract instruction. At the contextual level, policies and incentive structures that reward good RDM and provide time and resources for its implementation may be necessary to enable sustained practice.
The perception of DMPs as administrative burdens rather than practical tools represents a particular challenge. The evidence suggests that this perception is most prevalent when DMPs are disconnected from daily research workflows—when they function as compliance documents rather than living guidance. Overcoming this perception may require demonstrating the practical value of plans through examples and case studies, supporting the development of plans that are genuinely useful, and integrating plan review and updating into routine research practice.
Conclusions
This dissertation has synthesised evidence regarding which research data management practices most effectively reduce errors, lower stress, and shorten time-to-submission in PhD projects. The findings provide strong support for the value of foundational RDM practices including comprehensive documentation, version control, standardised file organisation, regular backups, early-stage planning through Data Management Plans, and use of electronic laboratory notebooks or equivalent tools.
The first objective—identifying foundational practices—has been achieved through systematic review of the literature, which consistently identifies documentation, version control, organisation, backup, and planning as core components of effective RDM across disciplines. The second objective—evaluating evidence on error reduction—is satisfied by strong findings linking poor documentation, lack of version control, and disorganised files to common errors, whilst systematic practices are associated with reduced error rates. The third objective—examining stress reduction mechanisms—is addressed through evidence that structured workflows reduce cognitive load, training enhances self-efficacy, and anticipatory planning reduces uncertainty. The fourth objective—assessing time-to-submission impacts—is supported by findings that organised data accelerates analysis and reporting, whilst reduced errors and smooth handovers prevent delays. The fifth objective—identifying barriers and strategies—reveals inconsistent adoption, burden perceptions, and resource constraints as key challenges, addressable through workflow integration, practical training, and institutional support. The sixth objective—highlighting research gaps—identifies needs for non-STEM research, tool comparisons, institutional impact studies, and resource disparity interventions.
The significance of these findings extends beyond individual research projects. In an era of reproducibility concerns and increasing scrutiny of research integrity, robust data management represents a concrete and achievable contribution to research quality. For doctoral researchers specifically, effective RDM practices can reduce the burden of a challenging undertaking whilst enhancing the credibility and impact of their work.
Future research should address the identified gaps, particularly through studies in non-STEM disciplines where evidence remains limited, comparative evaluations of specific digital tools and platforms, longitudinal assessments of institutional RDM interventions, and targeted investigations of stress and wellbeing outcomes. Such research would further strengthen the evidence base and inform continued improvement in research data management practices.
The evidence supports a clear conclusion: investment in robust research data management practices yields substantial returns for doctoral researchers, their institutions, and the broader research enterprise. Reducing errors enhances research integrity. Lowering stress supports researcher wellbeing. Shortening time-to-submission improves efficiency. Together, these benefits make the case for prioritising data management as a fundamental research skill deserving of systematic attention, training, and support.
References
Baker, M. (2016) ‘1,500 scientists lift the lid on reproducibility’, *Nature*, 533(7604), pp. 452–454. Available at: https://doi.org/10.1038/533452a
Birkbeck, G., Nagle, T. and Sammon, D. (2022) ‘Challenges in research data management practices: a literature analysis’, *Journal of Decision Systems*, 31(sup1), pp. 153–167. Available at: https://doi.org/10.1080/12460125.2022.2074653
Bjornen, K., Dyke, K. and Iakovakis, C. (2020) *Better Data Management*. Available at: https://doi.org/10.17605/osf.io/uznbf
Borghi, J. and Gulick, A. (2021) ‘Promoting Open Science Through Research Data Management’, *Harvard Data Science Review*, 3(2). Available at: https://doi.org/10.1162/99608f92.9497f68e
Borghi, J. and Van Gulick, A. (2020) ‘Data management and sharing: Practices and perceptions of psychology researchers’, *PLoS ONE*, 16(5), e0252047. Available at: https://doi.org/10.1371/journal.pone.0252047
Briney, K. (2022) ‘Data Management Planning for an Eight-Institution, Multi-Year Research Project’, *International Journal of Digital Curation*, 17(1), p. 9. Available at: https://doi.org/10.2218/ijdc.v17i1.799
Briney, K., Coates, H. and Goben, A. (2020) ‘Foundational Practices of Research Data Management’, *Research Ideas and Outcomes*, 6, e56508. Available at: https://doi.org/10.3897/rio.6.e56508
Cunha-Oliveira, T., Ioannidis, J. and Oliveira, P. (2024) ‘Best Practices for Data Management and Sharing in Experimental Biomedical Research’, *Physiological Reviews*, 104(4), pp. 1407–1456. Available at: https://doi.org/10.1152/physrev.00043.2023
Eckel, H. and Fina, F. (2024) ‘Slow but steady: data management training for students a key to research culture change, especially for humanities’, *Edinburgh Open Research*. Available at: https://doi.org/10.2218/eor.2024.9637
Fina, F., Eckel, H., Spanou, P. and Proven, J. (2025) ‘Supporting the Research Data Management Journey of a Postgraduate Student at the University of St Andrews’, *International Journal of Digital Curation*, 19(1), p. 5. Available at: https://doi.org/10.2218/ijdc.v19i1.980
Gerlach, R., Rex, J., Lang, K., Neute, N. and Schwartze, V. (2020) *Best Practice: Central Research Data Management in large Projects*. Available at: https://doi.org/10.5281/zenodo.3741324
Ioannidis, J.P.A. (2005) ‘Why most published research findings are false’, *PLoS Medicine*, 2(8), e124. Available at: https://doi.org/10.1371/journal.pmed.0020124
Jäckel, D. and Lehmann, A. (2023) ‘Benefits and Challenges: Data Management Plans in Two Collaborative Projects’, *Data Science Journal*, 22, p. 25. Available at: https://doi.org/10.5334/dsj-2023-025
Kanza, S. and Knight, N. (2022) ‘Behind every great research project is great data management’, *BMC Research Notes*, 15, p. 20. Available at: https://doi.org/10.1186/s13104-022-05908-5
Kapoor, I. (2023) ‘Teaching the Best Research Data Management Practices to PhD Students’, *Edinburgh Open Research*. Available at: https://doi.org/10.2218/eor.2023.8109
Karnik, K., Lang, M. and Slade, E. (2025) ’77 Best practices for data management and metadata creation for collaborative biostatistics teams’, *Journal of Clinical and Translational Science*, 9, pp. 24–24. Available at: https://doi.org/10.1017/cts.2024.755
Kovács, M., Hoekstra, R. and Aczél, B. (2020) ‘The Role of Human Fallibility in Psychological Research: A Survey of Mistakes in Data Management’, *Advances in Methods and Practices in Psychological Science*, 4(1), pp. 1–13. Available at: https://doi.org/10.1177/25152459211045930
Levecque, K., Anseel, F., De Beuckelaer, A., Van der Heyden, L. and Gisle, L. (2017) ‘Work organization and mental health problems in PhD students’, *Research Policy*, 46(4), pp. 868–879. Available at: https://doi.org/10.1016/j.respol.2017.02.008
Marsden, L., Godøy, Ø., Gabrielsen, T., Ellingsen, P., Reigstad, M., Marquardt, M., Morvik, A., Sagen, H., Tronstad, S. and Ferrighi, L. (2025) ‘Best practices for data management in marine science: lessons from the Nansen Legacy project’, *Earth System Science Data*, 17, pp. 5983–6002. Available at: https://doi.org/10.5194/essd-17-5983-2025
Michener, W. (2015) ‘Ten Simple Rules for Creating a Good Data Management Plan’, *PLoS Computational Biology*, 11(10), e1004525. Available at: https://doi.org/10.1371/journal.pcbi.1004525
Miksa, T., Simms, S., Mietchen, D. and Jones, S. (2019) ‘Ten principles for machine-actionable data management plans’, *PLoS Computational Biology*, 15(3), e1006750. Available at: https://doi.org/10.1371/journal.pcbi.1006750
Pálsdóttir, Á. (2021) ‘Data literacy and management of research data – a prerequisite for the sharing of research data’, *Aslib Journal of Information Management*, 73(3), pp. 322–341. Available at: https://doi.org/10.1108/ajim-04-2020-0110
Perrier, L., Blondal, E., Ayala, P., Dearborn, D., Kenny, T., Lightfoot, D., Reka, R., Thuna, M., Trimble, L. and MacDonald, H. (2017) ‘Research data management in academic institutions: A scoping review’, *PLoS ONE*, 12(5), e0178261. Available at: https://doi.org/10.1371/journal.pone.0178261
Praetzellis, M., Buys, M., Chen, X., Chodacki, J., Davies, N., Garza, K., Nancarrow, C., Riley, B. and Robinson, E. (2023) ‘A Programmatic and Scalable Approach to Making Data Management Machine-Actionable’, *Data Science Journal*, 22, p. 26. Available at: https://doi.org/10.5334/dsj-2023-026
Raszewski, R., Goben, A., Bergren, M., Jones, K., Ryan, C., Steffen, A. and Vonderheid, S. (2020) ‘A survey of current practices in data management education in nursing doctoral programs’, *Journal of Professional Nursing*, 37(1), pp. 155–162. Available at: https://doi.org/10.1016/j.profnurs.2020.06.003
Reichmann, S., Klebel, T., Hasani-Mavriqi, I. and Ross-Hellauer, T. (2021) ‘Between administration and research: Understanding data management practices in an institutional context’, *Journal of the Association for Information Science and Technology*, 72(11), pp. 1415–1431. Available at: https://doi.org/10.1002/asi.24492
Samuel, S., Sevryugina, Y., MacEachern, M., Saylor, K. and Woodbrook, R. (2025) ‘Stepping up to the moment: collaborating on a data management and sharing workshop series’, *Journal of the Medical Library Association*, 113(1), pp. 252–258. Available at: https://doi.org/10.5195/jmla.2025.2070
Sendra, A., Late, E. and Kumpulainen, S. (2025) ‘From data lifecycle to research activity model: research data management in data-intensive social sciences and humanities research’, *Aslib Journal of Information Management*. Available at: https://doi.org/10.1108/ajim-12-2024-0959
Snyder, H. (2019) ‘Literature review as a research methodology: An overview and guidelines’, *Journal of Business Research*, 104, pp. 333–339. Available at: https://doi.org/10.1016/j.jbusres.2019.07.039
Specht, A., O’Brien, M., Edmunds, R., Corrêa, P., David, R., Mabile, L., Machicao, M., Murayama, Y. and Stall, S. (2023) ‘The Value of a Data and Digital Object Management Plan (D(DO)MP) in Fostering Sharing Practices in a Multidisciplinary Multinational Project’, *Data Science Journal*, 22, p. 38. Available at: https://doi.org/10.5334/dsj-2023-038
Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L.B., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Heringa, J., ‘t Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S.J., Martone, M.E., Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. and Mons, B. (2016) ‘The FAIR Guiding Principles for scientific data management and stewardship’, *Scientific Data*, 3, 160018. Available at: https://doi.org/10.1038/sdata.2016.18
