← All systems
Research Data SystemsExperience / Internal

Research Data Systems at IITA: From Field Data Collection to FAIR Data Workflows

An experience-backed case study on research data workflows — from field data capture and validation to metadata, repositories, PostgreSQL, CKAN, FAIR principles, and AI-ready data foundations.

Graduate intern working around research data systems and workflow digitization · 2024 — ongoing

  • KoboToolbox
  • ODK
  • PostgreSQL
  • CKAN
  • Python
  • FAIR workflows
Role
Graduate intern working around research data systems and workflow digitization
Status
Experience / Internal
Timeline
2024 — ongoing
Type
Research Data Systems
Access
Private · sanitized case study

Context & why it matters

Research data is only useful when it can be trusted, understood, accessed, and reused. Agricultural research generates field data, lab records, survey forms, metadata, and reporting workflows, and the challenge isn't only collecting that data but preserving its quality, structure, context, and traceability across the whole lifecycle. My exposure here has been as a graduate intern working around these systems — contributing to data-collection and digitization efforts, exploring repository and metadata tooling, and developing an understanding of how research data should flow — rather than owning institutional platforms.

Agricultural research decisions ripple outward, so the data behind them has to hold up to scrutiny.

Problem

Research forms are often long and complex, and field and lab data carry real quality risks. Collection, storage, repository, and reporting workflows are easily disconnected, while metadata and FAIR-aligned management are needed to keep data understandable — and increasingly, research data has to be reusable for analytics, reporting, and future AI workflows.

Work areas

Digital data collection

  • KoboToolbox and ODK-style workflows
  • Long, multi-section research forms
  • Validation and constraints at entry
  • Skip logic for conditional questions
  • Structured capture instead of free text

PostgreSQL & data storage

  • Relational thinking for research entities
  • Structured, queryable storage
  • Data integrity through constraints
  • Reporting-ready tables

CKAN & research repositories

  • Exploring CKAN for dataset repositories
  • Repository modernization thinking
  • Metadata and dataset discoverability
  • FAIR-aligned workflows

Laboratory workflow digitization

  • Mapping how lab data moves
  • Sample and data traceability
  • Workflow mapping
  • Archiving and reporting considerations

AI-ready data foundations

  • Well-described datasets
  • Consistent schemas
  • Data validation
  • Reusable pipelines
  • Analytics and ML readiness

Architecture

Research data moves through a lifecycle rather than living in one place: field/lab capture → validation → structured storage → metadata & documentation → repository & discovery → reporting & analytics → AI-ready reuse. Designing around that flow is what keeps data trustworthy and reusable at every step.

  • Capture — structured field and lab collection (KoboToolbox / ODK-style forms)
  • Validation — constraints and checks applied at entry
  • Structured storage — relational, queryable storage (PostgreSQL)
  • Metadata & documentation — describing datasets so they stay understandable
  • Repository & discovery — making datasets findable (CKAN-style repositories)
  • Reporting & analytics — turning validated data into reporting
  • AI-ready reuse — consistent, well-described data ready for modelling

Technical decisions

Data quality starts at capture
The cheapest place to prevent bad data is the form itself — constraints, skip logic, and validation at entry.
Metadata is part of the system, not an afterthought
Data you can't describe is data you can't trust or reuse, so metadata belongs in the design from the start.
PostgreSQL is a strong foundation for structured research data
Relational integrity and queryability make it a dependable base for data that has to hold up over time.
FAIR workflows need both technical and human process design
Findable, accessible, interoperable, reusable data depends as much on agreed process as on tooling.
Research software must respect institutional workflows
Tools that ignore how researchers actually work get worked around; fitting the workflow is the point.
AI-ready data needs traceability, structure, and context first
Before modelling, data has to be structured, validated, and well-described — otherwise the models inherit the mess.

What it demonstrates

Shows I can operate inside real research workflows where data quality is non-negotiable.

  • Understanding research data lifecycles end to end
  • Reasoning about field-to-repository workflows
  • Working with tools like KoboToolbox, PostgreSQL, and CKAN
  • Connecting research data quality to analytics and AI readiness
  • Translating institutional workflows into technical structure

Proof assets

Some proof assets use dummy data or are shared as private walkthroughs to protect sensitive systems and records.

  • DiagramPlanned

    Research data workflow diagram

    The field-to-reuse data lifecycle.

    Planned — to be added

  • DocumentationPlanned

    Dummy Kobo/XLSForm sample

    An example form with validation and skip logic.

    Planned — to be added

  • DocumentationCase study only

    CKAN architecture notes

    Sanitized notes on repository structure and metadata.

    Shared as a sanitized case study

  • DiagramPlanned

    Lab digitization concept diagram

    How lab data moves and stays traceable.

    Planned — to be added

  • DocumentationPlanned

    PostgreSQL schema example with dummy data

    A sample schema populated with non-real data.

    Planned — to be added

Privacy

Next steps

  • Add sanitized workflow diagrams
  • Add a dummy Kobo/XLSForm example
  • Add a small PostgreSQL-backed research data demo
  • Add a FAIR metadata checklist
  • Add a lab data lifecycle diagram

Stack

  • KoboToolbox
  • ODK
  • PostgreSQL
  • CKAN
  • Python
  • FAIR workflows