• DataJoint Enables Seamless Migration of CWL Pipelines to Its Governed, Reproducible Scientific Data Infrastructure

    DataJoint Enables Seamless Migration of CWL Pipelines to Its Governed, Reproducible Scientific Data Infrastructure

    DataJoint today announced native support for converting Common Workflow Language (CWL) pipelines into DataJoint pipelines, enabling research organizations to immediately modernize existing scientific workflows — without sacrificing prior investment or starting from scratch.


    CWL: Widely Adopted, But Increasingly Constrained

    Common Workflow Language has become a de facto standard across pharmaceutical R&D, genomics, and academic research for defining portable, reproducible computational workflows. Major cloud and bioinformatics platforms support CWL natively, and it is broadly adopted across federally funded genomics programs and industry R&D consortia — making it one of the most widely deployed workflow standards in life sciences.

    Yet CWL has recognized limitations in production environments: limited error handling and debugging, no native provenance tracking, poor support for partial re-runs when a step fails mid-pipeline, and no mechanism to query workflow state. As AI-driven research demands tighter auditability and reproducibility, these gaps create real scientific and operational risk.

    What DataJoint Provides

    DataJoint’s CWL conversion layer reads existing CWL workflow definitions and executes them as native DataJoint pipelines. Research teams can extend these pipelines — mixing CWL definitions with DataJoint’s Python-based schema framework — and run them in interpreted mode today, with compiled execution on the roadmap. Key capabilities include:

    Automatic provenance. Every CWL step is backed by DataJoint’s schema-driven provenance layer, creating a complete, queryable record of inputs, outputs, and computational history.

    Granular retry and resilience. Failed steps can be individually retried or corrected without re-running the entire pipeline — a critical capability for long-running, high-cost workflows.

    Queryable state. Workflow state is accessible via DataJoint’s standard query syntax, enabling real-time monitoring and downstream analysis.

    Natural parallelization. Pipelines are decomposed into discrete, independently executable steps that support cluster-level parallelism and graceful pause/resume without lost progress.

    Structured entity database. Critically, DataJoint does not simply execute CWL workflows — it builds a structured database around the scientific entities those workflows produce. The conversion process involves explicitly defining the entities created at each stage (such as processed samples, imaging results, or analysis outputs) and the dependencies between them. This transforms a pipeline from a sequence of compute steps into a living, queryable scientific record — one that captures not just what ran, but what was produced, how it relates to other data, and how it can be reused.

    “Scientific AI will only be as trustworthy as the data foundation beneath it. CWL gave the research community a powerful way to define workflows — DataJoint gives those workflows the provenance, traceability, and governance they need to support defensible science and AI-ready research at scale.” — Jim Olson, CEO, DataJoint

  • Dave Schuette Joins DataJoint Board of Directors

    Dave Schuette Joins DataJoint Board of Directors

    DataJoint, the scientific data infrastructure company enabling defensible and reproducible AI in regulated R&D, today announced that Dave Schuette has joined its Board of Directors as an independent member. The appointment strengthens DataJoint’s leadership as the company expands into pharmaceutical and life sciences markets following the recent launch of its Agentic AI platform.

    Schuette is the founder and managing partner of Slide3, a boutique consulting firm serving pharmaceutical, financial services, and technology clients. With more than 25 years of experience as a business management executive, he brings a track record of transforming organizations and creating disruptive operational strategies at scale.

    Prior to founding Slide3 in 2018, Schuette served as EVP and President of the Enterprise Business Unit at Synchronoss Technologies, leading growth across healthcare and life sciences. He was a founding partner of Knowledgent, a data and analytics firm acquired by Accenture, and held senior roles at BusinessEdge Solutions, acquired by EMC. A Top 25 Consultant of the Year honoree from Consulting Magazine, he also brings direct pharma industry experience, including work with Bristol-Myers Squibb.

    “Dave’s expertise in pharma and technology, and his ability to help companies scale with clarity and purpose, are exactly what DataJoint needs at this inflection point,” said Jim Olson, CEO of DataJoint. “His experience bridging scientific rigor with operational agility makes him an invaluable addition to our board.”

    Schuette joins weeks after DataJoint launched DataJoint Agentic AI, a governed execution layer that enables semi-autonomous AI operation across scientific workflows — allowing pharma and biotech organizations to automate complex pipelines while maintaining full reproducibility and auditability.

    “DataJoint has built something genuinely differentiated — a platform that makes AI-ready data a reality, not just an aspiration,” said Dave Schuette. “I’m proud to join the board and help accelerate its mission at a time when trustworthy scientific AI has never mattered more.”

  • DataJoint Launches Agentic AI Control Layer for Scientific Workflows

    DataJoint Launches Agentic AI Control Layer for Scientific Workflows

    DataJoint today announced the launch of DataJoint Agentic AI, a governed execution layer for scientific workflows that enables semi-autonomous AI operation on rigorously structured, provenance-rich data.

    As pharmaceutical and academic institutions accelerate investment in generative and agentic AI to further innovation, many are confronting a critical constraint: AI systems trained on fragmented, under-described scientific data cannot reliably reproduce, audit, or defend their outputs. In regulated research environments, this lack of context creates material scientific and operational risk.


    DataJoint addresses this challenge at its source

    The platform captures multi-modal scientific data in precisely defined, interconnected frameworks — embedding rich metadata and full computational provenance at the point of every experimental result. By grounding AI agents in this context-rich foundation, DataJoint enables automated workflow execution while preserving reproducibility, traceability, and decision accountability.

    “Scientific AI will only be as trustworthy as the data foundation beneath it,” said Jim Olson, CEO of DataJoint. “We built DataJoint to ensure that every AI-driven insight is grounded in structured provenance and computational context — so that scientific decisions are not just faster, but defensible and reliable.”

    DataJoint’s agentic AI enables semi-autonomous execution of complex, multi-step scientific pipelines across imaging, electrophysiology, genomics, behavioral data, and more — within a governed, reproducible framework built for regulated and research environments. For pharma and biotech, this means faster hypothesis validation and AI-ready datasets that support regulatory confidence. For academic and medical centers, it means scaling sophisticated research without sacrificing rigor. And all for the purpose of accelerating discoveries and speeding innovation.

    For example, an AI agent operating within DataJoint can validate experimental inputs, trigger downstream processing, detect data and structure inconsistencies, and ensure computational reproducibility — all while maintaining a complete, queryable record of decisions and transformations.

    DataJoint’s structured scientific data infrastructure is already deployed in leading academic medical centers and industry research environments, supporting reproducible multi-modal pipelines at scale.


    Industry Showcases

    DataJoint will demonstrate its Agentic AI capabilities at:

    PMWC 2026 (Precision Medicine World Conference)
    March 4–6, 2026 | San Jose, CA

    Lab of the Future USA Congress
    March 2–3, 2026 | Boston, MA

    These events convene leaders in precision medicine, biopharma R&D, and digital laboratory transformation.

  • Welcoming John Apathy to the DataJoint Team as Strategic Advisor

    Welcoming John Apathy to the DataJoint Team as Strategic Advisor

    We’re thrilled to share that John Apathy has joined DataJoint as a Strategic Advisor, bringing deep expertise in data-driven innovation and AI strategy in life sciences R&D.


    John has spent over three decades leading digital transformation across organizations like Bristol Myers Squibb, Celgene, and GlaxoSmithKline—helping research and development teams turn complex data into scientific breakthroughs. Today, as Chief Solutions Officer at XponentL Data, a Genpact company, he continues to guide organizations in making data and AI a true competitive advantage.


    At DataJoint, we’re on a mission to make scientific research more reproducible, integrated, and AI-ready through our SciOps platform. John’s experience will be key in helping us accelerate that mission—empowering scientists to connect instruments, data, and computation into automated workflows that drive discovery.


    As our CEO Jim Olson put it:

    “John’s deep experience in digital and data transformation makes him an outstanding addition to our advisory team.”

    We couldn’t agree more. Welcome, John—excited to build the future of data-driven science together!

  • Brian Napack Joins DataJoint as Strategic Advisor to Advance Scientific Research

    Brian Napack Joins DataJoint as Strategic Advisor to Advance Scientific Research

    DataJoint is thrilled to welcome Brian Napack as a strategic advisor and investor. A visionary leader with decades of experience in education, research, publishing, and technology, Brian brings a proven track record of scaling innovation to create lasting impact. As the former CEO of John Wiley and current Executive Chairman of 2U, Brian has consistently championed initiatives that enhance the productivity and ROI of science and education.

    Brian will support DataJoint’s mission to revolutionize data management and AI in scientific research, helping labs worldwide overcome challenges in fragmented data, collaboration, and reproducibility. With over 100 labs already leveraging DataJoint’s platform and a recent $4.9M Seed funding round, the company is poised to transform research workflows in academia, life sciences, and beyond.

    We’re excited to partner with Brian as we continue to drive scientific discovery forward!

  • DataJoint Raises $4.9M to Transform Life Sciences Data Management with AI

    DataJoint Raises $4.9M to Transform Life Sciences Data Management with AI

    DataJoint has closed a $4.9M Seed funding round, co-led by Nina Capital, Inoca Capital Partners, and Capital Factory. The funding will fuel the growth of its team, expand its AI-powered SaaS platform, and extend its reach into life sciences and pharma industries across the U.S. and Europe.

    Already used by over 100 labs, including Johns Hopkins and Harvard, DataJoint’s platform harmonizes multimodal data and streamlines workflows. The company’s participation in the PharmStars accelerator further highlights its role as a key innovator in digital health and pharma collaboration.


    “This investment enables us to scale and bring transformative solutions to researchers and organizations,” said CEO Jim Olson.

    With its cutting-edge AI integration, DataJoint is set to revolutionize data management in life sciences.

  • DataJoint to NIH: Trustworthy AI Requires Trustworthy Data Operations

    DataJoint to NIH: Trustworthy AI Requires Trustworthy Data Operations

    On July 15, 2025, DataJoint submitted formal Comments to NIH on its Artificial Intelligence Strategy and One Year Action Plan.

    Our central message: Scientific AI can only succeed if it is built on rigor, transparency, and operational excellence. In our response, we advocate for:

    • A multi-tiered AI framework that distinguishes between foundational, creative, and strategic uses of AI in research;
    • A SciOps Capability Maturity Model to guide labs from ad hoc workflows to AI-ready data operations; and
    • Immediate, concrete steps NIH can take to advance Gold Standard Science, including pilot programs and shared benchmarks.

    We believe NIH has a critical role to play in ensuring that AI in biomedical research delivers trustworthy, reproducible, and cost-effective results.

    📄 Read our full response (PDF): Download the submission

    We welcome feedback, discussion, and collaboration as NIH shapes this important national strategy.