• DataJoint at CoSyNe 2026: Building AI-Ready Data Workflows for Neuroscience

    DataJoint at CoSyNe 2026: Building AI-Ready Data Workflows for Neuroscience

    This March, DataJoint’s Chief Science Officer Dimitri Yatsenko, PhD, and SciOps Engineer Milagros Marín, PhD, presented a tutorial at CoSyNe 2026 (Computational and Systems Neuroscience) called Building AI-Ready Data Workflows for Neuroscience Experiments.

    The materials from this talk are now available to the public at the conclusion of this blog post.


    The session brought together computational and systems neuroscientists for a hands-on look at what it takes to make scientific data infrastructure ready for AI — not someday, but now.

    Here are the key ideas we shared:

    1. Operational rigor is the foundation for AI in science. Dimitri opened with a provocation: How must research teams transform their work to harness AI? The answer isn’t better models — it’s better data discipline. Without structured schemas, enforced provenance, and reproducible computations, AI agents have nothing reliable to work with. We built on the SciOps Capability Maturity Model — a five-level roadmap from ad hoc scripts to closed-loop AI-assisted discovery — giving labs a concrete path to assess and grow their operational readiness.

    Dr. Dimitri Yatsenko presenting the DataJoint RNA-Seq pipeline in collaboration with the Cadwell Lab at UCSF

    2. The schema is not a record of the science it is the science. We showed how DataJoint’s relational workflow model unifies database, code, and computation into a single formal schema. Tables represent workflow steps, rows represent artifacts, and foreign keys prescribe execution order. The pipeline diagram is the database, not documentation that drifts from reality.

    DataJoint platform architecture. The open-source Python library provides the relational workflow model—schema definition, query algebra, and distributed computation. This core integrates with a relational database (system of record), object storage (for scalable data), and code repositories (for version-controlled pipeline definitions). The managed platform adds infrastructure, observability, and orchestration for production deployments. Milagros’ adaptation from Yatsenko & Nguyen, arXiv:2602.16585 (2026)

    3. Three production pipelines, three scales, one platform. Milagros walked through three real-world projects running on DataJoint: ORION pipelines (brain organoids generation with four electrophysiology modalities integrated, tracking complete provenance from iPSC to spike waveform, in collaboration with the Shcheglovitov lab at the University of Utah); Project AEON (24/7 continuous behavior at the Sainsbury Wellcome Centre, processing 7 million data points per day and weeks of Neuropixels recordings); and DatJoint MoSeq pipeline (unsupervised behavioral syllable discovery in collaboration with the Datta Lab at Harvard Medical School).

    Dr. Milagros Marín demonstrates how DataJoint orchestrates the AEON foraging platform at UCL’s Sainsbury Wellcome Centre — unifying Bonsai-acquired data streams, SLEAP pose estimation, and continuous electrophysiology into a structured, automated pipeline for weeks-long freely moving behavior experiments.

    4. Reproducibility validated, not just claimed. Each project included rigorous validation — positive and negative controls for ORION, dynamic schema generation tested at scale for AEON, and benchmark-matched syllable durations for MoSeq. The pipeline reproduces the science, not just the workflow.

    5. AI agents can query and reason over structured pipelines. We demonstrated an AI assistant that connects directly to a DataJoint pipeline, queries behavioral data, interprets distributions, and generates scientific summaries — all made possible by a self-documenting, queryable schema. A

    As Milagros put it: “Scientists direct, AI agents execute, and the data infrastructure doesn’t just store science — it understands it.”

    6. Open-source, community-ready, publication-grade. All three project codebases are open-source. The ORION pipeline has a paper in preparation (Marín et al., 2026), and a poster will be presented at FENS Forum 2026, and AEON’s preprint (Campagner et al. 2025) is out. We’re building toward an ecosystem where any lab can adopt these workflows and plug in their own protocols.

    DataJoint Tutorial at CoSyNe 2026

    It was energizing to connect with the CoSyNe community — researchers who think deeply about computation and are ready to bring that rigor to their data infrastructure. The conversation reinforced something we believe strongly: the lab of the future doesn’t just manage files — it manages knowledge.


    Missed the tutorial?

    DataJoint Tutorial at datajoint.com/cosyne-2026 — including live demos of all three pipelines and DataJoint’s framework for AI-ready research operations.

    Want to explore what AI-ready workflows look like for your lab? Visit https://docs.datajoint.com or write me an email at milagros@datajoint.com to build AI-ready infrastructure for labs and institutions.

    Trusted Data. Trusted AI. Trusted Science.

  • DataJoint at SfN 2025

    DataJoint at SfN 2025

    The DataJoint team is excited to connect with the neuroscience community at the Society for Neuroscience (SfN) 2025 in San Diego this November 15-19! We’re bringing insights from groundbreaking large-scale projects to practical tools that are shaping how labs manage and share their data today.

    From Circuit Mapping to Career Pathways: Our SfN Journey

    Our presence at SfN this year tells a story that begins with one of the most ambitious neuroscience projects of the decade and extends to the everyday challenges facing labs worldwide.


    Saturday, Nov 15: The MICrONS Legacy

    The Machine Intelligence from Cortical Networks (MICrONS) project represents a watershed moment in systems neuroscience by creating the largest functional connectome of mammalian cortex to date. This massive collaborative effort, recently published in Nature, has generated unprecedented multimodal datasets combining electron microscopy, calcium imaging, and electrophysiology across millions of synapses.

    DataJoint’s Chief Science Officer, Dimitri Yatsenko, will chair a nanosymposium diving deep into these insights:

    Nanosymposium NANO003: Insights From the MICrONS Project
    Saturday, November 15, 1:00–2:45 PM
    📍 San Diego Convention Center, Room 30


    Sunday, Nov 16: Building Your Career in Neuroinformatics

    The increasing complexity of multimodal datasets in neuroscience presents not only technical challenges but also new career opportunities. This Sunday, Dimitri, alongside Mathew Abrams, Uma Karmarkar, and Stephanie Albin, will lead a professional development workshop. The session will focus on the expanding field of neuroinformatics and explore career paths beyond traditional academia. Panelists, who have successfully navigated diverse careers leveraging neuroscience skills and big data analysis, will share their journeys and offer advice to attendees on how to pursue similar unconventional roles.

    Life After the PhD: Career Opportunities in Brain Data Science
    Sunday, November 16, 3:00–5:00 PM
    📍 San Diego Convention Center, Room 2

    Whether you’re a graduate student considering your options or a PI thinking about how to support your team’s development, this session will illuminate the expanding opportunities at the intersection of neuroscience, data science, and software engineering.


    Monday, Nov 17: Practical Solutions for Multimodal Data

    The lessons from large-scale collaborations like MICrONS are directly informing how we approach everyday lab data challenges. Visit our poster to see how principled data management frameworks are making multiphoton imaging data more accessible and reusable:

    Poster 12424: A Principled Framework for Compression and Standardization of Multiphoton Data
    Session PSTR198: Techniques and Software for Imaging and Neural Analyses
    Monday, November 17, 8:00 AM–12:00 PM

    This work reflects DataJoint’s current focus: helping labs capture, standardize, and share the increasingly complex multimodal datasets that modern neuroscience demands.


    Let’s Connect: Visit Booth 3326

    Throughout the conference, our team will be at Booth 3326, ready to discuss:

    • How collaborative data frameworks like those used in MICrONS can scale down to individual labs
    • Strategies for managing multimodal datasets (ephys, imaging, behavior, and more)
    • Your specific data management challenges and how DataJoint might help

    From the largest circuit mapping projects to your next experiment, the thread connecting our SfN activities is clear: neuroscience is increasingly about managing, integrating, and sharing complex data.

    We can’t wait to see you in San Diego. Let’s connect and explore what you are working on and how we can get there together.

  • DataJoint at the PharmStars PharmaTech Innovation Summit

    DataJoint at the PharmStars PharmaTech Innovation Summit

    On November 18th, DataJoint will join innovators, pharma R&D leaders, and investors at the PharmStars PharmaTech Innovation Summit in Boston. We’re excited to present how the DataJoint Scientific Data Platform is helping research organizations modernize their data foundations to accelerate discovery, improve reproducibility, and support the next generation of AI-driven science.


    Modern R&D workflows generate massive, multimodal datasets: imaging, sequencing, electrophysiology, clinical measures, behavioral data, and more. Yet the context needed to make that data useful often remains locked across instruments, lab systems, and custom scripts. This fragmentation slows research, limits collaboration, and undermines confidence in AI/ML insights.

    DataJoint solves this by unifying three pillars of scientific data operations in one platform:

    • Multimodal Data Integration
      • Structured and linked data spanning experiments, samples, instruments, pipelines, and analyses.
    • Scientific Context & Provenance
      • Every dataset and model result is traceable back to its experimental conditions and computational lineage.
    • Reproducible Computational Workflows
      • Standardized workflows that run reliably across teams, data centers, and cloud environments.

     The result: trusted, AI-ready data with transparency and governance built in.

    At PharmStars, DataJoint’s VP of Growth & Partnerships, Dana Wojtasinski, together with DataJoint’s CEO, Jim Olson, will be sharing real examples of how leading neuroscience, translational biology, and research teams are using DataJoint to:

    • Accelerate time to insight
    • Improve repeatability across studies, labs, and therapeutic programs
    • Scale computational pipelines for ML and data science
    • Reduce data engineering overhead and silos

    The PharmaTech Innovation Summit brings together the people building the next wave of R&D infrastructure—and we’re looking forward to collaborating with organizations shaping that future.

    If you work in biopharma or invest in digital R&D innovation and would like to attend, feel free to reach out for an invitation at info@datajoint.com.

    Trusted Data. Trusted AI. Trusted Science.

  • DataJoint at Minneapolis: Building AI-Ready DataOps for Modern Neuroscience

    DataJoint at Minneapolis: Building AI-Ready DataOps for Modern Neuroscience

    DataJoint members Dimitri Yatsenko, Jim Olson, and Kushal Bakshi recently visited the University of Minnesota in Minneapolis to deliver an important message to the neuroscience community: operational maturity is the fastest path to trustworthy AI in science.

    During the session, Dimitri presented ‘Building AI-Ready Data Operations for Modern Neuroscience’ to about 30 participants. The audience included graduate students, postdocs, and principal investigators from a range of departments, including Neuroscience, Biomedical Engineering, Neurology, and Clinical Movement Science. Many attendees lead multi-modal labs that combine electrophysiology, behavior, imaging, and clinical assessments, and they work hands-on with Python, MATLAB, and on-premises HPC systems.

    Our audience brought insightful questions to the table. They wanted to know how to transition from one-off scripts to reproducible, versioned pipelines without pausing their ongoing research. There was strong interest in easy-to-adopt metadata and naming standards and in applying FAIR principles to share data both across labs and within their departments. Other topics included managing and scheduling HPC resources, handling clinical and patient health information responsibly without slowing down workflows, and onboarding new trainees in a way that allows effective practices to endure.

    “This is the right way to do it, but we can’t stop our work for two months to change all our routines.”

    One key takeaway resonated with everyone: meet labs where they are. As one participant said, “This is the right way to do it, but we can’t stop our work for two months to change all our routines.” We agree. The best way forward is a phased approach. Start with quick wins, such as improved naming conventions, simple metadata, and automated data ingestion scripts. Next, build toward robust schemas, automated quality checks, and reproducible data pipelines. This approach ensures that AI becomes a reliable asset, not just an experimental add-on.

    Why does this matter? Labs that invest in clean data operations are already delivering results more quickly and with greater transparency. These labs also share their success across departments, helping modern neuroscience expand from individual discoveries to campus-wide impact.

    For further reading about trustworthy data operations in neuroscience, see our latest work: “SciOps: Achieving Productivity and Reliability in Data-Intensive Research”.

    Looking ahead, we are partnering with several groups to determine the most actionable first steps that can be taken right away. If you would like our slide deck or a quick data maturity assessment for your lab, reach out and we are ready to help!

  • DataJoint at SENC 2025

    DataJoint at SENC 2025

    DataJoint’s Sciops Engineer Milagros Marín spent Sept 3–5 at the 20th Meeting of the Spanish Society for Neuroscience (SENC) in Las Palmas de Gran Canaria, where researchers across neurodevelopment, neurodegeneration, and next-generation model systems filled the Auditorio Alfredo Kraus.

    Sessions were hosted at the Auditorio Alfredo Kraus with versatile spaces for parallel sessions and workshops

    The congress opened with Isabel Fariñas (University of Valencia; BioTecMed/CIBERNED)—2024 Santiago Ramón y Cajal laureate and EMBO member—on how the physical niche regulates neural stem-cell quiescence (including work with neurospheres).

    SENC also launched the WiNS Award for Inclusive Leadership, with a committee that included Ana Bribián Arruego (Sanofi) and Eva Ortega Paíno (Secretary-General of Research from the Ministry of Science, Innovation and Universities in Spain), alongside other distinguished scientists.

    Round table with some of the members of the Women for Neuroscience Committee (WinS)

    Organoids took center stage, with a dedicated symposium led by Aixa V. Morales (Instituto Cajal, CSIC) and Antonella Consiglio (IDIBELL/University of Barcelona). Talks featured Antonella Consiglio on patient-derived models for Tyrosine Hydroxylase Deficiency and Silvia Cappello (Max Planck Institute of Psychiatry) on cellular crosstalk in brain development—alongside other prominent researchers. On the industry side, I had the opportunity to meet 3Brain and HBio, who presented their organoid research solutions.

    Antonella Consiglio (University of Barcelona & IDIBELL), a leading expert in organoid research with 20+ years of groundbreaking contributions, presenting at SENC 2025.

    Why this hits home for us

    Organoid and other human-relevant models are scaling fast. DataJoint helps labs keep pace with open-source, reproducible, and scalable workflows—from ingest and QC to analysis, versioning, and collaboration—built for transparent, multi-site research.

    • We’ll be sharing our open-source DataJoint organoid pipeline in an upcoming post!
    • Want a closer look? Contact us for a demo of DataJoint SciOps for organoid research.


    With appreciation: to SENC President Manuel Sánchez-Malmierca, board member Lydia Jiménez, the SENC Board, and all organizers, speakers, participants, and attendees for fostering a welcoming, rigorous, and collegial environment.

  • DataJoint at MIT: ODIN 2025

    DataJoint at MIT: ODIN 2025

    At MIT’s McGovern Institute for Brain Research, during the ODIN (Open Data in Neurophysiology) Workshop featured a hands-on session with the DataJoint team—Dimitri Yatsenko, Monty Kosma, and Thinh Nguyen. The team led a live demo and hands-on tutorials on building reproducible, scalable data workflows with DataJoint, including how to publish pipelines and datasets to DANDI (and related repositories such as EMBER). They also walked through real examples from data-intensive research projects, such as the MICrONS Project.

    Thinh Nguyen, SciOps Lead, ready to start at MIT’s McGovern Institute

    Why this matters

    Open, shareable workflows help labs collaborate, reproduce results, and speed up move faster. That’s the goal of ODIN—building community and infrastructure for open neurophysiology.

    Watch the demo

    Talk: Dr. Dimitri Yatsenko — “Building Data Workflows for Neuroscience and AI”

    Tutorials: Dr. Thinh Nguyen —  step-by-step examples you can adapt to your lab

    Contact us for a DataJoint SciOps demo!


    Explore more events and resources at the ODIN initiative hub.