Skip to content

User setup instructions

The following document describes how to setup a development environment and connect to a database so that you can use the DataJoint Elements to build and run a workflow on your local machine.

Any of the DataJoint Elements can be combined together to create a workflow that matches your experimental setup. We have a number of example workflows to get you started. Each focuses on a specific modality, but they can be adapted for your custom workflow.

  1. Getting up and running will require a couple items for a good development environment. If any of these items are already familiar to you and installed on your machine, you can skip the corresponding section.

    1. Python

    2. Conda

    3. Integrated Development Environment

    4. Version Control (git)

    5. Visualization packages

  2. Next, you'll need to download one of the example workflows and corresponding example data.

  3. Finally, there are a couple different approaches to connecting to a database. Here, we highlight three approaches:

    1. First Time: Beginner. Temporary storage to learn the ropes.

    2. Local Database: Intermediate. Deployed on local hardware, managed by you.

    3. Central Database: Advanced: Deployed on dedicated hardware.

Development Environment

This diagram describes the general components for a local DataJoint environment.

flowchart LR
  py_interp  -->|DataJoint| db_server[("Database Server\n(e.g., MySQL)")]
  subgraph conda["Conda environment"]
    direction TB
    py_interp[Python Interpreter]
  end
  subgraph empty1[" "] %% Empty subgraphs prevent overlapping titles
    direction TB
    style empty1 fill:none, stroke-dasharray: 0 1
    conda
  end
  subgraph term["Terminal or Jupyter Notebook"]
    direction TB
    empty1
  end
  subgraph empty2[" "] %% Empty subgraphs prevent overlapping titles
    direction TB
    style empty2 fill:none, stroke-dasharray: 0 1
    term
  end
  class py_interp,conda,term,ide,db_server,DataJoint boxes;
  classDef boxes fill:#ddd, stroke:#333;

Python

DataJoint Elements are written in Python. The DataJoint Python API supports Python versions 3.7 and up. We recommend downloading the latest stable release of 3.9 here, and following the install instructions.

Conda

Python projects each rely on different dependencies, which may conflict across projects. We recommend working in a Conda environment for each project to isolate the dependencies. For more information on why Conda, and setting up the version of Conda that best suits your needs, see this article.

To get going quickly, we recommend you ...

  1. Download Miniconda and go through the setup, including adding Miniconda to your PATH (full instructions here).

  2. Declare and initialize a new conda environment with the following commands. Edit <name> to reflect your project.

    conda create --name datajoint-workflow-<name> python=3.9 
    conda activate datajoint-workflow-<name> 
    
Apple M1 users: Click to expand

Running analyses with Element DeepLabCut or Element Calcium imaging may require tensorflow, which can cause issues on M1 machines. By saving the yaml file below, this environment can be loaded with conda create -f my-file.yaml . If you encounter errors related to clang, try launching xcode and retrying.

name: dj-workflow-<name>
channels:
    - apple 
    - conda-forge
    - defaults
dependencies:
    - tensorflow-deps
    - opencv
    - python=3.9
    - pip>=19.0 
    - pip:
        - tensorflow-macos
        - tensorflow-metal
        - datajoint

Integrated Development Environment (IDE)

Development and use can be done with a plain text editor in the terminal. However, an integrated development environment (IDE) can improve your experience. Several IDEs are available. We recommend Microsoft's Visual Studio Code, also called VS Code. To set up VS Code with Python for the first time, follow this tutorial.

Version Control (git)

Table definitions and analysis code can change over time, especially with multiple collaborators working on the same project. Git is an open-source, distributed version control system that helps keep track of what changes where made when, and by whom. GitHub is a platform that hosts projects managed with git. The example DataJoint Workflows are hosted on GitHub, we will use git to clone (i.e., download) this repository.

  1. Check if you already have git by typing git --version in a terminal window.
  2. If git is not installed on your system, please install git.
  3. You can read more about git basics here.

Visualization packages (Jupyter Notebooks, DataJoint Diagrams)

To run the demo notebooks and generate visualizations associated with an example workflow, you'll need a couple extra packages.

Jupyter Notebooks help structure code (see here for full instructions on Jupyter within VS Code).

  1. Install Jupyter packages

    conda install jupyter ipykernel nb_conda_kernels
    

  2. Ensure your VS Code python interpreter is set to your Conda environment path.

    Click to expand more details.
    • View > Command Palette
    • Type "Python: Select Interpreter", hit enter.
    • If asked, select the workspace where you plan to download the workflow.
    • If present, select your Conda environment. If not present, enter in the path.

DataJoint Diagrams rely on additional packages. To install these packages, enter the following command...

conda install graphviz python-graphviz pydotplus

Example Config, Workflows and Data

Of the options below, pick the workflow that best matches your needs.

  1. Change the directory to where you want to download the workflow.

    cd ~/Projects
    
  2. Clone the relevant repository, and change directories to this new directory.

    git clone https://github.com/datajoint/<repository>
    cd <repository>
    

  3. Install this directory as editable with the -e flag.

    pip install -e .
    
    Why editable? Click for details This lets you modify the code after installation and experiment with different designs or adding additional tables. You may wish to edit pipeline.py or paths.py to better suit your needs. If no modification is required, using pip install . is sufficient.

  4. Install element-interface, which has utilities used across different Elements and Workflows.

    pip install "element-interface @ git+https://github.com/datajoint/element-interface"
    
  5. Set up a local DataJoint config file by saving the following block as a json in your workflow directory as dj_local_conf.json. Not sure what to put for the < > values below? We'll cover this when we connect to the database

    {
        "database.host": "<hostname>",
        "database.user": "<username>",
        "database.password": "<password>",
        "loglevel": "INFO",
        "safemode": true,
        "display.limit": 7,
        "display.width": 14,
        "display.show_tuple_count": true,
        "custom": {
            "database.prefix": "<username_>"
        }
    }
    

Example Workflows

  • Workflow Array Electrophysiology

    An example workflow for Neuropixels probes.

    Clone from GitHub

  • Workflow Calcium Imaging

    An example workflow for calcium imaging microscopy.

    Clone from GitHub

  • Workflow DeepLabCut

    An example workflow for pose estimation with DeepLabCut.

    Clone from GitHub

Example Data

The first notebook in each workflow will guide you through downloading example data from DataJoint's AWS storage archive. You can also process your own data. To use the example data, you would ...

  1. Install djarchive-client

    pip install git+https://github.com/datajoint/djarchive-client.git
    
  2. Use a python terminal to import the djarchive client and view available datasets, and revisions.

    import djarchive_client
    client = djarchive_client.client()
    list(client.datasets())  # List available datasets, select one
    list(client.revisions()) # List available revisions, select one
    
  3. Prepare a directory to store the download data, for example in /tmp, then download the data with the djarchive client. This may take some time with larger datasets.

    import os
    os.makedirs('/tmp/example_data/', exist_ok=True)
    client.download(
        '<workflow-dataset>',
        target_directory='/tmp/example_data',
        revision='<revision>'
    )
    

Example Data Organization

Array Ephys: Click to expand details
  • Dataset: workflow-array-ephys-benchmark
  • Revision: 0.1.0a4
  • Size: 293 GB

The example subject6/session1 data was recorded with SpikeGLX and processed with Kilosort2.

/tmp/example_data/
- subject6
- session1
    - towersTask_g0_imec0
    - towersTask_g0_t0_nidq.meta
    - towersTask_g0_t0.nidq.bin
Element and Workflow Array Ephys also support data recorded with OpenEphys.

Calcium Imaging: Click to expand details
  • Dataset: workflow-array-calcium-imaging-test-set
  • Revision: 0_1_0a2
  • Size: 142 GB

The example subject3 data was recorded with Scanbox. The example subject7 data was recorded with ScanImage. Both datasets were processed with Suite2p.

/tmp/example_data/
- subject3/
    - 210107_run00_orientation_8dir/
        - run00_orientation_8dir_000_000.sbx
        - run00_orientation_8dir_000_000.mat
        - suite2p/
            - combined
            - plane0
            - plane1
            - plane2
            - plane3
- subject7/
    - session1
        - suite2p
            - plane0
Element and Workflow Calcium Imaging also support data collected with ... - Nikon - Prairie View - CaImAn

DeepLabCut: Click to expand details
  • Dataset: workflow-dlc-data
  • Revision: v1
  • Size: .3 GB

The example data includes both training data and pretrained models.

/tmp/test_data/from_top_tracking/
- config.yml
- dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/
    - test/pose_cfg.yaml
    - train/
        - checkpoint
        - checkpoint_orig
        ─ learning_stats.csv
        ─ log.txt
        ─ pose_cfg.yaml
        ─ snapshot-10300.data-00000-of-00001
        ─ snapshot-10300.index
        ─ snapshot-10300.meta   # same for 103000
- labeled-data/
    - train1/
        - CollectedData_DJ.csv
        - CollectedData_DJ.h5
        - img00674.png          # and others
    - train2/                   # similar to above
- videos/
    - test.mp4
    - train1.mp4

FaceMap: Click to expand details

Associated workflow still under development

  • Dataset: workflow-facemap
  • Revision: 0.0.0
  • Size: .3 GB

Using Your Own Data

Some of the workflows carry some assumptions about how your file directory will be organized, and how some files are named.

Array Ephys: Click to expand details
  • In your DataJoint config, add another item under custom, ephys_root_data_dir, for your local root data directory. This can include multiple roots.

    "custom": {
        "database.prefix": "<username_>",
        "ephys_root_data_dir": ["/local/root/dir1", "/local/root/dir2"]
    }
    
  • The subject directory names must match the subject IDs in your subjects table. The ingest.py script ( demo ingestion notebook ) can help load these values from ./user_data/subjects.csv.
  • The session directories can have any naming convention, but must be specified in the session table (see also demo ingestion notebook ).
  • Each session can have multiple probes.
  • The probe directory names must end in a one-digit number corresponding to the probe number.
  • Each probe directory should contain: - One neuropixels meta file named *[0-9].ap.meta - Optionally, one Kilosort output folder

Folder structure:

<ephys_root_data_dir>/
└───<subject1>/                       # Subject name in `subjects.csv`
│   └───<session0>/                   # Session directory in `sessions.csv`
│   │   └───imec0/
│   │   │   │   *imec0.ap.meta
│   │   │   └───ksdir/
│   │   │       │   spike_times.npy
│   │   │       │   templates.npy
│   │   │       │   ...
│   │   └───imec1/
│   │       │   *imec1.ap.meta
│   │       └───ksdir/
│   │           │   spike_times.npy
│   │           │   templates.npy
│   │           │   ...
│   └───<session1>/
│   │   │   ...
└───<subject2>/
│   │   ...

Calcium Imaging: Click to expand details

Note: While Element Calcium Imaging can accommodate multiple scans per session, Workflow Calcium Imaging assumes there is only one scan per session.

  • In your DataJoint config, add another item under custom, imaging_root_data_dir, for your local root data directory.

    "custom": {
        "database.prefix": "<username_>",
        "imaging_root_data_dir": "/local/root/dir1"
    }
    
  • The subject directory names must match the subject IDs in your subjects table. The ingest.py script ( tutorial notebook ) can help load these values from ./user_data/subjects.csv.
  • Each session directory should contain: - All .tif or .sbx files for the scan, with any naming convention. - One suite2p subfolder, containing the analysis outputs in the default naming convention. - One caiman subfolder, containing the analysis output .hdf5 file, with any naming convention.

Folder structure:

imaging_root_data_dir/
└───<subject1>/                     # Subject name in `subjects.csv`
│   └───<session0>/                 # Session directory in `sessions.csv`
│   │   │   scan_0001.tif
│   │   │   scan_0002.tif
│   │   │   scan_0003.tif
│   │   │   ...
│   │   └───suite2p/
│   │       │   ops1.npy
│   │       └───plane0/
│   │       │   │   ops.npy
│   │       │   │   spks.npy
│   │       │   │   stat.npy
│   │       │   │   ...
│   │       └───plane1/
│   │           │   ops.npy
│   │           │   spks.npy
│   │           │   stat.npy
│   │           │   ...
│   │   └───caiman/
│   │       │   analysis_results.hdf5
│   └───<session1>/                 # Session directory in `sessions.csv`
│   │   │   scan_0001.tif
│   │   │   scan_0002.tif
│   │   │   ...
└───<subject2>/                     # Subject name in `subjects.csv`
│   │   ...

DeepLabCut: Click to expand details

Note: Element DeepLabCut assumes you've already used the DeepLabCut GUI to set up your project and label your data. This can include multiple roots.

  • In your DataJoint config, add another item under custom, dlc_root_data_dir, for your local root data directory.
    "custom": {
        "database.prefix": "<username_>",
        "dlc_root_data_dir": ["/local/root/dir1", "/local/root/dir2"]
    }
    
  • You have preserved the default DeepLabCut project directory, shown below.
  • The paths in your various yaml files reflect the current folder structure.

Folder structure:

/dlc_root_data_dir/your_project/
- config.yaml                   # Including correct path information
- dlc-models/iteration-*/your_project_date-trainset*shuffle*/
    - test/pose_cfg.yaml        # Including correct path information
    - train/pose_cfg.yaml       # Including correct path information
- labeled-data/any_names/*{csv,h5,png}
- training-datasets/iteration-*/UnaugmentedDataSet_your_project_date/
    - your_project_*shuffle*.pickle
    - your_project_scorer*shuffle*.mat
- videos/any_names.mp4

Miniscope: Click to expand details
  • In your DataJoint config, add another item under custom, miniscope_root_data_dir, for your local root data directory.

    "custom": {
        "database.prefix": "<username_>",
        "miniscope_root_data_dir": "/local/root/dir"
    }
    

Relational databases

DataJoint helps you connect to a database server from your programming environment (i.e., Python or MATLAB), granting a number of benefits over traditional file hierarchies (see YouTube Explainer). We offer two options:

  1. The First Time beginner approach loads example data to a temporary existing database, saving you setup time. But, because this data will be purged intermittently, it should not be used in a true experiment.
  2. The Local Database intermediate approach will walk you through setting up your own database on your own hardware. While easier to manage, it may be difficult to expose this to outside collaborators.
  3. The Central Database advanced approach has the benefits of running on dedicated hardware, but may require significant IT expertise and infrastructure depending on your needs.

First time

Temporary storage. Not for production use.

  1. Make an account at accounts.datajoint.io.
  2. In a workflow directory, make a config json file called dj_local_conf.json using your DataJoint account information and tutorial-db.datajoint.io as the host.
    {
        "database.host": "tutorial-db.datajoint.io",
        "database.user": "<datajoint-username>",
        "database.password": "<datajoint-password>",
        "loglevel": "INFO",
        "safemode": true,
        "display.limit": 7,
        "display.width": 14,
        "display.show_tuple_count": true,
        "custom": {
        "database.prefix": "<datajoint-username_>"
        }
    }
    
    Note: Your database prefix must begin with your username in order to have permission to declare new tables.
  3. Launch a Python terminal and start interacting with the workflow.

Local Database

  1. Install Docker.

    Why Docker? Click for details. Docker makes it easy to package a program, including the file system and related code libraries, in a container. This container can be distributed to any machine, both automating and standardizing the setup process.

  2. Test that docker has been installed by running the following command:

    docker run --rm hello-world
    

  3. Launch the DataJoint MySQL server with the following command:
     docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=tutorial datajoint/mysql
    
    What's this doing? Click for details.
    • Download a container image called datajoint/mysql, which is pre-installed and configured MySQL database with appropriate settings for use with DataJoint
    • Open up the port 3306 (MySQL default) on your computer so that your database server can accept connections.
    • Set the password for the root database user to be tutorial, which are then used in the config file.
  4. In a workflow directory, make a config json file called dj_local_conf.json using the following details. The prefix can be set to any value.
    {
        "database.host": "localhost",
        "database.password": "tutorial",
        "database.user": "root",
        "database.port": 3306,
        "loglevel": "INFO",
        "safemode": true,
        "display.limit": 7,
        "display.width": 14,
        "display.show_tuple_count": true,
        "custom": {
            "database.prefix": "neuro_"
        }
    }
    
Already familiar with Docker? Click here for details.

This document is written to apply to all example workflows. Many have a docker folder used by developers to set up both a database and a local environment for integration tests. Simply docker compose up the relevant file and docker exec into the relevant container.

Central Database

To set up a database on dedicated hardware may require expertise to set up and maintain. DataJoint's MySQL Docker image project provides all the information required to set up a dedicated database.

Interacting with the Workflow

In Python

  1. Connect to the database and import tables

    from <relevant-workflow>.pipeline import *
    
  2. View the declared tables. For a more in depth explanation of how to run the workflow and explore the data, refer to the Jupyter notebooks in the workflow directory.

    Array Ephys: Click to expand details
    subject.Subject()
    session.Session()
    ephys.ProbeInsertion()
    ephys.EphysRecording()
    ephys.Clustering()
    ephys.Clustering.Unit()
    
    Calcium Imaging: Click to expand details
    subject.Subject()
    session.Session()
    scan.Scan()
    scan.ScanInfo()
    imaging.ProcessingParamSet()
    imaging.ProcessingTask()
    
    DeepLabCut: Click to expand details
    subject.Subject()
    session.Session()
    train.TrainingTask()
    model.VideoRecording.File()
    model.Model()
    model.PoseEstimation.BodyPartPosition()
    

DataJoint LabBook

DataJoint LabBook is a graphical user interface to facilitate data entry for existing DataJoint tables.

  • Labbook Website - If a database is public (e.g., tutorial-db) and you have access, you can view the contents here.