User setup instructions¶
The following document describes how to setup a development environment and connect to a database so that you can use the DataJoint Elements to build and run a workflow on your local machine.
Any of the DataJoint Elements can be combined together to create a workflow that matches your experimental setup. We have a number of example workflows to get you started. Each focuses on a specific modality, but they can be adapted for your custom workflow.
-
Getting up and running will require a couple items for a good development environment. If any of these items are already familiar to you and installed on your machine, you can skip the corresponding section.
1. Python
2. Conda
-
Next, you'll need to download one of the example workflows and corresponding example data.
-
Finally, there are a couple different approaches to connecting to a database. Here, we highlight three approaches:
1. First Time: Beginner. Temporary storage to learn the ropes.
2. Local Database: Intermediate. Deployed on local hardware, managed by you.
3. Central Database: Advanced: Deployed on dedicated hardware.
Development Environment¶
This diagram describes the general components for a local DataJoint environment.
flowchart LR
py_interp -->|DataJoint| db_server[("Database Server\n(e.g., MySQL)")]
subgraph conda["Conda environment"]
direction TB
py_interp[Python Interpreter]
end
subgraph empty1[" "] %% Empty subgraphs prevent overlapping titles
direction TB
style empty1 fill:none, stroke-dasharray: 0 1
conda
end
subgraph term["Terminal or Jupyter Notebook"]
direction TB
empty1
end
subgraph empty2[" "] %% Empty subgraphs prevent overlapping titles
direction TB
style empty2 fill:none, stroke-dasharray: 0 1
term
end
class py_interp,conda,term,ide,db_server,DataJoint boxes;
classDef boxes fill:#ddd, stroke:#333;
Python¶
DataJoint Elements are written in Python. The DataJoint Python API supports Python versions 3.7 and up. We recommend downloading the latest stable release of 3.9 here, and following the install instructions.
Conda¶
Python projects each rely on different dependencies, which may conflict across projects. We recommend working in a Conda environment for each project to isolate the dependencies. For more information on why Conda, and setting up the version of Conda that best suits your needs, see this article.
To get going quickly, we recommend you ...
-
Download Miniconda and go through the setup, including adding Miniconda to your
PATH
(full instructions here). -
Declare and initialize a new conda environment with the following commands. Edit
<name>
to reflect your project.conda create --name datajoint-workflow-<name> python=3.9 conda activate datajoint-workflow-<name>
Apple M1 users: Click to expand
Running analyses with Element DeepLabCut or Element Calcium imaging may require
tensorflow, which can cause issues on M1 machines. By saving the yaml
file below, this environment can be loaded with conda create -f my-file.yaml
. If you encounter errors related to clang
, try launching xcode
and retrying.
name: dj-workflow-<name>
channels:
- apple
- conda-forge
- defaults
dependencies:
- tensorflow-deps
- opencv
- python=3.9
- pip>=19.0
- pip:
- tensorflow-macos
- tensorflow-metal
- datajoint
Integrated Development Environment (IDE)¶
Development and use can be done with a plain text editor in the terminal. However, an integrated development environment (IDE) can improve your experience. Several IDEs are available. We recommend Microsoft's Visual Studio Code, also called VS Code. To set up VS Code with Python for the first time, follow this tutorial.
Version Control (git)¶
Table definitions and analysis code can change over time, especially with multiple collaborators working on the same project. Git is an open-source, distributed version control system that helps keep track of what changes where made when, and by whom. GitHub is a platform that hosts projects managed with git. The example DataJoint Workflows are hosted on GitHub, we will use git to clone (i.e., download) this repository.
- Check if you already have git by typing
git --version
in a terminal window. - If git is not installed on your system, please install git.
- You can read more about git basics here.
Visualization packages (Jupyter Notebooks, DataJoint Diagrams)¶
To run the demo notebooks and generate visualizations associated with an example workflow, you'll need a couple extra packages.
Jupyter Notebooks help structure code (see here for full instructions on Jupyter within VS Code).
-
Install Jupyter packages
conda install jupyter ipykernel nb_conda_kernels
-
Ensure your VS Code python interpreter is set to your Conda environment path.
Click to expand more details.
- View > Command Palette
- Type "Python: Select Interpreter", hit enter.
- If asked, select the workspace where you plan to download the workflow.
- If present, select your Conda environment. If not present, enter in the path.
DataJoint Diagrams rely on additional packages. To install these packages, enter the following command...
conda install graphviz python-graphviz pydotplus
Example Config, Workflows and Data¶
Of the options below, pick the workflow that best matches your needs.
-
Change the directory to where you want to download the workflow.
cd ~/Projects
-
Clone the relevant repository, and change directories to this new directory.
git clone https://github.com/datajoint/<repository> cd <repository>
-
Install this directory as editable with the
-e
flag.pip install -e .
Why editable? Click for details
This lets you modify the code after installation and experiment with different designs or adding additional tables. You may wish to editpipeline.py
orpaths.py
to better suit your needs. If no modification is required, usingpip install .
is sufficient. -
Install
element-interface
, which has utilities used across different Elements and Workflows.pip install "element-interface @ git+https://github.com/datajoint/element-interface"
-
Set up a local DataJoint config file by saving the following block as a json in your workflow directory as
dj_local_conf.json
. Not sure what to put for the< >
values below? We'll cover this when we connect to the database{ "database.host": "<hostname>", "database.user": "<username>", "database.password": "<password>", "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "<username_>" } }
Example Workflows¶
- Workflow Session
An example workflow for session management.
- Workflow Array Electrophysiology
An example workflow for Neuropixels probes.
- Workflow Calcium Imaging
An example workflow for calcium imaging microscopy.
- Workflow Miniscope
An example workflow for miniscope calcium imaging.
- Workflow DeepLabCut
An example workflow for pose estimation with DeepLabCut.
Example Data¶
The first notebook in each workflow will guide you through downloading example data from DataJoint's AWS storage archive. You can also process your own data. To use the example data, you would ...
-
Install
djarchive-client
pip install git+https://github.com/datajoint/djarchive-client.git
-
Use a python terminal to import the
djarchive
client and view available datasets, and revisions.import djarchive_client client = djarchive_client.client() list(client.datasets()) # List available datasets, select one list(client.revisions()) # List available revisions, select one
-
Prepare a directory to store the download data, for example in
/tmp
, then download the data with thedjarchive
client. This may take some time with larger datasets.import os os.makedirs('/tmp/example_data/', exist_ok=True) client.download( '<workflow-dataset>', target_directory='/tmp/example_data', revision='<revision>' )
Example Data Organization¶
Array Ephys: Click to expand details
- Dataset: workflow-array-ephys-benchmark
- Revision: 0.1.0a4
- Size: 293 GB
The example subject6/session1
data was recorded with SpikeGLX and
processed with Kilosort2.
/tmp/example_data/
- subject6
- session1
- towersTask_g0_imec0
- towersTask_g0_t0_nidq.meta
- towersTask_g0_t0.nidq.bin
Calcium Imaging: Click to expand details
- Dataset: workflow-array-calcium-imaging-test-set
- Revision: 0_1_0a2
- Size: 142 GB
The example subject3
data was recorded with Scanbox.
The example subject7
data was recorded with ScanImage.
Both datasets were processed with Suite2p.
/tmp/example_data/
- subject3/
- 210107_run00_orientation_8dir/
- run00_orientation_8dir_000_000.sbx
- run00_orientation_8dir_000_000.mat
- suite2p/
- combined
- plane0
- plane1
- plane2
- plane3
- subject7/
- session1
- suite2p
- plane0
DeepLabCut: Click to expand details
- Dataset: workflow-dlc-data
- Revision: v1
- Size: .3 GB
The example data includes both training data and pretrained models.
/tmp/test_data/from_top_tracking/
- config.yml
- dlc-models/iteration-0/from_top_trackingFeb23-trainset95shuffle1/
- test/pose_cfg.yaml
- train/
- checkpoint
- checkpoint_orig
─ learning_stats.csv
─ log.txt
─ pose_cfg.yaml
─ snapshot-10300.data-00000-of-00001
─ snapshot-10300.index
─ snapshot-10300.meta # same for 103000
- labeled-data/
- train1/
- CollectedData_DJ.csv
- CollectedData_DJ.h5
- img00674.png # and others
- train2/ # similar to above
- videos/
- test.mp4
- train1.mp4
FaceMap: Click to expand details
Associated workflow still under development
- Dataset: workflow-facemap
- Revision: 0.0.0
- Size: .3 GB
Using Your Own Data¶
Some of the workflows carry some assumptions about how your file directory will be organized, and how some files are named.
Array Ephys: Click to expand details
-
In your DataJoint config, add another item under
custom
,ephys_root_data_dir
, for your local root data directory. This can include multiple roots."custom": { "database.prefix": "<username_>", "ephys_root_data_dir": ["/local/root/dir1", "/local/root/dir2"] }
- The
subject
directory names must match the subject IDs in your subjects table. Theingest.py
script ( demo ingestion notebook ) can help load these values from./user_data/subjects.csv
.
- The
session
directories can have any naming convention, but must be specified in the session table (see also demo ingestion notebook ).
- Each session can have multiple probes.
- The
probe
directory names must end in a one-digit number corresponding to the probe number.
- Each
probe
directory should contain: - One neuropixels meta file named*[0-9].ap.meta
- Optionally, one Kilosort output folder
Folder structure:
<ephys_root_data_dir>/
└───<subject1>/ # Subject name in `subjects.csv`
│ └───<session0>/ # Session directory in `sessions.csv`
│ │ └───imec0/
│ │ │ │ *imec0.ap.meta
│ │ │ └───ksdir/
│ │ │ │ spike_times.npy
│ │ │ │ templates.npy
│ │ │ │ ...
│ │ └───imec1/
│ │ │ *imec1.ap.meta
│ │ └───ksdir/
│ │ │ spike_times.npy
│ │ │ templates.npy
│ │ │ ...
│ └───<session1>/
│ │ │ ...
└───<subject2>/
│ │ ...
Calcium Imaging: Click to expand details
Note: While Element Calcium Imaging can accommodate multiple scans per session, Workflow Calcium Imaging assumes there is only one scan per session.
-
In your DataJoint config, add another item under
custom
,imaging_root_data_dir
, for your local root data directory."custom": { "database.prefix": "<username_>", "imaging_root_data_dir": "/local/root/dir1" }
- The
subject
directory names must match the subject IDs in your subjects table. Theingest.py
script ( tutorial notebook ) can help load these values from./user_data/subjects.csv
.
- The
session
directories can have any naming convention, but must be specified in the session table (see also [tutorial notebook])(https://github.com/datajoint/element-calcium-imaging/blob/main/notebooks/tutorial.ipynb) .
- Each
session
directory should contain: - All.tif
or.sbx
files for the scan, with any naming convention. - Onesuite2p
subfolder, containing the analysis outputs in the default naming convention. - Onecaiman
subfolder, containing the analysis output.hdf5
file, with any naming convention.
Folder structure:
imaging_root_data_dir/
└───<subject1>/ # Subject name in `subjects.csv`
│ └───<session0>/ # Session directory in `sessions.csv`
│ │ │ scan_0001.tif
│ │ │ scan_0002.tif
│ │ │ scan_0003.tif
│ │ │ ...
│ │ └───suite2p/
│ │ │ ops1.npy
│ │ └───plane0/
│ │ │ │ ops.npy
│ │ │ │ spks.npy
│ │ │ │ stat.npy
│ │ │ │ ...
│ │ └───plane1/
│ │ │ ops.npy
│ │ │ spks.npy
│ │ │ stat.npy
│ │ │ ...
│ │ └───caiman/
│ │ │ analysis_results.hdf5
│ └───<session1>/ # Session directory in `sessions.csv`
│ │ │ scan_0001.tif
│ │ │ scan_0002.tif
│ │ │ ...
└───<subject2>/ # Subject name in `subjects.csv`
│ │ ...
DeepLabCut: Click to expand details
Note: Element DeepLabCut assumes you've already used the DeepLabCut GUI to set up your project and label your data. This can include multiple roots.
- In your DataJoint config, add another item under
custom
,dlc_root_data_dir
, for your local root data directory."custom": { "database.prefix": "<username_>", "dlc_root_data_dir": ["/local/root/dir1", "/local/root/dir2"] }
- You have preserved the default DeepLabCut project directory, shown below.
- The paths in your various
yaml
files reflect the current folder structure.
- You have generated the
pickle
andmat
training files. If not, follow the DeepLabCut guide to create a training dataset
Folder structure:
/dlc_root_data_dir/your_project/
- config.yaml # Including correct path information
- dlc-models/iteration-*/your_project_date-trainset*shuffle*/
- test/pose_cfg.yaml # Including correct path information
- train/pose_cfg.yaml # Including correct path information
- labeled-data/any_names/*{csv,h5,png}
- training-datasets/iteration-*/UnaugmentedDataSet_your_project_date/
- your_project_*shuffle*.pickle
- your_project_scorer*shuffle*.mat
- videos/any_names.mp4
Miniscope: Click to expand details
-
In your DataJoint config, add another item under
custom
,miniscope_root_data_dir
, for your local root data directory."custom": { "database.prefix": "<username_>", "miniscope_root_data_dir": "/local/root/dir" }
Relational databases¶
DataJoint helps you connect to a database server from your programming environment (i.e., Python or MATLAB), granting a number of benefits over traditional file hierarchies (see YouTube Explainer). We offer two options:
- The First Time beginner approach loads example data to a temporary existing database, saving you setup time. But, because this data will be purged intermittently, it should not be used in a true experiment.
- The Local Database intermediate approach will walk you through setting up your own database on your own hardware. While easier to manage, it may be difficult to expose this to outside collaborators.
- The Central Database advanced approach has the benefits of running on dedicated hardware, but may require significant IT expertise and infrastructure depending on your needs.
First time¶
Temporary storage. Not for production use.
- Make an account at accounts.datajoint.io.
- In a workflow directory, make a config
json
file calleddj_local_conf.json
using your DataJoint account information andtutorial-db.datajoint.io
as the host.Note: Your database prefix must begin with your username in order to have permission to declare new tables.{ "database.host": "tutorial-db.datajoint.io", "database.user": "<datajoint-username>", "database.password": "<datajoint-password>", "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "<datajoint-username_>" } }
- Launch a Python terminal and start interacting with the workflow.
Local Database¶
-
Install Docker.
Why Docker? Click for details.
Docker makes it easy to package a program, including the file system and related code libraries, in a container. This container can be distributed to any machine, both automating and standardizing the setup process. -
Test that docker has been installed by running the following command:
docker run --rm hello-world
- Launch the DataJoint MySQL server with the following command:
docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=tutorial datajoint/mysql
What's this doing? Click for details.
- Download a container image called datajoint/mysql, which is pre-installed and configured MySQL database with appropriate settings for use with DataJoint
- Open up the port 3306 (MySQL default) on your computer so that your database server can accept connections.
- Set the password for the root database user to be tutorial, which are then used in the config file.
- In a workflow directory, make a config
json
file calleddj_local_conf.json
using the following details. The prefix can be set to any value.{ "database.host": "localhost", "database.password": "tutorial", "database.user": "root", "database.port": 3306, "loglevel": "INFO", "safemode": true, "display.limit": 7, "display.width": 14, "display.show_tuple_count": true, "custom": { "database.prefix": "neuro_" } }
Already familiar with Docker? Click here for details.
This document is written to apply to all example workflows. Many have a docker
folder used by developers to set up both a database and a local environment for
integration tests. Simply docker compose up
the relevant file and
docker exec
into the relevant container.
Central Database¶
To set up a database on dedicated hardware may require expertise to set up and maintain. DataJoint's MySQL Docker image project provides all the information required to set up a dedicated database.
Interacting with the Workflow¶
In Python¶
-
Connect to the database and import tables
from <relevant-workflow>.pipeline import *
-
View the declared tables. For a more in depth explanation of how to run the workflow and explore the data, refer to the Jupyter notebooks in the workflow directory.
Array Ephys: Click to expand details
subject.Subject() session.Session() ephys.ProbeInsertion() ephys.EphysRecording() ephys.Clustering() ephys.Clustering.Unit()
Calcium Imaging: Click to expand details
subject.Subject() session.Session() scan.Scan() scan.ScanInfo() imaging.ProcessingParamSet() imaging.ProcessingTask()
DeepLabCut: Click to expand details
subject.Subject() session.Session() train.TrainingTask() model.VideoRecording.File() model.Model() model.PoseEstimation.BodyPartPosition()
DataJoint LabBook¶
DataJoint LabBook is a graphical user interface to facilitate data entry for existing DataJoint tables.
- Labbook Website - If a database is public (e.g.,
tutorial-db
) and you have access, you can view the contents here.
- DataJoint LabBook Documentation, including prerequisites, installation, and running the application