DataJoint Basics in Ten Minutes
As I mentioned in my previous post, DataJoint is one of those rare technologies that comes around and can truly revolutionize your workflow making your life much easier.
The following are different functions that DataJoint has. For the purposes of this blog post, we will imagine ourselves as neuroscientists about to perform a neurophysiology experiment on our subjects, a set of mice.
We start off by importing DataJoint; our convention is to use dj
as its alias.
>>> import datajoint as dj
From here we create a schema, which is where our tables will live. We give our schema some meaningful name, in this case, I will call mine ‘tutorial’, and we end the name with an _, giving us tutorial_.
>>> schema = dj.schema('tutorial_')
Creating Our First Table
Now that we have our schema, it is time to populate it with tables! Our first table is called Mouse and it is a Manual table. This is a category of tables where the information is inserted manually.
>>> @schema
class Mouse(dj.Manual):
definition = """
# Experimental animals
mouse_id : int # Unique animal ID
---
dob=null : date # date of birth
sex="unknown" : enum('M','F','unknown') # sex
"""
On the left side, under #Experimental animals, we see our column names. These are the features that we will document about our Mice. Each table has a primary key which is separated from the secondary keys by — — —
. In this case, the primary key is the mouse_id, and the secondary keys are dob (date of birth) and sex.
Inserting Data
>>> Mouse()
Mouse_id dob sex
Our table is currently empty and being a manual table, we must insert data into it. DataJoint has several ways of doing this. The first is using the .insert()
function.
>>> Mouse.insert1((0, '2017-03-01', 'M'))
>>> Mouse()
Mouse_id dob sex
0 2017-03-01 M
Here we see that we can insert a tuple with the necessary information.
You can insert data via a dictionary.
>>> data = { 'mouse_id': 100, 'dob': '2017-05-12', 'sex': 'F'}
>>> Mouse.insert1(data)
Mouse_id dob sex
0 2017-03-01 M
100 2017-05-12 F
You can also use a list of tuples.
>>> data = [(1, '2016-11-19', 'M'),(2, '2016-11-20', 'unknown'),(5, '2016-12-25', 'F')]
>>> Mouse.insert(data)
Out[7]: Mouse_id dob sex
0 2017-03-01 M
1 2016-11-19 M
2 2016-11-20 unknown
5 2016-12-25 F
100 2017-15-12 F
Data Integrity
One of the many benefits of DataJoint is its ability to uphold data integrity. Trying to insert a new mouse with mouse_id :0
results in a Duplicate error.
>>> Mouse.insert1({'mouse_id': 0,'dob': '2018-01-01','sex': 'M',})
---------------------------------------------------------------------------DuplicateError
Traceback (most recent call last)<ipython-input-21-304369cd8a83> in <module> 2 {'mouse_id': 0, 3 'dob': '2018-01-01',----> 4 'sex': 'M', 5 })~/.local/lib/python3.7/site-packages/datajoint/table.py in insert1(self, row, **kwargs) 264 For kwargs, see insert() 265 """--> 266 self.insert((row,), **kwargs) 267 268 def insert(self, rows, replace=False, skip_duplicates=False, ignore_extra_fields=False, allow_direct_insert=None):~/.local/lib/python3.7/site-packages/datajoint/table.py in insert(self, rows, replace, skip_duplicates, ignore_extra_fields, allow_direct_insert) 336 except DuplicateError as err: 337 raise err.suggest(--> 338 'To ignore duplicate entries in insert, set skip_duplicates=True') 339 340 def delete_quick(self, get_count=False):DuplicateError: ("Duplicate entry '0' for key 'PRIMARY'", 'To ignore duplicate entries in insert, set skip_duplicates=True')
Creating A Table With Dependencies
>>> @schema
class Session(dj.Manual):
definition = """
# Experiment session
-> Mouse
session_date : date # date
---
experiment_setup : int # experiment setup ID
experimenter : varchar(100) # experimenter name
data_path='' : varchar(255) # """
We can see that when creating the Session table, the first primary key is -> Mouse.
This means that Session inherits the Mouse’s primary key as well as includes session_date
.
Querying
One of DataJoint’s strengths lies in its ability to query. There may be times when you only want a subset of the data according to certain criteria.
Four Types Of Queries:
- restriction (
&
) and negative restriction (-
): filter data - join (
*
): bring fields from different tables together - projection (
.proj()
): focus on a subset of attributes - aggregation (
.aggr()
): simple computation of one table against another table
Restriction (&)
The & can be used to limit
>>> Mouse & 'mouse_id=0'
Mouse_id dob sex
0 2017-03-01 M
We can also use the restriction with a couple of other keywords.
In[4]: Mouse & 'mouse_id = 0'Out[4]: Mouse_id dob sex 0 2017-03-01 M
>>> Mouse & 'mouse_id = 0'
Mouse_id dob sex
0 2017-03-01 M
Related posts
Naming things
Updates Delivered *Straight to Your Inbox*
Join the mailing list for industry insights, company news, and product updates delivered monthly.
