Skip to content

Diagrams

Diagrams are a great way to visualize the pipeline and understand the flow of data. DataJoint diagrams are based on entity relationship diagram (ERD). Objects of type dj.Diagram allow visualizing portions of the data pipeline in graphical form. Tables are depicted as nodes and dependencies as directed edges between them. The draw method plots the graph.

Diagram notation

Consider the following diagram

mp-diagram

DataJoint uses the following conventions:

  • Tables are indicated as nodes in the graph. The corresponding class name is indicated by each node.
  • Data tiers are indicated as colors and symbols:
    • Lookup=gray rectangle
    • Manual=green rectangle
    • Imported=blue oval
    • Computed=red circle
    • Part=black text The names of part tables are indicated in a smaller font.
  • Dependencies are indicated as edges in the graph and always directed downward, forming a directed acyclic graph.
  • Foreign keys contained within the primary key are indicated as solid lines. This means that the referenced table becomes part of the primary key of the dependent table.
  • Foreign keys that are outside the primary key are indicated by dashed lines.
  • If the primary key of the dependent table has no other attributes besides the foreign key, the foreign key is a thick solid line, indicating a 1:{0,1} relationship.
  • Foreign keys made without renaming the foreign key attributes are in black whereas foreign keys that rename the attributes are indicated in red.

Diagramming an entire schema

To plot the Diagram for an entire schema, an Diagram object can be initialized with the schema object (which is normally used to decorate table objects)

import datajoint as dj
schema = dj.Schema('my_database')
dj.Diagram(schema).draw()

or alternatively an object that has the schema object as an attribute, such as the module defining a schema:

import datajoint as dj
import seq    # import the sequence module defining the seq database
dj.Diagram(seq).draw()   # draw the Diagram

Note that calling the .draw() method is not necessary when working in a Jupyter notebook. You can simply let the object display itself, for example by entering dj.Diagram(seq) in a notebook cell. The Diagram will automatically render in the notebook by calling its _repr_html_ method. A Diagram displayed without .draw() will be rendered as an SVG, and hovering the mouse over a table will reveal a compact version of the output of the .describe() method.

Initializing with a single table

A dj.Diagram object can be initialized with a single table.

dj.Diagram(seq.Genome).draw()

A single node makes a rather boring graph but ERDs can be added together or subtracted from each other using graph algebra.

Adding diagrams together

However two graphs can be added, resulting in new graph containing the union of the sets of nodes from the two original graphs. The corresponding foreign keys will be automatically

# plot the Diagram with tables Genome and Species from module seq.
(dj.Diagram(seq.Genome) + dj.Diagram(seq.Species)).draw()

Expanding diagrams upstream and downstream

Adding a number to an Diagram object adds nodes downstream in the pipeline while subtracting a number from Diagram object adds nodes upstream in the pipeline.

Examples:

# Plot all the tables directly downstream from `seq.Genome`
(dj.Diagram(seq.Genome)+1).draw()
# Plot all the tables directly upstream from `seq.Genome`
(dj.Diagram(seq.Genome)-1).draw()
# Plot the local neighborhood of `seq.Genome`
(dj.Diagram(seq.Genome)+1-1+1-1).draw()