Written by

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Get Started with DataJoint

August 12, 2017

Strong Entity Integrity: Part 5 — Aggregation

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

DataJoint’s principle of Strong Entity Integrity (SEI) requires that queries must not introduce new classes of entities: projection and restriction operators preserve the entity class of its operand whereas the join operator yields a new entity class but one comprising the combination of the entity classes of its operands. A DataJoint query make obvious the entity class of its result.

One operator that most egregiously violates this principle in relational algebra and SQL is the aggregation operator also known as GROUP BY. In its traditional formulation, aggregation allows performing summary operations on groups of tuples identified by a combination of grouping attributes. The effective primary key of the result are the grouping attributes. Therefore the query creates a new type of entity in the middle of a query that is not traced to any explicitly defined entity class. SQL allows a further deviation from entity integrity with its ROLLUP functionality producing results comprising mixtures of entity classes with different primary keys.

DataJoint respects SEI and defines aggregation as a binary operator allowing summary operations on subsets of entity set B grouped by entities from class A. It has the following notation: A.aggr(B, ...') where ... is a list of aggregation computations such as n='sum(attr)`. In this way aggregation is a form of projection, preserving the entity class of A.

A.aggr(B, ...) is equivalent to SELECT ... FROM B GROUP BY . As always, the primary key cannot be omitted.

DataJoint simply makes explicit what programmers imply implicitly when they perform a GROUP BY operation. When grouping, we conceive of some sort of entity represented by the grouping attributes. In most cases, that entity class is already defined, often in the form of a table. Yet SQL forces users to spell out the list of attributes rather than explicitly specify the aggregating entity set.

For example, consider the following table definition for college course grades:

@schema

class Grade(dj.Manual):

definition = """

-> Student

-> Course

---

grade: decimal(3, 2) # grade e.g. 3.67

"""

Then studentss GPA can be computed as

Student().aggr(Grade(), gpa='AVG(grade)')

The entity class of the result is the same as that of the aggregating entity: the result is still a set entities of class Student.

Insights & Ideas

September 26, 2025

Neuropixels, Plainly Explained

What you’ll learn: which probe fits your study, the open tools you need, and how to keep results trustworthy.

Milagros Marín, PhD

SciOps Engineer

Insights & Ideas

August 26, 2025

AI and the Evolution of Relational Schemas

Can AI thrive without structure? Why relational schemas are essential for AI to produce reliable, trustworthy results - and need not be rigid.

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Insights & Ideas

August 21, 2025

Insight Entrepreneurship – A New Vision for Science

What if scientists were also stewards, strategists, and storytellers of knowledge? Introducing a new model for revitalizing science through ownership, integrity, and impact.

Dimitri Yatsenko, PhD

Founder • Chief Science & Technology Officer

Updates Delivered Straight to Your Inbox

Join the mailing list for industry insights, company news, and product updates delivered monthly.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Strong Entity Integrity: Part 5 — Aggregation

Neuropixels, Plainly Explained

AI and the Evolution of Relational Schemas

Insight Entrepreneurship – A New Vision for Science

Updates Delivered *Straight to Your Inbox*

Updates Delivered Straight to Your Inbox