Introduction

This section of the website introduces the concept of relational algebra. I have found that substantial official-looking sources about relational algebra on the web are just plain wrong. For example, a site called w3resource (which looks a lot like w3c) has a relational algebra operator for sorting. This is ridiculous. The tuples (rows) of a relation are sets- they can't be sorted. Another website, GeeksForGeeks argued there are performance issues with the relational algebra. This is nonsense, because the relational algebra is math- it is a concept, not an implementation. How something is implemented affects performance and the math concepts behind relational algebra can be implemented in multiple ways, including in ways that would optimize similarly to modern SQL.

Accordingly, I have put together this portion of my website to provide a simple introduction to relational algebra. I will try to introduce what are mathematical concepts in a hopefully not so math way.

This portion of the website assumes familiarity with SQL. I find explaining relational algebra without SQL is a bit of a nightmare. However, once people have played with SQL, it is much easier to understand relational algebra and to realise how kludgy SQL is as compared to what a proper relational algebra-based query language would look like. Also, there is an assumption the reader is familiar with the hospital database used in the SQL examples on this website.

Note I am trying to keep things simple. So, for example, I don't distinguish between natural join, theta join, etc. I just call them all inner join. Similarly, I don't discuss divide by or intersection because divide by isn't implemented in any RDBMS and intersection isn't something I have ever seen used. Likewise, I don't bother formally defining most operations.

The current contents are:

Overview

Queries on One Relation

Joins

Statistical Queries

Aggregation (γ)

Changing the Database

Not Supported

Here are good alternate website to learn about the concept:

Any questions or comments should be directed to: The creator's email

Relations

In a database context, a relation is not-quite-a-table. You can represent it so it looks like a table, but there are notable differences.

Formally (in math terms), a relation is a pair comprising (1) an ordered set of domains, and (2) a set of tuples, where each tuple is an ordered set where each element of the tuple maps to each domain. Basically, you can think of this as (1) a list of attributes or column names with datatypes and (2) a set of rows where the values in each row corresponds to what is in the column names. However, there are key differences between what the simple explanation says and what the math says. First, going from left to right (i.e., across the columns), everything is ordered. This means columns can repeat. However, going from top to bottom (i.e., across the rows), everything is a set. This means rows cannot repeat.

Basically, to understand relations, think about tables. However:

The column names of the table aren't just column names. They are also data types that constrain what is allowed in that column.
The rows can't be duplicated.
In other words, certain things that can be done in SQL can't be done in relational algebra and vice-versa.

It helps to think of relations like this:

Relational Algebra Concept	English Concept	Except
Relation	Table	Tables can repeat rows. Tables allow anything while values in tuples are constrained by a domain.
Ordered set	List
Domain	Name with a data type
Tuple	Row	Rows can repeat in a table. Tuples cannot repeat. Rows can be sorted. Tuples cannot be sorted, because in the relational algebra, a relation is defined as a set of tuples, not an ordered set or bag of tuples.

Here's an example of how tables and relations are different. Imagine the following table:

Person ID	First Name	Last Name
1	John	Smith
2	Mary	Contrary
3	John	Smith

If you remove the first column (Person ID), the result in relational algebra would be:

First Name	Last Name
John	Smith
Mary	Contrary

Notice how the third row disappeared. This is because it now duplicates the first row.

Why Relational Algebra

Relational algebra is a powerful tool for design. Implementation is hard and therefore costly. Every time you implement something wrong, you spend money correcting your mistake. If you don't correct your mistake, bad things can happen. Ideally, you fix the mistake before you start implementing. With relational algebra, you can simulate how your database works before you actually implement your database.

As a mathematical system, relational algebra has certain powerful properties. Notably, every operation you perform in relational algebra produces a relation. This means you can guarantee certain behaviors in relational algebra in the same way arithmetic guarantees certain behaviors in regular algebra.

In arithmetic, except for divide by 0, every operation results in a number. The result of 4+2, 4-2, 4×2, and 4÷2 are all numbers. This allows us to be able to substitute symbols for numbers. We can say let a=4, let b=2 and whether we do c=a+b, c=a-b, c=a×b or c=a÷b, with the exception of when b=0 and the operation is ÷, we know c will be a number.

The same holds true of the relational algebra. This allows us to treat relational algebra just like algebra. We can substitute symbols for things to determine what will happen to our database. We can therefore prove mathematical properties of our database. We can ask questions like:

If I combine three relations (A,B,C), what is the relationship between the attributes of relation A B, and C? Are they 1:1, or 1:many?
Is there a way to join relation A to relation B? Does there exist a relation C or set of relations C such that I can join A to B?

In other words, if we are very careful and take the time, we can reach a reasonable level of surety our database will work before implementing.

Project (π)

The project operator takes a relation and removes columns from it.

Assume a table called people that looks like this:

PersonID	FirstName	LastName
1	John	Smith
2	Mary	Contrary
3	John	Smith

π_{FirstName,LastName}People

would be:

FirstName	LastName
John	Smith
Mary	Contrary

It is going to get complex from hereon, so what I will do is assume you know about the hospital database, and I will show the equivalent command in SQL. Where SQL produces a different result, I will explain it.

The project operator (π) is equivalent to the things you put after the Select statement in SQL. So:

English Description	Relational Algebra	SQL Equivalent
Find the name and doctorno of all patients.	π_{name,doctorno} patient	Select name, doctorno from patient;

Restrict (σ)

The restrict operator (π) is equivalent to the where clause in SQL. So:

English Description	Relational Algebra	SQL Equivalent
Give all patient information about the patient named Bell.	σ_name='Bell' patient	Select * from patient where name='Bell';
Give the name and doctor ID of the patient named Bell.	π_{name,doctorno} σ_{name ='Bell'} patient	Select name, doctorno from patient where name='Bell';

And (∧), Or (∨) and Not (¬)

Relational algebra is based on sets and thus uses the and, or and not operators from boolean logic. So:

English Description	Relational Algebra	SQL Equivalent
Find ward information about wards with wardnos greater than or equal to w2 and less than or equal to w7.	σ_{wardno ≥'w2' ∧ wardno ≤'w7'} ward	Select * from ward where wardno>='w2' and wardno<='w7';
Find patient information for the patients named Bell and Robinson.	σ_{name ='Bell' ∨ name='Robinson'} patient	Select * from patient where name='Bell' or name='Robinson';
Find patient information about patients whose name is not Bell.	σ_{¬(name ='Bell')} patient	Select * from patient where not (name='Bell');

Between

Use the less than and equal or greater than and equal operators to do between:

English Description	Relational Algebra	SQL Equivalent
Find ward information for wards with wardnos between w2 and w7.	σ_{'w2' ≤ wardno ≤'w7'} ward	Select * from ward where wardno between 'w2' and 'w7';

Simple In (∈) and Not In (∉)

Use the symbols ∈ and ∉ for in and not in. These are set operators and take sets:

English Description	Relational Algebra	SQL Equivalent
Find patient information about patients with the names Bell or Robinson.	σ_{name ∈ {'Bell','Robinson'}} patient	Select * from patient where name in ('Bell','Robinson');
Find ward information about wards with the ward numbers w2, w3, or w7.	σ_{wardno ∉ {'w2','w3','w7'}} ward	Select * from ward where wardno not in ('w2','w3','w7');

Do not use ∈ and ∉ to do the equivalent of type I and type II nested queries. Instead, use join (⨝) and antijoin (▷).

Regular Expressions (Like)

The SQL operator Like is a travesty. There is no equivalent concept in math. Instead, use the more powerful concept of regular expressions. Regular expressions are a very powerful way of matching strings and their permutations are beyond the scope of what I can cover here. Here are some examples:

English Description	Relational Algebra	SQL Equivalent
Find patients who have an e in their name.	σ_{RegExp(name,'(.)e(.)')} patient	Select * from patient where name like '%e%';
Find patients who have an e as the second letter in their name.	σ_{RegExp(name,'.e(.*)')} patient	Select * from patient where name like '_e%';

Cartesian Product (×)

The cartesian product combines all the tuples of two relations together. For example, given the two below tables:

Table: People

PersonID	PersonName
1	Albert
2	Bertha
3	Charlie

Table: Fruit

FruitID	FruitName
A	Apple
B	Banana
C	Cherry

People × Fruit

would be:

PersonID	PersonName	FruitID	FruitName
1	Albert	A	Apple
1	Albert	B	Banana
1	Albert	C	Cherry
2	Bertha	A	Apple
2	Bertha	B	Banana
2	Bertha	C	Cherry
3	Charlie	A	Apple
3	Charlie	B	Banana
3	Charlie	C	Cherry

The relational algebra and SQL equivalents look like:

English Description	Relational Algebra	SQL Equivalent
Combine every row in patient and doctor.	patient×doctor	Select * from patient,doctor;

Inner Join (⨝)

Inner join works in a way similar to that of SQL's inner join. It connects two relations (tables) together based on matching attributes.

English Description	Relational Algebra	SQL Equivalent
Combine the staff and doctor tables.	staff⨝_{staff.staffno=doctor.staffno} doctor	Select * from staff inner join doctor on staff.staffno=doctor.staffno;
List the staff ID, name, rank and specialization of every doctor.	π_{staff.staffno,staffname,docposition,specialty} (staff⨝_{staff.staffno=doctor.staffno} doctor)	Select staff.staffno,staffname,docposition,specialty from staff inner join doctor on staff.staffno=doctor.staffno;

As with SQL, an inner join is the same as a cartesian product followed by a restrict:

staff⨝_{staff.staffno=doctor.staffno}
doctor

=

Select *
from staff
inner join doctor
on staff.staffno=doctor.staffno

=

σ_{staff.staffno=doctor.staffno}
(staff × doctor)

=

Select *
from staff, doctor
where staff.staffno=doctor.staffno;

Outer Joins (⟕, ⟖, ⟗)

The symbols are: left join (⟕), right join (⟖) and full outer join (⟗).

In the hospital database, a ward can have zero or more nurses.

English Description	Relational Algebra	SQL Equivalent
Find the staff IDs, names, ward numbers, wardnames and max bed capacity for all wards. Include wards that have no staff.	π_{staff.staffno,staffname,ward.wardno,wardname,maxbeds} ward⟕_{ward.wardno=nurse.wardno} (nurse⨝_{nurse.staffno=staff.staffno} staff)	Select staff.staffno,staffname,ward.wardno,wardname,maxbeds from ward left join (nurse inner join staff on nurse.staffno=staff.staffno) on ward.wardno=nurse.wardno;

Null (⊥)

Use an upside down T to represent null (⊥).

Union (⋃)

The union operator requires both of the relations to have the same domains. In other words, like SQL, the columns for both relations (tables) in a union need to be the same.

English Description	Relational Algebra	SQL Equivalent
List the staff ID, and names. For doctors, show their rank and specialty. For nurses, show their ward number.	π_{staff.staffno,staffname,docposition,specialty,⊥} staff⨝_{staff.staffno=doctor.staffno} doctor ⋃ π_{staff.staffno,staffname,⊥,⊥,wardno} staff⨝_{staff.staffno=nurse.staffno} nurse	Select staff.staffno,staffname,docposition,specialty,null from staff inner join doctor on staff.staffno=doctor.staffno union Select staff.staffno,staffname,null,null,wardno from staff inner join nurse on staff.staffno=nurse.staffno;

Rename (ρ)

Allows you to refer to a relation or the domain of the relation by another name. It is the same concept as aliasing in SQL. If it is outside the parenthesis, it refers to the relation name. If it is inside the parenthesis, it is the domain name. Remember domains in relations are an ordered list, so you need to name the domains in order.

English Description	Relational Algebra	SQL Equivalent
Rename the staff relation and its domains staffno and staffname as employees, employeeid and employeename.	ρ_{employees(employeeid,employeename)} π_{staffno,staffname} staff	Select staffno as employeeid, staffname as employeename from staff employees;

Self Join

There are two ways to do a self join in the relational algebra, the implicit and explicit way.

Implicit

Implicitly, in math (and therefore the relational algebra), you can represent a copy of something using the prime (') symbol. Thus, the programming concept:

x = x + 1

Can be represented mathematically as:

x' ← x + 1

So, the following is valid relational algebra syntax:

English Description	Relational Algebra	SQL Equivalent
Find all patients with the same name. Display their names and patient IDs.	π_{patient.name, patient.patientid, patient'.patientid} (patient ⨝_{patient.name=patient'.name ∧ patient.patientid<patient'.patientid} patient')	Select p1.name, p1.patientid, p2.patientid from patient p1 inner join patient p2 on p1.name=p2.name and p1.patientid<p2.patientid;

Explicit

You can use the rename operator to explicitly cast a relation:

English Description	Relational Algebra	SQL Equivalent
Find all patients with the same name. Display their names and patient IDs.	π_{p1.name,p1.patientid,p2.patientid} [(ρ_p1patient)⨝_{p1.name=p2.name ∧ p1.patientid<p2.patientid} (ρ_p2patient)]	Select p1.name, p1.patientid, p2.patientid from patient p1 inner join patient p2 on p1.name=p2.name and p1.patientid<p2.patientid;

Minus (-)

Minus is used to remove records from a relation. It works like minus in SQL. Minus must be 'Union Compatible,' i.e., just like the union operator, the domains in the relations must be exactly the same.

Also, note that because the relational algebra is based on sets, doing a minus can cause records to disappear when there are duplicates. Thus, minus in relational algebra is equivalent to an SQL minus followed by a distinct. So:

English Description	Relational Algebra	SQL Equivalent
Find all doctors who have no patients.	π_staffnodoctor - π_doctornopatient	Select staffno from doctor minus select doctorno from patient;

Antijoin (▷)

The antijoin is the mechanism by which we do the equivalent of type I and type II nested queries.

English Description	Relational Algebra	SQL Equivalent
Give the IDs and names of all doctors who have no patients.	π_{staffno, staffname} (staff ▷_{staffno=doctorno} patient)	Select staffno, staffname from staff where staffno not in ( (select doctorno from patient);
Find all unpaid items from the bills.	billitem ▷ _{billitem.billno=rcptdetail.billno ∧ billitem.lineno=rcptdetail.lineno} rcptdetail	Select * from billitem where not exists ( (select null from rcptdetail where billitem.billno=rcptdetail.billno and billitem.lineno=rcptdetail.lineno);

Aggregation (γ)

The aggregation operator works just like the SQL group by. It takes three inputs, two in the subscript. The subscript input pair are an ordered list of domains to group by and an ordered list of domains to calculate- similar to the attributes in the group by and in the select clause in a regular group by. The third input is the relation to process.

Some sources separate the two subscripted parts into a left and right subscript. This ends up being a very awkward way of writing things so I have opted to put them in parenthesis.

English Description	Relational Algebra	SQL Equivalent
How many patients does each doctor have?	γ_{[(staffno, staffname),(staffno, staffname, count(*))]} (staff ⨝_{staffno=doctorno} patient)	Select staffno, staffname, count(*) from staff inner join patient on staffno=doctorno group by staffno, staffname;

Having Does Not Exist - Use Restrict

There is no need for a having clause in relational algebra. Indeed, there really isn't a need for a having clause in SQL either- it is just bad design. Use a restrict operator (i.e., the equivalent of the SQL where clause).

In the below example, we calculate the number of patients for each doctor. We get the max of that. The relation that is the max number of patients is renamed as maxcount. We then find the staffno, staffname and number of patients for each doctor. We rename this relation as doctorwpatient. We then join doctorwpatient with maxcount by the count of patients.

English Description	Relational Algebra	SQL Equivalent
Which doctor(s) have the most patients?	ρ_{doctorwpatient} [(γ_{[(staffno, staffname),(staffno, staffname, count())]} (staff ⨝_{staffno=doctorno} patient)] ⨝_{doctorwpatient.count()=maxcount.max(count())} ρ_maxcount{ γ_{[(),(max(count()))]} γ_{[(doctorno),(count(*))]} patient}	Select staffno, staffname, count() from staff inner join patient on staffno=doctorno group by staffno, staffname having count()>=All (Select count(*) from patient group by doctorno);

Sorting

You can't sort a relation. Period. Tuples in a relation are grouped together as sets. Mathematical sets have no ordering. Some websites introduce a sort operator τ. Those websites don't know what they are talking about.

Given relational algebra doesn't have sorting, doesn't that mean relational algebra sucks? No! Relational algebra is meant to allow us to study the behavior of the relations, not the way the relations are displayed. This is like the difference between content and markup. Relational algebra is a tool for manipulating the content of a database design, not the markup of the database design.

In fact, sorting in SQL is the bad decision. Since we can sort in SQL, why not add features like bolding, italics or highlighting? Sorting is a visual concept, not a data manipulation concept. Introducing sorting into SQL has meant SQL has a number of silly kludges. For example, the order by clause must be the last clause in an SQL command. In the relational algebra, the operations can occur in any order!

Another analogy is the difference between a relational database management system and a report writer like PowerBI or Tableau. That relational database management systems typically don't have good reporting tools doesn't mean they suck. You are supposed to combine tools to get your business needs fulfilled.

Assignment (←)

The assignment operator changes the value of a variable.

English Description	Relational Algebra	SQL Equivalent
Make x be 1.	x←1	Select 1 as x from dual;

Insert

Insertion is an assignment with a union. Recall a relation is a pair comprising an ordered set of domains and a set of tuples where each tuple contains an ordered set of values corresponding to those domains. Thus, the below is a relation:

( (staffno,staffname), {(111,'New Person')} )

Given this, I can add the above to the staff relation by:

English Description	Relational Algebra	SQL Equivalent
Add the record to the staff relation	staff'← staff ⋃ ( (staffno,staffname), {(111,'New Person')} )	insert into staff(staffno, staffname) values (111,'New Person');

Delete

Similar to insertion, deletion is a minus followed by an assignment.

English Description	Relational Algebra	SQL Equivalent
Remove the record from the staff relation	staff'← staff - ( (staffno,staffname), {(111,'New Person')} )	delete from staff where staffno=111 and staffname='New Person';

Update

An update is a deletion followed by an insertion.

English Description	Relational Algebra	SQL Equivalent
Change the record from the staff relation so 'New Person' becomes 'A New Person'	staff'← staff - ( (staffno,staffname), {(111,'New Person')} ) ⋃ ( (staffno,staffname), {(111,'A New Person')} )	update staff set staffno=111, staffname='A New Person' where staffno=111 and staffname='New Person';

Writing in Relational Algebra

Relational algebra is math. So, we use symbols from other math systems like discrete mathematics (set theory) and boolean logic. Most new operations used in relational algebra are lowercase greek letters.

Many relational algebra operations take multiple arguments. Often, the second or third argument is written as a subscript. For example, the project operation (π) takes two arguments, (1) a relation, and (2) a set of domains to project. Project is written like this:

π_{staffno,staffname}staff

Notice how staffno and staffname are subscripted.

Likewise, the relational algebra inner join operation (strictly theta-join) takes three arguments, the two relations to join and the join condition. The join condition is subscripted as follows:

staff⨝_{staff.staffno=doctor.staffno}doctor

The subscript is important!