Introduction
This section of the website introduces the concept of relational algebra. I have found that substantial
official-looking sources about relational algebra on the web are just plain wrong. For example, a site
called w3resource (which looks a lot like w3c) has a relational algebra operator for sorting. This is
ridiculous. The tuples (rows) of a relation are sets- they can't be sorted. Another website, GeeksForGeeks
argued there are performance issues with the relational algebra. This is nonsense, because the relational
algebra is math- it is a concept, not an implementation. How something is implemented affects performance
and the math concepts behind relational algebra can be implemented in multiple ways, including in ways
that would optimize similarly to modern SQL.
Accordingly, I have put together this portion of my website to provide a simple introduction to
relational algebra. I will try to introduce what are mathematical concepts in a hopefully not so math way.
This portion of the website assumes familiarity with SQL. I find explaining relational algebra
without SQL is a bit of a nightmare. However, once people have played with SQL, it is much
easier to understand relational algebra and to realise how kludgy SQL is as compared to what
a proper relational algebra-based query language would look like. Also, there is an assumption
the reader is familiar with the hospital database used in the SQL examples on this website.
Note I am trying to keep things simple. So, for example, I don't distinguish between natural join,
theta join, etc. I just call them all inner join. Similarly, I don't discuss divide by or
intersection because divide by isn't implemented in any RDBMS and intersection isn't
something I have ever seen used. Likewise, I don't bother formally defining most operations.
The current contents are:
- Overview
- Queries on One Relation
- Joins
- Statistical Queries
- Changing the Database
- Not Supported
Here are good alternate website to learn about the concept:
Any questions or comments should be directed to: The creator's email
Relations
In a database context, a relation is not-quite-a-table. You can represent it so it looks like a table,
but there are notable differences.
Formally (in math terms), a relation is a pair comprising (1) an ordered set of domains, and (2) a
set of tuples, where each tuple is an ordered set where each element of the tuple maps to each domain.
Basically, you can think of this
as (1) a list of attributes or column names with datatypes and (2) a set of rows where the values in
each row corresponds to what is in the column names. However, there are key differences between what
the simple explanation says and what the math says. First, going from left to right (i.e., across the
columns), everything is ordered. This means columns can repeat. However, going from top to bottom
(i.e., across the rows), everything is a set. This means rows cannot repeat.
Basically, to understand relations, think about tables. However:
- The column names of the table aren't just column names. They are also data types that constrain what is
allowed in that column.
- The rows can't be duplicated.
- In other words, certain things that can be done in SQL can't be done in relational algebra and vice-versa.
It helps to think of relations like this:
Relational Algebra Concept |
English Concept |
Except |
Relation |
Table |
Tables can repeat rows. Tables allow anything while values in tuples are constrained by a domain. |
Ordered set |
List |
|
Domain |
Name with a data type |
|
Tuple |
Row |
Rows can repeat in a table. Tuples cannot repeat. Rows can be sorted. Tuples cannot be sorted, because in
the relational algebra, a relation is defined as a set of tuples, not an ordered set or bag of tuples.
|
Here's an example of how tables and relations are different. Imagine the following table:
Person ID |
First Name |
Last Name |
1 |
John |
Smith |
2 |
Mary |
Contrary |
3 |
John |
Smith |
If you remove the first column (Person ID), the result in relational algebra would be:
First Name |
Last Name |
John |
Smith |
Mary |
Contrary |
Notice how the third row disappeared. This is because it now duplicates the first row.
Why Relational Algebra
Relational algebra is a powerful tool for design. Implementation is hard and therefore costly.
Every time you implement something wrong, you spend money correcting your mistake. If you don't correct
your mistake, bad things can happen. Ideally, you fix the mistake before you start implementing.
With relational algebra, you can simulate how your database works before you actually implement your database.
As a mathematical system, relational algebra has certain powerful properties. Notably, every operation
you perform in relational algebra produces a relation. This means you can guarantee certain behaviors
in relational algebra in the same way arithmetic guarantees certain behaviors in regular algebra.
In arithmetic, except for divide by 0, every operation results in a number. The result of 4+2, 4-2, 4×2, and 4÷2
are all numbers. This allows us to be able to substitute symbols for numbers. We can say let a=4, let b=2 and whether
we do c=a+b, c=a-b, c=a×b or c=a÷b, with the exception of when b=0 and the operation is ÷, we know c will be
a number.
The same holds true of the relational algebra. This allows us to treat relational algebra just like algebra.
We can substitute symbols for things to determine what will happen to our database. We can therefore prove
mathematical properties of our database. We can ask questions like:
- If I combine three relations (A,B,C), what is the relationship between the attributes of relation A
B, and C? Are they 1:1, or 1:many?
- Is there a way to join relation A to relation B? Does there exist a relation C or set of relations C such that
I can join A to B?
In other words, if we are very careful and take the time, we can reach a reasonable level of surety our database will work before implementing.
Project (π)
The project operator takes a relation and removes columns from it.
Assume a table called people that looks like this:
PersonID |
FirstName |
LastName |
1 |
John |
Smith |
2 |
Mary |
Contrary |
3 |
John |
Smith |
πFirstName,LastNamePeople
would be:
FirstName |
LastName |
John |
Smith |
Mary |
Contrary |
It is going to get complex from hereon, so what I will do is assume you know about the hospital database, and I will
show the equivalent command in SQL. Where SQL produces a different result, I will explain it.
The project operator (π) is equivalent to the things you put after the Select statement in SQL.
So:
English Description |
Relational Algebra |
SQL Equivalent |
Find the name and doctorno of all patients.
|
πname,doctorno patient
|
Select name, doctorno
from patient;
|
Restrict (σ)
The restrict operator (π) is equivalent to the where clause in SQL. So:
English Description |
Relational Algebra |
SQL Equivalent |
Give all patient information about the patient named Bell.
|
σname='Bell' patient
|
Select *
from patient
where name='Bell';
|
Give the name and doctor ID of the patient named Bell.
|
πname,doctorno
σname ='Bell'
patient
|
Select name, doctorno
from patient
where name='Bell';
|
And (∧), Or (∨) and Not (¬)
Relational algebra is based on sets and thus uses the and, or and not operators from boolean logic. So:
English Description |
Relational Algebra |
SQL Equivalent |
Find ward information about wards with wardnos greater than or equal to w2 and less than or equal to w7.
|
σwardno ≥'w2' ∧ wardno ≤'w7' ward
|
Select *
from ward
where wardno>='w2' and
wardno<='w7';
|
Find patient information for the patients named Bell and Robinson.
|
σname ='Bell' ∨ name='Robinson' patient
|
Select *
from patient
where name='Bell' or
name='Robinson';
|
Find patient information about patients whose name is not Bell.
|
σ¬(name ='Bell') patient
|
Select *
from patient
where not (name='Bell');
|
Between
Use the less than and equal or greater than and equal operators to do between:
English Description |
Relational Algebra |
SQL Equivalent |
Find ward information for wards with wardnos between w2 and w7.
|
σ'w2' ≤ wardno ≤'w7' ward
|
Select *
from ward
where wardno between 'w2' and 'w7';
|
Simple In (∈) and Not In (∉)
Use the symbols ∈ and ∉ for in and not in. These are set operators and take sets:
English Description |
Relational Algebra |
SQL Equivalent |
Find patient information about patients with the names Bell or Robinson.
|
σname ∈ {'Bell','Robinson'} patient
|
Select *
from patient
where name in ('Bell','Robinson');
|
Find ward information about wards with the ward numbers w2, w3, or w7.
|
σwardno ∉ {'w2','w3','w7'} ward
|
Select *
from ward
where wardno not in ('w2','w3','w7');
|
Do not use ∈ and ∉ to do the equivalent of type I and type II nested queries.
Instead, use join (⨝) and antijoin (▷).
Regular Expressions (Like)
The SQL operator Like is a travesty. There is no equivalent concept in math. Instead,
use the more powerful concept of regular expressions.
Regular expressions are a very powerful way of matching strings and their permutations are beyond the scope
of what I can cover here. Here are some examples:
English Description |
Relational Algebra |
SQL Equivalent |
Find patients who have an e in their name.
|
σRegExp(name,'(.*)e(.*)') patient
|
Select *
from patient
where name like '%e%';
|
Find patients who have an e as the second letter in their name.
|
σRegExp(name,'.e(.*)') patient
|
Select *
from patient
where name like '_e%';
|
Cartesian Product (×)
The cartesian product combines all the tuples of two relations together. For example, given the two below tables:
Table: People
PersonID |
PersonName |
1 |
Albert |
2 |
Bertha |
3 |
Charlie |
Table: Fruit
FruitID |
FruitName |
A |
Apple |
B |
Banana |
C |
Cherry |
People × Fruit
would be:
PersonID |
PersonName |
FruitID |
FruitName |
1 |
Albert |
A |
Apple |
1 |
Albert |
B |
Banana |
1 |
Albert |
C |
Cherry |
2 |
Bertha |
A |
Apple |
2 |
Bertha |
B |
Banana |
2 |
Bertha |
C |
Cherry |
3 |
Charlie |
A |
Apple |
3 |
Charlie |
B |
Banana |
3 |
Charlie |
C |
Cherry |
The relational algebra and SQL equivalents look like:
English Description |
Relational Algebra |
SQL Equivalent |
Combine every row in patient and doctor.
|
patient×doctor
|
Select *
from patient,doctor;
|
Inner Join (⨝)
Inner join works in a way similar to that of SQL's inner join. It connects
two relations (tables) together based on matching attributes.
English Description |
Relational Algebra |
SQL Equivalent |
Combine the staff and doctor tables.
|
staff⨝staff.staffno=doctor.staffno doctor
|
Select *
from staff
inner join doctor
on staff.staffno=doctor.staffno;
|
List the staff ID, name, rank and specialization of every doctor.
|
πstaff.staffno,staffname,docposition,specialty
(staff⨝staff.staffno=doctor.staffno
doctor)
|
Select staff.staffno,staffname,docposition,specialty
from staff
inner join doctor
on staff.staffno=doctor.staffno;
|
As with SQL, an inner join is the same as a cartesian product followed by a restrict:
staff⨝staff.staffno=doctor.staffno
doctor
=
Select *
from staff
inner join doctor
on staff.staffno=doctor.staffno
=
σstaff.staffno=doctor.staffno
(staff ×
doctor)
=
Select *
from staff, doctor
where staff.staffno=doctor.staffno;
Outer Joins (⟕, ⟖, ⟗)
The symbols are: left join (⟕), right join (⟖) and full outer join (⟗).
In the hospital database, a ward can have zero or more nurses.
English Description |
Relational Algebra |
SQL Equivalent |
Find the staff IDs, names, ward numbers, wardnames and max bed capacity for all wards.
Include wards that have no staff.
|
πstaff.staffno,staffname,ward.wardno,wardname,maxbeds
ward⟕ward.wardno=nurse.wardno
(nurse⨝nurse.staffno=staff.staffno
staff)
|
Select staff.staffno,staffname,ward.wardno,wardname,maxbeds
from ward
left join
(nurse inner join
staff
on nurse.staffno=staff.staffno)
on ward.wardno=nurse.wardno;
|
Null (⊥)
Use an upside down T to represent null (⊥).
Union (⋃)
The union operator requires both of the relations to have the same domains.
In other words, like SQL, the columns for both relations (tables) in a union need to
be the same.
English Description |
Relational Algebra |
SQL Equivalent |
List the staff ID, and names. For doctors, show their rank and specialty.
For nurses, show their ward number.
|
πstaff.staffno,staffname,docposition,specialty,⊥
staff⨝staff.staffno=doctor.staffno
doctor
⋃
πstaff.staffno,staffname,⊥,⊥,wardno
staff⨝staff.staffno=nurse.staffno
nurse
|
Select staff.staffno,staffname,docposition,specialty,null
from staff
inner join doctor
on staff.staffno=doctor.staffno
union
Select staff.staffno,staffname,null,null,wardno
from staff
inner join nurse
on staff.staffno=nurse.staffno;
|
Rename (ρ)
Allows you to refer to a relation or the domain of the relation by another name.
It is the same concept as aliasing in SQL. If it is outside the parenthesis,
it refers to the relation name. If it is inside the parenthesis, it is the domain name.
Remember domains in relations are an ordered list, so you need to name
the domains in order.
English Description |
Relational Algebra |
SQL Equivalent |
Rename the staff relation and its domains staffno and staffname as employees, employeeid and employeename.
|
ρemployees(employeeid,employeename)
πstaffno,staffname
staff
|
Select staffno as employeeid, staffname as employeename
from staff employees;
|
Self Join
There are two ways to do a self join in the relational algebra, the implicit and explicit way.
Implicit
Implicitly, in math (and therefore the relational algebra), you can represent a copy of something
using the prime (') symbol. Thus, the programming concept:
x = x + 1
Can be represented mathematically as:
x' ← x + 1
So, the following is valid relational algebra syntax:
English Description |
Relational Algebra |
SQL Equivalent |
Find all patients with the same name. Display their names and patient IDs.
|
πpatient.name, patient.patientid, patient'.patientid
(patient ⨝patient.name=patient'.name ∧ patient.patientid<patient'.patientid
patient')
|
Select p1.name, p1.patientid, p2.patientid
from patient p1
inner join patient p2
on p1.name=p2.name and p1.patientid<p2.patientid;
|
Explicit
You can use the rename operator to explicitly cast a relation:
English Description |
Relational Algebra |
SQL Equivalent |
Find all patients with the same name. Display their names and patient IDs.
|
πp1.name,p1.patientid,p2.patientid
[(ρp1patient)⨝p1.name=p2.name ∧ p1.patientid<p2.patientid
(ρp2patient)]
|
Select p1.name, p1.patientid, p2.patientid
from patient p1
inner join patient p2
on p1.name=p2.name and p1.patientid<p2.patientid;
|
Minus (-)
Minus is used to remove records from a relation. It works like minus
in SQL. Minus must be 'Union Compatible,' i.e., just like the union operator,
the domains in the relations must be exactly the same.
Also, note that because the relational algebra is based on sets,
doing a minus can cause records to disappear when there are duplicates.
Thus, minus in relational algebra is equivalent to an SQL minus followed by a distinct.
So:
English Description |
Relational Algebra |
SQL Equivalent |
Find all doctors who have no patients.
|
πstaffnodoctor
-
πdoctornopatient
|
Select staffno
from doctor
minus
select doctorno
from patient;
|
Antijoin (▷)
The antijoin is the mechanism by which we do the equivalent of type I and type II nested queries.
English Description |
Relational Algebra |
SQL Equivalent |
Give the IDs and names of all doctors who have no patients.
|
πstaffno, staffname
(staff ▷staffno=doctorno
patient)
|
Select staffno, staffname
from staff
where staffno not in (
(select doctorno
from patient);
|
Find all unpaid items from the bills.
|
billitem ▷ billitem.billno=rcptdetail.billno ∧ billitem.lineno=rcptdetail.lineno
rcptdetail
|
Select *
from billitem
where not exists (
(select null
from rcptdetail
where billitem.billno=rcptdetail.billno
and billitem.lineno=rcptdetail.lineno);
|
Aggregation (γ)
The aggregation operator works just like the SQL group by. It takes three inputs, two in the
subscript. The subscript input pair are an ordered list of domains to group by and an ordered list of domains to calculate-
similar to the attributes in the group by and in the select clause in a regular group by. The third
input is the relation to process.
Some sources separate the two subscripted parts into a left and right subscript. This ends
up being a very awkward way of writing things so I have opted to put them in parenthesis.
English Description |
Relational Algebra |
SQL Equivalent |
How many patients does each doctor have?
|
γ[(staffno, staffname),(staffno, staffname, count(*))]
(staff ⨝staffno=doctorno
patient)
|
Select staffno, staffname, count(*)
from staff
inner join patient
on staffno=doctorno
group by staffno, staffname;
|
Having Does Not Exist - Use Restrict
There is no need for a having clause in relational algebra. Indeed, there really isn't a need
for a having clause in SQL either- it is just bad design. Use a restrict operator (i.e., the
equivalent of the SQL where clause).
In the below example, we calculate the number of patients for each doctor. We get the max of that.
The relation that is the max number of patients is renamed as maxcount.
We then find the staffno, staffname and number of patients for each doctor. We rename this
relation as doctorwpatient. We then join doctorwpatient with maxcount by the count of patients.
English Description |
Relational Algebra |
SQL Equivalent |
Which doctor(s) have the most patients?
|
ρdoctorwpatient
[(γ[(staffno, staffname),(staffno, staffname, count(*))]
(staff ⨝staffno=doctorno
patient)]
⨝doctorwpatient.count(*)=maxcount.max(count(*))
ρmaxcount{
γ[(),(max(count(*)))]
γ[(doctorno),(count(*))]
patient}
|
Select staffno, staffname, count(*)
from staff
inner join patient
on staffno=doctorno
group by staffno, staffname
having count(*)>=All
(Select count(*)
from patient
group by doctorno);
|
Sorting
You can't sort a relation. Period. Tuples in a relation are grouped together as sets.
Mathematical sets have no ordering. Some websites introduce a sort operator τ.
Those websites don't know what they are talking about.
Given relational algebra doesn't have sorting, doesn't that mean relational algebra sucks?
No! Relational algebra is meant to allow us to study the behavior of the relations, not
the way the relations are displayed. This is like the difference between content and
markup. Relational algebra is a tool for manipulating the content of a database design,
not the markup of the database design.
In fact, sorting in SQL is the bad decision. Since we can sort in SQL, why not
add features like bolding, italics or highlighting? Sorting is a visual concept,
not a data manipulation concept. Introducing sorting into SQL has meant
SQL has a number of silly kludges. For example, the order by clause must be
the last clause in an SQL command. In the relational algebra, the operations can
occur in any order!
Another analogy is the difference between a relational database management system
and a report writer like PowerBI or Tableau. That relational database management
systems typically don't have good reporting tools doesn't mean they suck. You
are supposed to combine tools to get your business needs fulfilled.
Assignment (←)
The assignment operator changes the value of a variable.
English Description |
Relational Algebra |
SQL Equivalent |
Make x be 1.
|
x←1
|
Select 1 as x
from dual;
|
Insert
Insertion is an assignment with a union. Recall a relation is a pair comprising an ordered set of domains and a set of tuples
where each tuple contains an ordered set of values corresponding to those domains. Thus, the below is a relation:
(
(staffno,staffname),
{(111,'New Person')}
)
Given this, I can add the above to the staff relation by:
English Description |
Relational Algebra |
SQL Equivalent |
Add the record to the staff relation
|
staff'←
staff
⋃
(
(staffno,staffname),
{(111,'New Person')}
)
|
insert into staff(staffno, staffname)
values
(111,'New Person');
|
Delete
Similar to insertion, deletion is a minus followed by an assignment.
English Description |
Relational Algebra |
SQL Equivalent |
Remove the record from the staff relation
|
staff'←
staff
-
(
(staffno,staffname),
{(111,'New Person')}
)
|
delete from staff
where
staffno=111 and
staffname='New Person';
|
Update
An update is a deletion followed by an insertion.
English Description |
Relational Algebra |
SQL Equivalent |
Change the record from the staff relation so 'New Person' becomes
'A New Person'
|
staff'←
staff
-
(
(staffno,staffname),
{(111,'New Person')}
)
⋃
(
(staffno,staffname),
{(111,'A New Person')}
)
|
update staff
set staffno=111,
staffname='A New Person'
where staffno=111 and
staffname='New Person';
|
Writing in Relational Algebra
Relational algebra is math. So, we use symbols from other math systems like discrete mathematics (set theory)
and boolean logic. Most new operations used in relational algebra are lowercase greek letters.
Many relational algebra operations take multiple arguments. Often, the second or third argument is written
as a subscript. For example, the project operation (π) takes two arguments, (1) a relation, and (2)
a set of domains to project. Project is written like this:
πstaffno,staffnamestaff
Notice how staffno and staffname are subscripted.
Likewise, the relational algebra inner join operation (strictly theta-join) takes three arguments, the two
relations to join and the join condition. The join condition is subscripted as follows:
staff⨝staff.staffno=doctor.staffnodoctor
The subscript is important!