Basics of Relational Algebra

Gamze Yılan
8 min readNov 8, 2020

There are many methods for modeling data and defining queries on it. You heard about SQL, yeah, but how about Relational Algebra? Try this article for a beginner-friendly introduction -and maybe bookmark it for quick reviews ‘till you get used to the logic!

What is a Relation?

Relation, under the concept of Computer Science, is any connection in between two or more data tables. You can create a relation by take a singular data or an entire data column from a table and add it into another one, adding multiple tables together to create one big table, or compare data columns in multiple tables by creating a connection between them. In short; you define a relation in between multiple data tables, combine data from those tables and end up with a new data.

What is a Relational Database?

A relational database is a database with pre-determined relations between data items. Each row on the data table on a relational database will represent an object or entity, while each column will represent an attribute of said object or entity.

While working with Relational Databases, we will have a terminology to follow. The columns on a data table, for example, will now be called an attribute. The rows are now tuples instead. If two tuples are exactly the same, we will call them duplicates.

A key is an attribute that identifies that table. Keys can be used to mark rows, make rows easier to manage, filter rows, access data within the table easily or even make rows from different tables related to each other. There are many types of keys;

  • Superkey: A Superkey is a key that will identify a unique tuple of each relation. For example, if your table consists of customer information, a Super Key would be a combination of customer names, customer addresses and customer phone numbers. Since not all these three can be the same in a customer, the combination will give us unique results.
  • Candidate key: A Candidate Key is a Super Key that does not contain a Super Key within. You can’t take the combination of customer name and customer phone number as a candidate key for example, for that while the combination will be unique so will the customer phone numbers, and therefore this super key will be consisting of another super key. If you take customer number alone as a key instead, since it doesn’t consist of any other super keys, this can be considered a candidate key.
  • Primary key: A Primary Key is a candidate key of your choice, that is chosen in order to manage data smoothly is considered a primary key.

What is Relational Algebra?

Every database management system must define a query language, a language that is used to access the data stored in the database and search within. There are multiple query language options such as SQL, CQL, and you have guessed right: Relational Algebra.

In relational algebra, the input will be the table from which we get the data and the output will be a temporary table holding the data the user asked for. Both the input and the output are relations. Working with operations in relational algebra show similarities and common points with sets in mathematics.

Relational algebra works through the entire data table, and therefore unlike other query languages such as SQL, you won’t have to use loops in order to iterate within the database one item at a time.

Operations

There are plenty of operations we can perform on the data using Relational Algebra or any other query language. For now, let’s focus on the basic operations we can perform using Relational Algebra.

  • Selection Operation (σ): This operator is used to select rows from within the table that fit under the pre-determined rules. Let’s work through the example below;
σsubject = "health" ^ price = "450"(Books)

The command above will search from within the Books table and bring you the tuples where the subject attribute matches health and price attribute matches 450. Remember from the logical operations that ^ sign stands for “and” operation.

  • Projection Operation (∏): This operator will bring you one or multiple attributes of your choice. But notice that when there are duplicates, this operator will print only one of them on the screen.
∏subject, author (Books)

The example above will bring you the attributes subject and author within the table Books. But if you have the multiple novels from Charles Dickens, it will print Novel, Charles Dickens only once.

  • Union Operation (∪): This operator is used to combine two attributes with the same name from different tables.
∏ author (Books) ∪ ∏ author (Articles)

In the example above, the projection operator will project the author attribute from the Books table and the author attribute from the Articles table. The union operator in between will list the authors from the articles table right under the authors from the books table. However, if Jack London has both books and articles, his name will be listed once only since the union operator will eliminate duplicates just like the projection operator. Keep in mind that you can union multiple attributes from as many tables as you like.

  • Set Difference (−): This operator will substract the second parameter’s (tables) tuples from the first parameter (table) and give what remains from the first parameter (table) as the output.
∏ author (Books) − ∏ author (Articles)

Let’s say that the author attribute from Books table consists of Charles Dickens, Jack London and Emily Dickinson. And the author attribute from the Articles table consists of Jack London and Shereen Bhan. The outcome will be a table consisting of Charles Dickens and Emily Dickenson.

  • Cartesian Product (Χ): This operator will combine the information from two different tables into one. Each tuple from the first parameter (table) will match each tuple from the second parameter (table), and therefore, each possible combination of tuples will be listed.

Let’s say that we have an id and an author attribute on both books and articles table. Books table consisting of 1, 2, 3 for Charles Dickens, Jack London and Emily Dickinson while Articles table consists of A, B for Jack London and Shereen Bhan.

Books Χ Articles

The line above will give us the following table;

1 Charles Dickens A Jack London

1 Charles Dickens B Shereen Bhan

2 Jack London A Jack London

2 Jack London B Shereen Bhan

3 Emily Dickinson A Jack London

3 Emily Dickinson B Shereen Bhan

Notice: Just like the sum and multiply operators we use in Math, these operators can also be used together.

  • Rename Operation (ρ): When you perform an operation with Relational Algebra, the outcome is often nameless. This operator, however, allows you to give a name to the output.
ρ strawberry (∏ author (Books) ∪ ∏ author (Articles))

The example above will create a new data table with the names of all authors from Books table and Articles table combined, and name this new table “strawberry”.

  • Set Intersection (∩): This operator will bring the common tuples between two datasets.
∏ author (Books) ∩ ∏ author (Articles)

The example above will list out the common authors in both lists, in this case, Jack London.

  • Inner join (⋈): This operator will compare the two tables, find a common attribute, and combine the data based on that common attribute.
Gym members ⋈ Zumba class members

Let’s say that the Gym members table consists of name, membership term while the Zuma class members table consists of name and sex. If the Gym members table consists of Adriana Lima, 3 months and Miranda Kerr, 4 months while Zumba class members consist of Adriana Lima, Female and Kelly Gale, Female. The combination table will take the common attribute “name” and give us the data table that consists of Adriana Lima, 3 months, Female.

  • Left Outer Join (⟕): This operator will combine two data tables; but while taking all tuples from the table it takes as the left parameter, it will take only the tuples with the same attributes on the data table it takes as the right parameter. For the tuples that don’t have a match on the right parameter, the attribute that is inherited from the right parameter will get the value Null.
Gym members ⟕ Zumba class members

The example above will give us the list below;

Adriana Lima, 3 months, Female

Miranda Kerr, 4 months, Null

  • Right Outer Join (): This operator will combine two data tables; but while taking all tuples from the table it takes as the right parameter, it will take only the tuples with the same attributes on the data table it takes as the left parameter. For the tuples that don’t have a match on the left parameter, the attribute that is inherited from the left parameter will get the value Null.
Gym members ⟖ Zumba class members

The example above will give us the data table below;

Adriana Lima, Female, 3 months

Kelly Gale, Female, Null

  • Full outer join (⟗): This operator will work as a union of right and left outer joins. It will combine both data tables and any tuple or attribute it can’t find a match for on either side, it will simply put in the value Null.
Gym members ⟗ Zumba class members

The example above will give the following table as the outcome;

Adriana Lima, Female, 3 Months

Miranda Kerr, Null, 4 Months

Kelly Gale, Female, Null

  • Division (÷): This operator will take two parameters, and bring out all the tuples from the first attribute that consists of the same tuples with the second parameter.
Gym members ÷ ∏ name (Zumba class members)

The example above will first create a table with only the names of the Zumba class members: Adriana Lima and Kelly Gale. And then, it will go through the gym members list to find the tuples with the exact same data in the name attribute: Adriana Lima, 3 months. Finally, it will exclude the name attribute which is already common and the output table will only consist of the data “3 months”.

  • Aggregate functions: Standing for average value there is avg, minimum value there is min, maximum value there is max, sum of values there is sum and finally to count the values there is count; there are five functions for making computations on attributes.
g sum(membership term)(Gym members)

The example above will give out the sum of all membership terms in the Gym members table, returning a data table named sum(membership term) with only one tuple, one which holds the value “7” months (3+4).

Changing the Database with Relational Algebra

There are multiple operations we can perform on the original database with the help of Relational Algebra.

  • Deletion
Gym Members <- Gym Members - σname = "Adriana Lima" (Gym Members)

The example above will delete the tuple where the name is “Adriana Lima” from the Gym Members database, and push the output database on the original Gym Members database.

  • Insertion
Gym Members <- Gym Members ∪ {("Stella Maxwell", 4)}

The example above will create a new data table with the values “Stella Maxwell” as name and “4” as membership term. Then it will combine the new data table with the Gym Members table and finally, push the outcome over the previous Gym Members table.

  • Update
Gym Members <- ∏ name, membership term *2 (Gym Members)

The example above will double the membership terms on the Gym Members database while keeping the name attributes the exact same and then push the new table over the old Gym Members table.

--

--