Back to IB HL Curriculum

A3. Databases

THEME A: SYSTEM FUNDAMENTALS

Important Note

These notes cover the main points for revision. They are great for reviewing key concepts, but for in-depth understanding, always keep your textbook nearby for reference.

💡 Exam Tip: Try not to use * in SQL queries. In exams, they usually ask for specific columns, so always mention them. For example, instead ofSELECT * FROM Employees;, useSELECT EmpName, Salary FROM Employees;

Databases are structured collections of data that allow efficient storage, retrieval, and manipulation of information. This chapter covers database structure, data types, primary keys, SQL queries, and validation techniques.

A3.1 Database Fundamentals

What is a Database?

A database is a persistent, organised store of data. If you just write data to a text file, it's hard to search and sort. Databases solve this.

  • Persistent: Data is saved on secondary storage (HDD/SSD).
  • Organised: Structured in tables to avoid duplication (redundancy).

Structure: The Relational Model

Table (Entity)

A collection of data about a specific thing (e.g., Employees, Departments).

Record (Row)

A single item in the table (e.g., One specific Hero).

Field (Column)

A specific attribute (e.g., HeroName, PowerLevel).

Keys

Primary Key (PK)

A unique identifier for every record in a table.

Example: HeroID. You can't have two heroes with ID 1.

Foreign Key (FK)

A field that links to a Primary Key in another table to create a relationship.

Example: DeptID in the Employees table links to the Departments table.

Benefits of Using a Relational Database

Data Integrity

Primary and Foreign keys enforce relationships, preventing orphan records and ensuring data remains accurate and reliable.

Data Consistency

Ensures data does not contradict itself across the database. A change in one place (e.g., customer address) reflects everywhere.

Reduced Redundancy

Normalization minimizes duplicate data storage. Data is stored once and referenced, saving space and reducing update errors.

Complex Querying

Uses declarative, industry-standard languages (SQL) to allow powerful and complex data retrieval and manipulation.

Data Concurrency

Allows multiple users to access and modify data simultaneously without conflicts, essential for modern multi-user applications.

Scalability Issues in a Relational Database

As applications grow, databases face challenges in handling increased load. Common scalability issues include:

📈 Data Volume Growth

As the amount of data grows, query performance can degrade, leading to slower response times and storage management issues.

👥 Increased Concurrency

A significant increase in simultaneous users can lead to concurrency issues, such as locking conflicts and reduced throughput.

🧩 Complexity of Queries

More sophisticated analysis and complex joins consume more CPU and memory resources, potentially bottlenecking the system.

💻 Hardware Limitations

Vertical scaling (upgrading a single server) has limits. Eventually, a single server becomes overloaded and cannot be upgraded further.

⚖️ Load Balancing

Distributing requests across multiple servers (horizontal scaling) is complex in relational databases due to consistency requirements.

🔍 Inefficient Indexing

Poorly designed or missing indexes can severely affect search times, making data retrieval slow as volume increases.

📝 Exam Corner: Database Administrator (DBA)

(e) Identify tasks that are carried out by the database administrator (DBA). [2]

Mark Scheme / Answers:

  • Data configuring / applying patches or upgrades
  • Setting permissions / passwords / access rights / ensuring security
  • Back up / recovery / archiving
  • Data cleansing / consistency checks on data / remove data errors

A3.2 SQL (Structured Query Language)

🏢 Scenario: Tech Solutions Inc.

Welcome, database admin! You have been recruited to manage the database for Tech Solutions Inc.We have employees, departments, projects, and devices. Your job is to organize this data efficiently.

1. Creating the Tables (DDL)

First, we need to build the structure to hold our data. We will create two tables: Departments and Employees.

Table 1: Departments

INT: Integer numbers.
VARCHAR(50): Text up to 50 characters.

Table 2: Employees

FOREIGN KEY: Links Employees to Departments.

2. Manipulating Data (DML)

INSERT: Adding Data

The INSERT command adds new rows to a table. Watch how our empty Employees table fills up!

Before Command (Empty)
EmpIDNameSalary
No data...
After Command
EmpIDNameSalary
101Alice Smith60000
102Bob Jones45000

UPDATE: Changing Data

Alice finished a big project! Let's update `Salary` from 60000 to 65000.

Before Update
EmpIDNameSalary
101Alice Smith60000
After Update
EmpIDNameSalary
101Alice Smith65000

DELETE: Removing Data

Charlie (EmpID 103) is leaving the company. We need to remove their record.

Before Delete
EmpIDName
101Alice Smith
103Charlie
104Dana
After Delete
EmpIDName
101Alice Smith
104Dana

3. Querying Data (DQL)

Now that we have data, we need to retrieve it. To make these examples interesting, let's assume we've added a few more employees!

Current "Employees" Table Data

EmpIDEmpNameSalaryDeptID
101Alice Smith650001
102Bob Jones450001
104Dana80000NULL
105Eve920002
106Frank550002

3.1 Selecting Specific Columns

Usually, we only need specific information, like just names and salaries. Listing columns is more efficient than selecting everything.

SQL Command
Result Output
EmpNameSalary
Alice Smith65000
Bob Jones45000
Dana80000
Eve92000
Frank55000

3.2 Projection: Creating Calculated Columns

You can create new columns by performing calculations. Let's calculate a 10% bonus for each employee.

SQL Command

AS Bonus creates a new column name for the calculated value.

Result Output
EmpNameSalaryBonus
Alice Smith650006500
Bob Jones450004500
Dana800008000
Eve920009200
Frank550005500

3.3 Filtering with WHERE

We use the WHERE clause to filter records. Let's find employees with High Salaries (Salary > 50000).

SQL Command
Result Output
EmpNameSalary
Alice Smith65000
Dana80000
Eve92000
Frank55000

3.4 Pattern Matching (LIKE)

The LIKE operator searches for patterns in text. Use % for "any characters" and _ for "exactly one character".

Quick Reference:
% = Any number of characters (0 or more)
_ = Exactly one character

1. Starts with "A"

Matches: Alice Smith

2. Ends with "e"

Matches: Alice Smith

3. Contains "an"

Matches: Dana, Frank

4. Second char is "o"

Matches: Bob Jones

5. Second-last char is "t"

Matches: Alice Smith

6. Exactly 3 characters

Matches: Eve, Bob (first names)

7. Starts "D", ends "a"

Matches: Dana

8. Third char is "b"

Matches: Bob Jones

3.5 Removing Duplicates (DISTINCT)

Use DISTINCT to get unique values and remove duplicates from results.
Let's find all unique department IDs (without repeating the same department).

Without DISTINCT
DeptID
1
1
NULL
2
2

5 rows (with duplicates)

With DISTINCT
DeptID
1
2
NULL

3 rows (unique values only)

3.6 Sorting (ORDER BY)

Let's rank our employees by Salary, from Highest to Lowest (DESC).

SQL Command
Result Output
EmpNameSalary
Eve92000
Dana80000
Alice Smith65000
Frank55000
Bob Jones45000

3.7 Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result.

COUNTCount Rows

Result:

5

Total employees

SUMTotal Sum

Result:

337000

Total payroll

AVGAverage

Result:

67400

Average salary

MINMinimum

Result:

45000

Lowest salary (Bob)

MAXMaximum

Result:

92000

Highest salary (Eve)

📊 All Together

Returns all stats in one query!

4. Deep Dive: NULLs and Counting

What is NULL?

A NULL value means no data, unknown, or not applicable. It is NOT the same as 0 (zero) or an empty string ("").

Example: Dana has DeptID = NULL because she is a new hire and hasn't been assigned a department yet.

Finding NULLs

You cannot use = NULL. You must use IS NULL.

❌ Incorrect

Returns: No rows (empty result)

✅ Correct

Result Output
EmpName
Dana

The COUNT Mystery

Be careful! COUNT(*) behaves differently than COUNT(Column) when NULLs are involved.

1. COUNT(*)

Counts total rows, including NULLs.

2. COUNT(Column)

Counts Non-NULL values only.

A3.3 Database Normalization

What is Normalization?

Normalization is a process in database design that helps organize data into tables to reduce data redundancy and improve data integrity. The goal is to minimize data anomalies while maintaining data consistency.

The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). You progress through these forms systematically: 1NF → 2NF → 3NF.

1First Normal Form (1NF)

In 1NF, a table is organized into rows and columns, with each cell containing a single, atomic value. There should be no repeating groups, arrays, or nested structures within a cell.

  • Ensure that each column has a single data type
  • Ensure that each entry in a column contains a single, non-repeating value
  • Atomic - All data must be in its most basic form

2Second Normal Form (2NF)

In 2NF, the table must already be in 1NF, and:

  • Ensure that there are no partial dependencies
  • Each non-key attribute (column) should depend on the entire primary key
  • If a column is only partially dependent on the primary key, split the table
  • Non-prime - Non-prime attributes depend on the whole key

3Third Normal Form (3NF)

In 3NF, the table must already be in 2NF, and:

  • Eliminate transitive dependencies
  • Non-key attributes should not depend on other non-key attributes
  • If a non-key attribute depends on another non-key attribute, move it to a new table with a foreign key relationship
  • Prime Transitive - No transitive dependencies between non-prime attributes

🧠 Memory Aid: ANP

Acronym Breakdown:

  • A
    Atomic (1NF)

    All data stored in atomic values

  • N
    Non-prime (2NF)

    Non-prime attributes depend on the whole key

  • P
    Prime Transitive (3NF)

    No transitive dependencies

📖 Story: "Alex's New Pizza"

Alex decided to open a new pizza place, aiming to keep the business organized from the start. They named it "ANP's Pizza".

  1. 1. Atomic Ingredients (1NF): Alex ensures every ingredient is listed separately (tomato, cheese, basil).
  2. 2. Non-prime Recipes (2NF): Each pizza's unique recipe determines the specific ingredients needed.
  3. 3. Prime Transitive Orders (3NF): Customer orders are directly linked to customers, not through intermediaries.

📚 Complete Example: Student Course Enrollment

Original Table (Not in 1NF):

StudentIDStudentNameCourses
1Zara AliMathematics, Physics
2Leo ChenChemistry, Biology
3Kai SatoComputer Science, Mathematics

❌ Problem: The "Courses" column contains multiple values (not atomic)

1First Normal Form (1NF):

Split the Courses column into separate rows for each student:

StudentIDStudentNameCourse
1Zara AliMathematics
1Zara AliPhysics
2Leo ChenChemistry
2Leo ChenBiology
3Kai SatoComputer Science
3Kai SatoMathematics

✅ Each cell now contains a single, atomic value

2Second Normal Form (2NF):

Create separate tables to eliminate partial dependencies:

Table: Students
StudentIDStudentName
1Zara Ali
2Leo Chen
3Kai Sato
Table: Courses
StudentIDCourse
1Mathematics
1Physics
2Chemistry
2Biology
3Computer Science
3Mathematics

✅ No partial dependencies - StudentName depends only on StudentID

3Third Normal Form (3NF):

Create a separate table for course details to remove transitive dependencies:

Table: Students
StudentIDName
1Zara Ali
2Leo Chen
3Kai Sato
Table: CourseDetails
CourseIDCourseName
1Mathematics
2Physics
3Chemistry
4Biology
5Computer Science
Table: StudentCourses
StudentIDCourseID
11
12
23
24
35
31

✅ No transitive dependencies - Course information is in its own table

⚠️ Database Anomalies

Anomalies are problems that can occur when you have poorly designed databases. Normalization helps prevent these issues.

Insertion Anomalies

Anomalies that occur when you encounter difficulties while adding new data to the database.

  • Incomplete Insertion: Can't add a new record without providing data for attributes that might not be relevant yet (e.g., can't add an order without a ship date)
  • Partial Dependency: Attributes depend on only part of the primary key
  • Redundant Data: Same information stored multiple times

Deletion Anomalies

  • Loss of Non-Key Information: Deleting a record may unintentionally remove other important information
  • Unintended Data Loss: Removing one piece of data causes loss of related data
  • Inconsistent Data: Partial deletions can leave the database in an inconsistent state

Update Anomalies

  • Data Inconsistency: Updating one instance of information may result in inconsistency if the same information is stored in multiple places
  • Data Dependency: Same information appears in multiple records; updating it in one place while forgetting to update it elsewhere leads to dependency issues

📝 Exam Corner: Redundant Data

(d) Outline two issues caused by storing redundant data. [4]

Mark Scheme / Answers:

  • May lead to update/deletion anomalies (data inconsistency).
  • Address changes may result in duplicate addresses (e.g., inconsistent records).
  • Consequence: Letters or fines may go to the wrong address.
  • Storing data multiple times wastes storage space.
  • Redundancy may slow down data retrieval and data entry.

📝 Exam Corner: Transaction Durability

(c) Transaction durability. [3]

Definition / Answer:

  • Durability ensures that transactions are saved permanently.
  • The user does not need to worry about the transaction being lost.
  • This applies even in the event of power loss, crashes, or errors.

📝 Exam Corner: Derived Fields

(e) Outline why the inclusion of a non persistent derived field will not affect the normalization of a database. [3]

Mark Scheme / Answers:

  • A derived field is created / temporary (does not exist in the table).
  • So normalization rules do not apply (not affected by redundancy).
  • No new dependencies are created.

🎯 Practice Question

Given the following sets, identify the functional dependencies and suggest how to normalize them:

  • SET A: {StudentID, StudentName, StudentAddress}
  • SET B: {CourseID, CourseName, CourseDescription}
  • SET C: {EnrollmentID, StudentID, CourseID, EnrollDate}

Key Functional Dependencies:

  • EnrollmentID → StudentID, EnrollDate
  • EnrollmentID → CourseID
  • StudentID → StudentName, StudentAddress
  • CourseID → CourseName, CourseDescription

📜 Final Exam Checklist

  • Always spell table and field names exactly as given in the question.
  • Write SQL commands in UPPERCASE (e.g., SELECT, FROM) to make your answer clear.
  • Check your WHERE clauses carefully. dates usually need quotes like "2023-12-25".
  • If asked to list names, check if they want FirstName, LastName, or both!
  • For normalization questions, always progress through 1NF → 2NF → 3NF systematically.
  • Remember ANP (Atomic, Non-prime, Prime Transitive) for the normal forms!
Get in Touch
CodeHaven - Master Computer Science