Back to IB SL Curriculum

A3. Databases

THEME A: SYSTEM FUNDAMENTALS

Important Note

These notes cover the main points for revision. They are great for reviewing key concepts, but for in-depth understanding, always keep your textbook nearby for reference.

πŸ’‘ Exam Tip: Try not to use * in SQL queries. In exams, they usually ask for specific columns, so always mention them. For example, instead ofSELECT * FROM Employees;, useSELECT EmpName, Salary FROM Employees;

Databases are structured collections of data that allow efficient storage, retrieval, and manipulation of information. This chapter covers database structure, data types, primary keys, SQL queries, and validation techniques.

A3.1 Database Fundamentals

What is a Database?

A database is a persistent, organised store of data. If you just write data to a text file, it's hard to search and sort. Databases solve this.

  • Persistent: Data is saved on secondary storage (HDD/SSD).
  • Organised: Structured in tables to avoid duplication (redundancy).

Structure: The Relational Model

Table (Entity)

A collection of data about a specific thing (e.g., Employees, Departments).

Record (Row)

A single item in the table (e.g., One specific Hero).

Field (Column)

A specific attribute (e.g., HeroName, PowerLevel).

Keys

Primary Key (PK)

A unique identifier for every record in a table.

Example: HeroID. You can't have two heroes with ID 1.

Foreign Key (FK)

A field that links to a Primary Key in another table to create a relationship.

Example: DeptID in the Employees table links to the Departments table.

Benefits of Using a Relational Database

Data Integrity

Primary and Foreign keys enforce relationships, preventing orphan records and ensuring data remains accurate and reliable.

Data Consistency

Ensures data does not contradict itself across the database. A change in one place (e.g., customer address) reflects everywhere.

Reduced Redundancy

Normalization minimizes duplicate data storage. Data is stored once and referenced, saving space and reducing update errors.

Complex Querying

Uses declarative, industry-standard languages (SQL) to allow powerful and complex data retrieval and manipulation.

Data Concurrency

Allows multiple users to access and modify data simultaneously without conflicts, essential for modern multi-user applications.

A3.4 ACID Properties

Understanding ACID

In database systems, ACID refers to a set of properties that guarantee database transactions are processed reliably. A transaction is a single logical unit of work (e.g., transferring money between accounts).

Atomicity

"All or Nothing"

A transaction must be completed in its entirety. If any part of the transaction fails, the entire transaction is rolled back to its previous state.

Consistency

"Preserving Rules"

A transaction must transform the database from one valid state to another. All predefined rules (constraints, cascades, triggers) must be followed.

Isolation

"Independence"

Transactions occur independently of one another. Even if multiple transactions happen simultaneously, the end result for each must be the same as if they had occurred sequentially.

Durability

"Permanent Storage"

Once a transaction is committed, it remains committed even in the event of a system failure (e.g., power outage or crash). Data is permanently written to non-volatile storage.

Real-World Scenario: Bank Transfer

The Transaction:

Transferring $100 from Account A to Account B.

1

Deduct $100 from Account A

2

Add $100 to Account B

How ACID protects this:

  • Atomicity:If power fails after step 1, the $100 is returned to Account A.
  • Consistency:The total amount of money in the bank ($A + $B) remains constant.

Scalability Issues in a Relational Database

As applications grow, databases face challenges in handling increased load. Common scalability issues include:

πŸ“ˆ Data Volume Growth

As the amount of data grows, query performance can degrade, leading to slower response times and storage management issues.

πŸ‘₯ Increased Concurrency

A significant increase in simultaneous users can lead to concurrency issues, such as locking conflicts and reduced throughput.

🧩 Complexity of Queries

More sophisticated analysis and complex joins consume more CPU and memory resources, potentially bottlenecking the system.

πŸ’» Hardware Limitations

Vertical scaling (upgrading a single server) has limits. Eventually, a single server becomes overloaded and cannot be upgraded further.

βš–οΈ Load Balancing

Distributing requests across multiple servers (horizontal scaling) is complex in relational databases due to consistency requirements.

πŸ” Inefficient Indexing

Poorly designed or missing indexes can severely affect search times, making data retrieval slow as volume increases.

πŸ“ Exam Corner: Database Administrator (DBA)

(e) Identify tasks that are carried out by the database administrator (DBA). [2]

Mark Scheme / Answers:

  • Data configuring / applying patches or upgrades
  • Setting permissions / passwords / access rights / ensuring security
  • Back up / recovery / archiving
  • Data cleansing / consistency checks on data / remove data errors

A3.2 SQL (Structured Query Language)

🏒 Scenario: Tech Solutions Inc.

Welcome, database admin! You have been recruited to manage the database for Tech Solutions Inc.We have employees, departments, projects, and devices. Your job is to organize this data efficiently.

1. Creating the Tables (DDL)

First, we need to build the structure to hold our data. We will create two tables: Departments and Employees.

Table 1: Departments

INT: Integer numbers.
VARCHAR(50): Text up to 50 characters.

Table 2: Employees

FOREIGN KEY: Links Employees to Departments.

2. Modifying Structure (DDL)

ALTER TABLE: Changing Table Structure

What if we created the table but later realized we needed an extra column? Or maybe a column data type needs changing? We use ALTER TABLE to modify the structure of an existing table.

ADD Column

Add a new column for Employee Phone Numbers.

MODIFY Column

Increase the size of the EmpName column to 100.

DROP Column

Remove the Phone column if it's no longer needed.

3. Manipulating Data (DML)

INSERT: Adding Data

The INSERT command adds new rows to a table. Watch how our empty Employees table fills up!

Before Command (Empty)
EmpIDNameSalary
No data...
After Command
EmpIDNameSalary
101Alice Smith60000
102Bob Jones45000

UPDATE: Changing Data

Alice finished a big project! Let's update `Salary` from 60000 to 65000.

Before Update
EmpIDNameSalary
101Alice Smith60000
After Update
EmpIDNameSalary
101Alice Smith65000

DELETE: Removing Data

Charlie (EmpID 103) is leaving the company. We need to remove their record.

Before Delete
EmpIDName
101Alice Smith
103Charlie
104Dana
After Delete
EmpIDName
101Alice Smith
104Dana

4. Querying Data (DQL)

Now that we have data, we need to retrieve it. To make these examples interesting, let's assume we've added a few more employees!

Current "Employees" Table Data

EmpIDEmpNameSalaryDeptID
101Alice Smith650001
102Bob Jones450001
104Dana80000NULL
105Eve920002
106Frank550002

3.1 Selecting Specific Columns

Usually, we only need specific information, like just names and salaries. Listing columns is more efficient than selecting everything.

SQL Command
Result Output
EmpNameSalary
Alice Smith65000
Bob Jones45000
Dana80000
Eve92000
Frank55000

3.2 Projection: Creating Calculated Columns

You can create new columns by performing calculations. Let's calculate a 10% bonus for each employee.

SQL Command

AS Bonus creates a new column name for the calculated value.

Result Output
EmpNameSalaryBonus
Alice Smith650006500
Bob Jones450004500
Dana800008000
Eve920009200
Frank550005500

3.3 Filtering with WHERE

We use the WHERE clause to filter records. Let's find employees with High Salaries (Salary > 50000).

SQL Command
Result Output
EmpNameSalary
Alice Smith65000
Dana80000
Eve92000
Frank55000

3.4 Pattern Matching (LIKE)

The LIKE operator searches for patterns in text. Use % for "any characters" and _ for "exactly one character".

Quick Reference:
% = Any number of characters (0 or more)
_ = Exactly one character

1. Starts with "A"

Matches: Alice Smith

2. Ends with "e"

Matches: Alice Smith

3. Contains "an"

Matches: Dana, Frank

4. Second char is "o"

Matches: Bob Jones

5. Second-last char is "t"

Matches: Alice Smith

6. Exactly 3 characters

Matches: Eve, Bob (first names)

7. Starts "D", ends "a"

Matches: Dana

8. Third char is "b"

Matches: Bob Jones

3.5 Removing Duplicates (DISTINCT)

Use DISTINCT to get unique values and remove duplicates from results.
Let's find all unique department IDs (without repeating the same department).

Without DISTINCT
DeptID
1
1
NULL
2
2

5 rows (with duplicates)

With DISTINCT
DeptID
1
2
NULL

3 rows (unique values only)

3.6 Sorting (ORDER BY)

Let's rank our employees by Salary, from Highest to Lowest (DESC).

SQL Command
Result Output
EmpNameSalary
Eve92000
Dana80000
Alice Smith65000
Frank55000
Bob Jones45000

3.7 Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result.

COUNTCount Rows

Result:

5

Total employees

SUMTotal Sum

Result:

337000

Total payroll

AVGAverage

Result:

67400

Average salary

MINMinimum

Result:

45000

Lowest salary (Bob)

MAXMaximum

Result:

92000

Highest salary (Eve)

πŸ“Š All Together

Returns all stats in one query!

5. Nested Subqueries

A subquery (or nested query) is a query within another query. We use the result of the inner query to filter or feed data into the outer query.

Example: Finding Top Earners

Let's say we want to find the names of all employees who earn more than the average salary. We don't know the exact mathematical average beforehand! We have to calculate it first using a subquery.

SQL Command

The inner query (SELECT AVG(Salary) FROM Employees) runs first and calculates the average (67400).
The outer query then finds everyone earning more than 67400.

Result Output
EmpNameSalary
Dana80000
Eve92000

6. Deep Dive: NULLs and Counting

What is NULL?

A NULL value means no data, unknown, or not applicable. It is NOT the same as 0 (zero) or an empty string ("").

Example: Dana has DeptID = NULL because she is a new hire and hasn't been assigned a department yet.

Finding NULLs

You cannot use = NULL. You must use IS NULL.

❌ Incorrect

Returns: No rows (empty result)

βœ… Correct

Result Output
EmpName
Dana

The COUNT Mystery

Be careful! COUNT(*) behaves differently than COUNT(Column) when NULLs are involved.

1. COUNT(*)

Counts total rows, including NULLs.

2. COUNT(Column)

Counts Non-NULL values only.

A3.3 Database Normalization

What is Normalization?

Normalization is a process in database design that helps organize data into tables to reduce data redundancy and improve data integrity. The goal is to minimize data anomalies while maintaining data consistency.

The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). You progress through these forms systematically: 1NF β†’ 2NF β†’ 3NF.

1First Normal Form (1NF)

In 1NF, a table is organized into rows and columns, with each cell containing a single, atomic value. There should be no repeating groups, arrays, or nested structures within a cell.

  • Ensure that each column has a single data type
  • Ensure that each entry in a column contains a single, non-repeating value
  • Atomic - All data must be in its most basic form

2Second Normal Form (2NF)

In 2NF, the table must already be in 1NF, and:

  • Ensure that there are no partial dependencies
  • Each non-key attribute (column) should depend on the entire primary key
  • If a column is only partially dependent on the primary key, split the table
  • Non-prime - Non-prime attributes depend on the whole key

3Third Normal Form (3NF)

In 3NF, the table must already be in 2NF, and:

  • Eliminate transitive dependencies
  • Non-key attributes should not depend on other non-key attributes
  • If a non-key attribute depends on another non-key attribute, move it to a new table with a foreign key relationship
  • Prime Transitive - No transitive dependencies between non-prime attributes

🧠 Memory Aid: ANP

Acronym Breakdown:

  • A
    Atomic (1NF)

    All data stored in atomic values

  • N
    Non-prime (2NF)

    Non-prime attributes depend on the whole key

  • P
    Prime Transitive (3NF)

    No transitive dependencies

πŸ“– Story: "Alex's New Pizza"

Alex decided to open a new pizza place, aiming to keep the business organized from the start. They named it "ANP's Pizza".

  1. 1. Atomic Ingredients (1NF): Alex ensures every ingredient is listed separately (tomato, cheese, basil).
  2. 2. Non-prime Recipes (2NF): Each pizza's unique recipe determines the specific ingredients needed.
  3. 3. Prime Transitive Orders (3NF): Customer orders are directly linked to customers, not through intermediaries.

πŸ“š Complete Example: Student Course Enrollment

Original Table (Not in 1NF):

StudentIDStudentNameCourses
1Zara AliMathematics, Physics
2Leo ChenChemistry, Biology
3Kai SatoComputer Science, Mathematics

❌ Problem: The "Courses" column contains multiple values (not atomic)

1First Normal Form (1NF):

Split the Courses column into separate rows for each student:

StudentIDStudentNameCourse
1Zara AliMathematics
1Zara AliPhysics
2Leo ChenChemistry
2Leo ChenBiology
3Kai SatoComputer Science
3Kai SatoMathematics

βœ… Each cell now contains a single, atomic value

2Second Normal Form (2NF):

Create separate tables to eliminate partial dependencies:

Table: Students
StudentIDStudentName
1Zara Ali
2Leo Chen
3Kai Sato
Table: Courses
StudentIDCourse
1Mathematics
1Physics
2Chemistry
2Biology
3Computer Science
3Mathematics

βœ… No partial dependencies - StudentName depends only on StudentID

3Third Normal Form (3NF):

Create a separate table for course details to remove transitive dependencies:

Table: Students
StudentIDName
1Zara Ali
2Leo Chen
3Kai Sato
Table: CourseDetails
CourseIDCourseName
1Mathematics
2Physics
3Chemistry
4Biology
5Computer Science
Table: StudentCourses
StudentIDCourseID
11
12
23
24
35
31

βœ… No transitive dependencies - Course information is in its own table

⚠️ Database Anomalies

Anomalies are problems that can occur when you have poorly designed databases. Normalization helps prevent these issues.

Insertion Anomalies

Anomalies that occur when you encounter difficulties while adding new data to the database.

  • Incomplete Insertion: Can't add a new record without providing data for attributes that might not be relevant yet (e.g., can't add an order without a ship date)
  • Partial Dependency: Attributes depend on only part of the primary key
  • Redundant Data: Same information stored multiple times

Deletion Anomalies

  • Loss of Non-Key Information: Deleting a record may unintentionally remove other important information
  • Unintended Data Loss: Removing one piece of data causes loss of related data
  • Inconsistent Data: Partial deletions can leave the database in an inconsistent state

Update Anomalies

  • Data Inconsistency: Updating one instance of information may result in inconsistency if the same information is stored in multiple places
  • Data Dependency: Same information appears in multiple records; updating it in one place while forgetting to update it elsewhere leads to dependency issues

πŸ“ Exam Corner: Redundant Data

(d) Outline two issues caused by storing redundant data. [4]

Mark Scheme / Answers:

  • May lead to update/deletion anomalies (data inconsistency).
  • Address changes may result in duplicate addresses (e.g., inconsistent records).
  • Consequence: Letters or fines may go to the wrong address.
  • Storing data multiple times wastes storage space.
  • Redundancy may slow down data retrieval and data entry.

πŸ“ Exam Corner: Transaction Durability

(c) Transaction durability. [3]

Definition / Answer:

  • Durability ensures that transactions are saved permanently.
  • The user does not need to worry about the transaction being lost.
  • This applies even in the event of power loss, crashes, or errors.

πŸ“ Exam Corner: Derived Fields

(e) Outline why the inclusion of a non persistent derived field will not affect the normalization of a database. [3]

Mark Scheme / Answers:

  • A derived field is created / temporary (does not exist in the table).
  • So normalization rules do not apply (not affected by redundancy).
  • No new dependencies are created.

🎯 Practice Question

Given the following sets, identify the functional dependencies and suggest how to normalize them:

  • SET A: {StudentID, StudentName, StudentAddress}
  • SET B: {CourseID, CourseName, CourseDescription}
  • SET C: {EnrollmentID, StudentID, CourseID, EnrollDate}

Key Functional Dependencies:

  • EnrollmentID β†’ StudentID, EnrollDate
  • EnrollmentID β†’ CourseID
  • StudentID β†’ StudentName, StudentAddress
  • CourseID β†’ CourseName, CourseDescription

A3.4 ACID Properties

In a database management system, a transaction is a single logical unit of work. To ensure the reliability and integrity of these transactions, databases must follow the ACID rules.

AAtomicity

The "All or Nothing" rule. A transaction consists of multiple steps, and atomicity ensures that either all steps are completed successfully, or none of them are. If any part fails, the entire transaction is rolled back to its original state.

Example: Transferring money. If the money is deducted from Account A but the system crashes before adding it to Account B, the deduction is undone.

CConsistency

Ensures the database moves from one valid state to another, maintaining all predefined rules (constraints, cascades, triggers). It guarantees that no transaction can leave the database in an illegal state.

Example: In a banking system, the total amount of money across all accounts must remain the same before and after a transfer.

IIsolation

Transactions occur independently and concurrently without interference. One transaction should not see the intermediate results of another unfinished transaction. The final result must be the same as if transactions were executed one after another.

Example: Two people booking the same seat on a plane simultaneously. Isolation ensures only one successful booking is recorded.

DDurability

Once a transaction has been committed, it remains permanent, even in the case of a system failure, power loss, or crash. Changes are written to non-volatile storage (like a hard drive).

Example: After you see "Transaction Successful" on an ATM, your balance is saved forever, even if the bank's power goes out.

The ACID Bank Scenario

Imagine Rahul is transferring β‚Ή5,000 to Priya. Here is how ACID works:

  1. A

    If the deduction from Rahul succeeds but the addition to Priya fails, the system undoes Rahul's deduction. Status: Atomic.

  2. C

    The total sum of Rahul's and Priya's accounts remains the same before and after. Status: Consistent.

  3. I

    If Rahul tries to send money to two people at once, the system handles them separately. Status: Isolated.

  4. D

    Once the screen says "Transfer Done", if the bank's power goes out, the money stays transferred. Status: Durable.

A3.5 Database Schemas (Three-Schema Architecture)

The Three-Schema Architecture

Also known as the ANSI-SPARC Architecture, this model separates the user applications from the physical database. The goal is to provide Data Independenceβ€”allowing changes at one level without affecting others.

1. External Level

Viewer's Perspective

The User View. Describes only the part of the database that is relevant to a specific user (e.g., a student sees grades, but not teacher salaries).

2. Conceptual Level

Logical Perspective

The Community View. Defines what data is stored and the relationships between them. This is where the DBA works.

3. Physical Level

Storage Perspective

The System View. Describes how the data is actually stored on the disks (data structures, file organizations, indexing).

Data Independence

Logical Data Independence

Capacity to change the conceptual schema without having to change external schemas.

Physical Data Independence

Capacity to change the internal schema without having to change the conceptual schema.

A3.6 Validation & Verification

Data Validation

Automated checks performed by the system to ensure data is reasonable and sensible before it is accepted.

Presence Check

Ensures data is not left blank (Required fields)

Range Check

Ensures data falls within specific limits (e.g., Age: 1-120)

Length Check

Ensures correct number of characters (e.g., Password min 8 chars)

Format Check

Ensures data matches a pattern (e.g., Postcode: AA11 1AA)

Type Check

Ensures correct data type (e.g., Number vs String)

Data Verification

Checks performed to ensure data entered exactly matches the original source document.

Double Entry

The same data is entered twice (usually by two different people) and the system compares them. If they don't match, an error is flagged.

Visual Check

The user manually compares the data on the screen with the original paper document to check for any transcription errors.

⚠️ Critical Difference: Data can pass Validation (ranges, formats) but still be wrong. Only Verification ensures it matches the source truth.

The "Golden Rule" Comparison

ScenarioValidationVerification
Entering age as "150"Fails (Range Check)Passes (it matches input)
Mis-typing "John" as "Jhon"Passes (Format OK)Fails (Mismatch)
Leaving field emptyFails (Presence)N/A

A3.7 Entity-Relationship Diagrams (ERD)

An ERD is a visual representation of different entities within a system and how they relate to each other. The primary components are Entities, Attributes, and Relationships.

Entity

Entity

"The Thing"

Attribute

Attribute

"The Property"

Relate

Relationship

"The Connection"

Crow's Foot Notation (Cardinality)

One-to-One (1:1)

E1
E2

One Person has exactly one Passport.

One-to-Many (1:M)

E1
Many

One Mother can have many Children.

Many-to-Many (M:N)

Many Students take many Courses.

πŸ’‘ Junction Table Rule: Databases cannot handle M:N directly. We must resolve this by creating a third table (e.g., "Enrollments") to turn it into two 1:M relationships.

Final Exam Strategy

  • 1Spell table/field names exactly as per question.
  • 2Use UPPERCASE for SLQ keywords (SELECT, FROM).
  • 3Quote strings and dates in WHERE clauses.
  • 4Verify capitalization of entity names (Singular).
  • 5Check for junction tables in M:N relationships.
  • 6Always process 1NF β†’ 2NF β†’ 3NF step-by-step.
Get in Touch
CodeHaven - Master Computer Science