A3. Databases
THEME A: SYSTEM FUNDAMENTALS
Important Note
These notes cover the main points for revision. They are great for reviewing key concepts, but for in-depth understanding, always keep your textbook nearby for reference.
π‘ Exam Tip: Try not to use * in SQL queries. In exams, they usually ask for specific columns, so always mention them. For example, instead ofSELECT * FROM Employees;, useSELECT EmpName, Salary FROM Employees;
Databases are structured collections of data that allow efficient storage, retrieval, and manipulation of information. This chapter covers database structure, data types, primary keys, SQL queries, and validation techniques.
Table of Contents
A3.1 Database Fundamentals
A3.2 SQL (Structured Query Language)
- β’ Setting the Scene: Tech Solutions
- β’ CREATE TABLE
- β’ ALTER TABLE
- β’ INSERT, UPDATE, DELETE
- β’ SELECT & Filtering
- β’ Nested Subqueries
- β’ Aggregation & NULLs
A3.3 Database Normalization
A3.4 ACID Properties
A3.5 Database Schemas
A3.6 Validation & Verification
A3.7 ERD & Crow's Foot
A3.1 Database Fundamentals
What is a Database?
A database is a persistent, organised store of data. If you just write data to a text file, it's hard to search and sort. Databases solve this.
- Persistent: Data is saved on secondary storage (HDD/SSD).
- Organised: Structured in tables to avoid duplication (redundancy).
Structure: The Relational Model
Table (Entity)
A collection of data about a specific thing (e.g., Employees, Departments).
Record (Row)
A single item in the table (e.g., One specific Hero).
Field (Column)
A specific attribute (e.g., HeroName, PowerLevel).
Keys
Primary Key (PK)
A unique identifier for every record in a table.
Example: HeroID. You can't have two heroes with ID 1.
Foreign Key (FK)
A field that links to a Primary Key in another table to create a relationship.
Example: DeptID in the Employees table links to the Departments table.
Benefits of Using a Relational Database
Data Integrity
Primary and Foreign keys enforce relationships, preventing orphan records and ensuring data remains accurate and reliable.
Data Consistency
Ensures data does not contradict itself across the database. A change in one place (e.g., customer address) reflects everywhere.
Reduced Redundancy
Normalization minimizes duplicate data storage. Data is stored once and referenced, saving space and reducing update errors.
Complex Querying
Uses declarative, industry-standard languages (SQL) to allow powerful and complex data retrieval and manipulation.
Data Concurrency
Allows multiple users to access and modify data simultaneously without conflicts, essential for modern multi-user applications.
A3.4 ACID Properties
Understanding ACID
In database systems, ACID refers to a set of properties that guarantee database transactions are processed reliably. A transaction is a single logical unit of work (e.g., transferring money between accounts).
Atomicity
"All or Nothing"
A transaction must be completed in its entirety. If any part of the transaction fails, the entire transaction is rolled back to its previous state.
Consistency
"Preserving Rules"
A transaction must transform the database from one valid state to another. All predefined rules (constraints, cascades, triggers) must be followed.
Isolation
"Independence"
Transactions occur independently of one another. Even if multiple transactions happen simultaneously, the end result for each must be the same as if they had occurred sequentially.
Durability
"Permanent Storage"
Once a transaction is committed, it remains committed even in the event of a system failure (e.g., power outage or crash). Data is permanently written to non-volatile storage.
Real-World Scenario: Bank Transfer
The Transaction:
Transferring $100 from Account A to Account B.
Deduct $100 from Account A
Add $100 to Account B
How ACID protects this:
- Atomicity:If power fails after step 1, the $100 is returned to Account A.
- Consistency:The total amount of money in the bank ($A + $B) remains constant.
Scalability Issues in a Relational Database
As applications grow, databases face challenges in handling increased load. Common scalability issues include:
π Data Volume Growth
As the amount of data grows, query performance can degrade, leading to slower response times and storage management issues.
π₯ Increased Concurrency
A significant increase in simultaneous users can lead to concurrency issues, such as locking conflicts and reduced throughput.
π§© Complexity of Queries
More sophisticated analysis and complex joins consume more CPU and memory resources, potentially bottlenecking the system.
π» Hardware Limitations
Vertical scaling (upgrading a single server) has limits. Eventually, a single server becomes overloaded and cannot be upgraded further.
βοΈ Load Balancing
Distributing requests across multiple servers (horizontal scaling) is complex in relational databases due to consistency requirements.
π Inefficient Indexing
Poorly designed or missing indexes can severely affect search times, making data retrieval slow as volume increases.
π Exam Corner: Database Administrator (DBA)
(e) Identify tasks that are carried out by the database administrator (DBA). [2]
Mark Scheme / Answers:
- Data configuring / applying patches or upgrades
- Setting permissions / passwords / access rights / ensuring security
- Back up / recovery / archiving
- Data cleansing / consistency checks on data / remove data errors
A3.2 SQL (Structured Query Language)
π’ Scenario: Tech Solutions Inc.
Welcome, database admin! You have been recruited to manage the database for Tech Solutions Inc.We have employees, departments, projects, and devices. Your job is to organize this data efficiently.
1. Creating the Tables (DDL)
First, we need to build the structure to hold our data. We will create two tables: Departments and Employees.
Table 1: Departments
INT: Integer numbers.
VARCHAR(50): Text up to 50 characters.
Table 2: Employees
FOREIGN KEY: Links Employees to Departments.
2. Modifying Structure (DDL)
ALTER TABLE: Changing Table Structure
What if we created the table but later realized we needed an extra column? Or maybe a column data type needs changing? We use ALTER TABLE to modify the structure of an existing table.
ADD Column
Add a new column for Employee Phone Numbers.
MODIFY Column
Increase the size of the EmpName column to 100.
DROP Column
Remove the Phone column if it's no longer needed.
3. Manipulating Data (DML)
INSERT: Adding Data
The INSERT command adds new rows to a table. Watch how our empty Employees table fills up!
Before Command (Empty)
| EmpID | Name | Salary |
|---|---|---|
| No data... | ||
After Command
| EmpID | Name | Salary |
|---|---|---|
| 101 | Alice Smith | 60000 |
| 102 | Bob Jones | 45000 |
UPDATE: Changing Data
Alice finished a big project! Let's update `Salary` from 60000 to 65000.
Before Update
| EmpID | Name | Salary |
|---|---|---|
| 101 | Alice Smith | 60000 |
After Update
| EmpID | Name | Salary |
|---|---|---|
| 101 | Alice Smith | 65000 |
DELETE: Removing Data
Charlie (EmpID 103) is leaving the company. We need to remove their record.
Before Delete
| EmpID | Name |
|---|---|
| 101 | Alice Smith |
| 103 | Charlie |
| 104 | Dana |
After Delete
| EmpID | Name |
|---|---|
| 101 | Alice Smith |
| 104 | Dana |
4. Querying Data (DQL)
Now that we have data, we need to retrieve it. To make these examples interesting, let's assume we've added a few more employees!
Current "Employees" Table Data
| EmpID | EmpName | Salary | DeptID |
|---|---|---|---|
| 101 | Alice Smith | 65000 | 1 |
| 102 | Bob Jones | 45000 | 1 |
| 104 | Dana | 80000 | NULL |
| 105 | Eve | 92000 | 2 |
| 106 | Frank | 55000 | 2 |
3.1 Selecting Specific Columns
Usually, we only need specific information, like just names and salaries. Listing columns is more efficient than selecting everything.
SQL Command
Result Output
| EmpName | Salary |
|---|---|
| Alice Smith | 65000 |
| Bob Jones | 45000 |
| Dana | 80000 |
| Eve | 92000 |
| Frank | 55000 |
3.2 Projection: Creating Calculated Columns
You can create new columns by performing calculations. Let's calculate a 10% bonus for each employee.
SQL Command
AS Bonus creates a new column name for the calculated value.
Result Output
| EmpName | Salary | Bonus |
|---|---|---|
| Alice Smith | 65000 | 6500 |
| Bob Jones | 45000 | 4500 |
| Dana | 80000 | 8000 |
| Eve | 92000 | 9200 |
| Frank | 55000 | 5500 |
3.3 Filtering with WHERE
We use the WHERE clause to filter records. Let's find employees with High Salaries (Salary > 50000).
SQL Command
Result Output
| EmpName | Salary |
|---|---|
| Alice Smith | 65000 |
| Dana | 80000 |
| Eve | 92000 |
| Frank | 55000 |
3.4 Pattern Matching (LIKE)
The LIKE operator searches for patterns in text. Use % for "any characters" and _ for "exactly one character".
Quick Reference:
% = Any number of characters (0 or more)_ = Exactly one character1. Starts with "A"
Matches: Alice Smith
2. Ends with "e"
Matches: Alice Smith
3. Contains "an"
Matches: Dana, Frank
4. Second char is "o"
Matches: Bob Jones
5. Second-last char is "t"
Matches: Alice Smith
6. Exactly 3 characters
Matches: Eve, Bob (first names)
7. Starts "D", ends "a"
Matches: Dana
8. Third char is "b"
Matches: Bob Jones
3.5 Removing Duplicates (DISTINCT)
Use DISTINCT to get unique values and remove duplicates from results.
Let's find all unique department IDs (without repeating the same department).
Without DISTINCT
| DeptID |
|---|
| 1 |
| 1 |
| NULL |
| 2 |
| 2 |
5 rows (with duplicates)
With DISTINCT
| DeptID |
|---|
| 1 |
| 2 |
| NULL |
3 rows (unique values only)
3.6 Sorting (ORDER BY)
Let's rank our employees by Salary, from Highest to Lowest (DESC).
SQL Command
Result Output
| EmpName | Salary |
|---|---|
| Eve | 92000 |
| Dana | 80000 |
| Alice Smith | 65000 |
| Frank | 55000 |
| Bob Jones | 45000 |
3.7 Aggregate Functions
Aggregate functions perform calculations on multiple rows and return a single result.
COUNTCount Rows
Result:
5
Total employees
SUMTotal Sum
Result:
337000
Total payroll
AVGAverage
Result:
67400
Average salary
MINMinimum
Result:
45000
Lowest salary (Bob)
MAXMaximum
Result:
92000
Highest salary (Eve)
π All Together
Returns all stats in one query!
5. Nested Subqueries
A subquery (or nested query) is a query within another query. We use the result of the inner query to filter or feed data into the outer query.
Example: Finding Top Earners
Let's say we want to find the names of all employees who earn more than the average salary. We don't know the exact mathematical average beforehand! We have to calculate it first using a subquery.
SQL Command
The inner query (SELECT AVG(Salary) FROM Employees) runs first and calculates the average (67400).
The outer query then finds everyone earning more than 67400.
Result Output
| EmpName | Salary |
|---|---|
| Dana | 80000 |
| Eve | 92000 |
6. Deep Dive: NULLs and Counting
What is NULL?
A NULL value means no data, unknown, or not applicable. It is NOT the same as 0 (zero) or an empty string ("").
Example: Dana has DeptID = NULL because she is a new hire and hasn't been assigned a department yet.
Finding NULLs
You cannot use = NULL. You must use IS NULL.
β Incorrect
Returns: No rows (empty result)
β Correct
Result Output
| EmpName |
|---|
| Dana |
The COUNT Mystery
Be careful! COUNT(*) behaves differently than COUNT(Column) when NULLs are involved.
1. COUNT(*)
Counts total rows, including NULLs.
2. COUNT(Column)
Counts Non-NULL values only.
A3.3 Database Normalization
What is Normalization?
Normalization is a process in database design that helps organize data into tables to reduce data redundancy and improve data integrity. The goal is to minimize data anomalies while maintaining data consistency.
The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). You progress through these forms systematically: 1NF β 2NF β 3NF.
1First Normal Form (1NF)
In 1NF, a table is organized into rows and columns, with each cell containing a single, atomic value. There should be no repeating groups, arrays, or nested structures within a cell.
- Ensure that each column has a single data type
- Ensure that each entry in a column contains a single, non-repeating value
- Atomic - All data must be in its most basic form
2Second Normal Form (2NF)
In 2NF, the table must already be in 1NF, and:
- Ensure that there are no partial dependencies
- Each non-key attribute (column) should depend on the entire primary key
- If a column is only partially dependent on the primary key, split the table
- Non-prime - Non-prime attributes depend on the whole key
3Third Normal Form (3NF)
In 3NF, the table must already be in 2NF, and:
- Eliminate transitive dependencies
- Non-key attributes should not depend on other non-key attributes
- If a non-key attribute depends on another non-key attribute, move it to a new table with a foreign key relationship
- Prime Transitive - No transitive dependencies between non-prime attributes
π§ Memory Aid: ANP
Acronym Breakdown:
- AAtomic (1NF)
All data stored in atomic values
- NNon-prime (2NF)
Non-prime attributes depend on the whole key
- PPrime Transitive (3NF)
No transitive dependencies
π Story: "Alex's New Pizza"
Alex decided to open a new pizza place, aiming to keep the business organized from the start. They named it "ANP's Pizza".
- 1. Atomic Ingredients (1NF): Alex ensures every ingredient is listed separately (tomato, cheese, basil).
- 2. Non-prime Recipes (2NF): Each pizza's unique recipe determines the specific ingredients needed.
- 3. Prime Transitive Orders (3NF): Customer orders are directly linked to customers, not through intermediaries.
π Complete Example: Student Course Enrollment
Original Table (Not in 1NF):
| StudentID | StudentName | Courses |
|---|---|---|
| 1 | Zara Ali | Mathematics, Physics |
| 2 | Leo Chen | Chemistry, Biology |
| 3 | Kai Sato | Computer Science, Mathematics |
β Problem: The "Courses" column contains multiple values (not atomic)
1First Normal Form (1NF):
Split the Courses column into separate rows for each student:
| StudentID | StudentName | Course |
|---|---|---|
| 1 | Zara Ali | Mathematics |
| 1 | Zara Ali | Physics |
| 2 | Leo Chen | Chemistry |
| 2 | Leo Chen | Biology |
| 3 | Kai Sato | Computer Science |
| 3 | Kai Sato | Mathematics |
β Each cell now contains a single, atomic value
2Second Normal Form (2NF):
Create separate tables to eliminate partial dependencies:
Table: Students
| StudentID | StudentName |
|---|---|
| 1 | Zara Ali |
| 2 | Leo Chen |
| 3 | Kai Sato |
Table: Courses
| StudentID | Course |
|---|---|
| 1 | Mathematics |
| 1 | Physics |
| 2 | Chemistry |
| 2 | Biology |
| 3 | Computer Science |
| 3 | Mathematics |
β No partial dependencies - StudentName depends only on StudentID
3Third Normal Form (3NF):
Create a separate table for course details to remove transitive dependencies:
Table: Students
| StudentID | Name |
|---|---|
| 1 | Zara Ali |
| 2 | Leo Chen |
| 3 | Kai Sato |
Table: CourseDetails
| CourseID | CourseName |
|---|---|
| 1 | Mathematics |
| 2 | Physics |
| 3 | Chemistry |
| 4 | Biology |
| 5 | Computer Science |
Table: StudentCourses
| StudentID | CourseID |
|---|---|
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 5 |
| 3 | 1 |
β No transitive dependencies - Course information is in its own table
β οΈ Database Anomalies
Anomalies are problems that can occur when you have poorly designed databases. Normalization helps prevent these issues.
Insertion Anomalies
Anomalies that occur when you encounter difficulties while adding new data to the database.
- Incomplete Insertion: Can't add a new record without providing data for attributes that might not be relevant yet (e.g., can't add an order without a ship date)
- Partial Dependency: Attributes depend on only part of the primary key
- Redundant Data: Same information stored multiple times
Deletion Anomalies
- Loss of Non-Key Information: Deleting a record may unintentionally remove other important information
- Unintended Data Loss: Removing one piece of data causes loss of related data
- Inconsistent Data: Partial deletions can leave the database in an inconsistent state
Update Anomalies
- Data Inconsistency: Updating one instance of information may result in inconsistency if the same information is stored in multiple places
- Data Dependency: Same information appears in multiple records; updating it in one place while forgetting to update it elsewhere leads to dependency issues
π Exam Corner: Redundant Data
(d) Outline two issues caused by storing redundant data. [4]
Mark Scheme / Answers:
- May lead to update/deletion anomalies (data inconsistency).
- Address changes may result in duplicate addresses (e.g., inconsistent records).
- Consequence: Letters or fines may go to the wrong address.
- Storing data multiple times wastes storage space.
- Redundancy may slow down data retrieval and data entry.
π Exam Corner: Transaction Durability
(c) Transaction durability. [3]
Definition / Answer:
- Durability ensures that transactions are saved permanently.
- The user does not need to worry about the transaction being lost.
- This applies even in the event of power loss, crashes, or errors.
π Exam Corner: Derived Fields
(e) Outline why the inclusion of a non persistent derived field will not affect the normalization of a database. [3]
Mark Scheme / Answers:
- A derived field is created / temporary (does not exist in the table).
- So normalization rules do not apply (not affected by redundancy).
- No new dependencies are created.
π― Practice Question
Given the following sets, identify the functional dependencies and suggest how to normalize them:
- SET A: {StudentID, StudentName, StudentAddress}
- SET B: {CourseID, CourseName, CourseDescription}
- SET C: {EnrollmentID, StudentID, CourseID, EnrollDate}
Key Functional Dependencies:
- EnrollmentID β StudentID, EnrollDate
- EnrollmentID β CourseID
- StudentID β StudentName, StudentAddress
- CourseID β CourseName, CourseDescription
A3.4 ACID Properties
In a database management system, a transaction is a single logical unit of work. To ensure the reliability and integrity of these transactions, databases must follow the ACID rules.
AAtomicity
The "All or Nothing" rule. A transaction consists of multiple steps, and atomicity ensures that either all steps are completed successfully, or none of them are. If any part fails, the entire transaction is rolled back to its original state.
Example: Transferring money. If the money is deducted from Account A but the system crashes before adding it to Account B, the deduction is undone.
CConsistency
Ensures the database moves from one valid state to another, maintaining all predefined rules (constraints, cascades, triggers). It guarantees that no transaction can leave the database in an illegal state.
Example: In a banking system, the total amount of money across all accounts must remain the same before and after a transfer.
IIsolation
Transactions occur independently and concurrently without interference. One transaction should not see the intermediate results of another unfinished transaction. The final result must be the same as if transactions were executed one after another.
Example: Two people booking the same seat on a plane simultaneously. Isolation ensures only one successful booking is recorded.
DDurability
Once a transaction has been committed, it remains permanent, even in the case of a system failure, power loss, or crash. Changes are written to non-volatile storage (like a hard drive).
Example: After you see "Transaction Successful" on an ATM, your balance is saved forever, even if the bank's power goes out.
The ACID Bank Scenario
Imagine Rahul is transferring βΉ5,000 to Priya. Here is how ACID works:
- A
If the deduction from Rahul succeeds but the addition to Priya fails, the system undoes Rahul's deduction. Status: Atomic.
- C
The total sum of Rahul's and Priya's accounts remains the same before and after. Status: Consistent.
- I
If Rahul tries to send money to two people at once, the system handles them separately. Status: Isolated.
- D
Once the screen says "Transfer Done", if the bank's power goes out, the money stays transferred. Status: Durable.
A3.5 Database Schemas (Three-Schema Architecture)
The Three-Schema Architecture
Also known as the ANSI-SPARC Architecture, this model separates the user applications from the physical database. The goal is to provide Data Independenceβallowing changes at one level without affecting others.
1. External Level
Viewer's Perspective
The User View. Describes only the part of the database that is relevant to a specific user (e.g., a student sees grades, but not teacher salaries).
2. Conceptual Level
Logical Perspective
The Community View. Defines what data is stored and the relationships between them. This is where the DBA works.
3. Physical Level
Storage Perspective
The System View. Describes how the data is actually stored on the disks (data structures, file organizations, indexing).
Data Independence
Logical Data Independence
Capacity to change the conceptual schema without having to change external schemas.
Physical Data Independence
Capacity to change the internal schema without having to change the conceptual schema.
A3.6 Validation & Verification
Data Validation
Automated checks performed by the system to ensure data is reasonable and sensible before it is accepted.
Presence Check
Ensures data is not left blank (Required fields)
Range Check
Ensures data falls within specific limits (e.g., Age: 1-120)
Length Check
Ensures correct number of characters (e.g., Password min 8 chars)
Format Check
Ensures data matches a pattern (e.g., Postcode: AA11 1AA)
Type Check
Ensures correct data type (e.g., Number vs String)
Data Verification
Checks performed to ensure data entered exactly matches the original source document.
Double Entry
The same data is entered twice (usually by two different people) and the system compares them. If they don't match, an error is flagged.
Visual Check
The user manually compares the data on the screen with the original paper document to check for any transcription errors.
β οΈ Critical Difference: Data can pass Validation (ranges, formats) but still be wrong. Only Verification ensures it matches the source truth.
The "Golden Rule" Comparison
| Scenario | Validation | Verification |
|---|---|---|
| Entering age as "150" | Fails (Range Check) | Passes (it matches input) |
| Mis-typing "John" as "Jhon" | Passes (Format OK) | Fails (Mismatch) |
| Leaving field empty | Fails (Presence) | N/A |
A3.7 Entity-Relationship Diagrams (ERD)
An ERD is a visual representation of different entities within a system and how they relate to each other. The primary components are Entities, Attributes, and Relationships.
Entity
"The Thing"
Attribute
"The Property"
Relationship
"The Connection"
Crow's Foot Notation (Cardinality)
One-to-One (1:1)
One Person has exactly one Passport.
One-to-Many (1:M)
One Mother can have many Children.
Many-to-Many (M:N)
Many Students take many Courses.
π‘ Junction Table Rule: Databases cannot handle M:N directly. We must resolve this by creating a third table (e.g., "Enrollments") to turn it into two 1:M relationships.
Final Exam Strategy
- 1Spell table/field names exactly as per question.
- 2Use UPPERCASE for SLQ keywords (SELECT, FROM).
- 3Quote strings and dates in WHERE clauses.
- 4Verify capitalization of entity names (Singular).
- 5Check for junction tables in M:N relationships.
- 6Always process 1NF β 2NF β 3NF step-by-step.