CodeHaven - Master Computer Science

Important Note

These notes cover the main points for revision. They are great for reviewing key concepts, but for in-depth understanding, always keep your textbook nearby for reference.

💡 Exam Tip: Try not to use * in SQL queries. In exams, they usually ask for specific columns, so always mention them. For example, instead ofSELECT * FROM Employees;, useSELECT EmpName, Salary FROM Employees;

Databases are structured collections of data that allow efficient storage, retrieval, and manipulation of information. This chapter covers database structure, data types, primary keys, SQL queries, and validation techniques.

A3.1 Database Fundamentals

What is a Database?

A database is a persistent, organised store of data. If you just write data to a text file, it's hard to search and sort. Databases solve this.

Persistent: Data is saved on secondary storage (HDD/SSD).
Organised: Structured in tables to avoid duplication (redundancy).

Structure: The Relational Model

Table (Entity)

A collection of data about a specific thing (e.g., Employees, Departments).

Record (Row)

A single item in the table (e.g., One specific Hero).

Field (Column)

A specific attribute (e.g., HeroName, PowerLevel).

Keys

Primary Key (PK)

A unique identifier for every record in a table.

Example: HeroID. You can't have two heroes with ID 1.

Foreign Key (FK)

A field that links to a Primary Key in another table to create a relationship.

Example: DeptID in the Employees table links to the Departments table.

Benefits of Using a Relational Database

Data Integrity

Primary and Foreign keys enforce relationships, preventing orphan records and ensuring data remains accurate and reliable.

Data Consistency

Ensures data does not contradict itself across the database. A change in one place (e.g., customer address) reflects everywhere.

Reduced Redundancy

Normalization minimizes duplicate data storage. Data is stored once and referenced, saving space and reducing update errors.

Complex Querying

Uses declarative, industry-standard languages (SQL) to allow powerful and complex data retrieval and manipulation.

Data Concurrency

Allows multiple users to access and modify data simultaneously without conflicts, essential for modern multi-user applications.

Scalability Issues in a Relational Database

As applications grow, databases face challenges in handling increased load. Common scalability issues include:

📈 Data Volume Growth

As the amount of data grows, query performance can degrade, leading to slower response times and storage management issues.

👥 Increased Concurrency

A significant increase in simultaneous users can lead to concurrency issues, such as locking conflicts and reduced throughput.

🧩 Complexity of Queries

More sophisticated analysis and complex joins consume more CPU and memory resources, potentially bottlenecking the system.

💻 Hardware Limitations

Vertical scaling (upgrading a single server) has limits. Eventually, a single server becomes overloaded and cannot be upgraded further.

⚖️ Load Balancing

Distributing requests across multiple servers (horizontal scaling) is complex in relational databases due to consistency requirements.

🔍 Inefficient Indexing

Poorly designed or missing indexes can severely affect search times, making data retrieval slow as volume increases.

📝 Exam Corner: Database Administrator (DBA)

(e) Identify tasks that are carried out by the database administrator (DBA). [2]

Mark Scheme / Answers:

Data configuring / applying patches or upgrades
Setting permissions / passwords / access rights / ensuring security
Back up / recovery / archiving
Data cleansing / consistency checks on data / remove data errors

A3.2 SQL (Structured Query Language)

🏢 Scenario: Tech Solutions Inc.

Welcome, database admin! You have been recruited to manage the database for Tech Solutions Inc.We have employees, departments, projects, and devices. Your job is to organize this data efficiently.

1. Creating the Tables (DDL)

First, we need to build the structure to hold our data. We will create two tables: Departments and Employees.

Table 1: Departments

INT: Integer numbers.
VARCHAR(50): Text up to 50 characters.

Table 2: Employees

FOREIGN KEY: Links Employees to Departments.

2. Manipulating Data (DML)

INSERT: Adding Data

The INSERT command adds new rows to a table. Watch how our empty Employees table fills up!

Before Command (Empty)

EmpID	Name	Salary
No data...

After Command

EmpID	Name	Salary
101	Alice Smith	60000
102	Bob Jones	45000

UPDATE: Changing Data

Alice finished a big project! Let's update `Salary` from 60000 to 65000.

Before Update

EmpID	Name	Salary
101	Alice Smith	60000

After Update

EmpID	Name	Salary
101	Alice Smith	65000

DELETE: Removing Data

Charlie (EmpID 103) is leaving the company. We need to remove their record.

Before Delete

EmpID	Name
101	Alice Smith
103	Charlie
104	Dana

After Delete

EmpID	Name
101	Alice Smith
104	Dana

3. Querying Data (DQL)

Now that we have data, we need to retrieve it. To make these examples interesting, let's assume we've added a few more employees!

Current "Employees" Table Data

EmpID	EmpName	Salary	DeptID
101	Alice Smith	65000	1
102	Bob Jones	45000	1
104	Dana	80000	NULL
105	Eve	92000	2
106	Frank	55000	2

3.1 Selecting Specific Columns

Usually, we only need specific information, like just names and salaries. Listing columns is more efficient than selecting everything.

SQL Command

Result Output

EmpName	Salary
Alice Smith	65000
Bob Jones	45000
Dana	80000
Eve	92000
Frank	55000

3.2 Projection: Creating Calculated Columns

You can create new columns by performing calculations. Let's calculate a 10% bonus for each employee.

SQL Command

AS Bonus creates a new column name for the calculated value.

Result Output

EmpName	Salary	Bonus
Alice Smith	65000	6500
Bob Jones	45000	4500
Dana	80000	8000
Eve	92000	9200
Frank	55000	5500

3.3 Filtering with WHERE

We use the WHERE clause to filter records. Let's find employees with High Salaries (Salary > 50000).

SQL Command

Result Output

EmpName	Salary
Alice Smith	65000
Dana	80000
Eve	92000
Frank	55000

3.4 Pattern Matching (LIKE)

The LIKE operator searches for patterns in text. Use % for "any characters" and _ for "exactly one character".

Quick Reference:

% = Any number of characters (0 or more)

_ = Exactly one character

1. Starts with "A"

Matches: Alice Smith

2. Ends with "e"

Matches: Alice Smith

3. Contains "an"

Matches: Dana, Frank

4. Second char is "o"

Matches: Bob Jones

5. Second-last char is "t"

Matches: Alice Smith

6. Exactly 3 characters

Matches: Eve, Bob (first names)

7. Starts "D", ends "a"

Matches: Dana

8. Third char is "b"

Matches: Bob Jones

3.5 Removing Duplicates (DISTINCT)

Use DISTINCT to get unique values and remove duplicates from results.
Let's find all unique department IDs (without repeating the same department).

Without DISTINCT

DeptID
1
1
NULL
2
2

5 rows (with duplicates)

With DISTINCT

DeptID
1
2
NULL

3 rows (unique values only)

3.6 Sorting (ORDER BY)

Let's rank our employees by Salary, from Highest to Lowest (DESC).

SQL Command

Result Output

EmpName	Salary
Eve	92000
Dana	80000
Alice Smith	65000
Frank	55000
Bob Jones	45000

3.7 Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result.

COUNTCount Rows

Result:

Total employees

SUMTotal Sum

Result:

337000

Total payroll

AVGAverage

Result:

67400

Average salary

MINMinimum

Result:

45000

Lowest salary (Bob)

MAXMaximum

Result:

92000

Highest salary (Eve)

📊 All Together

Returns all stats in one query!

4. Deep Dive: NULLs and Counting

What is NULL?

A NULL value means no data, unknown, or not applicable. It is NOT the same as 0 (zero) or an empty string ("").

Example: Dana has DeptID = NULL because she is a new hire and hasn't been assigned a department yet.

Finding NULLs

You cannot use = NULL. You must use IS NULL.

❌ Incorrect

Returns: No rows (empty result)

✅ Correct

Result Output

EmpName
Dana

The COUNT Mystery

Be careful! COUNT(*) behaves differently than COUNT(Column) when NULLs are involved.

1. COUNT(*)

Counts total rows, including NULLs.

2. COUNT(Column)

Counts Non-NULL values only.

A3.3 Database Normalization

What is Normalization?

Normalization is a process in database design that helps organize data into tables to reduce data redundancy and improve data integrity. The goal is to minimize data anomalies while maintaining data consistency.

The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). You progress through these forms systematically: 1NF → 2NF → 3NF.

1First Normal Form (1NF)

In 1NF, a table is organized into rows and columns, with each cell containing a single, atomic value. There should be no repeating groups, arrays, or nested structures within a cell.

Ensure that each column has a single data type
Ensure that each entry in a column contains a single, non-repeating value
Atomic - All data must be in its most basic form

2Second Normal Form (2NF)

In 2NF, the table must already be in 1NF, and:

Ensure that there are no partial dependencies
Each non-key attribute (column) should depend on the entire primary key
If a column is only partially dependent on the primary key, split the table
Non-prime - Non-prime attributes depend on the whole key

3Third Normal Form (3NF)

In 3NF, the table must already be in 2NF, and:

Eliminate transitive dependencies
Non-key attributes should not depend on other non-key attributes
If a non-key attribute depends on another non-key attribute, move it to a new table with a foreign key relationship
Prime Transitive - No transitive dependencies between non-prime attributes

🧠 Memory Aid: ANP

Acronym Breakdown:

A
Atomic (1NF)
All data stored in atomic values
N
Non-prime (2NF)
Non-prime attributes depend on the whole key
P
Prime Transitive (3NF)
No transitive dependencies

📖 Story: "Alex's New Pizza"

Alex decided to open a new pizza place, aiming to keep the business organized from the start. They named it "ANP's Pizza".

1. Atomic Ingredients (1NF): Alex ensures every ingredient is listed separately (tomato, cheese, basil).
2. Non-prime Recipes (2NF): Each pizza's unique recipe determines the specific ingredients needed.
3. Prime Transitive Orders (3NF): Customer orders are directly linked to customers, not through intermediaries.

📚 Complete Example: Student Course Enrollment

Original Table (Not in 1NF):

StudentID	StudentName	Courses
1	Zara Ali	Mathematics, Physics
2	Leo Chen	Chemistry, Biology
3	Kai Sato	Computer Science, Mathematics

❌ Problem: The "Courses" column contains multiple values (not atomic)

1First Normal Form (1NF):

Split the Courses column into separate rows for each student:

StudentID	StudentName	Course
1	Zara Ali	Mathematics
1	Zara Ali	Physics
2	Leo Chen	Chemistry
2	Leo Chen	Biology
3	Kai Sato	Computer Science
3	Kai Sato	Mathematics

✅ Each cell now contains a single, atomic value

2Second Normal Form (2NF):

Create separate tables to eliminate partial dependencies:

Table: Students

StudentID	StudentName
1	Zara Ali
2	Leo Chen
3	Kai Sato

Table: Courses

StudentID	Course
1	Mathematics
1	Physics
2	Chemistry
2	Biology
3	Computer Science
3	Mathematics

✅ No partial dependencies - StudentName depends only on StudentID

3Third Normal Form (3NF):

Create a separate table for course details to remove transitive dependencies:

Table: Students

StudentID	Name
1	Zara Ali
2	Leo Chen
3	Kai Sato

Table: CourseDetails

CourseID	CourseName
1	Mathematics
2	Physics
3	Chemistry
4	Biology
5	Computer Science

Table: StudentCourses

StudentID	CourseID
1	1
1	2
2	3
2	4
3	5
3	1

✅ No transitive dependencies - Course information is in its own table

⚠️ Database Anomalies

Anomalies are problems that can occur when you have poorly designed databases. Normalization helps prevent these issues.

Insertion Anomalies

Anomalies that occur when you encounter difficulties while adding new data to the database.

Incomplete Insertion: Can't add a new record without providing data for attributes that might not be relevant yet (e.g., can't add an order without a ship date)
Partial Dependency: Attributes depend on only part of the primary key
Redundant Data: Same information stored multiple times

Deletion Anomalies

Loss of Non-Key Information: Deleting a record may unintentionally remove other important information
Unintended Data Loss: Removing one piece of data causes loss of related data
Inconsistent Data: Partial deletions can leave the database in an inconsistent state

Update Anomalies

Data Inconsistency: Updating one instance of information may result in inconsistency if the same information is stored in multiple places
Data Dependency: Same information appears in multiple records; updating it in one place while forgetting to update it elsewhere leads to dependency issues

📝 Exam Corner: Redundant Data

(d) Outline two issues caused by storing redundant data. [4]

Mark Scheme / Answers:

May lead to update/deletion anomalies (data inconsistency).
Address changes may result in duplicate addresses (e.g., inconsistent records).
Consequence: Letters or fines may go to the wrong address.
Storing data multiple times wastes storage space.
Redundancy may slow down data retrieval and data entry.

📝 Exam Corner: Transaction Durability

Definition / Answer:

Durability ensures that transactions are saved permanently.
The user does not need to worry about the transaction being lost.
This applies even in the event of power loss, crashes, or errors.

📝 Exam Corner: Derived Fields

(e) Outline why the inclusion of a non persistent derived field will not affect the normalization of a database. [3]

Mark Scheme / Answers:

A derived field is created / temporary (does not exist in the table).
So normalization rules do not apply (not affected by redundancy).
No new dependencies are created.

🎯 Practice Question

Given the following sets, identify the functional dependencies and suggest how to normalize them:

SET A: {StudentID, StudentName, StudentAddress}
SET B: {CourseID, CourseName, CourseDescription}
SET C: {EnrollmentID, StudentID, CourseID, EnrollDate}

Key Functional Dependencies:

EnrollmentID → StudentID, EnrollDate
EnrollmentID → CourseID
StudentID → StudentName, StudentAddress
CourseID → CourseName, CourseDescription

📜 Final Exam Checklist

Always spell table and field names exactly as given in the question.
Write SQL commands in UPPERCASE (e.g., SELECT, FROM) to make your answer clear.
Check your WHERE clauses carefully. dates usually need quotes like "2023-12-25".
If asked to list names, check if they want FirstName, LastName, or both!
For normalization questions, always progress through 1NF → 2NF → 3NF systematically.
Remember ANP (Atomic, Non-prime, Prime Transitive) for the normal forms!

A3. Databases

Important Note

Table of Contents

A3.1 Database Fundamentals

A3.2 SQL (Structured Query Language)

A3.3 Database Normalization

A3.1 Database Fundamentals

What is a Database?

Structure: The Relational Model

Table (Entity)

Record (Row)

Field (Column)

Keys

Primary Key (PK)

Foreign Key (FK)

Benefits of Using a Relational Database

Data Integrity

Data Consistency

Reduced Redundancy

Complex Querying

Data Concurrency

Scalability Issues in a Relational Database

📈 Data Volume Growth

👥 Increased Concurrency

🧩 Complexity of Queries

💻 Hardware Limitations

⚖️ Load Balancing

🔍 Inefficient Indexing

📝 Exam Corner: Database Administrator (DBA)

A3.2 SQL (Structured Query Language)

🏢 Scenario: Tech Solutions Inc.

1. Creating the Tables (DDL)

Table 1: Departments

Table 2: Employees

2. Manipulating Data (DML)

INSERT: Adding Data

Before Command (Empty)

After Command

UPDATE: Changing Data

Before Update

After Update

DELETE: Removing Data

Before Delete

After Delete

3. Querying Data (DQL)

Current "Employees" Table Data

3.1 Selecting Specific Columns

SQL Command

Result Output

3.2 Projection: Creating Calculated Columns

SQL Command

Result Output

3.3 Filtering with WHERE

SQL Command

Result Output

3.4 Pattern Matching (LIKE)

Quick Reference:

3.5 Removing Duplicates (DISTINCT)

Without DISTINCT

With DISTINCT

3.6 Sorting (ORDER BY)

SQL Command

Result Output

3.7 Aggregate Functions

COUNTCount Rows

SUMTotal Sum

AVGAverage

MINMinimum

MAXMaximum

📊 All Together

4. Deep Dive: NULLs and Counting

What is NULL?

Finding NULLs

Result Output

The COUNT Mystery

A3.3 Database Normalization

What is Normalization?

1First Normal Form (1NF)

2Second Normal Form (2NF)

3Third Normal Form (3NF)