CodeHaven - Master Computer Science

Important Note

These notes cover the main points for revision. They are great for reviewing key concepts, but for in-depth understanding, always keep your textbook nearby for reference.

💡 Exam Tip: Try not to use * in SQL queries. In exams, they usually ask for specific columns, so always mention them. For example, instead ofSELECT * FROM Employees;, useSELECT EmpName, Salary FROM Employees;

Databases are structured collections of data that allow efficient storage, retrieval, and manipulation of information. This chapter covers database structure, data types, primary keys, SQL queries, and validation techniques.

A3.1 Database Fundamentals

What is a Database?

A database is a persistent, organised store of data. If you just write data to a text file, it's hard to search and sort. Databases solve this.

Persistent: Data is saved on secondary storage (HDD/SSD).
Organised: Structured in tables to avoid duplication (redundancy).

Structure: The Relational Model

Table (Entity)

A collection of data about a specific thing (e.g., Employees, Departments).

Record (Row)

A single item in the table (e.g., One specific Hero).

Field (Column)

A specific attribute (e.g., HeroName, PowerLevel).

Keys

Primary Key (PK)

A unique identifier for every record in a table.

Example: HeroID. You can't have two heroes with ID 1.

Foreign Key (FK)

A field that links to a Primary Key in another table to create a relationship.

Example: DeptID in the Employees table links to the Departments table.

Benefits of Using a Relational Database

Data Integrity

Primary and Foreign keys enforce relationships, preventing orphan records and ensuring data remains accurate and reliable.

Data Consistency

Ensures data does not contradict itself across the database. A change in one place (e.g., customer address) reflects everywhere.

Reduced Redundancy

Normalization minimizes duplicate data storage. Data is stored once and referenced, saving space and reducing update errors.

Complex Querying

Uses declarative, industry-standard languages (SQL) to allow powerful and complex data retrieval and manipulation.

Data Concurrency

Allows multiple users to access and modify data simultaneously without conflicts, essential for modern multi-user applications.

A3.4 ACID Properties

Understanding ACID

In database systems, ACID refers to a set of properties that guarantee database transactions are processed reliably. A transaction is a single logical unit of work (e.g., transferring money between accounts).

Atomicity

"All or Nothing"

A transaction must be completed in its entirety. If any part of the transaction fails, the entire transaction is rolled back to its previous state.

Consistency

"Preserving Rules"

A transaction must transform the database from one valid state to another. All predefined rules (constraints, cascades, triggers) must be followed.

Isolation

"Independence"

Transactions occur independently of one another. Even if multiple transactions happen simultaneously, the end result for each must be the same as if they had occurred sequentially.

Durability

"Permanent Storage"

Once a transaction is committed, it remains committed even in the event of a system failure (e.g., power outage or crash). Data is permanently written to non-volatile storage.

Real-World Scenario: Bank Transfer

The Transaction:

Transferring $100 from Account A to Account B.

Deduct $100 from Account A

Add $100 to Account B

How ACID protects this:

Atomicity:If power fails after step 1, the $100 is returned to Account A.
Consistency:The total amount of money in the bank ($A + $B) remains constant.

Scalability Issues in a Relational Database

As applications grow, databases face challenges in handling increased load. Common scalability issues include:

📈 Data Volume Growth

As the amount of data grows, query performance can degrade, leading to slower response times and storage management issues.

👥 Increased Concurrency

A significant increase in simultaneous users can lead to concurrency issues, such as locking conflicts and reduced throughput.

🧩 Complexity of Queries

More sophisticated analysis and complex joins consume more CPU and memory resources, potentially bottlenecking the system.

💻 Hardware Limitations

Vertical scaling (upgrading a single server) has limits. Eventually, a single server becomes overloaded and cannot be upgraded further.

⚖️ Load Balancing

Distributing requests across multiple servers (horizontal scaling) is complex in relational databases due to consistency requirements.

🔍 Inefficient Indexing

Poorly designed or missing indexes can severely affect search times, making data retrieval slow as volume increases.

📝 Exam Corner: Database Administrator (DBA)

(e) Identify tasks that are carried out by the database administrator (DBA). [2]

Mark Scheme / Answers:

Data configuring / applying patches or upgrades
Setting permissions / passwords / access rights / ensuring security
Back up / recovery / archiving
Data cleansing / consistency checks on data / remove data errors

A3.2 SQL (Structured Query Language)

🏢 Scenario: Tech Solutions Inc.

Welcome, database admin! You have been recruited to manage the database for Tech Solutions Inc.We have employees, departments, projects, and devices. Your job is to organize this data efficiently.

1. Creating the Tables (DDL)

First, we need to build the structure to hold our data. We will create two tables: Departments and Employees.

Table 1: Departments

INT: Integer numbers.
VARCHAR(50): Text up to 50 characters.

Table 2: Employees

FOREIGN KEY: Links Employees to Departments.

2. Modifying Structure (DDL)

ALTER TABLE: Changing Table Structure

What if we created the table but later realized we needed an extra column? Or maybe a column data type needs changing? We use ALTER TABLE to modify the structure of an existing table.

ADD Column

Add a new column for Employee Phone Numbers.

MODIFY Column

Increase the size of the EmpName column to 100.

DROP Column

Remove the Phone column if it's no longer needed.

3. Manipulating Data (DML)

INSERT: Adding Data

The INSERT command adds new rows to a table. Watch how our empty Employees table fills up!

Before Command (Empty)

EmpID	Name	Salary
No data...

After Command

EmpID	Name	Salary
101	Alice Smith	60000
102	Bob Jones	45000

UPDATE: Changing Data

Alice finished a big project! Let's update `Salary` from 60000 to 65000.

Before Update

EmpID	Name	Salary
101	Alice Smith	60000

After Update

EmpID	Name	Salary
101	Alice Smith	65000

DELETE: Removing Data

Charlie (EmpID 103) is leaving the company. We need to remove their record.

Before Delete

EmpID	Name
101	Alice Smith
103	Charlie
104	Dana

After Delete

EmpID	Name
101	Alice Smith
104	Dana

4. Querying Data (DQL)

Now that we have data, we need to retrieve it. To make these examples interesting, let's assume we've added a few more employees!

Current "Employees" Table Data

EmpID	EmpName	Salary	DeptID
101	Alice Smith	65000	1
102	Bob Jones	45000	1
104	Dana	80000	NULL
105	Eve	92000	2
106	Frank	55000	2

3.1 Selecting Specific Columns

Usually, we only need specific information, like just names and salaries. Listing columns is more efficient than selecting everything.

SQL Command

Result Output

EmpName	Salary
Alice Smith	65000
Bob Jones	45000
Dana	80000
Eve	92000
Frank	55000

3.2 Projection: Creating Calculated Columns

You can create new columns by performing calculations. Let's calculate a 10% bonus for each employee.

SQL Command

AS Bonus creates a new column name for the calculated value.

Result Output

EmpName	Salary	Bonus
Alice Smith	65000	6500
Bob Jones	45000	4500
Dana	80000	8000
Eve	92000	9200
Frank	55000	5500

3.3 Filtering with WHERE

We use the WHERE clause to filter records. Let's find employees with High Salaries (Salary > 50000).

SQL Command

Result Output

EmpName	Salary
Alice Smith	65000
Dana	80000
Eve	92000
Frank	55000

3.4 Pattern Matching (LIKE)

The LIKE operator searches for patterns in text. Use % for "any characters" and _ for "exactly one character".

Quick Reference:

% = Any number of characters (0 or more)

_ = Exactly one character

1. Starts with "A"

Matches: Alice Smith

2. Ends with "e"

Matches: Alice Smith

3. Contains "an"

Matches: Dana, Frank

4. Second char is "o"

Matches: Bob Jones

5. Second-last char is "t"

Matches: Alice Smith

6. Exactly 3 characters

Matches: Eve, Bob (first names)

7. Starts "D", ends "a"

Matches: Dana

8. Third char is "b"

Matches: Bob Jones

3.5 Removing Duplicates (DISTINCT)

Use DISTINCT to get unique values and remove duplicates from results.
Let's find all unique department IDs (without repeating the same department).

Without DISTINCT

DeptID
1
1
NULL
2
2

5 rows (with duplicates)

With DISTINCT

DeptID
1
2
NULL

3 rows (unique values only)

3.6 Sorting (ORDER BY)

Let's rank our employees by Salary, from Highest to Lowest (DESC).

SQL Command

Result Output

EmpName	Salary
Eve	92000
Dana	80000
Alice Smith	65000
Frank	55000
Bob Jones	45000

3.7 Aggregate Functions

Aggregate functions perform calculations on multiple rows and return a single result.

COUNTCount Rows

Result:

Total employees

SUMTotal Sum

Result:

337000

Total payroll

AVGAverage

Result:

67400

Average salary

MINMinimum

Result:

45000

Lowest salary (Bob)

MAXMaximum

Result:

92000

Highest salary (Eve)

📊 All Together

Returns all stats in one query!

5. Nested Subqueries

A subquery (or nested query) is a query within another query. We use the result of the inner query to filter or feed data into the outer query.

Example: Finding Top Earners

Let's say we want to find the names of all employees who earn more than the average salary. We don't know the exact mathematical average beforehand! We have to calculate it first using a subquery.

SQL Command

The inner query (SELECT AVG(Salary) FROM Employees) runs first and calculates the average (67400).
The outer query then finds everyone earning more than 67400.

Result Output

EmpName	Salary
Dana	80000
Eve	92000

6. Deep Dive: NULLs and Counting

What is NULL?

A NULL value means no data, unknown, or not applicable. It is NOT the same as 0 (zero) or an empty string ("").

Example: Dana has DeptID = NULL because she is a new hire and hasn't been assigned a department yet.

Finding NULLs

You cannot use = NULL. You must use IS NULL.

❌ Incorrect

Returns: No rows (empty result)

✅ Correct

Result Output

EmpName
Dana

The COUNT Mystery

Be careful! COUNT(*) behaves differently than COUNT(Column) when NULLs are involved.

1. COUNT(*)

Counts total rows, including NULLs.

2. COUNT(Column)

Counts Non-NULL values only.

A3.3 Database Normalization

What is Normalization?

Normalization is a process in database design that helps organize data into tables to reduce data redundancy and improve data integrity. The goal is to minimize data anomalies while maintaining data consistency.

The most commonly used normal forms are First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF). You progress through these forms systematically: 1NF → 2NF → 3NF.

1First Normal Form (1NF)

In 1NF, a table is organized into rows and columns, with each cell containing a single, atomic value. There should be no repeating groups, arrays, or nested structures within a cell.

Ensure that each column has a single data type
Ensure that each entry in a column contains a single, non-repeating value
Atomic - All data must be in its most basic form

2Second Normal Form (2NF)

In 2NF, the table must already be in 1NF, and:

Ensure that there are no partial dependencies
Each non-key attribute (column) should depend on the entire primary key
If a column is only partially dependent on the primary key, split the table
Non-prime - Non-prime attributes depend on the whole key

3Third Normal Form (3NF)

In 3NF, the table must already be in 2NF, and:

Eliminate transitive dependencies
Non-key attributes should not depend on other non-key attributes
If a non-key attribute depends on another non-key attribute, move it to a new table with a foreign key relationship
Prime Transitive - No transitive dependencies between non-prime attributes

🧠 Memory Aid: ANP

Acronym Breakdown:

A
Atomic (1NF)
All data stored in atomic values
N
Non-prime (2NF)
Non-prime attributes depend on the whole key
P
Prime Transitive (3NF)
No transitive dependencies

📖 Story: "Alex's New Pizza"

Alex decided to open a new pizza place, aiming to keep the business organized from the start. They named it "ANP's Pizza".

1. Atomic Ingredients (1NF): Alex ensures every ingredient is listed separately (tomato, cheese, basil).
2. Non-prime Recipes (2NF): Each pizza's unique recipe determines the specific ingredients needed.
3. Prime Transitive Orders (3NF): Customer orders are directly linked to customers, not through intermediaries.

📚 Complete Example: Student Course Enrollment

Original Table (Not in 1NF):

StudentID	StudentName	Courses
1	Zara Ali	Mathematics, Physics
2	Leo Chen	Chemistry, Biology
3	Kai Sato	Computer Science, Mathematics

❌ Problem: The "Courses" column contains multiple values (not atomic)

1First Normal Form (1NF):

Split the Courses column into separate rows for each student:

StudentID	StudentName	Course
1	Zara Ali	Mathematics
1	Zara Ali	Physics
2	Leo Chen	Chemistry
2	Leo Chen	Biology
3	Kai Sato	Computer Science
3	Kai Sato	Mathematics

✅ Each cell now contains a single, atomic value

2Second Normal Form (2NF):

Create separate tables to eliminate partial dependencies:

Table: Students

StudentID	StudentName
1	Zara Ali
2	Leo Chen
3	Kai Sato

Table: Courses

StudentID	Course
1	Mathematics
1	Physics
2	Chemistry
2	Biology
3	Computer Science
3	Mathematics

✅ No partial dependencies - StudentName depends only on StudentID

3Third Normal Form (3NF):

Create a separate table for course details to remove transitive dependencies:

Table: Students

StudentID	Name
1	Zara Ali
2	Leo Chen
3	Kai Sato

Table: CourseDetails

CourseID	CourseName
1	Mathematics
2	Physics
3	Chemistry
4	Biology
5	Computer Science

Table: StudentCourses

StudentID	CourseID
1	1
1	2
2	3
2	4
3	5
3	1

✅ No transitive dependencies - Course information is in its own table

⚠️ Database Anomalies

Anomalies are problems that can occur when you have poorly designed databases. Normalization helps prevent these issues.

Insertion Anomalies

Anomalies that occur when you encounter difficulties while adding new data to the database.

Incomplete Insertion: Can't add a new record without providing data for attributes that might not be relevant yet (e.g., can't add an order without a ship date)
Partial Dependency: Attributes depend on only part of the primary key
Redundant Data: Same information stored multiple times

Deletion Anomalies

Loss of Non-Key Information: Deleting a record may unintentionally remove other important information
Unintended Data Loss: Removing one piece of data causes loss of related data
Inconsistent Data: Partial deletions can leave the database in an inconsistent state

Update Anomalies

Data Inconsistency: Updating one instance of information may result in inconsistency if the same information is stored in multiple places
Data Dependency: Same information appears in multiple records; updating it in one place while forgetting to update it elsewhere leads to dependency issues

📝 Exam Corner: Redundant Data

(d) Outline two issues caused by storing redundant data. [4]

Mark Scheme / Answers:

May lead to update/deletion anomalies (data inconsistency).
Address changes may result in duplicate addresses (e.g., inconsistent records).
Consequence: Letters or fines may go to the wrong address.
Storing data multiple times wastes storage space.
Redundancy may slow down data retrieval and data entry.

📝 Exam Corner: Transaction Durability

Definition / Answer:

Durability ensures that transactions are saved permanently.
The user does not need to worry about the transaction being lost.
This applies even in the event of power loss, crashes, or errors.

📝 Exam Corner: Derived Fields

(e) Outline why the inclusion of a non persistent derived field will not affect the normalization of a database. [3]

Mark Scheme / Answers:

A derived field is created / temporary (does not exist in the table).
So normalization rules do not apply (not affected by redundancy).
No new dependencies are created.

🎯 Practice Question

Given the following sets, identify the functional dependencies and suggest how to normalize them:

SET A: {StudentID, StudentName, StudentAddress}
SET B: {CourseID, CourseName, CourseDescription}
SET C: {EnrollmentID, StudentID, CourseID, EnrollDate}

Key Functional Dependencies:

EnrollmentID → StudentID, EnrollDate
EnrollmentID → CourseID
StudentID → StudentName, StudentAddress
CourseID → CourseName, CourseDescription

A3.4 ACID Properties

In a database management system, a transaction is a single logical unit of work. To ensure the reliability and integrity of these transactions, databases must follow the ACID rules.

AAtomicity

The "All or Nothing" rule. A transaction consists of multiple steps, and atomicity ensures that either all steps are completed successfully, or none of them are. If any part fails, the entire transaction is rolled back to its original state.

Example: Transferring money. If the money is deducted from Account A but the system crashes before adding it to Account B, the deduction is undone.

CConsistency

Ensures the database moves from one valid state to another, maintaining all predefined rules (constraints, cascades, triggers). It guarantees that no transaction can leave the database in an illegal state.

Example: In a banking system, the total amount of money across all accounts must remain the same before and after a transfer.

IIsolation

Transactions occur independently and concurrently without interference. One transaction should not see the intermediate results of another unfinished transaction. The final result must be the same as if transactions were executed one after another.

Example: Two people booking the same seat on a plane simultaneously. Isolation ensures only one successful booking is recorded.

DDurability

Once a transaction has been committed, it remains permanent, even in the case of a system failure, power loss, or crash. Changes are written to non-volatile storage (like a hard drive).

Example: After you see "Transaction Successful" on an ATM, your balance is saved forever, even if the bank's power goes out.

The ACID Bank Scenario

Imagine Rahul is transferring ₹5,000 to Priya. Here is how ACID works:

A
If the deduction from Rahul succeeds but the addition to Priya fails, the system undoes Rahul's deduction. Status: Atomic.
C
The total sum of Rahul's and Priya's accounts remains the same before and after. Status: Consistent.
I
If Rahul tries to send money to two people at once, the system handles them separately. Status: Isolated.
D
Once the screen says "Transfer Done", if the bank's power goes out, the money stays transferred. Status: Durable.

A3.5 Database Schemas (Three-Schema Architecture)

The Three-Schema Architecture

Also known as the ANSI-SPARC Architecture, this model separates the user applications from the physical database. The goal is to provide Data Independence—allowing changes at one level without affecting others.

1. External Level

Viewer's Perspective

The User View. Describes only the part of the database that is relevant to a specific user (e.g., a student sees grades, but not teacher salaries).

2. Conceptual Level

Logical Perspective

The Community View. Defines what data is stored and the relationships between them. This is where the DBA works.

3. Physical Level

Storage Perspective

The System View. Describes how the data is actually stored on the disks (data structures, file organizations, indexing).

Data Independence

Logical Data Independence

Capacity to change the conceptual schema without having to change external schemas.

Physical Data Independence

Capacity to change the internal schema without having to change the conceptual schema.

A3.6 Validation & Verification

Data Validation

Automated checks performed by the system to ensure data is reasonable and sensible before it is accepted.

Presence Check

Ensures data is not left blank (Required fields)

Range Check

Ensures data falls within specific limits (e.g., Age: 1-120)

Length Check

Ensures correct number of characters (e.g., Password min 8 chars)

Format Check

Ensures data matches a pattern (e.g., Postcode: AA11 1AA)

Type Check

Ensures correct data type (e.g., Number vs String)

Data Verification

Checks performed to ensure data entered exactly matches the original source document.

Double Entry

The same data is entered twice (usually by two different people) and the system compares them. If they don't match, an error is flagged.

Visual Check

The user manually compares the data on the screen with the original paper document to check for any transcription errors.

⚠️ Critical Difference: Data can pass Validation (ranges, formats) but still be wrong. Only Verification ensures it matches the source truth.

The "Golden Rule" Comparison

Scenario	Validation	Verification
Entering age as "150"	Fails (Range Check)	Passes (it matches input)
Mis-typing "John" as "Jhon"	Passes (Format OK)	Fails (Mismatch)
Leaving field empty	Fails (Presence)	N/A

A3.7 Entity-Relationship Diagrams (ERD)

An ERD is a visual representation of different entities within a system and how they relate to each other. The primary components are Entities, Attributes, and Relationships.

Entity

"The Thing"

Attribute

"The Property"

Relate

Relationship

"The Connection"

Crow's Foot Notation (Cardinality)

One-to-One (1:1)

One Person has exactly one Passport.

One-to-Many (1:M)

Many

One Mother can have many Children.

Many-to-Many (M:N)

Many Students take many Courses.

💡 Junction Table Rule: Databases cannot handle M:N directly. We must resolve this by creating a third table (e.g., "Enrollments") to turn it into two 1:M relationships.

Final Exam Strategy

1Spell table/field names exactly as per question.
2Use UPPERCASE for SLQ keywords (SELECT, FROM).
3Quote strings and dates in WHERE clauses.

4Verify capitalization of entity names (Singular).
5Check for junction tables in M:N relationships.
6Always process 1NF → 2NF → 3NF step-by-step.

A3. Databases

Important Note

Table of Contents

A3.1 Database Fundamentals

A3.2 SQL (Structured Query Language)

A3.3 Database Normalization

A3.4 ACID Properties

A3.5 Database Schemas

A3.6 Validation & Verification

A3.7 ERD & Crow's Foot

A3.1 Database Fundamentals

What is a Database?

Structure: The Relational Model

Table (Entity)

Record (Row)

Field (Column)

Keys

Primary Key (PK)

Foreign Key (FK)

Benefits of Using a Relational Database

Data Integrity

Data Consistency

Reduced Redundancy

Complex Querying

Data Concurrency

A3.4 ACID Properties

Understanding ACID

Atomicity

Consistency

Isolation

Durability

Real-World Scenario: Bank Transfer

The Transaction:

How ACID protects this:

Scalability Issues in a Relational Database

📈 Data Volume Growth

👥 Increased Concurrency

🧩 Complexity of Queries

💻 Hardware Limitations

⚖️ Load Balancing

🔍 Inefficient Indexing

📝 Exam Corner: Database Administrator (DBA)

A3.2 SQL (Structured Query Language)

🏢 Scenario: Tech Solutions Inc.

1. Creating the Tables (DDL)

Table 1: Departments

Table 2: Employees

2. Modifying Structure (DDL)

ALTER TABLE: Changing Table Structure

ADD Column

MODIFY Column

DROP Column

3. Manipulating Data (DML)

INSERT: Adding Data

Before Command (Empty)

After Command

UPDATE: Changing Data

Before Update

After Update

DELETE: Removing Data

Before Delete

After Delete

4. Querying Data (DQL)

Current "Employees" Table Data

3.1 Selecting Specific Columns

SQL Command

Result Output

3.2 Projection: Creating Calculated Columns

SQL Command

Result Output

3.3 Filtering with WHERE

SQL Command

Result Output

3.4 Pattern Matching (LIKE)

Quick Reference:

3.5 Removing Duplicates (DISTINCT)

Without DISTINCT

With DISTINCT

3.6 Sorting (ORDER BY)

SQL Command