Table of Contents

1 Rational Algebra

1.1 Basic

  • select \(\sigma_{P}(r)\)
  • project \(\Pi_{S}(r)\)
  • rename \(\rho_{x(A_1,A_2,...,A_n)}(r)\)
  • union \(r\cup s\)
  • difference \(r-s\)
  • cartesian-product \(r\times s\)

1.2 Addition

  • intersection \(r\cap s = r-(r-s)\)
  • natual join \(r\Join s = \Pi_{R\cup S}(\sigma_{r.A_1=s.A_1 \land ...}(r\times s))\)
    • theta join \(r\Join_{\theta}s = \sigma_{\theta}(r\times s)\)
  • devision

    \[temp1 \leftarrow \Pi_{R-S}(r)\] \[temp2 \leftarrow \Pi_{R-S}((temp1\times s) - \Pi_{R-S,S}(r))\] \[result = temp1 - temp2\]

    Table 1: R
    Student Task
    Peter Database1
    Peter Database2
    Peter Compiler1
    Sara Database1
    Sara Database2
    Mary Database1
    Mary Comipler1
    Table 2: S
    Task
    Database1
    Database2
    Table 3: R÷ S
    Student
    Peter
    Sara
  • aggregation \(group_column\zeta_{aggre\_func(column)}(r)\)

1.3 Modification

  • delete \(r\leftarrow r - E\)
  • insert \(r\leftarrow r\cup E\)
  • update \(r\leftarrow \Pi_{F_1,F_2,...,F_n}(r)\)

2 SQL Language

2.1 Command

2.1.1 CREATE & DROP & ALTER

CREATE DATABASE dbname;
USE DATABASE dbname;
CREATE TABLE table_name (column_name TYPE Constrains, ... );
DESC table_name;
DROP TABLE;
ALTER TABLE table_name

ADD COLUMN columnName ...
ADD PRIMARY KEY (columnName)
RENAME TO tableNewName
CHANGE COLUMN columnOldName columnNewName TYPE ...
MODIFY COLUMN columnName TYPE...
DROP COLUMN columnName

2.1.2 INSERT & UPDATE & DELETE

INSERT INTO tableName [(columnName1, columnName2, ...)] VALUES ('value1', 'value2', ...);
UPDATE tableName SET columnName1 = 'value1', columnName2 = 'value2' WHERE expr;
DELETE FROM tableName WHERE expr;

2.1.3 Other Keywords

  • REGEXP pattern
  • IN ('value1', 'value2', …)
  • columnName BETWEEN value1 and value2

    Equivalent to "columnName > value1 and columnName < value2"

  • NOT

    When 'NOT' use with 'BETWEEN' and 'LIKE', 'NOT' must follow with 'WHERE' or 'AND/OR'.
    'NOT IN' is an exception. "IS NOT NULL" also.

  • SHOW

    SHOW CREATE TABLE tableName;
    SHOW COLUMNS FROM tableName;
    SHOW INDEX FROM tableName;
    SHOW WARNINGS;
    
  • FIRST, LAST, BEFORE, AFTER, SECOND…
  • CASE

    UPDATE tableName SET columnName =
    CASE
      WHEN column_1 = somevalue1
        THEN newValue;
    
  • ORDER BY

    ORDER BY columnName [ASC/DESC]

  • SUM, AVG, MAX, MIN, COUNT

    • match with GROUPBY

    E.g:SUM(columnName) … GROUP BY columnName

  • GROUP BY

    remove the duplicates

    • SELECT columnName1, columnName2 FROM tableName GROUP BY columnName2
  • Having

    The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions. E.g: HAVING count(columnName) > 5

  • EXISTS, NOT EXISTS are always using in corelated subquery.
  • UNION
    • Suppress the duplicates by default. UNION ALL can keep the duplicates.
  • WITH

    Define temporary view

    WITH temp_view_name(columnName...) as
         select statement
    SELECT ...
    FROM temp_view_name
    WHERE ...
    
  • RECURSIVE

    • CREAT RECURSIVE VIEW
    • WITH RECURSIVE
    WITH RECURSIVE empl(employee_name, manager_name) as (
         SELECT employee_name, manager_name
         FROM manager
        UNION
         SELECT manager.employee_name,empl.manager_name
         FROM manager, empl
         WHERE manager.manager_name = empl.employee_name
    )
    SELECT *
    FROM empl
    
  • GRANT & REVOKE
    • GRANT statement ON table TO who
    • REVOKE statement ON table FROM who

2.2 Datatype

CHAR, VARCHAR, BLOB, INT, DEC, DATE, DATETIME

2.3 Join

2.3.1 Overview

sqljoins.png

2.3.2 Inner Join

An inner join is just a cartesian join with
some result rows removed by a condition in the query.

select * from table_a a
inner join * from table_b b

2.4 Subquery

2.4.1 Noncorrelated Subquery

A subquery that stands alone and doesn't reference
anything from the outer query.

RDBMS will excute inner first, then excute outer.

2.4.2 Correlated Subquery

A subquery that relies on values returned from
the outer query.(Slow)

3 Design

3.1 Steps

  1. Find the one thing need to be described.
  2. List necessary information about this thing.(Depends on how to use this table)
  3. Break down the information into pieces .

3.2 Schema Pattern

3.2.1 One to One

  1. When to use
    1. Allow you to write fast queries.
    2. If you have a column containing values you don't yet know.Iso NULL value.
    3. Make data less accessible.
    4. To store a large piece of data, like BLOB.

3.2.2 One to Many

3.2.3 Many to Many

use junction table.

3.3 Functional Dependency

3.3.1 Definition

\[\forall t,u \in R\ (t[\bar A]= u[\bar A]) \to (t[\bar B] = u[\bar B])\]

3.3.2 Functional Dependency

If we have a functional dependency \(\bar A \to \bar B\),

  • Trivial

    \(\bar B \subseteq \bar A\) and \(\bar A \to \bar A \cup \bar B\) elso.

  • Nontrivial

    \(\bar B \nsubseteq \bar A\)

  • Completely nontrivial

    \(\bar A \cap \bar B = \emptyset\)

  1. Partial Functional Dependency

    When a non-key column is dependent on some, but not all, of the composite PK.

  2. Transitive Functional Dependency

    When any non-key column is dependent on any of the other non-key columns.

3.3.3 Rules For FD

  • Splitting rule

    If \(\bar A \to B_1, B_2\) , then \(\bar A \to B_1\ \bar A \to B_2\)

  • Combining rule

    If \(\bar A \to B_1\ \bar A \to B_2\), then \(\bar A \to B_1, B_2\)

  • Transitive rule

    If \(\bar A \to \bar B\) and \(\bar B \to \bar C\), then \(\bar A \to \bar C\)

3.3.4 Closure of Attributes

Given relation, FDs, set of attributes \(\bar A\)
Find all B such that \(\bar A \to B\) .

3.4 Multivalued Dependency

3.4.1 Definition

\(\forall t, u\in R:\ t[\bar A] = u[\bar A]\) then
\(\exists v \in R:\)
\(v[\bar A] = t[\bar A]\) and
\(v[\bar B] = t[\bar B]\) and
\(v[rest] = u[rest]\)

MVD says, If two tuples have same value for \(\bar A\),
then we have every combination for \(\bar B\) value and the rest.

tuple \(\bar A\) \(\bar B\) rest
t \(\bar a\) \(\bar b_1\) \(\bar r_1\)
u \(\bar a\) \(\bar b_2\) \(\bar r_2\)
v \(\bar a\) \(\bar b_1\) \(\bar r_2\)

Note that, there aslo must exist w:

w \(\bar a\) \(\bar b_2\) \(\bar r_1\)
  • Trivial MVDs

    \(\bar B\subseteq \bar A\) or \(\bar A\cup \bar B = all\ attributes\)
    always satisfied MVD.

    E.g. for first case, Consider \(\bar{AB} \twoheadrightarrow \bar B\).

  • Nontrivial

    otherwise.

MVD is a tuple-generating dependency.

3.4.2 Rules For MVD

  • FD is a MVD
  • Intersection rule

    If \((\bar A\twoheadrightarrow \bar B) \land (\bar A\twoheadrightarrow \bar C)\) , then \(\bar A\twoheadrightarrow \bar B\cap \bar C\) .

  • Transitive rule

    If \((\bar A\twoheadrightarrow \bar B) \land (\bar B\twoheadrightarrow \bar C)\) , then \(\bar A\twoheadrightarrow \bar C - \bar B\) .

4 Normalization

4.1 1NF

Data in your column is atomic if it's been broken
down into the smallest pieces that you need.

  • Rule 1:

    A column with atomic data can't have several values of
    the same type of data in that column.
    One example obeys the rule 1:

    foodname ingredients
    bread flour, milk, egg, yeast, oil
    salad lettuce, tomato, cucumber
  • Rule 2:

    A table with atomic data can't have multiple columns
    with the same type of data.

    teacher student1 student2
    Ms.Mary Joe Ron

4.2 2NF

  • Rule 1: Be in 1NF
  • Rule 2: Have no partial functional dependencies.

4.3 3NF

  • Rule 1: Be in 2NF
  • Rule 2: Have no transitive dependencies.

4.4 Boyce-Codd Normal Form(BCNF, 3.5NF)

FD leads to the BCNF.

  • Definition

    Relation R with FDs is in BCNF if:
    For each nontrivial \(\bar A\to B\), \(\bar A\) is a key.

4.4.1 Validation Example

R(A, B, C, D)
FDs: \(AC\to D,\ D\to A,\ D\to C,\ D\to B\)
For every \(\bar{left}\) can determine all the attributes.

4.5 4NF

  • Definition

    Relation R with MVDs is in 4NF if:
    For each nontrivial \(\bar A\twoheadrightarrow \bar B\), A is a key.

4NF is in BCNF.

5 Subclasses

5.1 Complete vs. Incomplete(Partial)

Complete: Every object is in at least one subclass.

5.2 Overlapping vs. Disjoint(Exclusive)

Overlapping: One object is in two+ subclasses.

5.3 How to design?

3 choices:

  1. Subclass relations contain superclass key + specialized attrs
  2. Subclass relations contain all attributes
  3. One relation containing all superclass + subclass attrs

Best translation may depend on properties:
Heavily overlapping -> design 3
Disjoint, complete ->design 2

Examples:

  1. S(K_, A), S1(K_, B), S2(K_, C)
  2. S(K_, A), S1(K_, A, B), S2(K_, A, C)
  3. S(K_, A, B, C)

6 Constraints

6.1 Motivation

Constrain allowable database states.(static)

6.2 Syntax

Major keywords: PK, FK, UNIQUE, CHECK

  • Examples:
Create ...
{
    columnName type CHECK (columnName IN ('value1', 'value2'));
}

ADD CONSTRAINT CHECK columnName > 1;

CHECK 'A' = SUBSTRING(columnName, 1, 1);

6.3 Foreign Key

6.3.1 Facts

  1. A FK can have different name than the parent key.
  2. FK values can be NULL.
  3. We can make sure a FK contains a meaningful value by using a constraint .
  4. The FK doesn't have to be the primary key of the parent table, but it must be unique.

6.3.2 Creation

CREATE TABLE tableName
(
    ...
    columnName TYPE NOT NULL,
    [CONSTRAINT constraint_name,]
    FOREIGN KEY (foreign_key_name)
    REFERENCES parent_tableName (parent_columnName)
)

You can name constraintname and foreignkeyname whatever you like.

7 Triggers

7.1 Motivation

  • To enforce constraints(Dynamic)
  • Move logic from apps into DBMS

7.2 Usage

  • Event-Condition-Action Rules

    When event occurs, check condition; if true, do action.

  • Syntax

    Create Trigger name
    Before|After|Instead of events
    [referencing-variables]
    [For Each Row]
    [when (condition)]
    action

    • events

      insert on T
      delete on T
      update [of C1,…,Cn] on T

    • [For Each Row]

      Determines whether the trigger is row-level or statement-level

    • referencing-variables

      Depends on [For Each Row]
      old row as var
      new row as var
      old table as var
      new table as var

    • condition

      In when or where clause depends on the SQL Implementation.

8 Indexes

8.1 Usage

Different between full table scans and immediate location of tuples.

8.2 Underlying Data Structures

  • Balanced trees (B tree, B+ tree)

    When uses ">, <, >=, <=" in query.

  • Hashtable

    When uses "=" in query.

8.3 SQL Syntax

CREATE INDEX IndexName ON T(A1,A2...)
CREATE UNIQUE INDEX ...
DROP INDEX IndexName

8.4 Downsides

  • Extra space
  • Index creation
  • Index maintenance(Important)

    When updates database, indexes will also be updated.

8.5 Upsides

Benefits depends on:

  • Data distributions
  • Query vs. update load
  • Size of table(and possibly layout)

8.6 Physical Design Advisors

  • Input (database statistics and workload)
  • Output (recommended indexes)

QueryOptimizer.png

9 Transaction

9.1 Motivation

  • Concurrent database access
  • Resilience to system failures

9.2 Properties

  • A(Atomicity)

    Each transaction is "all-or-nothing", never left half done.

  • C(Consistency)

    Can assume all constrants hold when transaction begins.
    Must guarantee all constraints hold when transaction ends.
    Serializability -> constraints always hold

  • I(Isolation)

    Serializability: Execution must be equivalent to some
    sequential(serial) order of all transactions.(e.g. T9, T1, T2, T3, …)

  • D(Durability)

    If system crashes after transaction commits,
    all effects of transaction remain in database.

9.3 Isolation levels

  • dirty data: written by an uncommitted transaction
  • nonrepeatable reads: an item read multiple times cannot change values

    T1:    Update Student Set GPA=(1.1)*GPA
    T2.S1: Select AVG(GPA) From Student
    T2.S2: Select MAX(GPA) From Student
    

    T2.S1 may excute before T1, T2.S2 may excute after T1.
    The GPAs in S1 and S2 are different, leads to a nonrepatable reads violation.

  • phantoms

    T1:    Insert Into Student [100 new tuples]
    T2.S1: Select AVG(GPA) From Student
    T2.S2: Select MAX(GPA) From Student
    

    T2.S1 may excute before T1, T2.S2 may excute after T1.
    [100 new tuples] are known as the phantoms tuples.

levels dirty reads nonrepeatable reads phantoms
Read Uncommitted Y Y Y
Read Committed N Y Y
Repeatable Read N N Y
Serializable N N N
  • Standard default: Serializable
  • Some systems have default Repeatable Read