Skip to content Skip to navigation

Connexions

You are here: Home » Content » Relational Design Theory

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • VOCW

    This module is included inLens: Vietnam OpenCourseWare's Lens
    By: Vietnam OpenCourseWare

    Click the "VOCW" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.
 

Relational Design Theory

Module by: Nguyen Kim Anh. E-mail the author

Summary: One important theory developed for the relational model involves the notion of functional dependency (fd ). Like constraints functional dependencies are drawn from the semantics of the application domain. Essentially, fd 's describe how individual attributes are related. Functional dependencies are a kind of constraint among attributes within a relation and that have implications for "good" relational schema design In this lecture, we will study • basic theory and definition of functional dependencies • methodology for improving schema designs (normalisation)

Relational Design Theory

One important theory developed for the relational model involves the notion of functional dependency (fd ). Like constraints functional dependencies are drawn from the semantics of the application domain. Essentially, fd 's describe how individual attributes are related.

Functional dependencies are a kind of constraint among attributes within a relation and that have implications for "good" relational schema design

In this lecture, we will study

  • basic theory and definition of functional dependencies
  • methodology for improving schema designs (normalisation)

The aim of studying this:

  • improve understanding of relationships among data
  • gain enough formalism to assist practical database design

Relational Design and Redundancy

Generally, a good relational database design must capture all of the necessary attributes/associations and should do this with a minimal amount of stored information (it means there is no redundant data)

In database design, redundancy is generally a "bad thing" because it causes problems maintaining consistency after updates

However, it can sometimes lead to performance improvements

  • e.g. may be able to avoid a join to collect bits of data together

Consider the following relation defining bank accounts/branches:

Table 1
accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Horseneck 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Mianus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000

Insertion anomaly:

  • when we insert a new record, we need to check that branch data is consistent with existing tuples

Update anomaly:

  • if a branch changes address, we need to update all tuples referring to that branch

Deletion anomaly:

  • if we remove information about the last account at a branch, all of the branch information disappears

Insertion anomaly example (insert account A-306 at Round Hill):

Table 2
accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Horseneck 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Mianus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000
A-306 800 1111111 Round Hill Horseneck 8000800

Update anomaly example (update Round Hill branch address):

Table 3
accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Palo Alto 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Mianus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000

Deletion anomaly example (remove account A-101):

Table 4
accountNo balance customer branch address assets
A-101 500 1313131 Downtown Brooklyn 9000000
A-102 400 1313131 Perryridge Horseneck 1700000
A-113 600 9876543 Round Hill Horseneck 8000000
A-201 900 9876543 Brighton Brooklyn 7100000
A-215 700 1111111 Mianus Horseneck 400000
A-222 700 1111111 Redwood Palo Alto 2100000
A-305 350 1234567 Round Hill Horseneck 8000000

Problem is after deleting the tuple, we don’t know where is the Downtown branch located or what are its assets?

To avoid these kinds of update problems we need to decompose the relation U into several smaller relations Ri where each Ri has minimal overlap with other Rj. Typically, each Ri contains information about one entity (e.g. branch, customer, ...)

This leads to a (bottom-up) database design procedure:

  • start from an unstructured collection of attributes
  • use normalisation (via fds) to impose structure
  • final schema is a collection of tables
  • final schema has minimal redundancy (normalised)

This contrasts with our earlier (top-down) design procedure:

  • structure data at conceptual level (ER design)
  • then map to "physical" level (relational design)
  • final schema is a collection of tables

It appears that ER-design-then-relational-mapping

  • leads to a collection of well-structured tables
  • which is similar to a normalised schema

However, we still need a dependency theory and normalisation procedure to deal with redundancy because of some reasons

  1. ER design does not guarantee minimal redundancy
    • dependency theory allows us to check designs for residual problems
  2. Normalisation can be viewed as (semi)automated design
    • determine all of the attributes in the problem domain
    • collect them all together in a "super-relation" (with update anomalies)
    • provide information about how attributes are related
    • apply normalisation to decompose into non-redundant relations

Notation/Terminology

Most texts adopt the following terminology:

Table 5
Relation schemas upper-case letters, denoting set of all attributes (e.g. R, S, P, Q )
Relation instances lower-case letter corresponding to schema (e.g. r(R), s(S), p(P), q(Q) )
Tuples Lower-case letters (e.g. t, t', t1, u, v )
Attributes Upper-case letters from start of alphabet (e.g. A, B, C, D )
Sets ofattributes simple concatenation of attribute names (e.g. X=ABCD, Y=EFG )
Attributes in tuples tuple[attrSet] (e.g t[ABCD], t[X])

Functional Dependency

A relation instance r(R) satisfies a dependency X →Y if

  • for any t, u size 12{ in } {} r, t[X] = u[X] size 12{ drarrow } {} t[Y] = u[Y]

In other words, if two tuples in r agree in their values for the set of attributes X, then they must also agree in their values for the set of attributes Y.

We say that "Y is functionally dependent on X".

Attribute sets X and Y may overlap; trivially true that X →X.

Consider the following instance r(R) of the relation schema R(ABCDE):

Table 6
A B C D E
a1 b1 c1 d1 e1
a2 b1 C2 d2 e1
a3 b2 C1 d1 e1
a4 b2 C2 d2 e1
a5 b3 C3 d1 e1

What kind of dependencies can we observe among the attributes in r(R)?

Since the values of A are unique, it follows from the fd definition that:

A →B, A →C, A →D, A →E

It also follows that A →BC (or any other subset of ABCDE).

This can be summarised as A →BCDE

From our understanding of primary keys, A is a Primary Key.

Since the values of E are always the same, it follows that:

A →E, B →E, C →E, D →E

Note: cannot generally summarise above by ABCD →E

In general, A →Y, B →Y AB →Y

Other observations:

  • combinations of BC are unique, therefore BC →ADE
  • combinations of BD are unique, therefore BD →ACE
  • if C values match, so do D values, therefore C →D
  • however, D values don't determine C values, so D graphics1.pngC

We could derive many other dependencies, e.g. AE →BC, ...

In practice, choose a minimal set of fds (basis)

  • from which all other fds can be derived
  • which typically captures useful problem-domain information

Inference Rule

Armstrong's rules are complete, general rules of inference on fds.

F1. Reflexivity e.g. If YXYX size 12{Y subseteq X} {} then X→Y

  • a formal statement of trivial dependencies; useful for derivations

F2. Augmentation e.g. X →Y size 12{ drarrow } {} XZ →YZ

  • if a dependency holds, then we can freely expand its left hand side

F3. Transitivity e.g. X →Y, Y →Z size 12{ drarrow } {} X →Z

  • the "most powerful" inference rule; useful in multi-step derivations

While Armstrong's rules are complete, other useful rules exist:

F4. Additivity e.g. X →Y, X →Z size 12{ drarrow } {} X →YZ

  • useful for constructing new right hand sides of fds (also called union)

F5. Projectivity e.g. X →YZ size 12{ drarrow } {} X →Y, X →Z

  • useful for reducing right hand sides of fds (also called decomposition)

F6. Pseudotransitivity e.g. X →Y, YZ →W size 12{ drarrow } {} XZ →W

  • shorthand for a common transitivity derivation

Using rules and a set F of given fds, we can determine what other fds hold.

Example (derivation of AB →GH):

R = ABCDEFGHIJ

F = { AB →E, AG →J, BE →I, E →G, GI →H }

Table 7
1.   AB →E   (given)
2.   AB →AB   (using F1)
3.   AB →B   (using F5 on 2)
4.   AB →BE   (using F4 on 1,3)
5.   BE →I   (given)
6.   AB →I   (using F3 on 4,5)
7.   E →G   (given)
8.   AB →G   (using F3 on 1,7)
9.   AB →GI   (using F4 on 6,8)
10.   GI →H   (given)
11.   AB →H   (using F3 on 6,8)
12.   GI →GI   (using F1)
13.   GI →I   (using F5 on 12)
14.   AB →GH   (using F4 on 8,11)

Closures

  1. Closure of a set of dependencies

The largest collection of dependencies that can be derived from F is called the closure of F and is denoted F+.

Closures allow us to answer two interesting questions:

  • is a particular dependency X →Y derivable from F?
  • are two sets of dependencies F and G equivalent?

For the question "is X →Y derivable from F?" ...

  • compute the closure F+; check whether (XY)F+(XY)F+ size 12{ \( X rightarrow Y \) in F rSup { size 8{+{}} } } {}

For the question "are F and G equivalent?" ...

  • compute closures F+ and G+; check whether they're equal

Unfortunately, closures on even small sets of functional dependencies can be very large.

Algorithms based on F+ rapidly become infeasible.

Example (of fd closure):

R = ABC, F = { AB →C, C →B } F+ = { A →A, AB →A, AC →A, AB →B, BC →B, ABC →B, C →C, AC →C, BC →C, ABC →C, AB →AB, . . . . . . , AB →ABC, AB →ABC, C →B, C →BC, AC →B, AC →AB }

  1. Closure of a set of attributes

Given a set X of attributes and a set F of fds, the largest set of attributes that can be derived from X using F, is called the closure of X (denoted X+).

We can prove (using additivity) that (XY)F+(XY)F+ size 12{ \( X rightarrow Y \) in F rSup { size 8{+{}} } } {} iff YX+YX+ size 12{Y subseteq X rSup { size 8{+{}} } } {}.

For computation, | X+ | is bounded by the number of attributes.

For the question "is X →Y derivable from F?" ...

  • compute the closure X+, check whether YX+YX+ size 12{Y subseteq X rSup { size 8{+{}} } } {}

For the question "are F and G equivalent?" ...

  • for each dependency in G, check whether derivable from F
  • for each dependency in F, check whether derivable from G
  • if true for all, then F size 12{ drarrow } {}G and G size 12{ drarrow } {}F which implies F+ = G+

Algorithm to compute the closure of set of attributes

Inputs: set F of fds

set X of attributes

Output: closure of X (i.e. X+)

X+ = X

stillChanging = true;

while (stillChanging) {

stillChanging = false;

for each W -> Z in F {

if (W in or equal X+) and not (Z in or equal X+) {

X+ = X+ union Z

stillChanging = true;

}

}

}

Example 1. R = ABCDEF, F = { AB →C, BC →AD, D →E, CF →B }

Does AB →D follow from F? Solve by checking DAB+DAB+ size 12{D in ital "AB" rSup { size 8{+{}} } } {}.

Computing AB+:

Table 8
1.   AB+ = AB   (initially)
2.   AB+ = ABC   (using AB →C)
3.   AB+ = ABCD   (using BC →AD)
4.   AB+ = ABCDE   (using D →E)

Since D is in AB+, then AB →D does follow from F.

Keys

Definition: A superkey of a relation schema R is set of attributes KRKR size 12{K subseteq R} {} if given a relation r(R) and any two tuples t1 and t2 in r then t1[K] ≠ t2[K].

Definition: A key K of a relation schema R is a superkey and there is no proper subset of K that can be a key.

Here, we have a similar definition of key of a relation schema given a set of functional dependencies.

Given a relation schema R with a list of attributes U =  A1, A2, …, An , F is a set of functional dependencies, K is a set of attributes and KUKU size 12{K subseteq U} {}. We can say K is a key of E if the followings hold.

  1. K U F + K U F + size 12{"K " rightarrow U in F rSup { size 8{+{}} } } {}
  2. for all proper subset K’ of K (K'K)then K UF+K (K'K)then K UF+ size 12{"K " \( "K'" subset K \) "then K " rightarrow U notin F rSup { size 8{+{}} } } {}

The difference between key and superkey is that a key has to be minimal. That means if K = { A1, A2, .., Ak} then K \ { Ai } is not a key for R for any Ai

If a relation schema has more than one key, each is called a candidate key

A candidate key will be chosen to be primary key.

Algorithm for finding a candidate key of a relation schema

Input: Relation schema R = { A1, A2, .., An}, set F of fds

Output: A candidate key K for R.

K = R

for i = 1 to n do {

Kt = K - {Ai}

if Kt+ = R then K = Kt

}

Example: R = ABCDEF, F = { BC →AD, D →E, CF →B }. Find a candidate key of this relation schema.

Table 9
1.   K = ABCDEF   (initially)
2.   K = BCDEF   (BCDEF) + = R (using BC →AD) so A is not in the key
3.   K = CDEF   (CDEF)+ = R (using CF →B; BC →AD) so B is not in the key
4.   K = CDEF   (DEF)+ = DEF ≠ R so DEF is not a key, C must be an attribute in key
5.   K = CEF   (CEF)+ = R (using CF →B; BC →AD) so D is not in the key
6.   K = CF   (CF)+ = R (using CF →B; BC →AD,D →E) so E is not in the key
7.   K = CF   (C)+ = C ≠ R so F must be in the key

One candidate key for the given relation schema is CF

Minimal Covers

For a given application, we can define many different sets of fds with the same closure (e.g. F and G where F+ = G+)

Minimal cover Fc for a set F of fds:

  • Fc is equivalent to F
  • all fds have the form X →A (where A is a single attribute)
  • it is not possible to make Fc smaller
    • either by deleting an fd
    • or by deleting an attribute from an fd

An fd d is redundant if (F-{d})+ = F+

An attribute A is redundant if (F{d}{d'})+=F+(F{d}{d'})+=F+ size 12{ \( F - lbrace d rbrace union lbrace d' rbrace \) rSup { size 8{+{}} } =F rSup { size 8{+{}} } } {} (where d' is the same as d but with attribute A removed)

Algorithm for computing minimal cover:

Inputs: set F of fds

Output: minimal cover Fc of F

Fc = F

Step 1: put fds into canonical form

for each f in Fc like X -> {A1,..,An}

Fc = Fc - {f}

for each a in {A1,..,An}

Fc = Fc union {X -> a}

end

end

Step 2: eliminate redundant attributes

for each f in Fc like X → A

for each b in X

f' = (X-{b}) -> A;

G = Fc - {f} Union {f'}

if (G+ == Fc+) Fc = G

end

end

Step 3: eliminate redundant functional dependencies

for each f in Fc

G = Fc - {f}

if (G+ == Fc+) Fc = G

end

Example : R = ABC, F = { A →BC, B →C, A →B, AB →C }

Compute the minimal cover:

  • canonical fds: A →B, A →C, B →C, AB →C
  • redundant attrs: A →B, A →C, B →C, AB →C
  • redundant fds: A →B, A →C, B →C

This gives the minimal cover FC={AB,BC}FC={AB,BC} size 12{F rSub { size 8{C} } = lbrace A rightarrow B,B rightarrow C rbrace } {}.

Example: Let consider the relation schema R(ABCDEG) and a set of functional dependencies F = {AB → C, C → A, BC → D, ACD → B, D → EG, BE → C, CG → BD, CE → AG }. Find the minimal cover of F

Canonical fds: F1 = {AB → C, C → A, BC → D, ACD → B, D → E, D → G, BE → C, CG → B, CG → D, CE → A, CE → G }

Find redundant attributes:

Consider fd: ACD → B, we have (CD)+ = ABCDEG it means CD → B . A is a redundant attribute in that functional dependency.

No other redundant attributes found. So we have

F2 = {AB → C, C → A, BC → D, CD → B, D → E, D → G, BE → C, CG → B, CG → D, CE → A, CE → G }

Find redundant functional dependency.

  • Initially F0 = F2
  • Consider first fd in F : AB → C . With respect to the set F0 \ {AB →C}, we have (AB)+ = AB. So the fd AB → C cannot be inferred from others fds in the set F0 or F0 is not equivalent to F0 \ {AB → C}. Therefore, we can have F1 = F0or the functional dependency AB → C is not redundant.
  • C → A is not redundant or F2 = F1 because (C)+ = C with respect to the set F1 \ {C → A}
  • BC → D is not redundant or F3 = F2 because (BC)+ = ABC with respect to the set F2 \ {BC → D}.
  • CD → B is a redundant functional dependency because (CD)+ = ABCDEG with respect to the set F3 \ {CD → B}. The closure of CD contains attribute B so that CD → B can be inferred from remaining dependencies in the set . Therefore, F4 = F3 \ {CD → B}.
  • Similarly, the following functional dependencies are redundant: CG → D, CE → A

Finally, the result of this step is F3 = {AB → C, C → A, BC → D, D → E, D → G, BE → C, CG → B, CE → G }

This is the minimal cover of F.

Relation Decomposition

As mention in the beginning of the lecture, in the bottom-up approach of designing relational database, we start from a single universal relation schema R = {A1, A2, …,An} that includes all the attributes of the database. Given a set of functional dependencies F that should hold on the attributes of R, we can use the dependencies to decompose the relation R into a set of relation schemas D = {R1, R2, … , Rk} with R1R2...RK=RR1R2...RK=R size 12{R rSub { size 8{1} } union R rSub { size 8{2} } "." "." "." union R rSub { size 8{K} } =R} {}. This means each new relation scheme contains a subset of attributes of R and every attributes of R appears as an attribute in one of the new relations. D is called a relation decomposition of R.

For a relation decomposition, there are two desirable properties:

  • Dependency preservation property ensures that each functional dependency can be represented in some individual relation resulting after decomposition
  • Lossless join property ensures No information is loss after decomposition.

Lossless Join Property of Decomposition

Definition: A decomposition D = {R1, R2, … , Rk} of a relation schema R has lossless join property with respect to a set of functional dependencies F on R if for any relation r(R) that satisfies F, we have r=ΠR1(r)ΠR2(r)...ΠRk(r)r=ΠR1(r)ΠR2(r)...ΠRk(r) size 12{r=Π rSub { size 8{R1} } \( r \) * Π rSub { size 8{R2} } \( r \) * "." "." "." * Π rSub { size 8{ ital "Rk"} } \( r \) } {}

Algorithm for testing for lossless join decomposition

Input: Relation schema R = { A1, A2, .., An}, set F of fds, a decomposition D = {R1, R2, … , Rk}

Output:Check whether D is lossless join decomposition or not

  • Create a matrix T with k rows and n columns. Row i in the matrix corresponds to relation Ri in D. Column j in the matrix corresponds to attribute Aj in R.
  • Set the values for an entry in the matrix T:

T(i,j)= a(i,j) if relation Ri includes attribute Aj. Othewise, T(i,j)= b(i,j)

  • Repeat the following loop until a complete loop execution results in no changes to T

for each funtional dependency X -> Y in F{

for all rows in T that have the same values in the columns corresponds to attributes in X {

Make the symbols in each column that correspond to an attribute in Y be the same in all these rows : if any of the row has an ‘a’ symbol for the column, set other rows to that same ‘a’ symbol in the column. otherwise, choose one of the ‘b’ symbol to set to other rows in the column.

}

}

  • If a row is made up entirely of ‘a’ symbol then the decomposition is lossless join. Otherwise, the decomposition is lossy join.

Example: Let relation schema R = ABCDE,; decomposition D is {R1 = AD, R2 = AB, R3 = BE, R4 = CDE và R5 = AE} ; a set of funtional dependencies F = {A → C, B → C,

C → D, DE → C, CE → A }. D is lossless join decomposition or not ?

According to algorithm, the initial matrix T is:

Table 10
  A B C D E
row 1 a1 b12 b13 a4 b15
row 2 a1 a2 b23 b24 b25
row 3 b31 a2 b33 b34 a5
row 4 b41 b42 a3 a4 a5
row 5 a1 b52 b53 b54 a5

Using the functional dependency A → C we have , rows 1, 2, 5 have the same symbol in column A, so make the symbol in column C to be the same as b13.

Table 11
A B C D E
a1 b12 b13 a4 b15
a1 a2 b13 b24 b25
b31 a2 b33 b34 a5
b41 b42 a3 a4 a5
a1 b52 b13 b54 a5

Using the functional dependency B → C, rows 2,3 have the same symbol in column B so make the symbol in column C to be the same as b13

Table 12
A B C D E
a1 b12 b13 a4 b15
a1 a2 b13 b24 b25
b31 a2 b13 b34 a5
b41 b42 a3 a4 a5
a1 b52 b13 b54 a5

Using the functional dependency C → D, row 1, 2, 3, 5 have the same symbol in column C so make the values of column D to be the the same: b24, b34, b54 become a4

Table 13
A B C D E
a1 b12 b13 a4 b15
a1 a2 b13 a4 b25
b31 a2 b13 a4 a5
b41 b42 a3 a4 a5
a1 b52 b13 a4 a5

Using DE→ C, rows 3, 4, 5 have the same symbols in columns D,E so make the symbol of column C to be the same: b13 becomes a3.

Table 14
A B C D E
a1 b12 b13 a4 b15
a1 a2 b13 a4 b25
b31 a2 a3 a4 a5
b41 b42 a3 a4 a5
a1 b52 a3 a4 a5

Using CE→ A, row 3, 4, 5 have the same symbols in columns C, E so make the symbol in column A be the same: b31, b41 become a1

Table 15
A B C D E
a1 b12 b13 a4 b15
a1 a2 b13 a4 b25
a1 a2 a3 a4 a5
a1 b42 a3 a4 a5
a1 b52 a3 a4 a5

There is a row 3 contains all ‘a’ symbol so this decomposition is lossless.

Dependency Preservation of Decomposition

In addition of lossless join, dependency preservation is a desirable property of decomposition. The functional dependency is a constraint in the database, we don’t want any constraint disappear because of the decomposition. Given a relation schema R, set of functional dependencies F on R and a decomposition D

It is not necessary that the exact dependencies specified on set F appear directly in one of the relation of D. However, the set of all the dependencies that hold on relations in D must be equivalent to F.

Definition: Given a relation schema R and a set of functional dependencies F of R ; D= (R1, R2,. , Rk) is a decomposition of R on F. We say Fi is a projection of F on Ri, denoted by ΠRi(F)ΠRi(F) size 12{Π rSub { size 8{ ital "Ri"} } \( F \) } {}, is the set of dependencies XYF+XYF+ size 12{X rightarrow Y in F rSup { size 8{+{}} } } {} such that XYRiXYRi size 12{ ital "XY" subseteq R rSub { size 8{i} } } {} .

We say that a decomposition D = {R1, R2, … , Rk} is dependency-preserving with respect to F if (ΠR1(F)ΠR2(F)...ΠR3(F))+=F+(ΠR1(F)ΠR2(F)...ΠR3(F))+=F+ size 12{ \( Π rSub { size 8{R1} } \( F \) union Π rSub { size 8{R2} } \( F \) union "." "." "." union Π rSub { size 8{R3} } \( F \) \) rSup { size 8{+{}} } =F rSup { size 8{+{}} } } {}

Example: Let consider the relation

R (CITY, STREET, ZIP)

Simply, we can use the name C for CITY, S for STREET, Z for ZIP . The set of funtional dependencies in R consists of CS→ Z and Z → C. Suppose we have a decomposition D = {R1 = SZ and R2 = CZ }. The projection of F on R1 contains only trivial dependencies that can be inferred using reflexivity rule. The projection of F on R­2 consists of trivial dependencies and Z → C. We can see that the set F’ of Z → C and trivial dependencies cannot derive the dependency CS→ Z. Thus, this decomposition of R does not preserve dependency

Normalization

Normalization is the branch of relational theory providing design insights.

The goals of normalization are:

  • be able to characterise the level of redundancy in a relational schema
  • provide mechanisms for transforming schemas to remove redundancy

Normalization draws heavily on the theory of functional dependencies.

Normalization theory defines six normal forms (NFs). Each normal form involves a set of dependency properties that a schema must satisfy and each gives guarantees about presence/absence of update anomalies . That means higher normal forms have less redundancy so that less update problems.

Normal Forms

A brief history of normal forms:

  • First,Second,Third Normal Forms (1NF,2NF,3NF) (Codd 1972)
  • Boyce-Codd Normal Form (BCNF) (1974)
  • Fourth Normal Form (4NF) (Zaniolo 1976, Fagin 1977)
  • Fifth Normal Form (5NF) (Fagin 1979)

NF hierarachy: 5NF size 12{ drarrow } {}4NF size 12{ drarrow } {}BCNF size 12{ drarrow } {}3NF size 12{ drarrow } {}2NF size 12{ drarrow } {}1NF

1NF allows most redundancy; 5NF allows least redundancy.

Table 16
1NF   all attributes have atomic values we assume this as part of relational model
2NF   all non-key attributes fully depend on key (i.e. no partial dependencies) avoids much redundancy
3NF, BCNF   no attributes dependent on non-key attrs(i.e. no transitive dependencies) avoids remaining redundancy
4NF   Removes problems due to multivalued dependencies
5NF   Removes problems due to join dependencies

In practice, BCNF and 3NF are the most important.

Boyce-Codd Normal Form (BCNF):

  • eliminates all redundancy due to functional dependencies
  • but may not preserve original functional dependencies

Third Normal Form (3NF):

  • eliminates most (but not all) redundancy due to fds
  • guaranteed to preserve all functional dependencies

In the following sections, we will give a more detail definition of the normal forms. Before move into details of the normal forms, we state some related definition.

Definition: Let R be a relation schema with the set of attributes U = {A1, A2, …, An }, F is the set of functional dependencies hold on R ; A is an attributes in U. A is called prime attribute if A appears in some candidate keys of R. If A is not a member of any candidate key of R then A is nonprime attribute.

Example : Given R (A,B,C,D) with a set of functional dependencies F = {AB → C, BC → A, B → D}. There are two candidate keys of R which are AB and BC. Therefore, the prime attributes of R are A, B and C. D is a nonprime attribute.

Definition: Consider a relation schema R with set of attributes U= { A1, A2, …, An }, F is a set of functional dependencies on R and X, YUYU size 12{Y supseteq U} {}. We say that Y is fully functionally dependent on X if:

  • X Y F + X Y F + size 12{X rightarrow Y in F rSup { size 8{+{}} } } {}
  • With all proper subset X’ of X , X'YF+X'YF+ size 12{X rSup { size 8{'} } rightarrow Y notin F rSup { size 8{+{}} } } {}

Otherwise, we say that Y is partially functionally dependent on X.

Definition: Consider a relation schema R with set of attributes U = { A1, A2, …, An }, F is a set of functional dependencies on R and X,YUX,YU size 12{X,Y subseteq U} {}. A functional dependency X→ Y is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R and X→ Z, Z → Y hold.

First Normal Form (1NF)

Definition: A relation schema R is in First Normal Form if the domains of all attributes of R are atomic.

A relation is in 1NF if the corresponding is in 1NF.

A domain is atomic if elements of the domain are considered to be indivisible units.

First Normal Form is defined to disallow multivalued attributes, composite attributes and their combination. In a relation instance of a 1NF relation schema, there is no set of values, tuple of values or a combination of both as an attribute value for a single tuple.

Example of relation instance of relation schema which is not in 1NF

Table 17
EMP-PROJ EID Name Address Birthdate Project
PName StartDate
  1 John Smith 12/34 Meeks Str, GreenGarden 26/7/1976 GreenLife 1/1/2006
          ProductX 15/7/1007
  2 Mary Streep 123 King Str, Marryland 12/2/1978 ProductX 1/2/2007
          Networld 15/8/1007

The 1NF version of the same relation with redundancy

Table 18
EMP -PROJ EID Name Address Birthdate PName StartDate
  1 John Smith 12/34 Meeks Str, GreenGarden 26/7/1976 GreenLife 1/1/2006
  1 John Smith 12/34 Meeks Str, GreenGarden 26/7/1976 ProductX 15/7/1007
  2 Mary Streep 123 King Str, Marryland 12/2/1978 ProductX 1/2/2007
  2 Mary Streep 123 King Str, Marryland 12/2/1978 Networld 15/8/1007

Second Normal Form (2NF)

This normal form is based on concept of full functional dependency (mentioned above).

Definition: A relation schema R is in 2NF if it is in 1NF and every nonprime attribute A in R is fully funtionally dependent on the primary key of R.

Example: Consider a relation schema EMP-PROJ above. The set of functional dependencies in this relation schema consist of two fds: EID → Name, Address, Birthdate and EID, PName → StartDate.

The only candidate key for this relation schema is {EID, PName}. The nonprime attributes are Name, Address, Birthdate, StartDate. Nonprime attributes Name, Address, Birthdate violate 2NF because they are funtionally dependent only on EID which is a part of the key.

If a relation schema is not in 2NF, we can normalized into several 2NF relations in which nonprime attributes are associated only with the part of key on which they are fully functionally dependent.

Therefore, we normalized the EMP-PROJ into two 2NF relations as follows

Table 19
EMPLOYEE EID Name Address Birthdate
  1 John Smith 12/34 Meeks Str, GreenGarden 26/7/1976
  2 Mary Streep 123 King Str, Marryland 12/2/1978
Table 20
EMP-PROJ EID PName StartDate
  1 GreenLife 1/1/2006
  1 ProductX 15/7/1007
  2 ProductX 1/2/2007
  2 Networld 15/8/1007

Third Normal Form (3NF)

This normal form is based on concept of transivtive functional dependency.

Definition: A relation schema R is in 3NF if it is in 2NF and no nonprime attribute in R is transitively dependent on the primary key of R.

Example: Consider the relation schema EMP-DEPT (EID, Name, Address, Birthdate, DeptId, DeptName, Dept-Manager). In this relation schema, suppose that the followings funtional dependencies hold {EID → Name, Address, Birthdate, DeptId, DeptId → DeptName, Dept-Manager}

The primary of this relation schema is {EID}. This relation schema is in 2NF. However, the nonprime attributes DeptName, Dept-Manager is transitively dependent on primary key (since we got EID→DeptId and DeptId → DeptName, Dept-Manager) so it is not in 3NF.

Algorithm : Decompose relation schema into 3NF schemas with Dependency Preserving

Input: Relation schema R, set of functional dependencies F

Output: A decomposition D - consists of relations which are all in 3NF with dependency preserving

  1. Find a minimal cover G of F
  2. If there are attributes in R that do not appear in any functional dependency in F, then place them into a relation schema and remove them from R.
  3. For each functional dependency X→A in F, create a relation schema in D with attributes {X size 12{ union } {}A}. If there are several funtional dependencies in F with the same left-hand-side X (such as X→A1, …, X→Ak) then create a relation shema which includes all attributes { X size 12{ union } {}A1 size 12{ union } {} A2 size 12{ union } {}.. size 12{ union } {}Ak }.

Example: Let consider the relation schema R(ABCDEG) and a set of functional dependencies F = {AB → C, C → A, BC → D, ACD → B, D → EG, BE → C, CG → BD, CE → AG }. Find the decomposition in 3NF with dependency preserving.

  1. The minimal cover of F is Fc = {AB → C, C → A, BC → D, D → E, D → G, BE → C, CG → B, CE→ G} (from the example in the section of Minimal Cover)
  2. All attributes in R are members of at least one functional dependency in F, so we can use the functional dependency to decompose relation schema R. The result is D = (ABC, CA, BCD, DEG, BEC, CGB, CEG). However, because CA  ABC so we can remove relation schema CA from D and the final result is D = (ABC, BCD, DEG, BEC, CGB, CEG)

Algorithm: Decompose relation schema into 3NF schemas with Dependency Preserving and Lossless Join

Input: Relation schema R, set of funtional dependencies F

Output: A decomposition D - consists of relations which are all in 3NF with dependency preserving

  1. Find a minimal cover Fc of F
  2. Find a key of relation schema R with respect to G (algorithm in section Keys). Result is K
  3. Find a decomposition into 3NF schemas with Dependency Preserving. Result is D (Previous algorithm)
  4. If none of the relation schemas in D contains key K of R then create one more relation schema in D that contains attributes that form K. Otherwise, D is the result of this algorithm.

Example: Let consider the relation schema R(ABCDEG) and a set of functional dependencies F = {AB → C, C → A, BC → D, ACD → B, D → EG, BE → C, CG → BD, CE → AG }. Find the decomposition in 3NF with dependency preserving and lossless join.

  1. Minimal cover of F: Fc = {AB → C, C → A, BC → D, D → E, D → G, BE → C, CG → B, CE→ G}
  2. Find a key K follows the algorithm in section Keys

K0 = ABCDEG

K1 = K0 \ A = BCDEG because BCDEG → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" in F rSup { size 8{+{}} } } {}

K2 = K1 \ B = CDEG because CDEG → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" in F rSup { size 8{+{}} } } {}

K3 = K2 = CDEG because DEG → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" notin F rSup { size 8{+{}} } } {}

K4 = K3 \ D = CEG because CEG → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" in F rSup { size 8{+{}} } } {}

K5 = K4 \ E = CG because CG → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" in F rSup { size 8{+{}} } } {}

K6 = K5 = CG because C → ABCDEGF+ABCDEGF+ size 12{"ABCDEG" notin F rSup { size 8{+{}} } } {}

The resulting key is CG.

  1. Using the result of previous example , we have a decomposition D with dependency preserving is D = (ABC, BCD, DEG, BEC, CGB, CEG)
  2. The relation schema CGB in D contains the key CG, so D is a dependency preserving and lossless join decomposition of R into 3NF relation schemas.

Boyce-Codd Normal Form

Definition: A relation schema R is in BCNF with respect to a set F of functional dependencies if for all fds X →Y in F+

  • either X →Y is trivial (i.e. Y size 12{ subseteq } {} X)
  • or X is a superkey

A relation is in BCNF is its schema is in BCNF

A database schema is in BCNF if all relation schemas are in BCNF.

Example: Given the relation schema R(A B C D E F) ; Set of functional dependencies F = {A →BC, E →AF}.

The key of R is DE. This relation schema is not in BCNF because the functional dependencies violate the conditions of BCNF ( all functional dependencies in F are not trivial and all left hand side attributes are not superkey)

If we transform a schema into BCNF, we are guaranteed:

  • no update anomalies due to fd-based redundancy
  • lossless join decomposition

However, we are not guaranteed:

  • all fds from the original schema exist in the new schema

This may be a problem if the fds contain significant semantic information about the problem domain.

If we need to preserve dependencies, use 3NF.

Algorithm converts an arbitrary schema to BCNF

Inputs: schema R, set F of fds

Output: set D of BCNF schemas

D = {R};

while (any schema (S belong to D) is not in BCNF) {

choose an fd X → Y on S that violates BCNF

D=(DS)(SY)XYD=(DS)(SY)XY size 12{D= \( D - S \) union \( S - Y \) union ital "XY"} {}

}

Example: With the previous example R(A B C D E F) ; Set of functional dependencies F = {A →BC, E →AF}. R is not in BCNF, both functional dependencies violate the conditions of BCNF.

  • Using A →BC to decompose we have D1 = {ABC, ADEF}. The relation schema R1(ABC) is in BCNF with respect to projection F1 = { A →BC}. However, relation schema R2(ADEF) is not in BCNF with respect to projection

F2 = { E →AF} because the key of R2 is DE and the left-hand-side of the functional dependency is not a superkey. Therefore, we have to decompose R2

  • R2 is decomposed into R21(AEF) and R22(DE) using E →AF
  • The final result is D = {ABC, AEF, DE}

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks