Skip to content Skip to navigation

Connexions

You are here: Home » Content » Query Processing and Optimization

Navigation

Lenses

What is a lens?

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

This content is ...

Affiliated with (What does "Affiliated with" mean?)

This content is either by members of the organizations listed or about topics related to the organizations listed. Click each link to see a list of all content affiliated with the organization.
  • VOCW

    This module is included inLens: Vietnam OpenCourseWare's Lens
    By: Vietnam OpenCourseWare

    Click the "VOCW" link to see all content affiliated with them.

Recently Viewed

This feature requires Javascript to be enabled.
 

Query Processing and Optimization

Module by: Nguyen Kim Anh. E-mail the author

Query processing is a set of activities involving in getting the result of a query expressed in a high-level language. These activities includes parsing the queries and translate them into expressions that can be implemented at the physical level of the file system,

optimizing the query of internal form to get a suitable execution strategies for processing and then doing the actual execution of queries to get the results.

The cost of processing of query is dominated by the disk access. For a given query, there are several possible strategies for processing exist, especially when query is complex. The difference between a good strategies and a bad one may be several order of magnitude. Therefore, it is worhwhile for the system to spend some time on selecting a good strategies for processing query.

In this lecture, we give a brief introduction about the difference phases in query processing process. Then we will discuss some basic query optimization strategies.

1. Overview of Query Processing

The main steps in processing a high-level query are illustrated in figure 1

Figure 1: Steps in query processing process
Figure 1 (graphics1.png)

The functions of Query Parser is parsing and translating a given high-leval language query into its immediate form such as relational algebra expressions. The parser need to check for the syntax of the query and also check for the semantic of the query ( it means verifying the relation names, the attribute names in the query are the names of relations and attributes in the database). A parse-tree of the query is constructed and then translated into relational algebra expression.

A relational algebra expression of a query specifies only partially how to evaluate a query, there are several ways to evaluate an relational algebra expression. For example, consider the query :

SELECT Salary FROM EMPLOYEE WHERE Salary >= 50,000 ;

The possible relational algebra expressions for this query are:

  • Π Salary ( σ Salary >= 50000 ( EMPLOYEE ) ) Π Salary ( σ Salary >= 50000 ( EMPLOYEE ) ) size 12{Π rSub { size 8{ ital "Salary"} } \( σ rSub { size 8{ ital "Salary"">=""50000"} } \( ital "EMPLOYEE" \) \) } {}
  • σ Salary >= 50000 ( Π Salary EMPLOYEE ) σ Salary >= 50000 ( Π Salary EMPLOYEE ) size 12{σ rSub { size 8{ ital "Salary"">=""50000"} } \( Π rSub { size 8{ ital "Salary"} } ital "EMPLOYEE" \) } {}

Further, each relational algebra operation can be executed using various algorithms. For example, to implement the preceding selection, we can do a linear search in the EMPLOYEE file to retrieve the tuples with Salary >= 50000. However, if an index available on the Salary attribute, we can use the index to locate the tuples. Different algorithms might have different cost.

Thus, in order to specify fully how to evaluate a query, the system is responsible for constructing a query execution plan which made up of the relational algebra expression and the detailed algorithms to evaluate each operation in that expression. Moreover, the selected plan should minimize the cost of query evaluation. The process of choosing a suitable query execution plan is known as query optimization This process is perform by Query Optimizer.

One aspect of optimization occurs at relational algebra level. The system attempt to find an expression that is equivalent to the given expression but that is more efficient to execute. The other aspect involves the selection of a detail strategy for processing the query, this relates to choosing the processing algorithm, choosing the indices to use and so on.

Once the query plan is chosen, the Query Execution Engine lastly take the plan, executes that plan and return the answer of the query.

In the following sections, we give an overview of query optimization techniques. The first one based on heuristic rules for ordering operations in the query execution plan. The second technique involves estimating the cost of different execution plan based on the systematic information. Most of the commercial DBMS’s query optimizers use the combination of these two methods.

2. Using Heuristic in Query Optimization

Heuristic optimization applies the rules to the initial query expression and produces the heuristically transformed query expressions.

A heuristic is a rule that works well in most cases but not always guaranteed. For example, a rule for transforming relational-algebra expression is “Perform selection operations as early as possible”. This rule is generate based on intuitive idea that selection is the operation that results a subset of the input relation so apply selection early might reduce the immediate result size. However, there are cases perform selection before join is not a good idea. For example for the expression

σ F ( r C s ) σ F ( r C s ) size 12{σ rSub { size 8{F} } \( r ⊳ ⊲ rSub { size 8{C} } s \) } {}

If relations r and s both large and the selection produces a result of only one tuple so do the selection before join might be good. But, assume that r is small relation, s is very large, s has an index on the join attribute and no index on the attributes of s in selection condition then compute the join using index then do select might be better then scan the whole s to do selection first.

In this section, we firstly introduce how the heuristic rules can be used to convert a initial query expression to an equivalent one then we discuss the generation of query execution plan.

2.1 Transformation of Relational Expression

As we mentioned, one aspect of optimization occurs at relational algebra level. This involves transforming an initial expression (tree) into an equivalent expression (tree) which is more efficient to execute. Two relational algebra expressions are said to be equivalent if the two expressions generate two relation of the same set of attributes and contain the same set of tuples although their attributes may be ordered differently.

The query tree is a data structure that represents the relational algebra expression in the query optimization process. The leaf nodes in the query tree corresponds to the input relations of the query. The internal nodes represent the operators in the query. When executing the query, the system will execute an internal node operation whenever its operands avaiable, then the internal node is replaced by the relation which is obtained from the preceding execution.

2.1.1 Equivalence Rules for transforming relational expressions

There are many rules which can be used to transform relational algebra operations to equivalent ones. We will state here some useful rules for query optimization.

In this section, we use the following notation:

  • E1, E2, E3,… : denote relational algebra expressions
  • X, Y, Z : denote set of attributes
  • F, F1, F2, F3 ,… : denote predicates (selection or join conditions)
  • Commutativity of Join, Cartesian Product operations

E 1 F E 2 E 2 F E 1 E 1 × E 2 E 2 × E 1 E 1 F E 2 E 2 F E 1 E 1 × E 2 E 2 × E 1 alignl { stack { size 12{E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F} } E rSub { size 8{2} } equiv E rSub { size 8{2} } ⊳ ⊲ rSub { size 8{F} } E rSub { size 8{1} } } {} # E rSub { size 8{1} } times E rSub { size 8{2} } equiv E rSub { size 8{2} } times E rSub { size 8{1} } {} } } {}

Note that: Natural Join operator is a special case of Join, so Natuaral Joins are also commnutative.

  • Associativity of Join , Cartesian Product operations

( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) ( E 1 × E 2 ) × E 3 E 1 × ( E 2 × E 3 ) ( E 1 F1 E 2 ) F2 E 3 E 1 F1 ( E 2 F2 E 3 ) ( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) ( E 1 × E 2 ) × E 3 E 1 × ( E 2 × E 3 ) ( E 1 F1 E 2 ) F2 E 3 E 1 F1 ( E 2 F2 E 3 ) alignl { stack { size 12{ \( E rSub { size 8{1} } *E rSub { size 8{2} } \) *E rSub { size 8{3} } equiv E rSub { size 8{1} } * \( E rSub { size 8{2} } *E rSub { size 8{3} } \) } {} # \( E rSub { size 8{1} } times E rSub { size 8{2} } \) ` times E rSub { size 8{3} } equiv E rSub { size 8{1} } times \( E rSub { size 8{2} } times E rSub { size 8{3} } \) {} # \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F1} } E rSub { size 8{2} } \) ` ⊳ ⊲ rSub { size 8{F2} } E rSub { size 8{3} } equiv E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F1} } \( E rSub { size 8{2} } ⊳ ⊲ rSub { size 8{F2} } E rSub { size 8{3} } \) {} } } {}

Join operation associative in the following manner: F1 involves attributes from only E1 and E2 and F2 involves only attributes from E2 and E3

  • Cascade of Projection

π X1 ( π X2 ( . . . ( π Xn ( E ) ) . . . ) ) π X1 ( E ) π X1 ( π X2 ( . . . ( π Xn ( E ) ) . . . ) ) π X1 ( E ) size 12{π rSub { size 8{X1} } \( π rSub { size 8{X2} } \( "." "." "." \( π rSub { size 8{ ital "Xn"} } \( E \) \) "." "." "." \) \) equiv π rSub { size 8{X1} } \( E \) } {}

  • Cascade of Selection

σ F1 F2 . . . Fn ( E ) σ F1 ( σ F2 ( . . . ( σ Fn ( E ) ) . . . ) ) σ F1 F2 . . . Fn ( E ) σ F1 ( σ F2 ( . . . ( σ Fn ( E ) ) . . . ) ) size 12{σ rSub { size 8{F1 and F2 and "." "." "." and ital "Fn"} } \( E \) equiv σ rSub { size 8{F1} } \( σ rSub { size 8{F2} } \( "." "." "." \( σ rSub { size 8{ ital "Fn"} } \( E \) \) "." "." "." \) \) } {}

  • Commutativity of Selection

σ F1 ( σ F2 ( E ) ) σ F2 ( σ F1 ( E ) ) σ F1 ( σ F2 ( E ) ) σ F2 ( σ F1 ( E ) ) size 12{σ rSub { size 8{F1} } \( σ rSub { size 8{F2} } \( E \) \) equiv σ rSub { size 8{F2} } \( σ rSub { size 8{F1} } \( E \) \) } {}

  • Commuting Selection with Projection

π X ( σ F ( E ) ) σ F ( π X ( E ) ) π X ( σ F ( E ) ) σ F ( π X ( E ) ) size 12{π rSub { size 8{X} } \( σ rSub { size 8{F} } \( E \) \) equiv σ rSub { size 8{F} } \( π rSub { size 8{X} } \( E \) \) } {}

This rule holds if the selection condition F involves only the attributes in set X.

  • Selection with Cartesian Product and Join
  • If all the attributes in the selection condition F involve only the attributes of one of the expression say E1, then the selection and Join can be combined as follows:

σ F ( E 1 C E 2 ) ( σ F ( E 1 ) ) C E 2 σ F ( E 1 C E 2 ) ( σ F ( E 1 ) ) C E 2 size 12{σ rSub { size 8{F} } \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{C} } E rSub { size 8{2} } \) equiv \( σ rSub { size 8{F} } \( E rSub { size 8{1} } \) \) ⊳ ⊲ rSub { size 8{C} } E rSub { size 8{2} } } {}

  • If the selection condition F = F1 AND F2 where F1 involves only attributes of expression E1 and F2 involves only attribute of expression E2 then we have:

σ F1 F2 ( E 1 C E 2 ) ( σ F1 ( E 1 ) ) C ( σ F2 ( E 2 ) ) σ F1 F2 ( E 1 C E 2 ) ( σ F1 ( E 1 ) ) C ( σ F2 ( E 2 ) ) size 12{σ rSub { size 8{F1 and F2} } \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{C} } E rSub { size 8{2} } \) equiv \( σ rSub { size 8{F1} } \( E rSub { size 8{1} } \) \) ⊳ ⊲ rSub { size 8{C} } \( σ rSub { size 8{F2} } \( E rSub { size 8{2} } \) \) } {}

  • If the selection condition F = F1 AND F2 where F1 involves only attributes of expression E1 and F2 involves attributes from both E1 and E2 then we have:

σ F1 F2 ( E 1 C E 2 ) σ F2 ( ( σ F1 ( E 1 ) ) C E 2 ) σ F1 F2 ( E 1 C E 2 ) σ F2 ( ( σ F1 ( E 1 ) ) C E 2 ) size 12{σ rSub { size 8{F1 and F2} } \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{C} } E rSub { size 8{2} } \) equiv σ rSub { size 8{F2} } \( ` \( σ rSub { size 8{F1} } \( E rSub { size 8{1} } \) \) ⊳ ⊲ rSub { size 8{C} } E rSub { size 8{2} } \) } {}

The same rule apply if the Join operation replaced by a Cartersian Product operation.

  • Commuting Projection with Join and Cartesian Product
  • Let X, Y be the set of attributes of E1 and E2 respectively. If the join condition involves only attributes in XY (union of two sets) then :

π XY ( E 1 F E 2 ) π X ( E 1 ) F π Y ( E 2 ) π XY ( E 1 F E 2 ) π X ( E 1 ) F π Y ( E 2 ) size 12{π rSub { size 8{ ital "XY"} } \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F} } E rSub { size 8{2} } \) equiv π rSub { size 8{X} } \( E rSub { size 8{1} } \) ⊳ ⊲ rSub { size 8{F} } π rSub { size 8{Y} } \( E rSub { size 8{2} } \) } {}

The same rule apply when replace the Join by Cartersian Product

  • If the join condition involves additional attributes say Z of E1 and W of E2 and Z,W are not in XY then :

π XY ( E 1 F E 2 ) π XY ( π XZ ( E 1 ) F π YW ( E 2 ) ) π XY ( E 1 F E 2 ) π XY ( π XZ ( E 1 ) F π YW ( E 2 ) ) size 12{π rSub { size 8{ ital "XY"} } \( E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F} } E rSub { size 8{2} } \) equiv π rSub { size 8{ ital "XY"} } \( π rSub { size 8{ ital "XZ"} } \( E rSub { size 8{1} } \) ⊳ ⊲ rSub { size 8{F} } π rSub { size 8{ ital "YW"} } \( E rSub { size 8{2} } \) \) } {}

  • Commuting Selection with set operations

The Selection commutes with all three set operations (Union, Intersect, Set Difference) .

σ F ( E 1 E 2 ) ( σ F ( E 1 ) ) ( σ F ( E 2 ) ) σ F ( E 1 E 2 ) ( σ F ( E 1 ) ) ( σ F ( E 2 ) ) size 12{σ rSub { size 8{F} } \( E rSub { size 8{1} } union E rSub { size 8{2} } \) equiv \( σ rSub { size 8{F} } \( E rSub { size 8{1} } \) \) union \( σ rSub { size 8{F} } \( E rSub { size 8{2} } \) \) } {}

The same rule apply when replace Union by Intersection or Set Difference

  • Commuting Projection with Union

π X ( E 1 E 2 ) ( π X ( E 1 ) ) ( π X ( E 2 ) ) π X ( E 1 E 2 ) ( π X ( E 1 ) ) ( π X ( E 2 ) ) size 12{π rSub { size 8{X} } \( E rSub { size 8{1} } union E rSub { size 8{2} } \) equiv \( π rSub { size 8{X} } \( E rSub { size 8{1} } \) \) union \( π rSub { size 8{X} } \( E rSub { size 8{2} } \) \) } {}

  • Commutativity of set operations: The Union and Intersection are commutative but Set Difference is not.

E 1 E 2 E 2 E 1 E 1 E 2 E 2 E 1 E 1 E 2 E 2 E 1 E 1 E 2 E 2 E 1 alignl { stack { size 12{E rSub { size 8{1} } union E rSub { size 8{2} } equiv E rSub { size 8{2} } union E rSub { size 8{1} } } {} # E rSub { size 8{1} } intersection E rSub { size 8{2} } equiv E rSub { size 8{2} } intersection E rSub { size 8{1} } {} } } {}

  • Associativity of set operations: Union and Intersection are associative but Set Difference is not

( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) ( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) ( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) ( E 1 E 2 ) E 3 E 1 ( E 2 E 3 ) alignl { stack { size 12{ \( E rSub { size 8{1} } union E rSub { size 8{2} } \) union E rSub { size 8{3} } equiv E rSub { size 8{1} } union \( E rSub { size 8{2} } union E rSub { size 8{3} } \) } {} # \( E rSub { size 8{1} } intersection E rSub { size 8{2} } \) intersection E rSub { size 8{3} } equiv E rSub { size 8{1} } intersection \( E rSub { size 8{2} } intersection E rSub { size 8{3} } \) {} } } {}

  • Converting a Catersian Product followed by a Selection into Join.

If the selection condition corresponds to a join condition we can do the convert as follows:

σ F ( E 1 × E 2 ) E 1 F E 2 σ F ( E 1 × E 2 ) E 1 F E 2 size 12{σ rSub { size 8{F} } \( E rSub { size 8{1} } times E rSub { size 8{2} } \) equiv E rSub { size 8{1} } ⊳ ⊲ rSub { size 8{F} } E rSub { size 8{2} } } {}

2.1.2 Example of Transformation

Consider the following query on COMPANY database: “Find the name of employee born after 1967 who work on a project named ‘Greenlife’ “.

The SQL query is:

SELECT Name

FROM EMPLOYEE E, JOIN J, PROJECT P

WHERE E.EID = J.EID and PCode = Code and Bdate > ’31-12-1967’ and P.Name = ‘Greenlife’;

The initial query tree for this SQL query is

Figure 2: Initial query tree for query in example
Figure 2 (graphics3.png)

We can transform the query in the following steps:

- Using transformation rule number 7 apply on Catersian Product and Selection operations to moves some Selection down the tree. Selection operations can help reduce the size of the temprary relations which involve in Catersian Product.

Figure 3: Move Selection down the tree
Figure 3 (graphics4.png)

- Using rule number 13 to convert the sequence (Selection, Cartesian Product) in to a Join. In this situation, since the Join condition is equality comparison in same attributes so we can convert it to Natural Join.

Figure 4: Combine Catersian Product with subsequent Selection into Join
Figure 4 (graphics5.png)

- Using rule number 8 to moves Projection operations down the tree.

Figure 5: Move projections down the query tree
Figure 5 (graphics6.png)

2.2 Heuristic Algebraic Optimization algorithm

Here are the steps of an algorithm that utilizes equivalence rules to transfrom the query tree.

  • Break up any Selection operation with conjunctive conditions into a cascade of Selection operations. This step is based on equivalence rule number 4.
  • Move selection operations as far down the query tree as possible. This step use the commutativity and associativity of selection as mentioned in equivalence rule number 5,6,7 and 9.
  • Rearrange the leaf nodes of the tree so most restrictive selections are done first. Most restrictive selection is the one that produces the fewest number of tuples. In addition, make sure that the ordering of leaf nodes does not cause the Catersian Product operation. This step relies on the rules of associativity of binary operations such as rule 2 and 12
  • Combine a Cartesian Product with a subsequent Selection operation into a Join operation if the selection condition represents a join condition (rule 13)
  • Break down and move lists of projection down the tree as far as possible. Creating new Projection operations as needed (rule 3, 6, 8, 10)
  • Identify sub-tree that present groups of operations that can be pipelined and executing them using pipelining

The previous example illustrates the transforming of a query tree using this algorithm.

2.3 Converting query tree to query evaluation plan

Query optimizers use the above equivalence rules to generate a enumeration of logically equivalent expressions to the given query expression. However, expression generating is just one part of the optimization process. As mentioned above, the evaluation plan include the detail algorithm for each operation in the expression and how the execution of the operations is coordinated. The figure 6 shows an evaluation plan.

Figure 6: An evaluation plan
Figure 6 (graphics7.png)

As we know, the output of Parsing and Translating step in the query processing is a relational algebra expression. For a complex query, this expression consists of several operations and interact with various relations. Thus the evaluation of the expression is very costly in terms of both time and memory space. Now we consider how to evaluate an expression containing multiple operations. The obvious way to evaluate the expression is simply evaluate one operations at a time in an appropriate order. The result of an individual evaluation will be stored in a temporary relation, which must be written to disk and might be use as the input for the following evaluation. Another approach is evaluate several operations simultaneously in a pipeline, in which result of one operation passed on to the next one, no temporary relation is created.

These two approaches for evaluating expression are materialization and pipelining.

Materialization

We will illustrate how to evaluate an expression using materialization approach by looking at an example expression.

Consider the expression

π Name , Address ( σ Dname = ' Human Re sourse ' ( DEPARTMENT ) EMPLOYEE ) π Name , Address ( σ Dname = ' Human Re sourse ' ( DEPARTMENT ) EMPLOYEE ) size 12{π rSub { size 8{ ital "Name", ital "Address"} } \( σ rSub { size 8{ ital "Dname"=' ital "Human""Re" ital "sourse"'} } \( ital "DEPARTMENT" \) * ital "EMPLOYEE" \) } {}

The corresponding query tree is

Figure 7: Sample query tree for a relational algebra expression
Figure 7 (graphics2.png)

When we apply the materialization approach, we start from the lowest-level operations in the expression. In our example, there is only one such operation: the SELECTION on DEPARTMENT. We execute this opeartion using the algorithm for it for example Retriving multiple records using a secondary index on DName. The result is stored in the temporary relations. We can use these temporary relation to execute the operation at the next level up in the tree . So in our example the inputs of the join operation are EMPLOYEE relation and temporary relation which is just created. Now, evaluate the JOIN operation, generating another temporary relation. The execution terminates when the root node is executed and produces the result relation for the query. The root node in this example is the PROJECTION applied on the temporary relation produced by executing the join.

Using materialized evaluation, every immediate operation produce a temporary relation which then used for evaluation of higher level operations. Those temporary relations vary in size and might have to stored in the disk . Thus, the cost for this evaluation is the sum of the costs of all operation plus the cost of writing/reading the result of intermediate results to disk (if it is applicable) .

Pipelining

We can improve query evaluation efficiency by reducing the number of temporary relations that are produced. To archieve this reduction, it is common to combining several operations into a pipeline of operations. For illustrate this idea, consider our example, rather being implemented seperately, the JOIN can be combined with the SELECTION on DEPARTMENT and the EMPLOYEE relation and the final PROJECTION operation. When the select operation generates a tuple of its result, that tuple is passed immediately along with a tuple from EMPLOYEE relation to the join. The join receive two tuples as input, process them , if a result tuple is generated by the join, that tuple again is passed immediately to the projection operation process to get a tuple in the final result relation.

We can implement the pipeline by create a query execution code

Using pipelining in this situation can reduce the number of temporary files, thus reduce the cost of query evaluation. And we can see that in general, if pipelining is applicable, the cost of the two approaches can differ substantially. But there are cases where only materialization is feasible.

3. Cost Estimates in Query Optimization

Typically, a query optimizer is not only depended on heuristic rules but also on estimating and compare the cost of executing different plans then choose the query execution plans with lowest cost.

3.1 Measure of Query Cost

The cost of a query execution plan includes the following components:

  • Access cost to secondary storage: This is the cost of searching for, reading, writing data blocks of secondary storage such as disk.
  • Computation cost: This is the cost of performing in-memory operation on the data buffer during execution. This can be considered as CPU time to execute a query
  • Storage cost: This is the cost of storing immediate files that are generated during execution
  • Communication cost: This is the cost of transfering the query and its result from site to site ( in a distributed or parallel database system)
  • Memory usage cost: Number of buffers needed during execution.

In a large database, access cost is usually the most important cose since disk accesses are slow compared to in-memory operations.

In a small database, when almost data reside in the memory, the emphasis is on computation cost. In the distributed system, communication cost should be minimize.

It is difficult to include all the cost components in a cost function. Therefore, some cost functions consider only disk access cost as the reasonable measure of the cost of a query-evaluation plan.

3.2 Catalog Information for Cost Estimation

Query optimizers use the statistic information stored in DBMS catalog to estimate the cost of a plan. The relevant catalog information about the relation includes:

  • Number of tuples in a relation r; denote by nr
  • Number of blocks containing tuple of relation r: br
  • Size of the tuple in a relation r ( assume records in a file are all of same types): s­r
  • Blocking factor of relation r which is the number of tuples that fit into one block: fr
  • V(A,r) is the number of distinct value of an attribute A in a relation r. This value is the same as size of πA(r)πA(r) size 12{π rSub { size 8{A} } \( r \) } {}. If A is a key attribute then V(A,r) = nr
  • SC(A,r) is the selection cardinality of attribute A of relation r. This is the average number of records that satisfy an equality condition on attribute A.

In addition to relation information, some information about indices is also used:

  • Number of levels in index i.
  • Number of lowest –level index blocks in index i ( number of blocks in leaf level of the index)

The statistical information listed here is simplified. The optimizer on real database management system might have further information to improve the accuracy of their cost estimates.

With the statistical information maintained in DBMS catalog and the measures of query cost based on number of disk accesses, we can estimate the cost for different relational algebra operations. In here, we will give an simple example of using cost model to estimate the cost for selection operation. However, we are not intend to go into detail of this issue in this course, please refer to the textbook and reference books if you want to go deeper in this issue.

Example of cost functions for SELECTION

Consider a selection operation on a relation whose tuples are all stored in one file. The simplest algorithms to implement seletion are linear search and binary search.

  • Linear search: Scan all file blocks, all records in a block are checked to see whether they satisfy the search condition. In general, cost for this method is C=br. For a selection on a key attribute, half of the blocks are scanned on average, so C = (br/2)
  • Binary search: If the file is ordered on an attribute A and selection condition is a equality comparison on A, we can use binary search. The estimate number of blocks to be scan is C=log2(br)+SC(A,r)fr1C=log2(br)+SC(A,r)fr1 size 12{C= lceil "log" rSub { size 8{2} } \( b rSub { size 8{r} } \) rceil + lceil { { ital "SC" \( A,r \) } over { ital "fr"} } rceil - 1} {}

The first term is the cost to locate the first satisfied tuple by a binary search, the second term is the number of blocks contains records that satisfy the select condition of which one has already been retrieved that why we have the third term

Now, consider a selection in EMPLOYEE file

σ DeptId = 1 ( EMPLOYEE ) σ DeptId = 1 ( EMPLOYEE ) size 12{σ rSub { size 8{ ital "DeptId"=1} } \( ital "EMPLOYEE" \) } {}

The file EMPLOYEE has the following statistical information:

  • f = 20 (there are 20 tuples can fit in one block)
  • V(DeptID, EMPLOYEE) = 10 (there are 10 different departments)
  • n = 1000 ( there are 1000 tuples in the file)

Cost for doing linear search is b = 1000/20 = 50 block accesses

Cost for doing binary search on ordering attribute DeptID:

  • Average number of records that satisfy the condition is : 1000/10 = 100 records
  • Number of blocks contains these tuples is: 100/20 = 5
  • A binary search for the first tuple would take log250 = 6
  • Thus the total cost is : 5 + 6 – 1 = 10 block accesses

Content actions

Download module as:

PDF | EPUB (?)

What is an EPUB file?

EPUB is an electronic book format that can be read on a variety of mobile devices.

Downloading to a reading device

For detailed instructions on how to download this content's EPUB to your specific device, click the "(?)" link.

| More downloads ...

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens

Lenses

A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks