<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE document PUBLIC "-//CNX//DTD CNXML 0.5 plus MathML//EN" "http://cnx.rice.edu/cnxml/0.5/DTD/cnxml_mathml.dtd">
<document xmlns="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml/0.4" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:bib="http://bibtexml.sf.net/" id="id9702365">
  <name>Introduction to SQL</name>
  <metadata>
  <md:version>1.1</md:version>
  <md:created>2008/07/01 16:20:51.610 GMT-5</md:created>
  <md:revised>2008/07/01 18:31:24.984 GMT-5</md:revised>
  <md:authorlist>
      <md:author id="okaram">
      <md:firstname>Orlando</md:firstname>
      
      <md:surname>Karam</md:surname>
      <md:email>okaram@spsu.edu</md:email>
    </md:author>
  </md:authorlist>

  <md:maintainerlist>
    <md:maintainer id="okaram">
      <md:firstname>Orlando</md:firstname>
      
      <md:surname>Karam</md:surname>
      <md:email>okaram@spsu.edu</md:email>
    </md:maintainer>
  </md:maintainerlist>
  
  <md:keywordlist>
    <md:keyword>SQL, Relational Databases</md:keyword>
  </md:keywordlist>

  <md:abstract>This is an introduction to SQL (Structured Query Language), the language used by most relational database management systems</md:abstract>
</metadata>
  <content>
    <section id="id-0157427043498">
      <name>Introduction to SQL</name>
      <para id="id9482574">SQL stands for Structured Query Language.. It is the standard language for relational databases. Basically every relational DBMS (RDBMS) implements SQL, although with slight variations.</para>
      <para id="id10329340">SQL can be divided into three subsets:</para>
      <list type="bulleted" id="id10789569">
        <item>The Data Definition Language (DDL), which includes commands for creating schema objects (tables, views etc) and manipulating them. It deals only with the metadata.</item>
        <item>The Data Manipulation Language (DML), which includes the commands for changing or retrieving the actual data.</item>
        <item>The Data Control Language (DCL), which includes the commands for database administration; that is, creating users, assigning permissions etc</item>
      </list>
      <para id="id10270823">SQL commands are English based. SQL is NOT case sensitive, except for data inside quotes; however there is a strong stylistic convention that requires writing keywords in all uppercase. Character constants are enclosed in SINGLE QUOTES, while double quotes are reserved for spelling names which contain special characters. </para>
      <para id="id11086093">Although SQL statements do NOT need to end in a semicolon, most SQL interpreters use this character (;) to represent the end of a statement. That way you can use line breaks to make your statement more readable. SQL ignores extra white spaces and line breaks.</para>
      <section id="id-286418236152">
        <name>Data Definition Language</name>
        <section id="id-553353161824">
          <name>CREATE TABLE statement</name>
          <para id="id10542880">The command to create a new table is CREATE TABLE; after the command we include the table name, and then a list of fields and constraints, separated by commas. For each field we include the field name, the data type, and any column constraints, that is constraints that affect only one column or field.</para>
          <para id="id9934052">For example, a simple table with three fields would look like this:</para>
          <example id="element-178"><name>Simple CREATE TABLE statement</name><code type="block">CREATE TABLE Book (
	ISBN  CHAR(9)     PRIMARY KEY,
	Title VARCHAR(20) UNIQUE NOT NULL,
	Pages Integer
);
</code>
</example>
          <para id="element-876">In this example, ISBN is a character field, with exactly 9 characters, and is the primary key. Title is a character field with up to 20 characters, which is unique (no two rows have the same value) and cannot be null, and pages is an integer. Notice ISBN has a column constraint, specifying that it is the primary key, and title has two constraints, UNIQUE and NOT NULL.</para><section id="id-0395664531452">
            <name>Common Datatypes</name>
            <para id="id11473899">Although the full set of datatypes available varies with each DBMS, the following datatypes are among the most useful and commonly available:</para>
            <list type="bulleted" id="id10827884">
              <item>CHAR(n) This is a fixed-length character string. If you insert a value with less than the full number of characters, it gets 'padded' with spaces.</item>
              <item>VARCHAR(n) This is a variable-length character string, with a maximum size specified.</item>
              <item>NUMERIC(prec,scale) This is a fixed-point number. The first number (the precision) specifies the TOTAL number of digits available, while the second (the scale) specifies how many of those are after the decimal period.</item>
              <item>DATE, TIMESTAMP This are data types for representing moments in time. Depending on the DBMS these may be synonyms. Normally date constants are denoted by a string (enclosed by quotes), but the particular format of the string depends on the locale and the DBMS. Oracle (and other DBMSs) support a function called to_date that takes two parameters, the string for the date and the format, such that to_date('01/23/2008','mm/dd/yyyy') would return the date for January 23, 2008.</item>
            </list>
          </section>
          <section id="id-390231143682">
            <name>Constraints</name>
            <para id="id6559631">In SQL we can attach constraints to each column or field, and we can also attach constraints to the whole table. The table constraints are added after all fields in the CREATE TABLE statement. For example, we can write the book table from Example 1, as follows:</para>
            <example id="element-235"><name>CREATE TABLE statement with table constraints</name><code type="block">
CREATE TABLE Book (
	ISBN  CHAR(9)     ,
	Title VARCHAR(20) NOT NULL,
	Pages Integer,
	CONSTRAINT Book_PK PRIMARY KEY(Isbn),
	CONSTRAINT Book_title UNIQUE (Title)
);
</code></example><para id="id10981355">Notice that this notation allows us to give a name to each constraint (Book_PK and Book_title in our case), and also that NOT NULL has to be expressed as a column constraint.</para>
            <para id="id9052361">SQL constraints include the following:</para>
            <list type="bulleted" id="id10190887">
              <item>NOT NULL</item>
              <item>PRIMARY KEY</item>
              <item>UNIQUE</item>
              <item>FOREIGN KEY ... REFERENCES this constraint allows us to express a foreign key.</item>
              <item>CHECK This constraint allows us to add an arbitrary predicate to be tested </item>
            </list>
            <para id="id10766169">We can also assign a default value to a column, by adding the keyword DEFAULT and then the value after the column, as if it was a constraint.</para>
            <para id="id10036406">A more complete example for a SQL create table follows:</para>
            <example id="element-361"><name>Another example of constraints in a CREATE TABLE</name><code type="block">CREATE TABLE Person (
	Id     INTEGER     PRIMARY KEY,
	SSN    CHAR(9)     UNIQUE,
	Name   VARCHAR(20) NOT NULL,
	Age    INTEGER     DEFAULT 18 
                         CHECK (Age BETWEEN 10 AND 100),
	Major  CHAR(3) REFERENCES Major(Id)
);</code>
</example><para id="id11009859">If we want to do composite primary keys, and composite foreign key references, we need to express them as table constraints, since they constrain more than one column, the syntax is as in the following example:</para><example id="element-777"><name>Composite primary key and foreign key references</name><code type="block">
CREATE TABLE Standing (
  deg_code      char(2) REFERENCES Degree(deg_code),
  min_cr        INTEGER DEFAULT 0 NOT NULL,
  max_cr        INTEGER NOT NULL,
  num           INTEGER NOT NULL,
  designation   VARCHAR(20) NOT NULL,
  CONSTRAINT Standing_PK
        PRIMARY KEY (deg_code, num),
  CONSTRAINT Standing_Unique_Designation
        UNIQUE (deg_code, designation),
  CONSTRAINT Standing_min_max
        CHECK (min_cr &lt;= max_cr)
);
</code></example>
          </section>
        </section>
        
        <section id="id-476927651692">
          <name>Other DDL statements</name>
          <para id="id11145331">Other DDL statements include:</para>
          <list type="bulleted" id="id10016722">
            <item>DROP TABLE that allows us to completely eliminate a table (not just the data, but also the schema)</item>
            <item>ALTER TABLE that allows us to modify the schema of a table </item>
          </list>
          <para id="id10689394">We also can create other kinds of schema objects, including views and indexes. In general, to create a schema object of a given kind, we will use CREATE and then the kind of object (TABLE,VIEW, INDEX etc), the name, and then syntax specific for that kind of object. To eliminate that object we use DROP, then the type (TABLE, VIEW, INDEX ...) and the name of the object.</para>
        </section>
      </section>
      <section id="id-645509330007">
        <name>Data Manipulation Language (DML)</name>
        <para id="id6892940">SQL's DML includes statements to do alter the rows in a table, and to get data from one or more tables. Notice that this statements do NOT alter the schema at all, only the data in the table. The main statements are:</para>
        <list type="bulleted" id="id11446608">
          <item>INSERT INTO This statement allows you to insert one or more rows into a table</item>
          <item>DELETE FROM This statement allows you to delete one or more rows from a table</item>
          <item>UPDATE This statement allows you to change one or more fields from one or more rows</item>
          <item>SELECT This statement allows you to get data from one or more tables. It does NOT change the data in any way. </item>
        </list>
        <para id="id11466276">Since the SELECT statement is more complex, we devote a separate section to it, after covering the other DML statements.</para>
        <section id="id-0463837222688">
          <name>INSERT INTO</name>
          <para id="id11466556">This statement allows you to add a row to a table. Its basic syntax is as follows:</para>
          <example id="element-919"><name>INSERT INTO syntax</name><code type="block">INSERT INTO tableName (field1,field2,...) VALUES (value1, v2,...)</code></example><para id="id10028002">It is not necessary to list all fields, but it is a good idea. If you don't list fields, then it is assumed that the values correspond to all fields in the table in the order they were defined (which means that if the schema changes, your statement will probably result in a syntax error).</para>
          <para id="id10737291">As an example, if we were to insert one row into the Person table of Example 3, the statement would look like this:</para><example id="element-633"><name> INSERT INTO example. Notice the use of single quotes for string constants</name><code type="block">
INSERT INTO Person (Id,SSN,Name,Age,Major) VALUES (1, '11111','Orlando',30,'CS')
</code></example>
        </section>
        <section id="id-41273056187">
          
          <name> DELETE FROM</name><para id="id10073293">The DELETE FROM statement allows you to delete one or more rows from a table. If it includes a WHERE clause, it deletes only those rows that satisfy the predicate, if there is no WHERE clause then all rows will be deleted. Notice this statement does NOT alter the schema of the table, even if it deletes all the rows.</para>
          <para id="id10872540">Following is an example for deleting all people who are younger than 20 from the person table.</para><example id="element-472"><name>DELETE all rows from the person table that have a value smaller than 20 for the age field </name><code type="block">DELETE FROM Person WHERE Age &lt; 20</code></example>
        </section>
        <section id="id-479037983556">
          <name>UPDATE</name>
          <para id="id10224873">The UPDATE statement allows you to change one or more fields from one or more rows of a table. It modifies all rows or just some of them depending on the WHERE clause.</para>
          <example id="element-674"><name>Increase by 1 the ages of all people</name><code type="block">UPDATE Person
SET Age=Age+1;
</code>

</example><example id="element-184"><name>Change some fields of just the row with id 1</name><code type="block">UPDATE Person
SET Name='Orlando', Age=35
WHERE Id=1;</code></example>
        </section>
        <section id="id-939300603407">
          <name>Transactions</name>
          <para id="id8446655">Relational DBMSs support the concept of a transaction, to group a set of statements that should occur together. A transaction can be committed (with its effects recorded permanently) with the COMMIT statement, or rolled back (with the effects completely undone), with the ROLL BACK statement.</para>
        </section>
      </section>
      <section id="id-443565626653">
        <name>The SELECT Statement</name>
        <para id="id10982166">The SELECT statement allows you to get data from one or more tables. It does NOT change the data stored in any way, it just allows you to read it. The SELECT Statement is probably the most complex SQL statement, since it needs to allow you for complex conditions and to get data from several tables. </para>
        <para id="id8472781">In its basic form, the SELECT statement contains three clauses, although the WHERE clause is actually optional. After the SELECT keyword, we specify the fields (or expressions) we want to retrieve; then we add the FROM keyword and the table from which we are getting the data, and finally a WHERE clause that specifies which rows are to be returned.</para>
        <para id="id10911923"><table id="id6060347"><name>Table 1: Sample data for Person table for simple select examples</name><tgroup cols="5"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><colspec colnum="3" colname="c3"/><colspec colnum="4" colname="c4"/><colspec colnum="5" colname="c5"/><tbody><row><entry>Id</entry><entry>Name</entry><entry>Age</entry><entry>Gender</entry><entry>Country</entry></row><row><entry>1</entry><entry>Orlando</entry><entry>35</entry><entry>M</entry><entry>us</entry></row><row><entry>2</entry><entry>Lina</entry><entry>25</entry><entry>F</entry><entry>mx</entry></row><row><entry>3</entry><entry>Jose</entry><entry>45</entry><entry>M</entry><entry>us</entry></row><row><entry>4</entry><entry>Krishnapriya</entry><entry>27</entry><entry>F</entry><entry>in</entry></row></tbody></tgroup></table>Table 1: Sample data for Person table for simple select examplesIn order to illustrate the SELECT statement, we will use a table called Person containing five fields, and the sample data as in the following table (Table 1). We record the person's id, name, age, gender and the country they were born in (using two-letter ISO country codes for the countries, so us is the United States of America, mx is Mexico, and in is India)</para>
        <para id="id11273732">The simplest select statement uses * instead of a list of attributes (retrieving ALL columns from the table) and does not use a WHERE clause (retrieving ALL rows), so the statement to list all data in the person table would be:</para>
        
        <example id="element-356"><name>Retrieving all data from Person table</name><code type="block">SELECT *
FROM Person</code></example><para id="id8735374">We can also specify a list of fields, or even expressions. When using expressions, it is often convenient to give the expression a nice name, so SQL allows us to alias a field or expression, by writing the keyword AS and then alias after the field name or expression (actually, the AS keyword is optional in most DBMSs, although PostgreSQL requires it).</para>
        <para id="id8735378">So the following would be a valid example:</para>
        
        <example id="element-614"><name> Retrieving all data from Person table</name><code type="block">
SELECT Id AS Identifier, Name, Age+5 AS AgeIn5 FROM Person
</code></example><para id="element-746">and it would return data as follows:</para><para id="id11439889"><table id="id8597603"><name>Table 2: Sample result from Example above, notice change on column names</name><tgroup cols="3"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><colspec colnum="3" colname="c3"/><tbody><row><entry>Identifier</entry><entry>Name</entry><entry>AgeIn5</entry></row><row><entry>1</entry><entry>Orlando</entry><entry>40</entry></row><row><entry>2</entry><entry>Lina</entry><entry>30</entry></row><row><entry>3</entry><entry>Jose</entry><entry>50</entry></row><row><entry>4</entry><entry>Krishnapriya</entry><entry>32</entry></row></tbody></tgroup></table>
</para>
        <para id="element-527">Notice that the last field retrieved uses an expression, Age+5. Most DBMSs support the standard math operators, many of the basic math functions (floor, sin, cos etc), and many string functions (upper, lower, substr etc); consult your DBMS's manual.</para><para id="id10820703">SQL statements become slightly more interesting when we add conditions, which allow us to retrieve only certain rows, rather than the whole table. We can add conditions with the WHERE clause, and can use the traditional comparison operators (&lt;, &lt;= etc) and combine them with the traditional logical operators (AND, OR, NOT). Differently from most programming languages, SQL uses the actual words AND OR and NOT for those operators. The precedence of the operators is the usual (NOT, then AND, finally OR), and you can use parenthesis to alter that precedence.</para>
        <para id="id10820706">So, a query that would return only females who are older than 20 would be:</para>
        
        <example id="element-987"><name>Retrieving Females who are older than 20</name><code type="block">SELECT *FROM Person
WHERE Gender='F' AND Age&gt;20</code></example><section id="id-268582211967">
          <name>Aggregate functions</name>
          <para id="id9493873">SQL also allows you to use functions that compute a result based on several rows; we call those aggregate functions. SQL 92 supports the functions COUNT, SUM, AVG, MAX and MIN, with the meaning suggested by their name.</para>
          <para id="id10145339">For example, we can get the number of rows in the Person table by writing:</para>
          
          <example id="element-768"><name>Retrieving number of rows from person table</name><code type="block">SELECT COUNT(*)
FROM Person;</code></example><para id="id6617029">We can, of course, use a WHERE clause to control wich rows gets counted, or included in the aggregate. For example, if we wanted to know how many females there are we would write:</para>
          
          <example id="element-224"><name>Retrieving number of females from person table</name><code type="block">SELECT COUNT(*)
FROM Person
WHERE Gender='F';</code></example><para id="id10706565">Now, what if we need to get the number of people per gender ? Given that there are only two genders, maybe we could just write two separate queries, one for females and one for males, and put together the results in a piece of paper; but what if we need the number of people per country of origin ? We will probably have many countries, plus our set of queries will only be valid for a particular moment in time, since the set of countries may change.</para>
          <para id="id11260986">SQL allows computing aggregates over groups of rows, where the rows in a table are divided into groups based on some field or expressions over those fields; we do that by using the aggregates, and adding a GROUP BY clause to our SELECT statement. For example, the following query:</para>
          
          <example id="element-183"><name>Retrieving number of people per country</name><code type="block">SELECT Country, COUNT(*) as NumPeople, AVG(Age)
FROM Person 
GROUP BY Country;</code></example><para id="id10768122">Would retrieve a table containing all countries in the person table, with the number of people per country and their average age. All rows in the person table are grouped according to their country, and the aggregate (count) is calculated for each group. Conceptually, we can think of the table being sorted on the fields mentioned in the GROUP BY clause, and then the totals calculated per group.</para>
          <para id="id9727559">Conceptually, we can view the sample data organized as in Table 3 (each group of rows is marked in a different color):</para>
          <para id="id11421729"><table id="id11285959"><name>Table 3: Sample data for Person table grouped according to country (same as table 1, resorted)</name><tgroup cols="5"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><colspec colnum="3" colname="c3"/><colspec colnum="4" colname="c4"/><colspec colnum="5" colname="c5"/><tbody><row><entry>Id</entry><entry>Name</entry><entry>Age</entry><entry>Gender</entry><entry>Country</entry></row><row><entry>1</entry><entry>Orlando</entry><entry>35</entry><entry>M</entry><entry>us</entry></row><row><entry>3</entry><entry>Jose</entry><entry>45</entry><entry>M</entry><entry>us</entry></row><row><entry>2</entry><entry>Lina</entry><entry>25</entry><entry>F</entry><entry>mx</entry></row><row><entry>4</entry><entry>Krishnapriya</entry><entry>27</entry><entry>F</entry><entry>in</entry></row></tbody></tgroup></table></para>
          <para id="id11012218">And then the totals calculated per group, yielding the result in table 4:</para>
          <para id="id11084624"><table id="id10043818"><name>Table 4: Result for executing example 15 on the sample data of table 3</name><tgroup cols="3"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><colspec colnum="3" colname="c3"/><tbody><row><entry>Country</entry><entry>NumPepole</entry><entry>AvgAge</entry></row><row><entry>us</entry><entry>2</entry><entry>40</entry></row><row><entry>mx</entry><entry>1</entry><entry>25</entry></row><row><entry>in</entry><entry>1</entry><entry>27</entry></row></tbody></tgroup></table></para>
          <para id="id10726517">Notice that every field retrieved by the SELECT needs to either be an aggregate, or be mentioned in the GROUP BY expression. This is because if we tried to retrieve a normal field without grouping by it, the different rows in the group may have a different value for that field !</para>
          <para id="id9679114">Also, sometimes we want to restrict the results of the query based on some condition calculated after the grouping, say get only countries with more than one person; we can use the HAVING clause to do that; for example, to modify the above query to only return countries with more than one person we would write:</para>

<example id="element-913">
  <name>Retrieving number of people per country, only countries with more than one person</name>
<code type="block">
SELECT Country, COUNT(*) as NumPeople, AVG(Age)
FROM Person
GROUP BY Country
HAVING COUNT(*) &gt;= 1
</code>
</example>
          
        </section>
      </section>
      <section id="id-315692215975">
        <name>JOINS</name>
        <para id="id10123511">Up to now, we have only seen how to get results from a single table; but we oftentimes need to combine information from several tables; in fact, good relational design requires us to divide our information into many tables, and to get useful information we need to combine those tables.</para>
        <para id="id9682485">The main operation to combine information from several tables is the join operation. We can think of a join as a combination of a cartesian product of two tables (which gets us all pairs of rows, with one coming from each table) followed by a selection process (which allows us to get only those pairs that 'match' according to some predicate).</para>
        <para id="id10827244">SQL supports several ways to do joins. The simplest way, which I call an implicit join is to put two or more tables in the FROM clause, separated by commas. The join condition (that is, the predicate that decides which rows match) is added to the WHERE clause.</para>
        <para id="id10552136">For example, assume we have another table, called country, that gives us the name of the country given their iso code (we have an extra country that we'll use later):</para>
        <para id="id7826074"><table id="id11288756"><name>Table 5: Sample data for Country table</name><tgroup cols="2"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><tbody><row><entry>Code</entry><entry>Name</entry></row><row><entry>us</entry><entry>United States of America</entry></row><row><entry>mx</entry><entry>Mexico</entry></row><row><entry>in</entry><entry>India</entry></row><row><entry>cn</entry><entry>People's republic of China</entry></row></tbody></tgroup></table>Table 5: Sample data for Country table</para>
        <para id="id10227411">So, if we want to get the names of all people with the name of the country they were born in, we can use the following query:</para>

<example id="ok17">
<name> Retrieve names of all people with their country's name</name>
<code type="block">
SELECT Person.Name, Country.Name
FROM Person, Country
WHERE Person.Country=Country.Code
</code>
</example>
<para id="okp1">
Notice how we need to use Person.Name and Country.Name in the first line, to disambiguate to which field we are referring to. For clarity, we also do it on the third line, but it is not necessary there (since there are no two fields called Country or Code). To save on typing but keep the clarity, I usually alias the tables to their initial letter, so the query looks like this:
</para>
<example id="ok1">
<name>Retrieve names of all people with their country's name, aliasing tables</name>
<code type="block">
SELECT P.Name, C.Name
FROM Person P, Country C
WHERE P.Country=C.Code
</code>
</example>
        <para id="id11251255">For a simple query like this, an implicit join is appropriate; however, when you are joining several tables, and have many conditions, some arising from the join and others from other requirements, the implicit join syntax can easily lead to mistakes. </para>
        <para id="id10609246">SQL provides another syntax for join statements, which can make the intent easier. Since this syntax includes the JOIN keyword, I call this an explicit join statement. The syntax involves writing the table names, with the keyword JOIN between them, followed by the keyword ON and a condition between parenthesis; for example,</para>
        <para id="id8780538">SELECT Person.Name, Country.NameFROM Person JOIN Country ON (Person.Country=Country.Code)Example 19: Retrieve names of all people with their country's name</para>
        <section id="id-00677962615549">
          <name>NATURAL joins and USING</name>
          <para id="id9457074">Most join conditions involve the equality predicate. SQL supports the concept of a NATURAL JOIN, which is a join in which the join condition is created implicitly, by requiring equality of all fields with the same name in both tables. </para>
          <para id="id8028077">In order to illustrate the syntax, if we want to get all people with the same name as their country (hey, it's my example :) we would type the following query:</para>
          
          <example id="element-603"><name>Retrieve info of all people with the same name as their country</name><code type="block">
SELECT *
FROM Person NATURAL JOIN Country 
</code></example><para id="element-354">I consider natural joins too brittle to use in real life applications; since adding a field to the table may silently change the results of the query. SQL also supports a JOIN ... USING syntax that is much better.  Rather than writing ON and the join condition, we write USING and then a list of fields which must match; so the above query with USING would look like:</para><example id="element-493"><name>Retrieve info of all people with the same name as their country</name><code type="block">SELECT *
FROM Person JOIN Country USING (Name)</code>
</example>
        </section>
        <section id="id-202756709938">
          <name>OUTER JOIN</name>
          <para id="id9055545">Many times we want to make sure ALL rows from a certain table appear in a join, even when there is no corresponding join on the other table. We can achieve this by using an OUTER JOIN. We need to specify whether we want a LEFT (that is, all the rows from the table on the left appear), RIGHT or FULL (rows from both tables re guaranteed to appear) OUTER JOIN.</para>
          <para id="id8975544">For example, if we want to get the name of each person with the name of their country, and we want to make sure ALL people appear, including those with no country in the database (so the country would be null) we need to do an OUTER join; as follows:</para>
          <example id="element-855"><name>Retrieve names of all people with their country's name, making sure ALL people appear</name><code type="block">
SELECT P.Name, C.NameFROM Person P LEFT OUTER JOIN Country C ON (P.Country=C.Code)
</code></example><para id="id10341596">Notice in this case we are using a LEFT outer join, since the Person table is to the left of the JOIN keyword and we want all rows in that table to appear. If we wanted to make sure all countries appear, we would use a RIGHT outer join in this case; and if we want to make sure both all people and all countries appear we would use a FULL outer join.</para>
        </section>
      </section>
      <section id="id-0214602026349">
        <name>Subqueries</name>
        <para id="id9541634">In most places where we use a table name or an expression that returns a list, SQL allows us to substitute another SQL query (a subquery). We can use subqueries in FROM clauses, or instead of a list, for operators that accept a list (IN, EXISTS, =ANY etc)</para>
        <para id="id10937304">For example, imagine that we add a new table, HasLived, that reflects the fact that a person has lived in a country. The table with sample data could look like Table x</para>
        <para id="id9178434"><table id="id7337596"><name>Table 6: Sample data for HasLived table</name><tgroup cols="2"><colspec colnum="1" colname="c1"/><colspec colnum="2" colname="c2"/><tbody><row><entry>Person</entry><entry>Country</entry></row><row><entry>1</entry><entry>mx</entry></row><row><entry>1</entry><entry>us</entry></row><row><entry>2</entry><entry>mx</entry></row><row><entry>3</entry><entry>us</entry></row><row><entry>4</entry><entry>in</entry></row><row><entry>4</entry><entry>us</entry></row></tbody></tgroup></table></para>
        <para id="element-281">From this and the previous tables, we can see that Orlando (with id 1) has lived in Mexico and the USA, that Lina has lived in Mexico etc</para><para id="id11473489">Now if we are asked to find the names of people who have lived in the country with id 'us' (namely the USA); the query would look like:</para>
        <example id="element-423"><name>Names of people who've lived in the country with id 'us'</name><code type="block">SELECT P.Name
FROM Person P JOIN HasLived Hl ON (P.Id=Hl.Person)
WHERE Hl.country = 'us'</code>
</example><para id="id10608079">But imagine that we are asked for the names of people who have NOT lived in the 'us'; we could be tempted to modify the condition in the WHERE clause above, as follows:</para>
        <example id="element-335"><name>(wrong) Names of people who've NOT lived in the country with id 'us'.</name><code type="block">SELECT P.Name
FROM Person P JOIN HasLived Hl ON (P.Id=Hl.Person)
WHERE Hl.country != 'us'</code>
</example><para id="id8735214">This query actually returns the names of people who've lived somewhere other than in 'us'The problem is that this query actually asks for the people who have lived in a country other than 'us', which, given that there may be more that one row per person (i.e. a person may have lived in more than one place), it is NOT an equivalent query. </para>
        <para id="id10563812">In our example, the only person who has not lived in 'us' is Lina (person with id 2); but the above query would also return the people with id 1 and 4, since they've lived in other countries. </para>
        <para id="id9320012">The right solution would look like:</para>
        <example id="element-107"><name>Names of people who've NOT lived in the country with id 'us'</name><code type="block">SELECT P.Name
FROM Person P WHERE Id NOT IN (
    SELECT Person
    FROM HasLived
    WHERE Country='us'
)
</code>
</example><para id="id10522065">Here the inner query selects the id's of people who have lived in 'us' and the main query selects all people except those.</para>
        <para id="id4598728">As another example, say we're asked to find people who have lived in both the country with id 'mx' and the one with id 'us'; again a naïve (and wrong) attempt would be:</para>
        <example id="element-134"><name>(wrong) Names of people who've lived in both 'us' and 'mx</name><code type="block">SELECT P.Name
FROM Person P JOIN HasLived Hl ON (P.Id=Hl.Person)
WHERE Hl.country = 'us' AND Hl.country='mx'
</code>
</example><para id="id10730268">This query would actually return no rows at all. The condition in the where clause, is tested for the same row, so we are actually requiring that the same country field be 'us' and 'mx', which is obviously impossible, since a field can only have one value for a given row.</para>
        <para id="id9896638">The correct query would use IN, as follows:</para>
<example id="element-759">
<name>Names of people who've lived in both 'us' and 'mx</name>
<code type="block">
SELECT P.NameFROM Person P WHERE Id IN (
    SELECT PersonFROM HasLivedWHERE Country='us') 
AND Id IN (
    SELECT PersonFROM HasLivedWHERE Country='mx');
</code>
</example>
        
        
      </section>
      
    </section>
  </content>
</document>
