Friday, October 12, 2012

SQL set operators


SQL set operators


1 Introduction

SQL set operators allows combine results from two or more SELECT statements. At first sight this looks similar to SQL joins although there is big difference. SQL joins tends to combine columns i.e. with each additionally joined table it is possible to select more and more columns. SQL set operators on the other hand combine rows from different queries with strong preconditions - all involved SELECTS must:
  • retrieve the same number of columns and
  • the data types of corresponding columns in each involved SELECT must be compatible (either the same or with possibility implicitly convert to the data types of the first SELECT statement).
Visually the difference can be explained as follows - joins tend to extend breadthways, but set operations in depth.
SQL join example
SQL set operator example
NB 1! All examples are created for Oracle database and written according to Oracle syntax. However it doesn't matter what database management system is used, many of them with (probably) very little modifications or even exactly the same can be used for every other DBMS supporting set operators. Exactly why they work or why not are described for OracleSQL Server and MySQL. If you need to use them for other DBMSes then you should check these examples yourself although I would be very pleased if you'd send me information what examples are not working on what DBMSes. I will include this info here along with your name.

Contents

1 Introduction
2 Set operator types and syntax
   2.1 Common facts to remember
   2.2 Used tables for examples
   2.3 UNION [DISTINCT] and UNION ALL
   2.4 EXCEPT [DISTINCT] and EXCEPT ALL
   2.5 INTERSECT [DISTINCT] and INTERSECT ALL
   2.6 Raising it to higher levels - are two table data equal?
3 Example usage for various DBMSes
   3.1 Oracle
   3.2 Microsoft SQL Server
   3.3 MySQL
   3.4 IBM DB2
4 References and more information

2 Set operator types and syntax

According to SQL Standard there are following Set operator types:
  • UNION [DISTINCT];
  • UNION ALL;
  • EXCEPT [DISTINCT];
  • EXCEPT ALL;
  • INTERSECT [DISTINCT];
  • INTERSECT ALL.
As we can see there are 3 basic types Union, Except and Intersect and all have 2 modifications either Distinct or All. SQL Standard does not enforce keyword Distinct and some DBMSes for example Oracle and SQL Server even do not allow it, therefore if you see just Union, Except or Intersect - these actually mean Union Distinct, Except Distinct and Intersect Distinct.
It is already clear from the very syntax that Distinct modification removes duplicates from the result set, but All modification retains them.
Query syntax is common for all of them:
<query1>      
<SET OPERATOR>
<query1>      
Each query1 and query2 is full-fledged SELECT statements with possible joins, subqueries and other constructions. There is also possibility to combine more than 2 SELECT statements with set operators among them.
Lets look in general overview what the result of each one of them is. Following chart defines 2 queries returning data and various set operator combinations among them.
QueryAQueryBQueryA UNION [DISTINCT] QueryBExample 1QueryA UNION ALL QueryBExample 3QueryA EXCEPT (MINUS) [DISTINCT] QueryB Example 10QueryB EXCEPT (MINUS) [DISTINCT] QueryA Example 11QueryA EXCEPT (MINUS) ALL QueryBExample 14QueryB EXCEPT (MINUS) ALL QueryAExample 15QueryA INTERSECT [DISTINCT] QueryBExample 17QueryA INTERSECT ALL QueryBExample 19
RigaRigaRigaRigaTallinnRigaVilniusRigaRiga
RigaRigaTallinnRigaStockholmTallinnVilniusVilniusRiga
RigaVilniusVilniusRigaTallinnVilniusHelsinkiVilnius
TallinnVilniusHelsinkiRigaTallinnHelsinki
TallinnVilniusStockholmRigaHelsinki
TallinnVilniusTallinnStockholm
VilniusHelsinkiTallinn
HelsinkiTallinn
HelsinkiVilnius
StockholmVilnius
Vilnius
Vilnius
Vilnius
Helsinki
Helsinki
Helsinki
Stockholm
Next chart contains the same QueryA but QueryC returns 0 rows, just to feel some possible quirks.
QueryAQueryCQueryA UNION [DISTINCT] QueryCExample 4QueryA UNION ALL QueryCExample 5QueryA EXCEPT (MINUS) [DISTINCT] QueryCExample 12QueryA EXCEPT (MINUS) ALL QueryCExample 16QueryC EXCEPT (MINUS) [DISTINCT] QueryAQueryC EXCEPT (MINUS) ALL QueryAQueryA INTERSECT [DISTINCT] QueryC Example 18QueryA INTERSECT ALL QueryC
RigaRigaRigaRigaRiga
RigaTallinnRigaTallinnRiga
RigaVilniusRigaVilniusRiga
TallinnHelsinkiTallinnHelsinkiTallinn
TallinnStockholmTallinnStockholmTallinn
TallinnTallinnTallinn
VilniusVilniusVilnius
HelsinkiHelsinkiHelsinki
HelsinkiHelsinkiHelsinki
StockholmStockholmStockholm

2.1 Common facts to remember

There are some facts which probably aren't obvious and should be mentioned. Let's expand requirements for queries to be combined using one of the set operators:
  • column count must be the same;
  • data types of retrieved columns should match or at least should be implicitly convertible by database;
  • one can use many set operators for example Query1 UNION ALL Query2 UNION ALL Query3 MINUS Query4 INTERSECT Query5. In such case one should look into used DB documentation what is the order of operators, because for example Oracle executes operators starting from left to right, but DB2 firstly executes Intersect;
  • Usually returned column names are taken from the first query;
  • Order by clauses for each individual query except the last one cannot be at all (Oracle) or are ignored (MySQL).
Some other facts:
  • UNION and INTERSECT operators are commutative, i.e. the order of queries is not important; it doesn't change the final result. See Example 1 andExample 2.
  • EXCEPT operator is NOT commutative, it IS important which query is first, which second using EXCEPT operator. See Example 10 and Example 11.
  • UNION, EXCEPT and INTERSECT used without anything or with DISTINCT returns only unique values. This is especially interesting when one query returning many nonunique rows is UNIONED to another query returning zero rows (Example 4). The final result contains fewer rows than first query.
  • If you know that result sets returned by each query are unique then use UNION ALL, because database doesn't know that and uses more (wasted) resources to filter out duplicates in case of UNION.
  • If you need determined ordering then use Order by clause in the last query. Don't assume that rows from first query will always be returned first.
  • If you need to distinguish which query produced rows then you can add some tag or flag column indicating which query produced them.
  • NULL values using set operators are considered to be equal to each other (Example 9).

2.2 Used tables for examples

Throughout this entire article we will use following tables and table data (the same data as used in tables above):
CREATE TABLE table1 (                       
  id INTEGER NOT NULL PRIMARY KEY,          
  city VARCHAR(10) NOT NULL);               
CREATE TABLE table2 (                       
  id INTEGER NOT NULL PRIMARY KEY,          
  city VARCHAR(10) NOT NULL);               
CREATE TABLE table3 (                       
  city VARCHAR(10) NOT NULL);               
INSERT INTO table1 VALUES (1, 'RIGA');      
INSERT INTO table1 VALUES (2, 'RIGA');      
INSERT INTO table1 VALUES (3, 'RIGA');      
INSERT INTO table1 VALUES (4, 'TALLINN');   
INSERT INTO table1 VALUES (5, 'TALLINN');   
INSERT INTO table1 VALUES (6, 'TALLINN');   
INSERT INTO table1 VALUES (7, 'VILNIUS');   
INSERT INTO table1 VALUES (8, 'HELSINKI');  
INSERT INTO table1 VALUES (9, 'HELSINKI');  
INSERT INTO table1 VALUES (10, 'STOCKHOLM');
INSERT INTO table2 VALUES (1, 'RIGA');      
INSERT INTO table2 VALUES (2, 'RIGA');      
INSERT INTO table2 VALUES (3, 'VILNIUS');   
INSERT INTO table2 VALUES (4, 'VILNIUS');   
INSERT INTO table2 VALUES (5, 'VILNIUS');   
INSERT INTO table2 VALUES (6, 'VILNIUS');   
INSERT INTO table2 VALUES (7, 'HELSINKI');  
COMMIT;                                     

2.3 UNION [DISTINCT] and UNION ALL

These usually are most widely used set operators. Quite many times one cannot get all the result from one Select statement. Then one of the UNIONS can help.
Graphically UNION can be visualised using Venn diagrams. Assume we have two row sets.
Two queries
Then Query1 UNION Query2 would be as follows. Grey area shows resultant set.
SQL Union
Of course the previous picture is very general visualisation and fully real just for sets which contains each element no more than once.
Query1 UNION ALL Query2 would be as follows:
SQL Union all

2.3.1 Examples

As we can see only unique rows are retuned in next example.

Example 1 Unions cities from table1 and table2.

SELECT city FROM table1 
UNION                   
SELECT city FROM table2;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 

Example 2 Unions cities from table2 and table1. The query ordering is not important, result is the same, compare with Example 1.

SELECT city FROM table2 
UNION                   
SELECT city FROM table1;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 
DO NOT ASSUME that Union always return ordered row set. It is NOT TRUE. It is just because of implementation model, i.e. sort is being done to filter out duplicates. At least from version 10 Oracle has possibility to do HASH UNIQUE operation, which doesn't sort rows and you won't get them back sorted. So ALWAYS use Order by clause if you need guaranteed order of rows.
Next example just combines the rows without filtering out duplicates.

Example 3 Unions ALL cities from table1 and table2.

SELECT city FROM table1 
UNION ALL               
SELECT city FROM table2;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
RIGA                    
RIGA                    
VILNIUS                 
VILNIUS                 
VILNIUS                 
VILNIUS                 
HELSINKI                
                        
17 rows selected.       

Example 4 UNION [DISTINCT] even with empty set may reduce number of rows. Compare result from first two queries with third query.

SELECT city FROM table1;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
                        
10 rows selected.       
                        
SELECT city FROM table3;
                        
no rows selected        
                        
SELECT city FROM table1 
UNION                   
SELECT city FROM table3;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 

Example 5 UNION ALL with empty set gives the same result as without it.

SELECT city FROM table1 
UNION ALL               
SELECT city FROM table3;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
                        
10 rows selected.       

Example 6 Each query in Union must return the same number of columns.

SELECT * FROM table1                                         
UNION                                                        
SELECT city FROM table2;                                     
SELECT * FROM table1                                         
*                                                            
ERROR at line 1:                                             
ORA-01789: query block has incorrect number of result columns

Example 7 Of course query can be Unioned to itself. This time all rows are returned because combination of both columns is reviewed.

SELECT * FROM table1 
UNION                
SELECT * FROM table1;
                     
        ID CITY      
---------- ----------
         1 RIGA      
         2 RIGA      
         3 RIGA      
         4 TALLINN   
         5 TALLINN   
         6 TALLINN   
         7 VILNIUS   
         8 HELSINKI  
         9 HELSINKI  
        10 STOCKHOLM 
                     
10 rows selected.    
Along with subquery factoring clause (or common table expression clause, "with" clause) UNION ALL can be used to generate some sample data without having actual tables. It has become very popular in Oracle forums.

Example 8 Using "with" clause to generate sample test data to test inner join functionality.

WITH cities AS (                                                     
  SELECT 1 as cty_id, 'RIGA' as city FROM dual                       
  UNION ALL                                                          
  SELECT 2, 'TALLINN' FROM dual                                      
),                                                                   
streets AS (                                                         
  SELECT 1 as str_id, 1 as str_cty_id, 'BRIVIBAS' as street FROM dual
  UNION ALL                                                          
  SELECT 2, 2, 'NARVA MNT'FROM dual                                  
)                                                                    
SELECT city, street FROM cities                                      
INNER JOIN streets ON (str_cty_id = cty_id);                         
                                                                     
CITY    STREET                                                       
------- ---------                                                    
RIGA    BRIVIBAS                                                     
TALLINN NARVA MNT                                                    
NULL values are considered equal when using with set operators. This is different than usually, for example, testing for eaquality.

Example 9 Using "with" clause to generate two NULL values and unioning them.

WITH null1 AS (              
  SELECT NULL value FROM dual
),                           
null2 AS (                   
  SELECT NULL value FROM dual
)                            
SELECT value FROM null1      
UNION                        
SELECT value FROM null2;     
                             
V                            
-                            
                             
                             
1 row selected.              

2.4 EXCEPT [DISTINCT] and EXCEPT ALL

EXCEPT returns unique rows that are returned by the first query but are NOT returned by the second query. EXCEPT ALL does the same but retains cardinality, for example, if the first query returns two values of X and second only one, then EXCEPT won't return X but EXCEPT ALL would return one instance of X.
Oracle uses MINUS operator instead of EXCEPT, but the functionality is the same. None of the Oracle, SQL Server and MySQL has implemented EXCEPT ALL. It can be simulated using analytic functions as shown in Example 14 till Example 16.
Usually EXCEPT is used to compare date in different data sources (tables) to find differences, for example, differences in the same tables across test and production and/or actual copy and backup.
Visually Query1 EXCEPT Query2 can be expressed as follows:
SQL except (minus) A-B
Obviously diagram is not symmetric therefore for Query2 EXCEPT Query1 we get different picture:
SQL except (minus) B-A

2.4.1 Examples


Example 10 Cities in table1 except (minus) [distinct] cities in table2.

SELECT city FROM table1 
MINUS                   
SELECT city FROM table2;
                        
CITY                    
----------              
STOCKHOLM               
TALLINN                 

Example 11 Cities in table2 except (minus) [distinct] cities in table1. Of course the result is different than in Example 10.

SELECT city FROM table2 
MINUS                   
SELECT city FROM table1;
                        
no rows selected        
As MINUS filters out duplicates then even subtracting empty set may reduce the initial set.

Example 12 Cities in table1 except (minus) [distinct] empty set (cities in table3).

SELECT city FROM table1 
MINUS                   
SELECT city FROM table3;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 
It is not possible in Oracle and SQL Server to use EXCEPT (MINUS) ALL directly.

Example 13 Minus all doesn't exist in Oracle.

SELECT city FROM table1          
MINUS ALL                        
SELECT city FROM table2;         
                                 
MINUS ALL                        
      *                          
ERROR at line 2:                 
ORA-00928: missing SELECT keyword
However using analytic functions and simple minus it is possible. The main idea for MINUS (EXCEPT) ALL is to retain cardinality, i.e. how many instances of each row exists in source sets. Here analytic function row_number() can help. It just increments counter for each row which is the same as previous and restarts if the row values change. Then we can use simple MINUS [DISTINCT], and show only business columns.

Example 14 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all cities in table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   

Example 15 Faked minus all using row_number() analytic function. Cities in table2 except (minus) all cities in table1.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
VILNIUS                                                   
VILNIUS                                                   
VILNIUS                                                   
Any set Minus all empty set doesn't change. Just like with Union all.

Example 16 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all empty set (cities in table3).

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table3                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
HELSINKI                                                  
RIGA                                                      
RIGA                                                      
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   
VILNIUS                                                   
It is obvious that subtracting anything from empty set will always be empty set therefore I won't show you these examples. It is true for both modifications of except (minus) - distinct and all.

2.5 INTERSECT [DISTINCT] and INTERSECT ALL

Intersect returns only these rows, which are in both tables. Intersect [distinct] returns just unique rows, but intersect all retains cardinality. Intersect is commutative, just like union - it is not important which query is the first, which second one.
Picture for Query1 INTERSECT Query2 is as follows:
SQL intersect

2.5.1 Examples


Example 17 Cities in table1 intersect [distinct] cities in table2.

SELECT city FROM table1 
INTERSECT               
SELECT city FROM table2;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
VILNIUS                 

Example 18 Cities in table2 intersect [distinct] empty set (cities in table3). Every intersection with empty set is empty set.

SELECT city FROM table1 
INTERSECT               
SELECT city FROM table3;
                        
no rows selected        
Intersect all is not possible in Oracle or SQL Server just like with Minus (Except) all. But we can use already known workaround.

Example 19 Faked intersect all using row_number() analytic function. Cities in table1 intersect all cities in table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  INTERSECT                                               
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
RIGA                                                      
VILNIUS                                                   

2.6 Raising it to higher levels - are two table data equal?

There are times when we need to find whether two table data are equal. And here I mean "really equal" i.e. both the rows are equal and in case of duplicate rows cardinality of them also are the same. So what we need is to test whether the "opposite" of Intersect i.e. rows that are returned only by the first query or the second query is empty set. Visually it would be as in following picture grey area would be empty.
SQL symmetric difference
In set theory "the opposite" of intersect can be referred as Symmetric difference, which is similar to XOR (exclusive OR) in Boolean logic.
Unfortunately there isn't such Symmetric difference operator in SQL. So we need to be more creative. Looking at previous pictures throughout this article it is quite obvious what we need:
(Query 1 MINUS Query2)
UNION                 
(Query 2 MINUS Query1)
In case of absolutely unique rows it would be sufficient - as soon as it returns at least one row, tables' data are not equal. But. We have to remember that there might be duplicates and amount of them might be different in both result sets. So then we'd need:
(Query 1 MINUS ALL Query2)
UNION ALL                 
(Query 2 MINUS ALL Query1)
Let's look at real examples.

Example 20 Distinct Symmetric difference of table1 and table2.

(SELECT city FROM table1 
MINUS                    
SELECT city FROM table2) 
UNION                    
(SELECT city FROM table1 
MINUS                    
SELECT city FROM table2);
                         
CITY                     
----------               
STOCKHOLM                
TALLINN                  
It is obvious that somehow these tables are different. But exactly how? Then we'd need smarter query.

Example 21 Symmetric difference retaining cardinality of table1 and table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q                                                       
UNION ALL                                                 
SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   
VILNIUS                                                   
VILNIUS                                                   
VILNIUS                                                   
So these are rows that are left outside in one or another table. What if we'd like to know in what table exactly? Just add a flag column.

Example 22 Symmetric difference retaining cardinality and showing what is missed of table1 and table2.

SELECT city, 2 flag FROM (                                
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q                                                       
UNION ALL                                                 
SELECT city, 1 flag FROM (                                
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY             FLAG                                     
---------- ----------                                     
HELSINKI            2                                     
RIGA                2                                     
STOCKHOLM           2                                     
TALLINN             2                                     
TALLINN             2                                     
TALLINN             2                                     
VILNIUS             1                                     
VILNIUS             1                                     
VILNIUS             1                                     
So we can see that table2 misses one Helsinki, one Stockholm and 3 Tallin rows and table1 misses 3 Vilnius rows. If we'd add these rows, then they'd contain exactly the same cities with exactly the same cardinality.

No comments:

Post a Comment

Friday, October 12, 2012

SQL set operators


SQL set operators


1 Introduction

SQL set operators allows combine results from two or more SELECT statements. At first sight this looks similar to SQL joins although there is big difference. SQL joins tends to combine columns i.e. with each additionally joined table it is possible to select more and more columns. SQL set operators on the other hand combine rows from different queries with strong preconditions - all involved SELECTS must:
  • retrieve the same number of columns and
  • the data types of corresponding columns in each involved SELECT must be compatible (either the same or with possibility implicitly convert to the data types of the first SELECT statement).
Visually the difference can be explained as follows - joins tend to extend breadthways, but set operations in depth.
SQL join example
SQL set operator example
NB 1! All examples are created for Oracle database and written according to Oracle syntax. However it doesn't matter what database management system is used, many of them with (probably) very little modifications or even exactly the same can be used for every other DBMS supporting set operators. Exactly why they work or why not are described for OracleSQL Server and MySQL. If you need to use them for other DBMSes then you should check these examples yourself although I would be very pleased if you'd send me information what examples are not working on what DBMSes. I will include this info here along with your name.

Contents

1 Introduction
2 Set operator types and syntax
   2.1 Common facts to remember
   2.2 Used tables for examples
   2.3 UNION [DISTINCT] and UNION ALL
   2.4 EXCEPT [DISTINCT] and EXCEPT ALL
   2.5 INTERSECT [DISTINCT] and INTERSECT ALL
   2.6 Raising it to higher levels - are two table data equal?
3 Example usage for various DBMSes
   3.1 Oracle
   3.2 Microsoft SQL Server
   3.3 MySQL
   3.4 IBM DB2
4 References and more information

2 Set operator types and syntax

According to SQL Standard there are following Set operator types:
  • UNION [DISTINCT];
  • UNION ALL;
  • EXCEPT [DISTINCT];
  • EXCEPT ALL;
  • INTERSECT [DISTINCT];
  • INTERSECT ALL.
As we can see there are 3 basic types Union, Except and Intersect and all have 2 modifications either Distinct or All. SQL Standard does not enforce keyword Distinct and some DBMSes for example Oracle and SQL Server even do not allow it, therefore if you see just Union, Except or Intersect - these actually mean Union Distinct, Except Distinct and Intersect Distinct.
It is already clear from the very syntax that Distinct modification removes duplicates from the result set, but All modification retains them.
Query syntax is common for all of them:
<query1>      
<SET OPERATOR>
<query1>      
Each query1 and query2 is full-fledged SELECT statements with possible joins, subqueries and other constructions. There is also possibility to combine more than 2 SELECT statements with set operators among them.
Lets look in general overview what the result of each one of them is. Following chart defines 2 queries returning data and various set operator combinations among them.
QueryAQueryBQueryA UNION [DISTINCT] QueryBExample 1QueryA UNION ALL QueryBExample 3QueryA EXCEPT (MINUS) [DISTINCT] QueryB Example 10QueryB EXCEPT (MINUS) [DISTINCT] QueryA Example 11QueryA EXCEPT (MINUS) ALL QueryBExample 14QueryB EXCEPT (MINUS) ALL QueryAExample 15QueryA INTERSECT [DISTINCT] QueryBExample 17QueryA INTERSECT ALL QueryBExample 19
RigaRigaRigaRigaTallinnRigaVilniusRigaRiga
RigaRigaTallinnRigaStockholmTallinnVilniusVilniusRiga
RigaVilniusVilniusRigaTallinnVilniusHelsinkiVilnius
TallinnVilniusHelsinkiRigaTallinnHelsinki
TallinnVilniusStockholmRigaHelsinki
TallinnVilniusTallinnStockholm
VilniusHelsinkiTallinn
HelsinkiTallinn
HelsinkiVilnius
StockholmVilnius
Vilnius
Vilnius
Vilnius
Helsinki
Helsinki
Helsinki
Stockholm
Next chart contains the same QueryA but QueryC returns 0 rows, just to feel some possible quirks.
QueryAQueryCQueryA UNION [DISTINCT] QueryCExample 4QueryA UNION ALL QueryCExample 5QueryA EXCEPT (MINUS) [DISTINCT] QueryCExample 12QueryA EXCEPT (MINUS) ALL QueryCExample 16QueryC EXCEPT (MINUS) [DISTINCT] QueryAQueryC EXCEPT (MINUS) ALL QueryAQueryA INTERSECT [DISTINCT] QueryC Example 18QueryA INTERSECT ALL QueryC
RigaRigaRigaRigaRiga
RigaTallinnRigaTallinnRiga
RigaVilniusRigaVilniusRiga
TallinnHelsinkiTallinnHelsinkiTallinn
TallinnStockholmTallinnStockholmTallinn
TallinnTallinnTallinn
VilniusVilniusVilnius
HelsinkiHelsinkiHelsinki
HelsinkiHelsinkiHelsinki
StockholmStockholmStockholm

2.1 Common facts to remember

There are some facts which probably aren't obvious and should be mentioned. Let's expand requirements for queries to be combined using one of the set operators:
  • column count must be the same;
  • data types of retrieved columns should match or at least should be implicitly convertible by database;
  • one can use many set operators for example Query1 UNION ALL Query2 UNION ALL Query3 MINUS Query4 INTERSECT Query5. In such case one should look into used DB documentation what is the order of operators, because for example Oracle executes operators starting from left to right, but DB2 firstly executes Intersect;
  • Usually returned column names are taken from the first query;
  • Order by clauses for each individual query except the last one cannot be at all (Oracle) or are ignored (MySQL).
Some other facts:
  • UNION and INTERSECT operators are commutative, i.e. the order of queries is not important; it doesn't change the final result. See Example 1 andExample 2.
  • EXCEPT operator is NOT commutative, it IS important which query is first, which second using EXCEPT operator. See Example 10 and Example 11.
  • UNION, EXCEPT and INTERSECT used without anything or with DISTINCT returns only unique values. This is especially interesting when one query returning many nonunique rows is UNIONED to another query returning zero rows (Example 4). The final result contains fewer rows than first query.
  • If you know that result sets returned by each query are unique then use UNION ALL, because database doesn't know that and uses more (wasted) resources to filter out duplicates in case of UNION.
  • If you need determined ordering then use Order by clause in the last query. Don't assume that rows from first query will always be returned first.
  • If you need to distinguish which query produced rows then you can add some tag or flag column indicating which query produced them.
  • NULL values using set operators are considered to be equal to each other (Example 9).

2.2 Used tables for examples

Throughout this entire article we will use following tables and table data (the same data as used in tables above):
CREATE TABLE table1 (                       
  id INTEGER NOT NULL PRIMARY KEY,          
  city VARCHAR(10) NOT NULL);               
CREATE TABLE table2 (                       
  id INTEGER NOT NULL PRIMARY KEY,          
  city VARCHAR(10) NOT NULL);               
CREATE TABLE table3 (                       
  city VARCHAR(10) NOT NULL);               
INSERT INTO table1 VALUES (1, 'RIGA');      
INSERT INTO table1 VALUES (2, 'RIGA');      
INSERT INTO table1 VALUES (3, 'RIGA');      
INSERT INTO table1 VALUES (4, 'TALLINN');   
INSERT INTO table1 VALUES (5, 'TALLINN');   
INSERT INTO table1 VALUES (6, 'TALLINN');   
INSERT INTO table1 VALUES (7, 'VILNIUS');   
INSERT INTO table1 VALUES (8, 'HELSINKI');  
INSERT INTO table1 VALUES (9, 'HELSINKI');  
INSERT INTO table1 VALUES (10, 'STOCKHOLM');
INSERT INTO table2 VALUES (1, 'RIGA');      
INSERT INTO table2 VALUES (2, 'RIGA');      
INSERT INTO table2 VALUES (3, 'VILNIUS');   
INSERT INTO table2 VALUES (4, 'VILNIUS');   
INSERT INTO table2 VALUES (5, 'VILNIUS');   
INSERT INTO table2 VALUES (6, 'VILNIUS');   
INSERT INTO table2 VALUES (7, 'HELSINKI');  
COMMIT;                                     

2.3 UNION [DISTINCT] and UNION ALL

These usually are most widely used set operators. Quite many times one cannot get all the result from one Select statement. Then one of the UNIONS can help.
Graphically UNION can be visualised using Venn diagrams. Assume we have two row sets.
Two queries
Then Query1 UNION Query2 would be as follows. Grey area shows resultant set.
SQL Union
Of course the previous picture is very general visualisation and fully real just for sets which contains each element no more than once.
Query1 UNION ALL Query2 would be as follows:
SQL Union all

2.3.1 Examples

As we can see only unique rows are retuned in next example.

Example 1 Unions cities from table1 and table2.

SELECT city FROM table1 
UNION                   
SELECT city FROM table2;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 

Example 2 Unions cities from table2 and table1. The query ordering is not important, result is the same, compare with Example 1.

SELECT city FROM table2 
UNION                   
SELECT city FROM table1;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 
DO NOT ASSUME that Union always return ordered row set. It is NOT TRUE. It is just because of implementation model, i.e. sort is being done to filter out duplicates. At least from version 10 Oracle has possibility to do HASH UNIQUE operation, which doesn't sort rows and you won't get them back sorted. So ALWAYS use Order by clause if you need guaranteed order of rows.
Next example just combines the rows without filtering out duplicates.

Example 3 Unions ALL cities from table1 and table2.

SELECT city FROM table1 
UNION ALL               
SELECT city FROM table2;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
RIGA                    
RIGA                    
VILNIUS                 
VILNIUS                 
VILNIUS                 
VILNIUS                 
HELSINKI                
                        
17 rows selected.       

Example 4 UNION [DISTINCT] even with empty set may reduce number of rows. Compare result from first two queries with third query.

SELECT city FROM table1;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
                        
10 rows selected.       
                        
SELECT city FROM table3;
                        
no rows selected        
                        
SELECT city FROM table1 
UNION                   
SELECT city FROM table3;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 

Example 5 UNION ALL with empty set gives the same result as without it.

SELECT city FROM table1 
UNION ALL               
SELECT city FROM table3;
                        
CITY                    
----------              
RIGA                    
RIGA                    
RIGA                    
TALLINN                 
TALLINN                 
TALLINN                 
VILNIUS                 
HELSINKI                
HELSINKI                
STOCKHOLM               
                        
10 rows selected.       

Example 6 Each query in Union must return the same number of columns.

SELECT * FROM table1                                         
UNION                                                        
SELECT city FROM table2;                                     
SELECT * FROM table1                                         
*                                                            
ERROR at line 1:                                             
ORA-01789: query block has incorrect number of result columns

Example 7 Of course query can be Unioned to itself. This time all rows are returned because combination of both columns is reviewed.

SELECT * FROM table1 
UNION                
SELECT * FROM table1;
                     
        ID CITY      
---------- ----------
         1 RIGA      
         2 RIGA      
         3 RIGA      
         4 TALLINN   
         5 TALLINN   
         6 TALLINN   
         7 VILNIUS   
         8 HELSINKI  
         9 HELSINKI  
        10 STOCKHOLM 
                     
10 rows selected.    
Along with subquery factoring clause (or common table expression clause, "with" clause) UNION ALL can be used to generate some sample data without having actual tables. It has become very popular in Oracle forums.

Example 8 Using "with" clause to generate sample test data to test inner join functionality.

WITH cities AS (                                                     
  SELECT 1 as cty_id, 'RIGA' as city FROM dual                       
  UNION ALL                                                          
  SELECT 2, 'TALLINN' FROM dual                                      
),                                                                   
streets AS (                                                         
  SELECT 1 as str_id, 1 as str_cty_id, 'BRIVIBAS' as street FROM dual
  UNION ALL                                                          
  SELECT 2, 2, 'NARVA MNT'FROM dual                                  
)                                                                    
SELECT city, street FROM cities                                      
INNER JOIN streets ON (str_cty_id = cty_id);                         
                                                                     
CITY    STREET                                                       
------- ---------                                                    
RIGA    BRIVIBAS                                                     
TALLINN NARVA MNT                                                    
NULL values are considered equal when using with set operators. This is different than usually, for example, testing for eaquality.

Example 9 Using "with" clause to generate two NULL values and unioning them.

WITH null1 AS (              
  SELECT NULL value FROM dual
),                           
null2 AS (                   
  SELECT NULL value FROM dual
)                            
SELECT value FROM null1      
UNION                        
SELECT value FROM null2;     
                             
V                            
-                            
                             
                             
1 row selected.              

2.4 EXCEPT [DISTINCT] and EXCEPT ALL

EXCEPT returns unique rows that are returned by the first query but are NOT returned by the second query. EXCEPT ALL does the same but retains cardinality, for example, if the first query returns two values of X and second only one, then EXCEPT won't return X but EXCEPT ALL would return one instance of X.
Oracle uses MINUS operator instead of EXCEPT, but the functionality is the same. None of the Oracle, SQL Server and MySQL has implemented EXCEPT ALL. It can be simulated using analytic functions as shown in Example 14 till Example 16.
Usually EXCEPT is used to compare date in different data sources (tables) to find differences, for example, differences in the same tables across test and production and/or actual copy and backup.
Visually Query1 EXCEPT Query2 can be expressed as follows:
SQL except (minus) A-B
Obviously diagram is not symmetric therefore for Query2 EXCEPT Query1 we get different picture:
SQL except (minus) B-A

2.4.1 Examples


Example 10 Cities in table1 except (minus) [distinct] cities in table2.

SELECT city FROM table1 
MINUS                   
SELECT city FROM table2;
                        
CITY                    
----------              
STOCKHOLM               
TALLINN                 

Example 11 Cities in table2 except (minus) [distinct] cities in table1. Of course the result is different than in Example 10.

SELECT city FROM table2 
MINUS                   
SELECT city FROM table1;
                        
no rows selected        
As MINUS filters out duplicates then even subtracting empty set may reduce the initial set.

Example 12 Cities in table1 except (minus) [distinct] empty set (cities in table3).

SELECT city FROM table1 
MINUS                   
SELECT city FROM table3;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
STOCKHOLM               
TALLINN                 
VILNIUS                 
It is not possible in Oracle and SQL Server to use EXCEPT (MINUS) ALL directly.

Example 13 Minus all doesn't exist in Oracle.

SELECT city FROM table1          
MINUS ALL                        
SELECT city FROM table2;         
                                 
MINUS ALL                        
      *                          
ERROR at line 2:                 
ORA-00928: missing SELECT keyword
However using analytic functions and simple minus it is possible. The main idea for MINUS (EXCEPT) ALL is to retain cardinality, i.e. how many instances of each row exists in source sets. Here analytic function row_number() can help. It just increments counter for each row which is the same as previous and restarts if the row values change. Then we can use simple MINUS [DISTINCT], and show only business columns.

Example 14 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all cities in table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   

Example 15 Faked minus all using row_number() analytic function. Cities in table2 except (minus) all cities in table1.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
VILNIUS                                                   
VILNIUS                                                   
VILNIUS                                                   
Any set Minus all empty set doesn't change. Just like with Union all.

Example 16 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all empty set (cities in table3).

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table3                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
HELSINKI                                                  
RIGA                                                      
RIGA                                                      
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   
VILNIUS                                                   
It is obvious that subtracting anything from empty set will always be empty set therefore I won't show you these examples. It is true for both modifications of except (minus) - distinct and all.

2.5 INTERSECT [DISTINCT] and INTERSECT ALL

Intersect returns only these rows, which are in both tables. Intersect [distinct] returns just unique rows, but intersect all retains cardinality. Intersect is commutative, just like union - it is not important which query is the first, which second one.
Picture for Query1 INTERSECT Query2 is as follows:
SQL intersect

2.5.1 Examples


Example 17 Cities in table1 intersect [distinct] cities in table2.

SELECT city FROM table1 
INTERSECT               
SELECT city FROM table2;
                        
CITY                    
----------              
HELSINKI                
RIGA                    
VILNIUS                 

Example 18 Cities in table2 intersect [distinct] empty set (cities in table3). Every intersection with empty set is empty set.

SELECT city FROM table1 
INTERSECT               
SELECT city FROM table3;
                        
no rows selected        
Intersect all is not possible in Oracle or SQL Server just like with Minus (Except) all. But we can use already known workaround.

Example 19 Faked intersect all using row_number() analytic function. Cities in table1 intersect all cities in table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  INTERSECT                                               
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
RIGA                                                      
VILNIUS                                                   

2.6 Raising it to higher levels - are two table data equal?

There are times when we need to find whether two table data are equal. And here I mean "really equal" i.e. both the rows are equal and in case of duplicate rows cardinality of them also are the same. So what we need is to test whether the "opposite" of Intersect i.e. rows that are returned only by the first query or the second query is empty set. Visually it would be as in following picture grey area would be empty.
SQL symmetric difference
In set theory "the opposite" of intersect can be referred as Symmetric difference, which is similar to XOR (exclusive OR) in Boolean logic.
Unfortunately there isn't such Symmetric difference operator in SQL. So we need to be more creative. Looking at previous pictures throughout this article it is quite obvious what we need:
(Query 1 MINUS Query2)
UNION                 
(Query 2 MINUS Query1)
In case of absolutely unique rows it would be sufficient - as soon as it returns at least one row, tables' data are not equal. But. We have to remember that there might be duplicates and amount of them might be different in both result sets. So then we'd need:
(Query 1 MINUS ALL Query2)
UNION ALL                 
(Query 2 MINUS ALL Query1)
Let's look at real examples.

Example 20 Distinct Symmetric difference of table1 and table2.

(SELECT city FROM table1 
MINUS                    
SELECT city FROM table2) 
UNION                    
(SELECT city FROM table1 
MINUS                    
SELECT city FROM table2);
                         
CITY                     
----------               
STOCKHOLM                
TALLINN                  
It is obvious that somehow these tables are different. But exactly how? Then we'd need smarter query.

Example 21 Symmetric difference retaining cardinality of table1 and table2.

SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q                                                       
UNION ALL                                                 
SELECT city FROM (                                        
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY                                                      
----------                                                
HELSINKI                                                  
RIGA                                                      
STOCKHOLM                                                 
TALLINN                                                   
TALLINN                                                   
TALLINN                                                   
VILNIUS                                                   
VILNIUS                                                   
VILNIUS                                                   
So these are rows that are left outside in one or another table. What if we'd like to know in what table exactly? Just add a flag column.

Example 22 Symmetric difference retaining cardinality and showing what is missed of table1 and table2.

SELECT city, 2 flag FROM (                                
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
) q                                                       
UNION ALL                                                 
SELECT city, 1 flag FROM (                                
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table2                                             
  MINUS                                                   
  SELECT city,                                            
    row_number() OVER (PARTITION BY city ORDER BY city) rn
  FROM table1                                             
) q;                                                      
                                                          
CITY             FLAG                                     
---------- ----------                                     
HELSINKI            2                                     
RIGA                2                                     
STOCKHOLM           2                                     
TALLINN             2                                     
TALLINN             2                                     
TALLINN             2                                     
VILNIUS             1                                     
VILNIUS             1                                     
VILNIUS             1                                     
So we can see that table2 misses one Helsinki, one Stockholm and 3 Tallin rows and table1 misses 3 Vilnius rows. If we'd add these rows, then they'd contain exactly the same cities with exactly the same cardinality.

No comments:

Post a Comment

My Blog List