SQL set operators
1 Introduction
SQL set operators allows combine results from two or more SELECT statements. At first sight this looks similar to SQL joins although there is big difference. SQL joins tends to combine columns i.e. with each additionally joined table it is possible to select more and more columns. SQL set operators on the other hand combine rows from different queries with strong preconditions - all involved SELECTS must:- retrieve the same number of columns and
- the data types of corresponding columns in each involved SELECT must be compatible (either the same or with possibility implicitly convert to the data types of the first SELECT statement).
NB 1! All examples are created for Oracle database and written according to Oracle syntax. However it doesn't matter what database management system is used, many of them with (probably) very little modifications or even exactly the same can be used for every other DBMS supporting set operators. Exactly why they work or why not are described for Oracle, SQL Server and MySQL. If you need to use them for other DBMSes then you should check these examples yourself although I would be very pleased if you'd send me information what examples are not working on what DBMSes. I will include this info here along with your name.
Contents
1 Introduction2 Set operator types and syntax
2.1 Common facts to remember
2.2 Used tables for examples
2.3 UNION [DISTINCT] and UNION ALL
2.4 EXCEPT [DISTINCT] and EXCEPT ALL
2.5 INTERSECT [DISTINCT] and INTERSECT ALL
2.6 Raising it to higher levels - are two table data equal?
3 Example usage for various DBMSes
3.1 Oracle
3.2 Microsoft SQL Server
3.3 MySQL
3.4 IBM DB2
4 References and more information
2 Set operator types and syntax
According to SQL Standard there are following Set operator types:- UNION [DISTINCT];
- UNION ALL;
- EXCEPT [DISTINCT];
- EXCEPT ALL;
- INTERSECT [DISTINCT];
- INTERSECT ALL.
It is already clear from the very syntax that Distinct modification removes duplicates from the result set, but All modification retains them.
Query syntax is common for all of them:
<query1>
<SET OPERATOR>
<query1>
Each query1 and query2 is full-fledged SELECT statements with possible joins, subqueries and other constructions. There is also possibility to combine more than 2 SELECT statements with set operators among them.Lets look in general overview what the result of each one of them is. Following chart defines 2 queries returning data and various set operator combinations among them.
QueryA | QueryB | QueryA UNION [DISTINCT] QueryBExample 1 | QueryA UNION ALL QueryBExample 3 | QueryA EXCEPT (MINUS) [DISTINCT] QueryB Example 10 | QueryB EXCEPT (MINUS) [DISTINCT] QueryA Example 11 | QueryA EXCEPT (MINUS) ALL QueryBExample 14 | QueryB EXCEPT (MINUS) ALL QueryAExample 15 | QueryA INTERSECT [DISTINCT] QueryBExample 17 | QueryA INTERSECT ALL QueryBExample 19 |
---|---|---|---|---|---|---|---|---|---|
Riga | Riga | Riga | Riga | Tallinn | Riga | Vilnius | Riga | Riga | |
Riga | Riga | Tallinn | Riga | Stockholm | Tallinn | Vilnius | Vilnius | Riga | |
Riga | Vilnius | Vilnius | Riga | Tallinn | Vilnius | Helsinki | Vilnius | ||
Tallinn | Vilnius | Helsinki | Riga | Tallinn | Helsinki | ||||
Tallinn | Vilnius | Stockholm | Riga | Helsinki | |||||
Tallinn | Vilnius | Tallinn | Stockholm | ||||||
Vilnius | Helsinki | Tallinn | |||||||
Helsinki | Tallinn | ||||||||
Helsinki | Vilnius | ||||||||
Stockholm | Vilnius | ||||||||
Vilnius | |||||||||
Vilnius | |||||||||
Vilnius | |||||||||
Helsinki | |||||||||
Helsinki | |||||||||
Helsinki | |||||||||
Stockholm |
QueryA | QueryC | QueryA UNION [DISTINCT] QueryCExample 4 | QueryA UNION ALL QueryCExample 5 | QueryA EXCEPT (MINUS) [DISTINCT] QueryCExample 12 | QueryA EXCEPT (MINUS) ALL QueryCExample 16 | QueryC EXCEPT (MINUS) [DISTINCT] QueryA | QueryC EXCEPT (MINUS) ALL QueryA | QueryA INTERSECT [DISTINCT] QueryC Example 18 | QueryA INTERSECT ALL QueryC |
---|---|---|---|---|---|---|---|---|---|
Riga | Riga | Riga | Riga | Riga | |||||
Riga | Tallinn | Riga | Tallinn | Riga | |||||
Riga | Vilnius | Riga | Vilnius | Riga | |||||
Tallinn | Helsinki | Tallinn | Helsinki | Tallinn | |||||
Tallinn | Stockholm | Tallinn | Stockholm | Tallinn | |||||
Tallinn | Tallinn | Tallinn | |||||||
Vilnius | Vilnius | Vilnius | |||||||
Helsinki | Helsinki | Helsinki | |||||||
Helsinki | Helsinki | Helsinki | |||||||
Stockholm | Stockholm | Stockholm |
2.1 Common facts to remember
There are some facts which probably aren't obvious and should be mentioned. Let's expand requirements for queries to be combined using one of the set operators:- column count must be the same;
- data types of retrieved columns should match or at least should be implicitly convertible by database;
- one can use many set operators for example Query1 UNION ALL Query2 UNION ALL Query3 MINUS Query4 INTERSECT Query5. In such case one should look into used DB documentation what is the order of operators, because for example Oracle executes operators starting from left to right, but DB2 firstly executes Intersect;
- Usually returned column names are taken from the first query;
- Order by clauses for each individual query except the last one cannot be at all (Oracle) or are ignored (MySQL).
- UNION and INTERSECT operators are commutative, i.e. the order of queries is not important; it doesn't change the final result. See Example 1 andExample 2.
- EXCEPT operator is NOT commutative, it IS important which query is first, which second using EXCEPT operator. See Example 10 and Example 11.
- UNION, EXCEPT and INTERSECT used without anything or with DISTINCT returns only unique values. This is especially interesting when one query returning many nonunique rows is UNIONED to another query returning zero rows (Example 4). The final result contains fewer rows than first query.
- If you know that result sets returned by each query are unique then use UNION ALL, because database doesn't know that and uses more (wasted) resources to filter out duplicates in case of UNION.
- If you need determined ordering then use Order by clause in the last query. Don't assume that rows from first query will always be returned first.
- If you need to distinguish which query produced rows then you can add some tag or flag column indicating which query produced them.
- NULL values using set operators are considered to be equal to each other (Example 9).
2.2 Used tables for examples
Throughout this entire article we will use following tables and table data (the same data as used in tables above):CREATE TABLE table1 (
id INTEGER NOT NULL PRIMARY KEY,
city VARCHAR(10) NOT NULL);
CREATE TABLE table2 (
id INTEGER NOT NULL PRIMARY KEY,
city VARCHAR(10) NOT NULL);
CREATE TABLE table3 (
city VARCHAR(10) NOT NULL);
INSERT INTO table1 VALUES (1, 'RIGA');
INSERT INTO table1 VALUES (2, 'RIGA');
INSERT INTO table1 VALUES (3, 'RIGA');
INSERT INTO table1 VALUES (4, 'TALLINN');
INSERT INTO table1 VALUES (5, 'TALLINN');
INSERT INTO table1 VALUES (6, 'TALLINN');
INSERT INTO table1 VALUES (7, 'VILNIUS');
INSERT INTO table1 VALUES (8, 'HELSINKI');
INSERT INTO table1 VALUES (9, 'HELSINKI');
INSERT INTO table1 VALUES (10, 'STOCKHOLM');
INSERT INTO table2 VALUES (1, 'RIGA');
INSERT INTO table2 VALUES (2, 'RIGA');
INSERT INTO table2 VALUES (3, 'VILNIUS');
INSERT INTO table2 VALUES (4, 'VILNIUS');
INSERT INTO table2 VALUES (5, 'VILNIUS');
INSERT INTO table2 VALUES (6, 'VILNIUS');
INSERT INTO table2 VALUES (7, 'HELSINKI');
COMMIT;
2.3 UNION [DISTINCT] and UNION ALL
These usually are most widely used set operators. Quite many times one cannot get all the result from one Select statement. Then one of the UNIONS can help.Graphically UNION can be visualised using Venn diagrams. Assume we have two row sets.
Then Query1 UNION Query2 would be as follows. Grey area shows resultant set.
Of course the previous picture is very general visualisation and fully real just for sets which contains each element no more than once.
Query1 UNION ALL Query2 would be as follows:
2.3.1 Examples
As we can see only unique rows are retuned in next example.Example 1 Unions cities from table1 and table2.
SELECT city FROM table1
UNION
SELECT city FROM table2;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
VILNIUS
Example 2 Unions cities from table2 and table1. The query ordering is not important, result is the same, compare with Example 1.
SELECT city FROM table2
UNION
SELECT city FROM table1;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
VILNIUS
DO NOT ASSUME that Union always return ordered row set. It is NOT TRUE. It is just because of implementation model, i.e. sort is being done to filter out duplicates. At least from version 10 Oracle has possibility to do HASH UNIQUE operation, which doesn't sort rows and you won't get them back sorted. So ALWAYS use Order by clause if you need guaranteed order of rows.Next example just combines the rows without filtering out duplicates.
Example 3 Unions ALL cities from table1 and table2.
SELECT city FROM table1
UNION ALL
SELECT city FROM table2;
CITY
----------
RIGA
RIGA
RIGA
TALLINN
TALLINN
TALLINN
VILNIUS
HELSINKI
HELSINKI
STOCKHOLM
RIGA
RIGA
VILNIUS
VILNIUS
VILNIUS
VILNIUS
HELSINKI
17 rows selected.
Example 4 UNION [DISTINCT] even with empty set may reduce number of rows. Compare result from first two queries with third query.
SELECT city FROM table1;
CITY
----------
RIGA
RIGA
RIGA
TALLINN
TALLINN
TALLINN
VILNIUS
HELSINKI
HELSINKI
STOCKHOLM
10 rows selected.
SELECT city FROM table3;
no rows selected
SELECT city FROM table1
UNION
SELECT city FROM table3;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
VILNIUS
Example 5 UNION ALL with empty set gives the same result as without it.
SELECT city FROM table1
UNION ALL
SELECT city FROM table3;
CITY
----------
RIGA
RIGA
RIGA
TALLINN
TALLINN
TALLINN
VILNIUS
HELSINKI
HELSINKI
STOCKHOLM
10 rows selected.
Example 6 Each query in Union must return the same number of columns.
SELECT * FROM table1
UNION
SELECT city FROM table2;
SELECT * FROM table1
*
ERROR at line 1:
ORA-01789: query block has incorrect number of result columns
Example 7 Of course query can be Unioned to itself. This time all rows are returned because combination of both columns is reviewed.
SELECT * FROM table1
UNION
SELECT * FROM table1;
ID CITY
---------- ----------
1 RIGA
2 RIGA
3 RIGA
4 TALLINN
5 TALLINN
6 TALLINN
7 VILNIUS
8 HELSINKI
9 HELSINKI
10 STOCKHOLM
10 rows selected.
Along with subquery factoring clause (or common table expression clause, "with" clause) UNION ALL can be used to generate some sample data without having actual tables. It has become very popular in Oracle forums.Example 8 Using "with" clause to generate sample test data to test inner join functionality.
WITH cities AS (
SELECT 1 as cty_id, 'RIGA' as city FROM dual
UNION ALL
SELECT 2, 'TALLINN' FROM dual
),
streets AS (
SELECT 1 as str_id, 1 as str_cty_id, 'BRIVIBAS' as street FROM dual
UNION ALL
SELECT 2, 2, 'NARVA MNT'FROM dual
)
SELECT city, street FROM cities
INNER JOIN streets ON (str_cty_id = cty_id);
CITY STREET
------- ---------
RIGA BRIVIBAS
TALLINN NARVA MNT
NULL values are considered equal when using with set operators. This is different than usually, for example, testing for eaquality.Example 9 Using "with" clause to generate two NULL values and unioning them.
WITH null1 AS (
SELECT NULL value FROM dual
),
null2 AS (
SELECT NULL value FROM dual
)
SELECT value FROM null1
UNION
SELECT value FROM null2;
V
-
1 row selected.
2.4 EXCEPT [DISTINCT] and EXCEPT ALL
EXCEPT returns unique rows that are returned by the first query but are NOT returned by the second query. EXCEPT ALL does the same but retains cardinality, for example, if the first query returns two values of X and second only one, then EXCEPT won't return X but EXCEPT ALL would return one instance of X.Oracle uses MINUS operator instead of EXCEPT, but the functionality is the same. None of the Oracle, SQL Server and MySQL has implemented EXCEPT ALL. It can be simulated using analytic functions as shown in Example 14 till Example 16.
Usually EXCEPT is used to compare date in different data sources (tables) to find differences, for example, differences in the same tables across test and production and/or actual copy and backup.
Visually Query1 EXCEPT Query2 can be expressed as follows:
Obviously diagram is not symmetric therefore for Query2 EXCEPT Query1 we get different picture:
2.4.1 Examples
Example 10 Cities in table1 except (minus) [distinct] cities in table2.
SELECT city FROM table1
MINUS
SELECT city FROM table2;
CITY
----------
STOCKHOLM
TALLINN
Example 11 Cities in table2 except (minus) [distinct] cities in table1. Of course the result is different than in Example 10.
SELECT city FROM table2
MINUS
SELECT city FROM table1;
no rows selected
As MINUS filters out duplicates then even subtracting empty set may reduce the initial set.Example 12 Cities in table1 except (minus) [distinct] empty set (cities in table3).
SELECT city FROM table1
MINUS
SELECT city FROM table3;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
VILNIUS
It is not possible in Oracle and SQL Server to use EXCEPT (MINUS) ALL directly.Example 13 Minus all doesn't exist in Oracle.
SELECT city FROM table1
MINUS ALL
SELECT city FROM table2;
MINUS ALL
*
ERROR at line 2:
ORA-00928: missing SELECT keyword
However using analytic functions and simple minus it is possible. The main idea for MINUS (EXCEPT) ALL is to retain cardinality, i.e. how many instances of each row exists in source sets. Here analytic function row_number() can help. It just increments counter for each row which is the same as previous and restarts if the row values change. Then we can use simple MINUS [DISTINCT], and show only business columns.Example 14 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all cities in table2.
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
) q;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
TALLINN
TALLINN
Example 15 Faked minus all using row_number() analytic function. Cities in table2 except (minus) all cities in table1.
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
) q;
CITY
----------
VILNIUS
VILNIUS
VILNIUS
Any set Minus all empty set doesn't change. Just like with Union all.Example 16 Faked minus all using row_number() analytic function. Cities in table1 except (minus) all empty set (cities in table3).
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table3
) q;
CITY
----------
HELSINKI
HELSINKI
RIGA
RIGA
RIGA
STOCKHOLM
TALLINN
TALLINN
TALLINN
VILNIUS
It is obvious that subtracting anything from empty set will always be empty set therefore I won't show you these examples. It is true for both modifications of except (minus) - distinct and all.2.5 INTERSECT [DISTINCT] and INTERSECT ALL
Intersect returns only these rows, which are in both tables. Intersect [distinct] returns just unique rows, but intersect all retains cardinality. Intersect is commutative, just like union - it is not important which query is the first, which second one.Picture for Query1 INTERSECT Query2 is as follows:
2.5.1 Examples
Example 17 Cities in table1 intersect [distinct] cities in table2.
SELECT city FROM table1
INTERSECT
SELECT city FROM table2;
CITY
----------
HELSINKI
RIGA
VILNIUS
Example 18 Cities in table2 intersect [distinct] empty set (cities in table3). Every intersection with empty set is empty set.
SELECT city FROM table1
INTERSECT
SELECT city FROM table3;
no rows selected
Intersect all is not possible in Oracle or SQL Server just like with Minus (Except) all. But we can use already known workaround.Example 19 Faked intersect all using row_number() analytic function. Cities in table1 intersect all cities in table2.
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
INTERSECT
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
) q;
CITY
----------
HELSINKI
RIGA
RIGA
VILNIUS
2.6 Raising it to higher levels - are two table data equal?
There are times when we need to find whether two table data are equal. And here I mean "really equal" i.e. both the rows are equal and in case of duplicate rows cardinality of them also are the same. So what we need is to test whether the "opposite" of Intersect i.e. rows that are returned only by the first query or the second query is empty set. Visually it would be as in following picture grey area would be empty.In set theory "the opposite" of intersect can be referred as Symmetric difference, which is similar to XOR (exclusive OR) in Boolean logic.
Unfortunately there isn't such Symmetric difference operator in SQL. So we need to be more creative. Looking at previous pictures throughout this article it is quite obvious what we need:
(Query 1 MINUS Query2)
UNION
(Query 2 MINUS Query1)
In case of absolutely unique rows it would be sufficient - as soon as it returns at least one row, tables' data are not equal. But. We have to remember that there might be duplicates and amount of them might be different in both result sets. So then we'd need:(Query 1 MINUS ALL Query2)
UNION ALL
(Query 2 MINUS ALL Query1)
Let's look at real examples.Example 20 Distinct Symmetric difference of table1 and table2.
(SELECT city FROM table1
MINUS
SELECT city FROM table2)
UNION
(SELECT city FROM table1
MINUS
SELECT city FROM table2);
CITY
----------
STOCKHOLM
TALLINN
It is obvious that somehow these tables are different. But exactly how? Then we'd need smarter query.Example 21 Symmetric difference retaining cardinality of table1 and table2.
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
) q
UNION ALL
SELECT city FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
) q;
CITY
----------
HELSINKI
RIGA
STOCKHOLM
TALLINN
TALLINN
TALLINN
VILNIUS
VILNIUS
VILNIUS
So these are rows that are left outside in one or another table. What if we'd like to know in what table exactly? Just add a flag column.Example 22 Symmetric difference retaining cardinality and showing what is missed of table1 and table2.
SELECT city, 2 flag FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
) q
UNION ALL
SELECT city, 1 flag FROM (
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table2
MINUS
SELECT city,
row_number() OVER (PARTITION BY city ORDER BY city) rn
FROM table1
) q;
CITY FLAG
---------- ----------
HELSINKI 2
RIGA 2
STOCKHOLM 2
TALLINN 2
TALLINN 2
TALLINN 2
VILNIUS 1
VILNIUS 1
VILNIUS 1
So we can see that table2 misses one Helsinki, one Stockholm and 3 Tallin rows and table1 misses 3 Vilnius rows. If we'd add these rows, then they'd contain exactly the same cities with exactly the same cardinality.
No comments:
Post a Comment