Sql 语句中的EXISTS的详细说明.

隐者黑鹰88 2016-11-07

展开全文

经常别人说EXISTS比IN快！NOT EXISTS比NOT IN快！然而事实真的如此么？（EXISTS它主要是判断条件结果的真和假，隐式返回True&false）

我们先讨论IN和EXISTS。

select * from t1 where x in ( select y from t2 )

事实上可以理解为：

select *

from t1, ( select distinct y from t2 ) t2

where t1.x = t2.y;

——如果你有一定的SQL优化经验，从这句很自然的可以想到t2绝对不能是个大表，因为需要对t2进行全表的“唯一排序”，如果t2很大这个排序的性能是不可忍受的。但是t1可以很大，为什么呢？最通俗的理解就是因为t1.x=t2.y可以走索引。但这并不是一个很好的解释。试想，如果t1.x和t2.y都有索引，我们知道索引是种有序的结构，因此t1和t2之间最佳的方案是走merge join。另外，如果t2.y上有索引，对t2的排序性能也有很大提高。

select * from t1 where exists ( select null from t2 where y = x )

可以理解为：

for x in ( select * from t1 )

loop

if ( exists ( select null from t2 where y = x.x )

then

OUTPUT THE RECORD!

end if

end loop

——这个更容易理解，t1永远是个表扫描！因此t1绝对不能是个大表，而t2可以很大，因为y=x.x可以走t2.y的索引。

综合以上对IN/EXISTS的讨论，我们可以得出一个基本通用的结论：IN适合于外表大而内表小的情况；EXISTS适合于外表小而内表大的情况。

如果你对上述说法表示怀疑，请看以下测试：

********************************************************************************

SQL> create table big as select * from all_objects;

表已创建。

SQL> insert /*+ append */ into big select * from big;

已创建26872行。

SQL> commit;

提交完成。

SQL> insert /*+ append */ into big select * from big;

已创建53744行。

SQL> commit;

提交完成。

SQL> insert /*+ append */ into big select * from big;

已创建107488行。

SQL> commit;

提交完成。

SQL> create index big_idx on big(object_id);

索引已创建。

SQL> create table small as select * from all_objects where rownum < 100;

表已创建。

SQL> create index small_idx on small(object_id);

索引已创建。

********************************************************************************

运行SQL并设置EVENT=10046，用TKPROF格式化TRACE文件，结果如下。

大表在外，小表在内的测试：

********************************************************************************

select count(subobject_name)

from big

where object_id in ( select object_id from small )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.01 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.14 29 900 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.15 29 900 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

792 TABLE ACCESS (BY INDEX ROWID) OF 'BIG'

892 NESTED LOOPS

99 VIEW OF 'VW_NSO_1'

99 SORT (UNIQUE)

99 TABLE ACCESS (FULL) OF 'SMALL'

792 INDEX (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

select count(subobject_name)

from big

where exists ( select null from small where small.object_id = big.object_id )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 1.90 2.72 2917 216125 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 1.90 2.72 2917 216125 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

792 FILTER

214976 TABLE ACCESS (FULL) OF 'BIG'

225 INDEX (RANGE SCAN) OF 'SMALL_IDX' (NON-UNIQUE)

********************************************************************************

用IN的性能数据：

cpu=0.00 elapsed=0.15 query=900 current=0 disk=29

用EXISTS的性能数据：

cpu=1.90 elapsed=2.72 query=216125 current=0 disk=2917

——在CPU的消耗和LIO、PIO上的对比十分明显，IN的效率高得多！

大表在内，小表在外的测试：

********************************************************************************

select count(subobject_name)

from small

where object_id in ( select object_id from big )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.41 1.71 2917 2982 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.41 1.72 2917 2982 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

99 TABLE ACCESS (BY INDEX ROWID) OF 'SMALL'

26972 NESTED LOOPS

26872 VIEW OF 'VW_NSO_1'

26872 SORT (UNIQUE)

214976 TABLE ACCESS (FULL) OF 'BIG'

99 INDEX (RANGE SCAN) OF 'SMALL_IDX' (NON-UNIQUE)

select count(subobject_name)

from small

where exists ( select null from big where small.object_id = big.object_id )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.00 0 202 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.00 0 202 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

99 FILTER

99 TABLE ACCESS (FULL) OF 'SMALL'

99 INDEX (RANGE SCAN) OF 'BIG_IDX' (NON-UNIQUE)

********************************************************************************

用IN的性能数据：

cpu=0.41 elapsed=1.72 query=2982 current=26 disk=2917

用EXISTS的性能数据：

cpu=0.00 elapsed=0.00 query=202 current=0 disk=0

——在CPU的消耗和PIO、LIO上的对比十分明显，EXISTS效率高得多！

有些遗憾的是我这个测试是在RBO下进行的，RBO是个死板的只根据优先级来确定执行计划的优化器，RBO不会评估实际的执行计划对系统造成的影响。在RBO中NESTED LOOP的优先级要远远大于MERGE JOIN，只要能走NESTED LOOP RBO就绝不会走MERGE JOIN。如果你用的是CBO，并且对表、索引做过统计分析，上面IN的测试一定会选择走MERGE JOIN。我们用HINTS在RBO下强制走MERGE JOIN对比一下这个SQL分别走MJ和NL的性能：

********************************************************************************

select count(subobject_name)

from small

where object_id in ( select/*+ use_merge(small big) */ object_id from big )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.01 0.17 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.09 0.27 187 473 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.10 0.44 187 473 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

99 MERGE JOIN

26872 SORT (UNIQUE)

214976 INDEX (FULL SCAN) OF 'BIG_IDX' (NON-UNIQUE)

99 SORT (JOIN)

99 TABLE ACCESS (FULL) OF 'SMALL'

********************************************************************************

可以看到：

NESTED LOOP：cpu=0.41 elapsed=1.72 query=2982 current=26 disk=2917

MERGE JOIN：cpu=0.10 elapsed=0.44 query=437 current=2 disk=187

——这也证实了我上面的说法。很多人不敢让自己的SQL走merge join，其实对于两个已经具有排序结构的表merge join是最佳选择。

下面我们讨论NOT IN和NOT EXISTS，我把它们放在一起讨论实属被逼无奈，因为很多人喜欢拿它们比较。其实NOT IN/NOT EXISTS与IN/EXISTS不一样，IN/EXISTS是完全可以作为等价替换结构的，而NOT IN/NOT EXISTS则不同，它们并不是等价替换结构！只有当子查询中不可能返回空值时，NOT IN/NOT EXISTS才可以等价替换。

为什么？请看：

********************************************************************************

SQL> conn scott/tiger@tdb;

已连接。

SQL> select count(*)

2 from emp

3 where mgr is null;

COUNT(*)

----------------

SQL> select count(*)

2 from emp

3 where empno not in ( select mgr from emp );

COUNT(*)

----------------

SQL> select count(*)

2 from emp t1

3 where not exists ( select null

4 from emp t2

5 where t2.mgr = t1.empno );

COUNT(*)

----------------

********************************************************************************

如果子查询中返回的结果集含有空值NOT IN永远是0，因为NULL代表“未知”，任何值和NULL比较永远是false。

现在我们基于假设——子查询中不返回空值，来比较NOT IN和NOT EXISTS。

在RBO中如果不使用HINTS来改变NOT IN的执行计划，几乎任何时候NOT IN都比NOT EXISTS慢得多，在CBO中如果具有准确的统计信息NOT IN的效率和NOT EXISTS的一样，甚至会比NOT EXISTS快得多。

调整NOT IN性能的基本原则是：如果想让NOT IN跑得快就必须走合适的连接。

select * from t1 where x not in ( select y from t2 )

以这个句子为例（y无空值）

这个句子可以等价替换为：

a) select * from t1 where not exists ( select null from t2 where t2.y=t1.x)

或

b) select t1.* from t1, t2 where t1.x=t2.y(+) and t2.y is null

测试如下：

********************************************************************************

SQL> create table t1 as select * from all_objects where rownum <= 5000;

表已创建。

SQL> create table t2 as select * from all_objects where rownum <= 4950;

表已创建。

SQL> create index t2_idx on t2(object_id);

索引已创建。

********************************************************************************

RBO下的测试：

********************************************************************************

select count(*)

from t1 rbo

where object_id not in ( select object_id from t2 )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 6.13 19.12 127487 197502 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 6.13 19.13 127487 197502 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

50 FILTER

5000 TABLE ACCESS (FULL) OF 'T1'

4950 TABLE ACCESS (FULL) OF 'T2'

select count(*)

from t1 rbo

where not exists ( select null from t2 where t2.object_id = rbo.object_id)

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.01 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.01 0.12 83 10075 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.02 0.12 83 10075 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

50 FILTER

5000 TABLE ACCESS (FULL) OF 'T1'

4950 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE)

select count(*)

from t1, t2 rbo

where t1.object_id = rbo.object_id(+) and rbo.object_id is null

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.05 72 5087 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.06 72 5087 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

50 FILTER

5000 NESTED LOOPS (OUTER)

5000 TABLE ACCESS (FULL) OF 'T1'

4950 INDEX (RANGE SCAN) OF 'T2_IDX' (NON-UNIQUE)

********************************************************************************

RBO自己选择的执行计划，性能数据：

NOT IN：cpu=6.13 elapsed=19.13 query=197502 current=0 disk=127487

NOT EXISTS：cpu=0.02 elapsed=0.12 query=10075 current=0 disk=83

OUTER JOIN：cpu=0.00 elapsed=0.06 query=5087 current=0 disk=72

——NOT EXISTS的效率比NOT IN好很多，但与OUTER JOIN相比NOT EXISTS的效率略低。

RBO，用HINTS改变NOT IN的执行计划：

********************************************************************************

select count(*)

from t1 rbo

where object_id not in ( select/*+ hash_aj(rbo t2) */ object_id from t2 )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.04 0.45 0 3 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.01 0.09 48 191 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.05 0.55 48 194 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

0 SORT (AGGREGATE)

0 HASH JOIN (ANTI)

0 TABLE ACCESS (FULL) OF 'T1'

0 INDEX (FAST FULL SCAN) OF 'T2_IDX' (NON-UNIQUE)

********************************************************************************

在只有t2.object_id有索引的情况下，hash join-anti性能数据如下：

HJ-ANTI：cpu=0.05 elapsed=0.55 query=194 current=0 disk=48

——性能好了很多！

在t1.object_id上建立索引，使用merge join-anti：

********************************************************************************

select count(*)

from t1 rbo

where object_id not in ( select /*+ merge_aj(rbo t2) */ object_id from t2 )

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.00 0 28 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.00 0 28 0 1

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT GOAL: CHOOSE

1 SORT (AGGREGATE)

50 MERGE JOIN (ANTI)

5000 INDEX (FULL SCAN) OF 'T1_IDX' (NON-UNIQUE)

4950 SORT (UNIQUE)

4950 INDEX (FAST FULL SCAN) OF 'T2_IDX' (NON-UNIQUE)

********************************************************************************

在t1.object_id上建立索引，merge join-anti的性能数据如下：

MJ-ANTI：cpu=0.00 elapsed=0.00 query=28 current=0 disk=0

——这个NOT IN语句在t1.object_id、t2.object_id都有索引的情况下，merge join-anti的效率高于上面的任何SQL。

综上，只要NOT IN走合适的连接，其效率很高甚至高于NOT EXISTS和OUTER JOIN。

本站是提供个人知识管理的网络存储空间，所有内容均由用户发布，不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息，谨防诈骗。如发现有害或侵权内容，请点击一键举报。

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自：隐者黑鹰88 > 《技术其他》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

隐者黑鹰88

关注对话

TA的最新馆藏

很绅士的吸引人网名有男人味的绅士昵称
成年人的世界
不能没有你
怎么让员工服从你下达的命令
梦对人身心的影响
噩梦连连的七个原因及解决方法

喜欢该文的人也喜欢更多

热门阅读换一换