hash join(2)
今天和同事讨论hash join,发现我之前的想法有些问题:
我的最初的理解是:扫描probe table找到对应bucket时,扫描bucket link并匹配其中的每个值,最后将自己挂在link的最后,这样bucket link的长度就会越来越长,扫描的成本就会越来越大。但是这样明显有几个矛盾:第一,这样做会导致hash table的大小不断增大,这样join到最后可能hash area根本无法容纳一个hash table,而Oracle在估算成本时,明显只根据build table来决定分区数,也说明hash table不会增大。第二,在hash bucket中只需要精确匹配build table中的值就可以了(因为hash碰撞的原因),完全没必要把probe table的值也挂在hash bucket中。第三,最大的矛盾是Oracle在join后完全可以将这个结果集直接返回给用户,没有必要把这些内容也保存在hash table中。
正确的推测应该是:hash partition number,hash cluster size,hash bucket number,hash bucket length等等在hash join开始时,就由build table决定了,在整个build过程中,不再发生变化。当join时发现了匹配的行,就直接返回结果给用户。所以在作hash join时,build table中join的字段要尽可能的唯一(或者重复的很少),否则性能可能会很差。比如5000条记录只有5个不同的值,如果作为build table,则最多使用10个bucket,每个bucket是一个1000个值的link,这样hash join的性能将很差。
下面的trace文件可以很明显的看出,数据分布在partition 0,3,4,6中,一共有8187个bucket,但是只有5个bucket有数据,每个bucket里面有1000行。
Stargate release Partition:0 rows:1000 clusters:1 slots:1 kept=1 Partition:1 rows:0 clusters:0 slots:0 kept=1 Partition:2 rows:0 clusters:0 slots:0 kept=1 Partition:3 rows:2000 clusters:1 slots:1 kept=1 Partition:4 rows:1000 clusters:1 slots:1 kept=1 Partition:5 rows:0 clusters:0 slots:0 kept=1 Partition:6 rows:1000 clusters:1 slots:1 kept=1 Partition:7 rows:0 clusters:0 slots:0 kept=1
Number of buckets with 0 rows: 8187
Number of buckets with 1 rows: 0
Number of buckets with 2 rows: 0
Number of buckets with 3 rows: 0
Number of buckets with 4 rows: 0
Number of buckets with 5 rows: 0
Number of buckets with 6 rows: 0
Number of buckets with 7 rows: 0
Number of buckets with 8 rows: 0
Number of buckets with 9 rows: 0
Number of buckets with between 10 and 19 rows: 0
Number of buckets with between 20 and 29 rows: 0
Number of buckets with between 30 and 39 rows: 0
Number of buckets with between 40 and 49 rows: 0
Number of buckets with between 50 and 59 rows: 0
Number of buckets with between 60 and 69 rows: 0
Number of buckets with between 70 and 79 rows: 0
Number of buckets with between 80 and 89 rows: 0
Number of buckets with between 90 and 99 rows: 0
Number of buckets with 100 or more rows: 5
### Hash table overall statistics ###
Total buckets: 8192 Empty buckets: 8187 Non-empty buckets: 5 Sherman’s Way ipod
Total number of rows: 5000
Maximum number of rows in a bucket: 1000
Average number of rows in non-empty buckets: 1000.000000
–EOF–