保持PostgreSQL有时select错误的查询计划

PostgreSQL性能的一个奇怪的问题，查询，使用PostgreSQL的8.4.9。此查询是在3D卷内select一组点，使用LEFT OUTER JOIN添加相关的ID列，其中存在相关的ID。 x范围的小改动可以使PostgreSQLselect一个不同的查询计划，这个计划的执行时间从0.01秒到50秒。这是问题的查询：

 SELECT treenode.id AS id, treenode.parent_id AS parentid, (treenode.location).x AS x, (treenode.location).y AS y, (treenode.location).z AS z, treenode.confidence AS confidence, treenode.user_id AS user_id, treenode.radius AS radius, ((treenode.location).z - 50) AS z_diff, treenode_class_instance.class_instance_id AS skeleton_id FROM treenode LEFT OUTER JOIN (treenode_class_instance INNER JOIN class_instance ON treenode_class_instance.class_instance_id = class_instance.id AND class_instance.class_id = 7828307) ON (treenode_class_instance.treenode_id = treenode.id AND treenode_class_instance.relation_id = 7828321) WHERE treenode.project_id = 4 AND (treenode.location).x >= 8000 AND (treenode.location).x <= (8000 + 4736) AND (treenode.location).y >= 22244 AND (treenode.location).y <= (22244 + 3248) AND (treenode.location).z >= 0 AND (treenode.location).z <= 100 ORDER BY parentid DESC, id, z_diff LIMIT 400;

该查询需要将近一分钟，如果我将EXPLAIN添加到该查询的前面，似乎正在使用以下查询计划：

  Limit (cost=56185.16..56185.17 rows=1 width=89) -> Sort (cost=56185.16..56185.17 rows=1 width=89) Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision)) -> Nested Loop Left Join (cost=6715.16..56185.15 rows=1 width=89) Join Filter: (treenode_class_instance.treenode_id = treenode.id) -> Bitmap Heap Scan on treenode (cost=148.55..184.16 rows=1 width=81) Recheck Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision) AND ((location).z >= 0::double precision) AND ((location).z <= 100::double precision)) Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4)) -> BitmapAnd (cost=148.55..148.55 rows=9 width=0) -> Bitmap Index Scan on location_x_index (cost=0.00..67.38 rows=2700 width=0) Index Cond: (((location).x >= 8000::double precision) AND ((location).x <= 12736::double precision)) -> Bitmap Index Scan on location_z_index (cost=0.00..80.91 rows=3253 width=0) Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision)) -> Hash Join (cost=6566.61..53361.69 rows=211144 width=16) Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id) -> Seq Scan on treenode_class_instance (cost=0.00..25323.79 rows=969285 width=16) Filter: (relation_id = 7828321) -> Hash (cost=5723.54..5723.54 rows=51366 width=8) -> Seq Scan on class_instance (cost=0.00..5723.54 rows=51366 width=8) Filter: (class_id = 7828307) (20 rows)

但是，如果我将x范围条件中的8000replace为10644 ，则查询将在几分之一秒内执行，并使用以下查询计划：

  Limit (cost=58378.94..58378.95 rows=2 width=89) -> Sort (cost=58378.94..58378.95 rows=2 width=89) Sort Key: treenode.parent_id, treenode.id, (((treenode.location).z - 50::double precision)) -> Hash Left Join (cost=57263.11..58378.93 rows=2 width=89) Hash Cond: (treenode.id = treenode_class_instance.treenode_id) -> Bitmap Heap Scan on treenode (cost=231.12..313.44 rows=2 width=81) Recheck Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision) AND ((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision)) Filter: (((location).y >= 22244::double precision) AND ((location).y <= 25492::double precision) AND (project_id = 4)) -> BitmapAnd (cost=231.12..231.12 rows=21 width=0) -> Bitmap Index Scan on location_z_index (cost=0.00..80.91 rows=3253 width=0) Index Cond: (((location).z >= 0::double precision) AND ((location).z <= 100::double precision)) -> Bitmap Index Scan on location_x_index (cost=0.00..149.95 rows=6157 width=0) Index Cond: (((location).x >= 10644::double precision) AND ((location).x <= 15380::double precision)) -> Hash (cost=53361.69..53361.69 rows=211144 width=16) -> Hash Join (cost=6566.61..53361.69 rows=211144 width=16) Hash Cond: (treenode_class_instance.class_instance_id = class_instance.id) -> Seq Scan on treenode_class_instance (cost=0.00..25323.79 rows=969285 width=16) Filter: (relation_id = 7828321) -> Hash (cost=5723.54..5723.54 rows=51366 width=8) -> Seq Scan on class_instance (cost=0.00..5723.54 rows=51366 width=8) Filter: (class_id = 7828307) (21 rows)

我远不是parsing这些查询计划的专家，但明显的区别似乎是，对于一个x范围，它使用Hash Left Join （非常快）的Hash Left Join ，而使用另一个范围一个Nested Loop Left Join （这似乎是非常缓慢）。在这两种情况下查询返回大约90行。如果我在查询的缓慢版本之前将SET ENABLE_NESTLOOP TO FALSE ，它会变得非常快，但是我明白使用这个设置通常是一个坏主意。

例如，我可以创build一个特定的索引，以使查询规划者更有可能select明显更有效的策略吗？任何人都可以提出为什么PostgreSQL的查询计划应该为这些查询之一select这样一个糟糕的策略？下面我列出了可能有用的模式的细节。

treenode表有900,000行，定义如下：

  Table "public.treenode" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null location | double3d | not null parent_id | bigint | radius | double precision | not null default 0 confidence | integer | not null default 5 Indexes: "treenode_pkey" PRIMARY KEY, btree (id) "treenode_id_key" UNIQUE, btree (id) "location_x_index" btree (((location).x)) "location_y_index" btree (((location).y)) "location_z_index" btree (((location).z)) Foreign-key constraints: "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id) Referenced by: TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE TABLE "treenode" CONSTRAINT "treenode_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES treenode(id) Triggers: on_edit_treenode BEFORE UPDATE ON treenode FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: location

double3d复合types定义如下：

 Composite type "public.double3d" Column | Type --------+------------------ x | double precision y | double precision z | double precision

参与连接的另外两个表是treenode_class_instance ：

  Table "public.treenode_class_instance" Column | Type | Modifiers -------------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null relation_id | bigint | not null treenode_id | bigint | not null class_instance_id | bigint | not null Indexes: "treenode_class_instance_pkey" PRIMARY KEY, btree (id) "treenode_class_instance_id_key" UNIQUE, btree (id) "idx_class_instance_id" btree (class_instance_id) Foreign-key constraints: "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE "treenode_class_instance_relation_id_fkey" FOREIGN KEY (relation_id) REFERENCES relation(id) "treenode_class_instance_treenode_id_fkey" FOREIGN KEY (treenode_id) REFERENCES treenode(id) ON DELETE CASCADE "treenode_class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id) Triggers: on_edit_treenode_class_instance BEFORE UPDATE ON treenode_class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: relation_instance

…和class_instance ：

  Table "public.class_instance" Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------ id | bigint | not null default nextval('concept_id_seq'::regclass) user_id | bigint | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | bigint | not null class_id | bigint | not null name | character varying(255) | not null Indexes: "class_instance_pkey" PRIMARY KEY, btree (id) "class_instance_id_key" UNIQUE, btree (id) Foreign-key constraints: "class_instance_class_id_fkey" FOREIGN KEY (class_id) REFERENCES class(id) "class_instance_user_id_fkey" FOREIGN KEY (user_id) REFERENCES "user"(id) Referenced by: TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_a_fkey" FOREIGN KEY (class_instance_a) REFERENCES class_instance(id) ON DELETE CASCADE TABLE "class_instance_class_instance" CONSTRAINT "class_instance_class_instance_class_instance_b_fkey" FOREIGN KEY (class_instance_b) REFERENCES class_instance(id) ON DELETE CASCADE TABLE "connector_class_instance" CONSTRAINT "connector_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) TABLE "treenode_class_instance" CONSTRAINT "treenode_class_instance_class_instance_id_fkey" FOREIGN KEY (class_instance_id) REFERENCES class_instance(id) ON DELETE CASCADE Triggers: on_edit_class_instance BEFORE UPDATE ON class_instance FOR EACH ROW EXECUTE PROCEDURE on_edit() Inherits: concept

如果查询计划人员做出错误的决定，则主要是两件事情之一：

统计数字已经closures了。

_{意思是“不准确”，而不是“closures”。}

你足够运行ANALYZE了吗？也stream行在它的组合formsVACUUM ANALYZE 。如果autovacuum开启（这是现代Postgres中的默认设置）， ANALYZE将自动运行。但是请考虑：

9.1还是build议使用常规的VACUUM ANALYZE？

^{（前两个答案仍然适用于Postgres 9.6。）}

如果你的表很大，数据分布不规则 ，那么提高default_statistics_target可能会有帮助。或者说，只需设置相关列的统计目标（基本上是在查询的WHERE或JOIN子句中）：

 ALTER TABLE ... ALTER COLUMN ... SET STATISTICS 1234; -- calibrate number

目标可以设置在0到10000范围内;

之后再次运行ANALYZE （在相关表格上）。

2.计划者估算的成本设置已closures。

阅读手册中的“ 计划员成本常数 ”一章。

在这个通常有用的PostgreSQL Wiki页面上查看章节default_statistics_target和random_page_cost 。

当然，还有其他很多可能的原因，但这是迄今为止最常见的原因。

我很怀疑，除非你考虑数据库统计和自定义数据types的组合，否则这与统计数据不好有关。

我的猜测是PostgreSQL正在select一个嵌套循环连接，因为它查看谓词(treenode.location).x >= 8000 AND (treenode.location).x <= (8000 + 4736) ，并且在你的比较。 嵌套循环通常在连接内部有less量数据时使用。

但是，一旦你把这个常数改为10736，你会得到一个不同的计划。 遗传查询优化（GEQO）正在踢入，并且您看到非确定性计划构build的副作用，这个计划总是有可能是非常复杂的。在查询的评估顺序中有足够的差异，让我觉得这是怎么回事。

一种select是使用参数化/准备语句来检查而不是使用临时代码。既然你在三维空间工作，你可能也想考虑使用PostGIS 。虽然这可能是矫枉过正，但它也可能能够为您提供所需的性能，以使这些查询正常运行。

虽然强制规划者的行为不是最好的select，但有时我们最终会做出比软件更好的决策。

欧文对这个统计数据说了些什么。也：

 ORDER BY parentid DESC, id, z_diff

sorting

 parentid DESC, id, z

可能会给优化者更多的空间来洗牌。（我认为这不是什么问题，因为这是最后一个期限，sorting并不昂贵，但是可以试试看）

我不是积极的，这是你的问题的来源，但它看起来像版本8.4.8和8.4.9之间postgres查询计划器中所做的一些变化。你可以尝试使用旧版本，看看是否有所作为。

http://postgresql.1045698.n5.nabble.com/BUG-6275-Horrible-performance-regression-td4944891.html

如果更改版本，请不要忘记重新分析表格。

保持PostgreSQL有时select错误的查询计划

统计数字已经closures了。

2.计划者估算的成本设置已closures。

为什么StringBuilder的链式模式sb.append（x）.append（y）比普通的sb.append（x）更快; sb.append（Y）？

你能从一个MethodInfo对象得到一个Func <T>（或类似的）吗？

启用C ++ 11时，std :: vector的性能回归

如何在Scala中优化理解和循环？

为什么Haskell程序比等效的Python程序慢得多？

分析PHP代码

列表理解和function函数比“for循环”更快吗？

Jaro-Winkler和Levenshtein距离之间的区别？

任何方式来编写一个Windows .bat文件来杀死进程？

PHP中全局variables和函数参数之间的优缺点？