If you are reading this hoping that PostgreSQL finally got UPDATE ... LIMIT
like MySQL, I have to disappoint you. The LIMIT
clause is not yet supported for DML statements in PostgreSQL. If you want to UPDATE
only a limited number of rows, you have to use workarounds. This article will describe how to do this and how to avoid the pitfalls and race condition you may encounter. Note that most of the following also applies to DELETE ... LIMIT
!
Table of Contents
UPDATE ... LIMIT
The following table contains 100 categories of 100 entries each. Since we don't discuss persistence, an UNLOGGED
table will do:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
CREATE UNLOGGED TABLE upd ( id bigint GENERATED ALWAYS AS IDENTITY, category integer NOT NULL, flag boolean DEFAULT FALSE NOT NULL ); INSERT INTO upd (category) SELECT (i - 1) / 100 FROM generate_series(1, 100000) AS i; /* set hint bits, gather optimizer statistics */ VACUUM (ANALYZE) upd; /* created after INSERT for performance */ ALTER TABLE upd ADD PRIMARY KEY (id); CREATE INDEX ON upd (category); |
Now we would like to set the flag
on a single row with category 42:
1 2 3 4 |
UPDATE upd SET flag = TRUE WHERE category = 42 LIMIT 1; |
But alas, we cannot do that. We'll have to look for other ways to achieve our goal.
UPDATE ... LIMIT
with a simple subquerySince there is a LIMIT
clause for SELECT
, we can use a subquery as follows:
1 2 3 4 5 6 |
UPDATE upd AS u SET flag = TRUE WHERE u.category = 42 AND u.id IN (SELECT id FROM upd AS u2 WHERE u2.category = u.category LIMIT 1); |
This query works as intended if nobody else is using the table. But it can fail to work as expected in the face of concurrent data modifications:
1 2 3 |
BEGIN; DELETE FROM upd WHERE id = 4201; |
Now run our proposed update statement in a concurrent session, and it will hang. Now, if you COMMIT
the DELETE
, our UPDATE
statement will delete nothing:
1 |
UPDATE 0 |
How did this happen? It is true that the entire query sees the same state of the database (the same snapshot). Owing to the benefits of multiversioning, the subquery has no trouble reading the row that is being deleted. However, the UPDATE
has to take a FOR NO KEY UPDATE
row lock and has to wait. When the deleting transaction commits, the UPDATE
finds that the row is gone and has nothing to do. The documentation describes that as follows:
If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row.
UPDATE ... LIMIT
implementation by taking a lock in the subqueryThe problem with our first attempt was that we didn't prevent the row from getting deleted after the subquery saw it. To prevent that from happening, we should lock the row in the subquery. There is one additional consideration: we don't care which row exactly we modify, as long as the category is 42. Consequently, we can improve concurrency by using SKIP LOCKED
, so that we ignore locked rows and never have to wait:
1 2 3 4 5 6 7 |
UPDATE upd AS u SET flag = TRUE WHERE u.category = 42 AND u.id IN (SELECT id FROM upd AS u2 WHERE u2.category = u.category FOR UPDATE SKIP LOCKED LIMIT 1); |
Looks good, right? Well, here is the result when we execute the statement:
1 |
UPDATE 99 |
What? It updated all the remaining 99 rows in category 42! Before we file a bug against PostgreSQL for ignoring the LIMIT 1
clause, let's look at the execution plan:
1 2 3 4 5 6 7 8 9 10 11 12 |
QUERY PLAN --------------------------------------------------------------------------- Update on upd u -> Nested Loop Semi Join -> Index Scan using upd_category_idx on upd u Index Cond: (category = 42) -> Subquery Scan on "ANY_subquery" Filter: (u.id = "ANY_subquery".id) -> Limit -> LockRows -> Index Scan using upd_category_idx on upd u2 Index Cond: (category = u.category) |
PostgreSQL chose to execute the subquery using a nested loop join, and the subquery (which has the LockRows
node) ended up in the inner loop. That is, PostgreSQL scans the 99 rows in category 42 and executes the subquery for each of them. Now, because each update writes a new row version to the end of the table, each execution of the subquery encounters a different row first — and this turns out to be the current row found by the outer index scan.
This was a particularly surprising outcome. Other kinds of bad behavior could occur if concurrent transactions lock and unlock rows between individual executions of the subquery. The root cause of these problems is the same: PostgreSQL executes the subquery more than once. In my example, I caused PostgreSQL to choose such a plan by making the subquery a correlated subquery (it references u.category
from the outer query).
UPDATE ... LIMIT
implementationTo fix the nasty problem we encountered in the previous section, we have to make sure that PostgreSQL executes the subquery only once. in our case, it would be enough to change the first WHERE
condition to
1 |
WHERE u2.category = 42 |
But the best way to make sure that the subquery is calculated only once is to put it into a materialized CTE:
1 2 3 4 5 6 7 8 9 10 11 |
WITH u2 AS MATERIALIZED ( SELECT id FROM upd WHERE upd.category = 42 FOR UPDATE SKIP LOCKED LIMIT 1 ) UPDATE upd AS u SET flag = TRUE FROM u2 WHERE u.category = 42 AND u.id = u2.id; |
This results in the following execution plan:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
QUERY PLAN -------------------------------------------------------------- Update on upd u CTE u2 -> Limit -> LockRows -> Index Scan using upd_category_idx on upd Index Cond: (category = 42) -> Nested Loop -> CTE Scan on u2 -> Index Scan using upd_pkey on upd u Index Cond: (id = u2.id) Filter: (category = 42) |
This version should be free from race conditions and other oddities!
If you have read my diatribe against SELECT ... FOR UPDATE
, you will guess what comes next. FOR UPDATE
is not the correct lock for our UPDATE
! Rather, we should use FOR NO KEY UPDATE
. The advantage of using this lower row lock level is that it won't interfere with concurrent INSERT
statements on a table with a foreign key referencing our table. So this is the final version of our UPDATE ... LIMIT
statement:
1 2 3 4 5 6 7 8 9 10 11 |
WITH u2 AS MATERIALIZED ( SELECT id FROM upd WHERE upd.category = 42 FOR NO KEY UPDATE SKIP LOCKED LIMIT 1 ) UPDATE upd AS u SET flag = TRUE FROM u2 WHERE u.category = 42 AND u.id = u2.id; |
We tried to come up with a good workaround for the lack of UPDATE ... LIMIT
in PostgreSQL. Along the way, we encountered and solved several potential problems: race conditions, unexpected execution plans, and excessive locks. You can use the same technique if you want to DELETE
only a few rows (and then FOR UPDATE
would be the correct row lock).
You are currently viewing a placeholder content from Turnstile. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply