Rebuilding large GIN indexes

takes time. A lot of time. With rebuilding, I mean when you for example have to UPDATE all your tsvector fields due to a change in tsearch2 configuration. (Yes, I had to do that because I had a slightly incorrect tsearch2 configuration for the archives search database). So don't do it. Instead use the fact that PostgreSQL has nice transactional DDL and do something like this:

BEGIN TRANSACTION;

CREATE TABLE messages_new AS SELECT id,txt,to_tsvector(txt) AS fti

  FROM messages;

CREATE INDEX messages_new_fti ON messages_new USING gin(fti);

ANALYZE messags_new;

ALTER TABLE messages RENAME TO messages_old;

ALTER TABLE messages_new RENAME TO messages;

COMMIT;

DROP TABLE messages_old;

ALTER INDEX messages_new_fti RENAME TO messages_fti;

(apologies for any typos, I didn't bother to actually type these commands into the database again, and I lost my cut-and-paste of what I ran)

This way, the messages table can still serve up searches without any disruption to the searches at all. And creating the new index is a lot faster than updating the existing one if you have to touch all rows.


Add comment

New comments can no longer be posted on this entry.

Conferences

I speak at and organize conferences around Open Source in general and PostgreSQL in particular.

Upcoming

SCaLE 2023
Mar 9-12, 2023
Pasadena, CA, USA
Nordic PGDay 2023
Mar 21, 2023
Stockholm, Sweden
pgday.Paris 2023
Mar 23, 2023
Paris, France
PGCon 2023
May 30-Jun 2, 2023
Ottawa, Canada
PGConf.EU 2023
Dec 12-15, 2023
Prague, Czechia

Past

FOSDEM 2023
Feb 2-5, 2023
Brussels, Belgium
PGConf.EU 2022
Oct 25-28, 2022
Berlin, Germany
PGConf.NYC 2022
Sep 22-23, 2022
New York, USA
Swiss PGDay 2022
Jul 1, 2022
Rapperswil, Switzerland
pgCon 2022
May 24-27, 2022
Online, Online
More past conferences