PostgreSQL：文件：16：69.3. 可擴充性

文件 → PostgreSQL 16

支援版本：目前 (16) / 15 / 14 / 13 / 12

開發版本：devel

不支援版本：11 / 10 / 9.6 / 9.5 / 9.4 / 9.3 / 9.2

69.3. 可擴充性
上一頁	向上	第 69 章 SP-GiST 索引	首頁	下一頁

69.3. 可擴充性 #

SP-GiST 提供一個高度抽象的介面，要求存取方法開發人員僅實作特定於給定資料類型的函式。SP-GiST 核心負責有效率的磁碟對應和搜尋樹狀結構。它也處理並行性和記錄考量。

SP-GiST 樹狀結構的葉元組通常包含與索引欄位相同資料類型的值，儘管它們也可能包含索引欄位的失真表示。儲存在根層級的葉元組將直接表示原始索引資料值，但較低層級的葉元組可能僅包含部分值，例如字尾。在這種情況下，運算子類別支援函式必須能夠使用從傳遞到葉層級的內部元組中累積的資訊重建原始值。

當 SP-GiST 索引使用 INCLUDE 欄位建立時，這些欄位的數值也會儲存在葉元組中。 INCLUDE 欄位與 SP-GiST 算子類別無關，因此在此不再進一步討論。

內部元組較為複雜，因為它們是搜尋樹中的分支點。每個內部元組都包含一組一個或多個節點，代表相似的葉值群組。節點包含一個下行連結，指向另一個較低層級的內部元組，或指向位於同一個索引頁面上的葉元組清單。每個節點通常都有描述其的標籤；例如，在基數樹中，節點標籤可以是字串值的下一字元。（或者，如果算子類別對所有內部元組使用一組固定的節點，則可以省略節點標籤；請參閱第 69.4.2 節。）內部元組也可以選擇性地具有描述其所有成員的 前置詞值。在基數樹中，這可以是所表示字串的共同前置詞。前置詞值不一定真的是前置詞，但可以是算子類別需要的任何資料；例如，在四叉樹中，它可以儲存四個象限相對於其測量的中心點。四叉樹內部元組也會包含對應於此中心點周圍象限的四個節點。

有些樹演算法需要知道目前元組的層級（或深度），因此 SP-GiST 核心提供算子類別在遞減樹時管理層級計數的可能性。當需要時，它也支援遞增重建所表示的值，以及在樹遞減時傳遞其他資料（稱為 橫越值）。

注意

核心程式碼 SP-GiST 會處理空值輸入。雖然 SP-GiST 索引會儲存索引欄位中的空值輸入，但這會隱藏在索引運算子類別程式碼中：不會將任何空值索引輸入或搜尋條件傳遞給運算子類別方法。（假設 SP-GiST 運算子是嚴格的，因此無法對空值成功。）因此，這裡不會進一步討論空值。

針對 SP-GiST 的索引運算子類別必須提供五種使用者定義方法，其中兩種是選用的。所有五種強制性方法都遵循慣例，接受兩個 internal 參數，第一個參數是指向包含支援方法輸入值的 C 結構的指標，而第二個參數是指向必須放置輸出值的 C 結構的指標。四種強制性方法只會傳回 void，因為其所有結果都會顯示在輸出結構中；但 leaf_consistent 會傳回 boolean 結果。這些方法不得修改其輸入結構的任何欄位。在所有情況下，在呼叫使用者定義的方法之前，輸出結構會初始化為零。選用的第六個方法 compress 接受要編入索引的 datum 作為唯一參數，並傳回適合在葉元組中實體儲存的值。選用的第七個方法 options 接受指向 C 結構的 internal 指標，其中應放置特定於運算子類別的參數，並傳回 void。

五種強制性使用者定義方法為

config

傳回有關索引實作的靜態資訊，包括前綴和節點標籤資料類型的資料類型 OID。

SQL 函數宣告必須如下所示

CREATE FUNCTION my_config(internal, internal) RETURNS void ...

第一個參數是指向包含函數輸入資料的 spgConfigIn C 結構的指標。第二個參數是指向 spgConfigOut C 結構的指標，函數必須使用結果資料填入此結構。

typedef struct spgConfigIn
{
    Oid         attType;        /* Data type to be indexed */
} spgConfigIn;

typedef struct spgConfigOut
{
    Oid         prefixType;     /* Data type of inner-tuple prefixes */
    Oid         labelType;      /* Data type of inner-tuple node labels */
    Oid         leafType;       /* Data type of leaf-tuple values */
    bool        canReturnData;  /* Opclass can reconstruct original data */
    bool        longValuesOK;   /* Opclass can cope with values > 1 page */
} spgConfigOut;

attType 會傳遞以支援多型索引運算子類別；對於一般的固定資料類型運算子類別，它總是會有相同的值，因此可以忽略。

對於不使用前綴的運算子類別，prefixType 可以設定為 VOIDOID。同樣地，對於不使用節點標籤的運算子類別，labelType 可以設定為 VOIDOID。如果運算子類別有能力重建最初提供的索引值，則應將 canReturnData 設定為 true。僅當 attType 為可變長度，且運算子類別有能力透過重複附加字尾來區隔長值時，才應將 longValuesOK 設定為 true（請參閱第 69.4.1 節）。

leafType 應與操作類別的 opckeytype 目錄項目所定義的索引儲存類型相符。（請注意，opckeytype 可能為零，表示儲存類型與操作類別的輸入類型相同，這是最常見的情況。）基於向後相容性的考量，config 方法可以將 leafType 設定為其他值，且會使用該值；但由於索引內容會在目錄中錯誤識別，因此此做法已不建議使用。此外，也可以將 leafType 保留為未初始化狀態（零）；這表示索引儲存類型是從 opckeytype 衍生的。

當 attType 和 leafType 不同時，必須提供選用方法 compress。方法 compress 負責將要編入索引的資料從 attType 轉換為 leafType。

choose

選擇一種方法，將新值插入內部元組。

SQL 函數宣告必須如下所示

CREATE FUNCTION my_choose(internal, internal) RETURNS void ...

第一個引數是指向 C 結構 spgChooseIn 的指標，其中包含函式的輸入資料。第二個引數是指向 C 結構 spgChooseOut 的指標，函式必須以結果資料填入該結構。

typedef struct spgChooseIn
{
    Datum       datum;          /* original datum to be indexed */
    Datum       leafDatum;      /* current datum to be stored at leaf */
    int         level;          /* current level (counting from zero) */

    /* Data from current inner tuple */
    bool        allTheSame;     /* tuple is marked all-the-same? */
    bool        hasPrefix;      /* tuple has a prefix? */
    Datum       prefixDatum;    /* if so, the prefix value */
    int         nNodes;         /* number of nodes in the inner tuple */
    Datum      *nodeLabels;     /* node label values (NULL if none) */
} spgChooseIn;

typedef enum spgChooseResultType
{
    spgMatchNode = 1,           /* descend into existing node */
    spgAddNode,                 /* add a node to the inner tuple */
    spgSplitTuple               /* split inner tuple (change its prefix) */
} spgChooseResultType;

typedef struct spgChooseOut
{
    spgChooseResultType resultType;     /* action code, see above */
    union
    {
        struct                  /* results for spgMatchNode */
        {
            int         nodeN;      /* descend to this node (index from 0) */
            int         levelAdd;   /* increment level by this much */
            Datum       restDatum;  /* new leaf datum */
        }           matchNode;
        struct                  /* results for spgAddNode */
        {
            Datum       nodeLabel;  /* new node's label */
            int         nodeN;      /* where to insert it (index from 0) */
        }           addNode;
        struct                  /* results for spgSplitTuple */
        {
            /* Info to form new upper-level inner tuple with one child tuple */
            bool        prefixHasPrefix;    /* tuple should have a prefix? */
            Datum       prefixPrefixDatum;  /* if so, its value */
            int         prefixNNodes;       /* number of nodes */
            Datum      *prefixNodeLabels;   /* their labels (or NULL for
                                             * no labels) */
            int         childNodeN;         /* which node gets child tuple */

            /* Info to form new lower-level inner tuple with all old nodes */
            bool        postfixHasPrefix;   /* tuple should have a prefix? */
            Datum       postfixPrefixDatum; /* if so, its value */
        }           splitTuple;
    }           result;
} spgChooseOut;

datum 是 spgConfigIn 的原始資料，attType 類型將插入索引中。 leafDatum 是 spgConfigOut 的值，leafType 類型，最初是方法 compress 套用在 datum 時的結果，如果提供方法 compress，否則與 datum 的值相同。如果 choose 或 picksplit 方法變更 leafDatum，則 leafDatum 可以在樹狀結構的較低層級變更。當插入搜尋到達葉子頁面時，leafDatum 的目前值會儲存在新建立的葉子元組中。 level 是目前內部元組的層級，從根層級的零開始。如果目前內部元組標記為包含多個等效節點，則 allTheSame 為 true（請參閱第 69.4.3 節）。如果目前內部元組包含前置詞，則 hasPrefix 為 true；如果是，prefixDatum 是其值。 nNodes 是內部元組中包含的子節點數目，nodeLabels 是其標籤值的陣列，如果沒有標籤，則為 NULL。

choose 函式可以判斷新值是否符合其中一個現有子節點，或必須新增新的子節點，或新值與元組前置詞不一致，因此必須分割內部元組以建立較不具限制性的前置詞。

如果新值符合其中一個現有子節點，請將 resultType 設定為 spgMatchNode。將 nodeN 設定為節點陣列中該節點的索引（從零開始）。將 levelAdd 設定為透過該節點遞減所造成的 level 增量，或如果操作類別未使用層級，則保留為零。將 restDatum 設定為等於 leafDatum，如果操作類別不會修改從一個層級到下一個層級的資料，否則將其設定為修改後的值，作為下一個層級的 leafDatum。

如果必須新增一個子節點，請將 resultType 設為 spgAddNode。將 nodeLabel 設為要使用的新節點標籤，並將 nodeN 設為要在節點陣列中插入節點的索引（從零開始）。新增節點後，choose 函式會再次呼叫，並提供修改過的內部元組；該呼叫應會產生 spgMatchNode 結果。

如果新值與元組前綴不一致，請將 resultType 設為 spgSplitTuple。此動作會將所有現有節點移至新的較低層級內部元組，並將現有內部元組替換為具有單一下載連結指向新的較低層級內部元組的元組。設定 prefixHasPrefix 以指出新的上層元組是否應具有前綴，如果有的話，請將 prefixPrefixDatum 設為前綴值。此新的前綴值必須比原始值寬鬆許多，才能接受要編入索引的新值。將 prefixNNodes 設為新元組中所需的節點數，並將 prefixNodeLabels 設為包含其標籤的 palloc 陣列，或在不需要節點標籤時設為 NULL。請注意，新的上層元組的總大小不得大於它所取代的元組的總大小；這會限制新的前綴和新標籤的長度。將 childNodeN 設為將下載連結至新的較低層級內部元組的節點索引（從零開始）。設定 postfixHasPrefix 以指出新的較低層級內部元組是否應具有前綴，如果有的話，請將 postfixPrefixDatum 設為前綴值。這兩個前綴和下載連結節點標籤（如果有）的組合必須與原始前綴具有相同的意義，因為沒有機會變更移至新的較低層級元組的節點標籤，也無法變更任何子索引項目。分割節點後，choose 函式會再次呼叫，並提供替換的內部元組。如果 spgSplitTuple 動作未建立合適的節點，該呼叫可能會傳回 spgAddNode 結果。最後，choose 必須傳回 spgMatchNode，以允許插入降至下一層級。

picksplit

決定如何針對一組葉元組建立新的內部元組。

SQL 函數宣告必須如下所示

CREATE FUNCTION my_picksplit(internal, internal) RETURNS void ...

第一個引數是指向 spgPickSplitIn C 結構的指標，其中包含函式的輸入資料。第二個引數是指向 spgPickSplitOut C 結構的指標，函式必須使用結果資料填入該結構。

typedef struct spgPickSplitIn
{
    int         nTuples;        /* number of leaf tuples */
    Datum      *datums;         /* their datums (array of length nTuples) */
    int         level;          /* current level (counting from zero) */
} spgPickSplitIn;

typedef struct spgPickSplitOut
{
    bool        hasPrefix;      /* new inner tuple should have a prefix? */
    Datum       prefixDatum;    /* if so, its value */

    int         nNodes;         /* number of nodes for new inner tuple */
    Datum      *nodeLabels;     /* their labels (or NULL for no labels) */

    int        *mapTuplesToNodes;   /* node index for each leaf tuple */
    Datum      *leafTupleDatums;    /* datum to store in each new leaf tuple */
} spgPickSplitOut;

nTuples 是提供的葉元組數。 datums 是其資料值陣列，類型為 spgConfigOut.leafType。 level 是所有葉元組共有的目前層級，這將成為新的內部元組的層級。

設定 hasPrefix 以指示新的內部元組是否應該有前置詞，如果有的話，設定 prefixDatum 為前置詞值。設定 nNodes 以指示新的內部元組將包含的節點數，並將 nodeLabels 設定為其標籤值的陣列，或如果不需要節點標籤，則設定為 NULL。將 mapTuplesToNodes 設定為提供每個葉元組應指派到的節點索引（從零開始）的陣列。將 leafTupleDatums 設定為要儲存在新的葉元組中的值的陣列（如果運算子類別不會修改從一個層級到下一個層級的資料，這些值將與輸入 datums 相同）。請注意，picksplit 函式負責 palloc nodeLabels、mapTuplesToNodes 和 leafTupleDatums 陣列。

如果提供多個葉元組，預期 picksplit 函式會將它們分類到多個節點中；否則無法將葉元組拆分到多個頁面中，這是此操作的最終目的。因此，如果 picksplit 函式最後將所有葉元組都放在同一個節點中，核心 SP-GiST 程式碼將覆寫該決定，並產生一個內部元組，其中葉元組會隨機指派到幾個標籤相同的節點中。此類元組標記為 allTheSame，以表示已發生這種情況。choose 和 inner_consistent 函式必須妥善處理此類內部元組。請參閱第 69.4.3 節以取得更多資訊。

picksplit 只能套用在單一葉元組上，情況是 config 函式將 longValuesOK 設定為 true，且已提供大於一頁的輸入值。在這種情況下，操作的重點是剝離前置詞，並產生新的、較短的葉資料值。呼叫將重複進行，直到產生一個足夠短以放入一頁的葉資料為止。請參閱第 69.4.1 節以取得更多資訊。

inner_consistent

傳回在樹狀搜尋期間要追蹤的節點（分支）集合。

SQL 函數宣告必須如下所示

CREATE FUNCTION my_inner_consistent(internal, internal) RETURNS void ...

第一個參數是指向 spgInnerConsistentIn C 結構的指標，其中包含函式的輸入資料。第二個參數是指向 spgInnerConsistentOut C 結構的指標，函式必須使用結果資料填滿此結構。

typedef struct spgInnerConsistentIn
{
    ScanKey     scankeys;       /* array of operators and comparison values */
    ScanKey     orderbys;       /* array of ordering operators and comparison
                                 * values */
    int         nkeys;          /* length of scankeys array */
    int         norderbys;      /* length of orderbys array */

    Datum       reconstructedValue;     /* value reconstructed at parent */
    void       *traversalValue; /* opclass-specific traverse value */
    MemoryContext traversalMemoryContext;   /* put new traverse values here */
    int         level;          /* current level (counting from zero) */
    bool        returnData;     /* original data must be returned? */

    /* Data from current inner tuple */
    bool        allTheSame;     /* tuple is marked all-the-same? */
    bool        hasPrefix;      /* tuple has a prefix? */
    Datum       prefixDatum;    /* if so, the prefix value */
    int         nNodes;         /* number of nodes in the inner tuple */
    Datum      *nodeLabels;     /* node label values (NULL if none) */
} spgInnerConsistentIn;

typedef struct spgInnerConsistentOut
{
    int         nNodes;         /* number of child nodes to be visited */
    int        *nodeNumbers;    /* their indexes in the node array */
    int        *levelAdds;      /* increment level by this much for each */
    Datum      *reconstructedValues;    /* associated reconstructed values */
    void      **traversalValues;        /* opclass-specific traverse values */
    double    **distances;              /* associated distances */
} spgInnerConsistentOut;

The array scankeys, of length nkeys, describes the index search condition(s). These conditions are combined with AND — only index entries that satisfy all of them are interesting. (Note that nkeys = 0 implies that all index entries satisfy the query.) Usually the consistent function only cares about the sk_strategy and sk_argument fields of each array entry, which respectively give the indexable operator and comparison value. In particular it is not necessary to check sk_flags to see if the comparison value is NULL, because the SP-GiST core code will filter out such conditions. The array orderbys, of length norderbys, describes ordering operators (if any) in the same manner. reconstructedValue is the value reconstructed for the parent tuple; it is (Datum) 0 at the root level or if the inner_consistent function did not provide a value at the parent level. traversalValue is a pointer to any traverse data passed down from the previous call of inner_consistent on the parent index tuple, or NULL at the root level. traversalMemoryContext is the memory context in which to store output traverse values (see below). level is the current inner tuple's level, starting at zero for the root level. returnData is true if reconstructed data is required for this query; this will only be so if the config function asserted canReturnData. allTheSame is true if the current inner tuple is marked “all-the-same”; in this case all the nodes have the same label (if any) and so either all or none of them match the query (see Section 69.4.3). hasPrefix is true if the current inner tuple contains a prefix; if so, prefixDatum is its value. nNodes is the number of child nodes contained in the inner tuple, and nodeLabels is an array of their label values, or NULL if the nodes do not have labels.

nNodes 必須設定為搜尋需要拜訪的子節點數量，而 nodeNumbers 必須設定為其索引的陣列。如果運算子類別追蹤層級，請將 levelAdds 設定為下降到每個要拜訪節點時所需的層級遞增陣列。（這些遞增通常對所有節點都相同，但並非一定如此，因此使用陣列。）如果需要重建值，請將 reconstructedValues 設定為要拜訪的每個子節點重建值的陣列；否則，請將 reconstructedValues 保留為 NULL。假設重建值為 spgConfigOut.leafType 類型。（不過，由於核心系統除了可能複製之外不會對其執行任何操作，因此它們只要具有與 leafType 相同的 typlen 和 typbyval 屬性就已足夠。）如果執行已排序搜尋，請將 distances 設定為根據 orderbys 陣列的距離值陣列（距離最短的節點將優先處理）。否則，請保留為 NULL。如果希望將其他帶外資訊（“traverse 值”）傳遞到樹狀搜尋的較低層級，請將 traversalValues 設定為適當 traverse 值的陣列，每個要拜訪的子節點一個；否則，請將 traversalValues 保留為 NULL。請注意，inner_consistent 函式負責在目前的記憶體內容中 palloc nodeNumbers、levelAdds、distances、reconstructedValues 和 traversalValues 陣列。不過，traversalValues 陣列所指出的任何輸出 traverse 值都應該在 traversalMemoryContext 中配置。每個 traverse 值都必須是單一 palloc'd 塊。

leaf_consistent

如果葉狀體元組符合查詢，則傳回 true。

SQL 函數宣告必須如下所示

CREATE FUNCTION my_leaf_consistent(internal, internal) RETURNS bool ...

第一個引數是指向 spgLeafConsistentIn C 結構的指標，其中包含函式的輸入資料。第二個引數是指向 spgLeafConsistentOut C 結構的指標，函式必須使用結果資料填入該結構。

typedef struct spgLeafConsistentIn
{
    ScanKey     scankeys;       /* array of operators and comparison values */
    ScanKey     orderbys;       /* array of ordering operators and comparison
                                 * values */
    int         nkeys;          /* length of scankeys array */
    int         norderbys;      /* length of orderbys array */

    Datum       reconstructedValue;     /* value reconstructed at parent */
    void       *traversalValue; /* opclass-specific traverse value */
    int         level;          /* current level (counting from zero) */
    bool        returnData;     /* original data must be returned? */

    Datum       leafDatum;      /* datum in leaf tuple */
} spgLeafConsistentIn;

typedef struct spgLeafConsistentOut
{
    Datum       leafValue;        /* reconstructed original data, if any */
    bool        recheck;          /* set true if operator must be rechecked */
    bool        recheckDistances; /* set true if distances must be rechecked */
    double     *distances;        /* associated distances */
} spgLeafConsistentOut;

長度為 nkeys 的陣列 scankeys 描述索引搜尋條件。這些條件與 AND 結合，只有滿足所有條件的索引項目才能滿足查詢。（請注意，nkeys = 0 表示所有索引項目都滿足查詢。）通常，一致性函數只關心每個陣列項目的 sk_strategy 和 sk_argument 欄位，分別提供可索引運算子與比較值。特別地，不需要檢查 sk_flags 來查看比較值是否為 NULL，因為 SP-GiST 核心程式碼會過濾掉此類條件。長度為 norderbys 的陣列 orderbys 以相同的方式描述排序運算子。reconstructedValue 是為父層次組重建的值；在根層次或 inner_consistent 函數未在父層次提供值時，其為 (Datum) 0。traversalValue 是指向從 inner_consistent 在父索引組上執行前一次呼叫傳遞下來的任何橫向資料的指標，或在根層次為 NULL。level 是目前葉層次組的層次，從根層次的 0 開始。returnData 為 true，如果此查詢需要重建資料；只有當 config 函數斷言 canReturnData 時，才會如此。leafDatum 是 spgConfigOut.leafType 儲存在目前葉層次組中的金鑰值。

如果葉層次組符合查詢，函數必須傳回 true，否則傳回 false。在 true 的情況下，如果 returnData 為 true，則必須將 leafValue 設定為值（類型為 spgConfigIn.attType），最初提供給此葉層次組進行索引。此外，如果配對不確定，則可以將 recheck 設定為 true，因此必須將運算子重新套用至實際的堆組以驗證配對。如果執行排序搜尋，請根據 orderbys 陣列將 distances 設定為距離值陣列。否則，讓它保持 NULL。如果傳回的距離中至少有一個不是精確的，請將 recheckDistances 設定為 true。在這種情況下，執行器會在從堆組擷取組後計算精確距離，並在需要時重新排列組。

使用者定義的選用方法為

Datum compress(Datum in)

將資料項目轉換成適合於索引葉元組中實體儲存的格式。它接受 spgConfigIn 類型的值。attType 並傳回 spgConfigOut 類型的值。leafType。輸出值不得包含非線性 TOAST 指標。

注意：compress 方法僅套用於要儲存的值。一致的方法接收查詢 scankeys 不變，未透過 compress 轉換。

選項

定義一組控制運算子類別行為的使用者可見參數。

SQL 函數宣告必須如下所示

CREATE OR REPLACE FUNCTION my_options(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

這個函式傳遞指標給 local_relopts 結構，需要填入一組運算子類別特定的選項。可以使用 PG_HAS_OPCLASS_OPTIONS() 和 PG_GET_OPCLASS_OPTIONS() 巨集從其他支援函式存取這些選項。

由於 SP-GiST 中鍵的表示方式很靈活，因此它可能取決於使用者指定的參數。

所有 SP-GiST 支援方法通常在短暫的記憶體內容中被呼叫；也就是說，CurrentMemoryContext 會在處理每個元組後重設。因此，不必太擔心 pfree 掉您 palloc 的所有內容。（config 方法為例外：它應該避免記憶體外洩。但通常 config 方法不需要做任何事，只要將常數指定到傳遞的參數結構中即可。）

如果索引欄位為可整理資料類型，索引整理會使用標準 PG_GET_COLLATION() 機制傳遞給所有支援方法。

上一頁	向上	下一頁
69.2. 內建運算子類別	首頁	69.4. 實作

提交修正

如果您在文件檔中看到任何不正確、與您對特定功能的體驗不符，或需要進一步說明的內容，請使用此表單回報文件檔問題。