Index
Introduction
If there are lots of row on database table, it is time-wasting to get the value from the row, and scan the value of row one by one
Index is set on primary key by default
When user needs to search the result frequency by entering some parameters or fields, we need to create the index that we will searching on
Primary(Clustered) Index
Introduction
A
clustered index
is collocated with the data in the same table space or same disk file. It will be created when primary key is created on the columnData blocks are all the time moved around here & there by the OS whenever it’s necessary. A database system does not have any absolute control over how physical data space is managed, but inside a data block, records can be stored or managed in the logical order of the index key.
Don’t need to care about actually organizing the physical record in a certain order, rather a small index section is maintained in that order & fetching or maintaining records becomes very easy.
Structure
An index is usually maintained as a B+ Tree on disk & in-memory, and any index is stored in blocks on disk.
The leaf index block of the index contains a row locator (A pointer to the block). For the primary index, the row locator refers to virtual address of the corresponding physical location of the data blocks on disk where rows reside being sorted as per the index key.
Secondary Index
Introduction
Any index other than a clustered index is called a secondary index. Secondary indices does not impact physical storage locations unlike primary indices.
Secondary indices are needed on these columns if the frequency of such queries is very high.
Structure
Secondary index is also maintained in the B+ Tree and it’s sorted as per the key on which the index was created. The leaf nodes contain a copy of the key of the corresponding data in the primary index.
Retrieving data through the secondary index means you have to traverse two B+ trees — one is the secondary index B+ tree itself, and the other is the primary index B+ tree. The secondary index points to primary index
if you delete a primary index, all secondary indices have to be updated to contain a copy of the new primary index key
Unique Key Index
CREATE UNIQUE INDEX unique_idx_1 ON index_demo (pan_no);
Like primary keys, unique keys can also identify records uniquely with one difference — the unique key column can contain
null
values.
Composite Index
CREATE INDEX composite_index_1 ON index_demo (phone_no, name, age);
CREATE INDEX composite_index_2 ON index_demo (pan_no, name, age);
Let’s say we have an index defined on 4 columns —
col1
,col2
,col3
,col4
. With a composite index, we have search capability oncol1
,(col1, col2)
,(col1, col2, col3)
,(col1, col2, col3, col4)
we can’t omit a column from the middle & use that like —
(col1, col3)
or(col1, col2, col4)
orcol3
orcol4
etcWhile deciding the columns for a composite index, you can analyze different use cases of your system & try to come up with the order of columns that will benefit most of your use cases.
Partial Index
CREATE INDEX secondary_index_1 ON index_demo (name(4));
The column
name
can contain large values of any length. Also in the index, the row locators’ or row pointers’ metadata have their own size.it’s possible to create an index on the first few bytes of data as well. Example: the following command creates an index on the first 4 bytes of name. Though this method reduces memory overhead by a certain amount
Cardinality
It refers that the uniqueness of the result get by searching by the index. Fewer results get by searching the index and the bigger difference between each result means higher cardinality.
Setting low cardinality index, such as gender will decrease the performance of the query, as from the concept of b-tree geometry, index containing numbers of value will damage the performance, and the time complexity will be near to O(n)
At the opposite side, setting high cardinality index, the time complexity will be near to O(1), as the index is nearly unique.
The query optimizer will decide whether using index or not on searching based on cardinality
Only one index per table per query will be used by optimizer based on cardinality
Advantages
Help us to increase the speed of searching data
With index Scan
Index Scan:
Index Block → Specific Data Block → Row
[Only reads relevant blocks]
Without index scan
Sequential Scan:
Block 1 → Block 2 → Block 3 → Block 4 → Block 5
[Check every row in each block]
Drawbacks
With
DML
operations, indices are updated, so write operations are quite costly with indexes. The more indices you have, the greater the cost. Indexes are used to make read operations faster.indices consume extra memory
Index algo
B-tree
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT
);
CREATE INDEX idx_email ON users(email);
INSERT INTO users (email) VALUES
('alice@mail.com'), -- id=1
('bob@mail.com'), -- id=2
('charlie@mail.com'), -- id=3
('david@mail.com'); -- id=4
Select * from "users" where "email" = 'bob@mail.com';
Initial structure (after first two inserts):
'bob@mail.com'
/
'alice@mail.com'
After third insert:
'bob@mail.com'
/ \
'alice@mail.com' 'charlie@mail.com'
After fourth insert (rebalancing):
'charlie@mail.com' -- Becomes root for better balance
/ \
'bob@mail.com' 'david@mail.com'
/
'alice@mail.com'
The B-tree traversal:
Start at root ('charlie@mail.com')
'bob' < 'charlie' → go left
Find 'bob@mail.com' → get row data
B-tree is better for finding exact values or ranges (The equality)
Gin
CREATE TABLE products (
id SERIAL PRIMARY KEY,
tags TEXT[]
);
CREATE INDEX idx_tags ON products USING GIN (tags);
INSERT INTO products (tags) VALUES
(ARRAY['electronics', 'laptop']),
(ARRAY['electronics', 'phone']),
(ARRAY['clothing', 'shirt']);
-- Find products with 'electronics' tag
SELECT * FROM products WHERE tags @> ARRAY['electronics']::TEXT[];
Entry Tree:
'clothing' → [Row3]
'electronics'→ [Row1, Row2]
'laptop' → [Row1]
'phone' → [Row2]
'shirt' → [Row3]
Each entry points to multiple rows:
'electronics' points to:
- Row(id=1, tags=['electronics', 'laptop'])
- Row(id=2, tags=['electronics', 'phone'])
The GIN traversal:
Look up 'electronics' in entry tree
Get list of matching rows [Row1, Row2]
Return those rows
Good for array column and find for containment
References
Last updated
Was this helpful?