Structure

GUID Partition Table

Read more about GUID Partition Table

/ɡaɪd pɑːrˈtɪʃən ˈteɪbəl/

noun — "modern disk partitioning standard with large capacity support."

GUID Partition Table, often abbreviated GPT, is a modern partitioning scheme for storage devices that overcomes the limitations of the legacy MBR system. It supports disks larger than 2 TB, allows for virtually unlimited partitions (commonly 128 in practice), and includes redundancy and checksums to improve data integrity. GPT is part of the UEFI (Unified Extensible Firmware Interface) standard and is widely used in contemporary BIOS- and UEFI-based systems.

Technically, a GUID Partition Table stores partition information in a globally unique identifier (GUID) format. Each partition has a unique 128-bit GUID, a starting and ending LBA (Logical Block Address), a partition type GUID, and attribute flags. GPT structures also include a protective MBR at the first sector to prevent legacy tools from misidentifying the disk as unpartitioned.

A GPT disk layout typically consists of:

Protective MBR: The first sector contains a standard MBR with a single partition entry spanning the entire disk, safeguarding GPT data from legacy tools.
Primary GPT Header: Located at LBA 1, it contains the size and location of the partition table, disk GUID, and CRC32 checksum for header validation.
Partition Entries: Immediately following the primary header, an array of partition entries (default 128) stores GUIDs, start/end LBAs, and attributes.
Backup GPT Header and Partition Table: Located at the end of the disk, ensuring recoverability if the primary structures are corrupted.

Workflow example: when a system boots or mounts a GPT disk, the firmware or operating system reads the primary GPT header to locate the partition table. Each partition is identified via its GUID, and the OS uses this information to mount filesystems or prepare volumes for use. In case of corruption, the backup GPT header at the disk’s end can restore partition information, providing resilience absent in traditional MBR disks.

Practical usage includes modern operating systems requiring large disks, multi-boot configurations, and environments needing improved partition integrity checks. GPT enables flexible partitioning schemes for servers, workstations, and personal computers, while supporting advanced features like EFI system partitions and hybrid MBR/GPT setups for backward compatibility.

Conceptually, a GUID Partition Table is like a meticulously labeled map of a library: each section (partition) has a unique identifier, boundaries are precisely defined, and backup copies exist to prevent loss, ensuring efficient and reliable access to stored information.

See MBR, UEFI, Disk Partitioning.

Master Boot Record

Read more about Master Boot Record

/ˌɛm biː ˈɑːr/

noun — "first sector of a storage device containing boot information."

MBR, short for Master Boot Record, is the first sector of a storage device, such as a hard disk or solid-state drive, that contains essential information for bootstrapping an operating system and managing disk partitions. It occupies the first 512 bytes of the device and serves as a foundational structure for legacy BIOS-based systems, providing both executable boot code and a partition table.

Technically, the MBR is divided into three primary components:

Boot Code: The first 446 bytes store executable machine code that the BIOS executes during system startup. This code locates an active partition and transfers control to its volume boot record, initiating the operating system boot process.
Partition Table: The next 64 bytes contain up to four partition entries, each specifying the start sector, size, type, and bootable status of a partition. This defines the logical layout of the disk for the operating system and bootloader.
Boot Signature: The final 2 bytes, usually 0x55AA, signal to the BIOS that the sector is a valid bootable MBR.

In workflow terms, when a BIOS-based computer powers on, the system firmware reads the MBR from the first sector of the storage device. The boot code executes, scans the partition table for the active partition, and jumps to the partition’s volume boot record. This process transfers control to the operating system loader, ultimately starting the OS.

A minimal illustration of an MBR structure:


+------------------------+
| Boot Code (446 bytes)   |
+------------------------+
| Partition Table (64 B)  |
|   - Partition 1          |
|   - Partition 2          |
|   - Partition 3          |
|   - Partition 4          |
+------------------------+
| Boot Signature (2 B)    |
+------------------------+

The MBR has limitations, such as supporting only four primary partitions and disks up to 2 TB in size. Modern systems often use the GUID Partition Table to overcome these constraints, offering more partitions and larger disk support while retaining backward compatibility in some cases.

Conceptually, the MBR acts like a table of contents and starting key for a book: it tells the system where each chapter (partition) begins and provides the initial instructions to start reading (boot code), enabling the system to access and load the operating system efficiently.

See GUID Partition Table, Partition Table, BIOS.

Master File Table

Read more about Master File Table

/ˈmɑːstər faɪl ˈteɪbəl/

noun — "central table of file metadata."

Master File Table (MFT) is the core metadata repository used by the NTFS file system to manage all files and directories on a storage volume. Each file, directory, or system object has a corresponding MFT entry containing critical information, including file name, security descriptors, timestamps, attributes, data location pointers, and size. The MFT allows NTFS to efficiently locate, read, and modify files while maintaining data integrity, access control, and journaling consistency.

Technically, an MFT entry is a structured record typically 1 kilobyte in size. Small files may store their contents directly within the MFT entry, called resident files, while larger files use non-resident entries that point to clusters on the disk where the actual data resides. Attributes include standard information (timestamps, permissions), filename attributes (long names, multiple names), security descriptors for access control, and optional extended attributes. NTFS also maintains special MFT entries for the volume itself, the bitmap tracking free clusters, and the log file for journaling purposes.

In workflow terms, consider creating a document in Windows. NTFS allocates an MFT entry for the file, updates the directory listing, and records the allocated clusters in the entry. Reading the file involves querying the MFT to locate its clusters and applying access controls before retrieving data. Operations such as moving, renaming, or deleting a file are performed by updating the MFT entries, minimizing disk movement and supporting fast metadata access.

A simplified pseudocode example of MFT interaction:

// Pseudocode illustrating MFT entry lookup
entry = MFT.getEntry("C:\\Documents\\report.txt")
if entry.exists():
    data = readClusters(entry.clusterPointers)
    print(data)

MFT also interacts closely with the NTFS journaling system. Changes to entries are logged before committing to the disk to enable recovery in case of system crashes, ensuring ACID-like durability for metadata. Because the MFT is central to all file operations, it is typically stored in a reserved area at the beginning of the volume and may include a mirror copy for redundancy.

Conceptually, the Master File Table is like a library’s main card catalog: each card (MFT entry) contains not just the book’s title but its physical shelf location, borrowing rules, and historical notes, allowing librarians to quickly locate, track, and manage every book efficiently.

See NTFS, Journaling, FileSystem.

Metadata

B+tree

Read more about B+tree

/biː-plʌs-triː/

noun — "optimized B-tree variant for database indexing."

B+ tree is an extension of the B-tree data structure designed to optimize range queries, sequential access, and storage utilization in database systems and file systems. Unlike standard B-trees, all actual data records in a B+ tree reside in the leaf nodes, while internal nodes store only keys for routing searches. Leaf nodes are linked sequentially, forming a linked list that enables efficient in-order traversal and range scans, which are critical for queries that retrieve multiple contiguous records.

Technically, a B+ tree of order m has internal nodes containing up to m–1 keys and m child pointers, just like a standard B-tree. Leaf nodes contain the actual data entries along with pointers to the next leaf node, providing a natural sequence for range-based operations. When insertion causes a leaf or internal node to overflow, the node splits, and the median key is promoted to the parent, maintaining balance and logarithmic search performance. Deletion may trigger node merging or redistribution to prevent underflow while preserving the tree’s structure.

In workflow terms, consider a relational database storing transaction records indexed by transaction_date. A query requesting all transactions between two dates benefits from the B+ tree structure: the search quickly navigates to the starting leaf node and then follows the linked leaf nodes sequentially, retrieving all matching records efficiently without additional tree traversal.

A simplified code example of a B+ tree leaf traversal for a range query in pseudocode:

function rangeQueryBPlusTree(startKey, endKey):
    node = searchLeaf(root, startKey)
    results = []
    while node is not null and node.keys[0] <= endKey:
        for i in 0..node.numKeys-1:
            if startKey <= node.keys[i] <= endKey:
                results.append(node.values[i])
        node = node.nextLeaf
    return results

This demonstrates how the sequential leaf linking allows contiguous data retrieval efficiently, a major advantage over standard B-trees for range queries and full scans.

B+ trees are widely used in modern databases (MySQL InnoDB, PostgreSQL), file systems (NTFS, ext4), and key-value stores, providing predictable O(logn) search, insertion, and deletion times while enabling high-performance sequential and range operations. Their separation of routing keys from data entries and linked leaf nodes makes them particularly suitable for disk-based storage, where minimizing disk I/O is critical.

Conceptually, a B+ tree is like a multi-tiered filing system where each internal folder contains only labels directing you to subfolders, and all actual documents are stored in sequentially linked filing cabinets at the bottom level, making both precise lookups and batch reads efficient.

See B-tree, Index, Database.

Organization

Database

B-tree

Read more about B-tree

/biː-triː/

noun — "balanced tree for efficient data retrieval."

B-tree is a self-balancing tree data structure commonly used in databases and file systems to maintain sorted data and enable efficient insertion, deletion, and search operations. It is designed to minimize disk access and optimize storage of large datasets by keeping nodes partially filled, thereby reducing the height of the tree and the number of I/O operations required to locate an element. B-trees are a cornerstone in indexing, providing logarithmic-time complexity for lookups, inserts, and deletions even in massive datasets.

Technically, a B-tree of order m has nodes that may contain up to m–1 keys and m children. All leaf nodes reside at the same depth, ensuring balanced structure. Internal nodes store keys that act as separation values, guiding searches toward the correct subtree. When a node exceeds its capacity, it splits, propagating keys upward; conversely, underflow during deletion may trigger merging or redistribution to maintain balance. This structure allows B-trees to handle dynamic datasets efficiently, making them ideal for database indexes and file system directories where read/write operations must be optimized.

In workflow terms, consider a relational database using a B-tree to index a customer table on the customer_id column. When a new customer is added, the B-tree ensures the customer_id is inserted at the correct position while maintaining balanced nodes. When querying a customer, the tree guides the search through internal nodes, locating the record with minimal disk accesses, even if the table contains millions of entries.

For a simplified code example demonstrating a B-tree search in pseudocode:

function searchBTree(node, key):
    if node is null:
        return null
    for i in 0..node.numKeys:
        if key == node.keys[i]:
            return node.values[i]
        else if key < node.keys[i]:
            return searchBTree(node.children[i], key)
    return searchBTree(node.children[node.numKeys], key)

This illustrates how a B-tree navigates through internal nodes and children to locate the desired key efficiently without scanning the entire dataset.

Advanced variations include B+ trees, which store all actual data in leaf nodes and maintain internal nodes as keys for routing, and B* trees, which optimize node utilization and reduce splits. B-trees and their derivatives underpin database indexing strategies, file systems like NTFS and ext4, and key-value storage engines, enabling high-performance retrieval and updates on large datasets.

Conceptually, a B-tree is like a multi-level library index: each shelf lists references that direct the reader to the next level, ultimately reaching the exact book with minimal walking between shelves.

See Index, Database, B+ tree.

Organization

Database

Index

Read more about Index

/ˈɪn.deks/

noun — "data structure for fast lookup."

Index is a specialized data structure used in computing and database systems to improve the speed and efficiency of data retrieval operations. It functions as a roadmap or table of contents, allowing a system to quickly locate the position of a desired item without scanning the entire dataset. Indexes are essential in relational and non-relational databases, search engines, file systems, and large-scale storage systems, where rapid access to specific records is critical.

Technically, an index stores key-value pairs or references that map a search key (such as a column value) to the physical location of the corresponding data. Common implementations include B-trees, B+ trees, hash tables, inverted indexes, and bitmaps, each optimized for different query types, data distributions, and performance characteristics. Indexes may be clustered, where the data rows are physically ordered according to the index, or non-clustered, where the index maintains separate pointers to the data. Multiple indexes can coexist on a dataset to support diverse access patterns.

In workflow terms, consider a relational database storing millions of customer records. Without an index on the email field, a query searching for a specific email would require a full table scan, checking each row sequentially. By creating an index on the email column, the database engine can quickly locate the desired row using the index structure, dramatically reducing query latency. Similarly, search engines build inverted indexes that map keywords to document locations, enabling rapid retrieval of relevant pages in response to user queries.

For a concrete example, a simple index on an array of integers in code could be represented as:

data = [34, 7, 23, 32, 5, 62]
index = { value: position for position, value in enumerate(data) }
# index lookup
position_of_23 = index[23]  # returns 2

This demonstrates how an index allows immediate access to the position of a value without scanning the entire array.

Indexes also store auxiliary information, such as minimum and maximum values, counts, or aggregated statistics, to accelerate query operations. Maintaining indexes incurs storage and update overhead, as each insertion, deletion, or modification of data requires updating the corresponding indexes. Database designers balance read performance against write overhead, selecting indexes carefully based on workload patterns.

Advanced indexing strategies include partial indexes, covering indexes, multi-column indexes, and spatial indexes for specialized data types like geolocation coordinates. In distributed databases and data warehouses, indexes support query planners in generating efficient execution strategies, while replication and partitioning ensure durability and availability.

Conceptually, an index is like a library catalog: instead of scanning every book on the shelves, a reader consults the catalog to immediately locate the desired book by author, title, or subject, enabling rapid and precise access to information.

See Database, Query, B-tree.

Database

Queue

Read more about Queue

/kjuː/

noun — "ordered collection for sequential processing."

Queue is an abstract data structure that stores a sequence of elements in a specific order for processing. The most common ordering principle is FIFO (First In, First Out), though variations like priority queues may alter the processing sequence. A queue ensures that elements are handled systematically, supporting predictable workflows and task management in computing systems.

Technically, a queue supports at least two core operations: enqueue, which adds an element to the back of the queue, and dequeue, which removes an element from the front. Additional operations may include peeking at the front element without removing it, checking size, or verifying emptiness. Queues are implemented using arrays, linked lists, or ring buffers, and are widely used in operating system scheduling, network packet management, and asynchronous task handling.

In workflow terms, a print server maintains a queue of print jobs: documents submitted first are printed first, ensuring fairness and order. In network systems, packets entering a router may be queued for processing to prevent congestion and maintain sequence integrity.

Conceptually, a queue is like a line of people waiting at a service counter: each person is served in the order they arrived, maintaining orderly and predictable progression.

See FIFO, LIFO, Stack.

Processing

First In, First Out

Read more about First In, First Out

/ˈfiː.foʊ/

noun — "first item in, first item out."

FIFO, short for First In, First Out, is a data handling or storage method in which the earliest added item is the first to be removed. This ordering principle is widely used in queues, memory buffers, and inventory accounting, ensuring that items are processed in the same order they were received.

Technically, a FIFO queue supports two primary operations: enqueue (adding an item to the back) and dequeue (removing the item from the front). This ordering guarantees that elements are processed sequentially and no item is skipped or reordered. In computing, FIFO structures are used for task scheduling, buffering in I/O operations, and inter-process communication.

In workflow terms, consider a line of customers at a checkout counter: the first person to arrive is the first served. In computing, network packets may be queued in a FIFO buffer so that the oldest packet is transmitted first, preventing starvation of early data.

Conceptually, FIFO acts like a conveyor belt: items enter at one end and exit in the exact order they arrived, preserving temporal sequence and fairness.

See Queue, LIFO, Memory.

Concept

Last In, First Out

Read more about Last In, First Out

/ˈlaɪ.foʊ/

noun — "last item in, first item out."

LIFO, short for Last In, First Out, is a data handling or storage method in which the most recently added item is the first to be removed. This ordering principle is used in stacks, memory management, and certain inventory accounting practices, ensuring that the latest entries are processed before earlier ones.

Technically, a LIFO stack supports two primary operations: push (adding an item to the top) and pop (removing the item from the top). No element below the top can be removed until the top element is processed, preserving the strict ordering. In programming, stacks implemented with arrays or linked lists commonly use this principle for function call management, expression evaluation, and undo operations.

In workflow terms, consider a stack of plates: the last plate placed on top is the first one you remove. In computing, when a function calls another function, the return address and local variables are stored on the call stack using LIFO, ensuring proper execution flow and return sequencing.

Conceptually, LIFO acts like a stack of boxes: you can only remove the one on top, leaving the earlier ones untouched until the top layers are cleared.

See Stack, FIFO, Memory.

Concept