Format | CΛTΞИCOΔΞ

File Allocation Table 16

Read more about File Allocation Table 16

/ˌfæt ˈsɪksˌtiːn/

noun — "legacy File Allocation Table filesystem."

FAT16, short for File Allocation Table 16, is a legacy filesystem that organizes data on block-based storage devices using a 16-bit cluster addressing scheme. It was widely used in early personal computers and embedded devices due to its simplicity, low overhead, and compatibility across operating systems and firmware environments.

Technically, FAT16 divides a storage volume into fixed-size clusters, each representing a number of logical blocks. The core structure is the File Allocation Table, which tracks the allocation of clusters. Each 16-bit entry in the table either points to the next cluster in a file chain, marks the end-of-file, or indicates a free cluster. This linear mapping allows software to access files without knowledge of the physical layout of the disk.

The typical layout of a FAT16 volume includes a reserved area at the beginning containing boot and filesystem metadata, followed by one or more copies of the File Allocation Table, and then a data region containing directories and file contents. Directory entries store filenames, timestamps, attributes, and starting cluster references for files. The simplicity of this design ensures broad compatibility but lacks advanced features such as journaling, access control, or large file support.

Operationally, writing a file on FAT16 involves allocating free clusters, updating the File Allocation Table to form a chain, and writing data into the clusters. Reading a file requires following this chain sequentially. Deleting a file marks clusters as free but does not erase the content immediately, which allows recovery but introduces security considerations. Performance can degrade on large volumes due to linear table searches and fragmented cluster chains.

Constraints of FAT16 include a maximum file size of 2 gigabytes minus 1 byte and a volume size limit of approximately 4 gigabytes, depending on cluster size. These limits stem from the 16-bit addressing in the File Allocation Table, restricting the number of addressable clusters. Partitioning schemes and operating systems must account for these limitations when using FAT16.

In practice, FAT16 was commonly used for floppy disks, early hard drives, and memory cards. It allowed multiple operating systems to access the same volume, making it ideal for cross-platform file sharing and boot media. File system drivers map logical file offsets to cluster chains, which then translate to physical addresses using LBA. Partitioning defines which clusters belong to which volume, isolating different datasets.

Example of cluster chaining in FAT16:


File start cluster: 3
FAT[3]  = 7
FAT[7]  = 9
FAT[9]  = EOF

This indicates the file occupies clusters 3, 7, and 9, which may not be physically contiguous but are logically linked via the File Allocation Table.

Conceptually, FAT16 works like a simple ledger recording which storage boxes belong to which item. Its simplicity allows widespread use, but as storage capacity increases, the ledger becomes inadequate for modern needs.

See FAT32, FileSystem, LBA, Disk Partitioning.

Format

Storage

File Allocation Table 32

Read more about File Allocation Table 32

/ˌfæt θɜːrtiˈtuː/

noun — "widely compatible file allocation table filesystem."

FAT32, short for File Allocation Table 32, is a disk filesystem designed to organize, store, and retrieve files on block-based storage devices using a table-driven allocation model. It represents an evolution of earlier FAT variants and is defined by its use of 32-bit cluster addressing, allowing larger volumes and files than its predecessors while maintaining broad hardware and software compatibility.

Technically, FAT32 structures a storage volume into fixed-size allocation units called clusters. Each cluster consists of one or more logical blocks addressed using LBA (Logical Block Addressing). The core data structure is the File Allocation Table itself, which maps each cluster to either the next cluster in a file chain, an end-of-file marker, or a free-space indicator. This table allows the filesystem to track how files are physically laid out across non-contiguous regions of disk.

The filesystem layout of FAT32 includes several well-defined regions. A reserved area at the beginning of the volume contains boot and filesystem metadata. Following this is one or more copies of the File Allocation Table for redundancy. The data region occupies the remainder of the disk and contains directory entries and file contents stored as chains of clusters. Directory entries hold metadata such as filenames, timestamps, attributes, and the starting cluster of each file.

One defining characteristic of FAT32 is its simplicity. The filesystem does not implement journaling, access control lists, or advanced metadata structures. This design minimizes overhead and makes implementation straightforward, which is why FAT32 is supported by firmware, operating systems, and embedded devices across decades of hardware evolution. However, this simplicity also means reduced resilience against unexpected power loss or corruption.

Operationally, when a file is written, the filesystem allocates free clusters and records their sequence in the File Allocation Table. Reading the file requires following this cluster chain from start to end. Deleting a file marks its clusters as available but does not immediately erase the data, which has implications for data recovery and security. Allocation and lookup operations are linear in nature, which can affect performance on very large volumes.

There are important technical constraints associated with FAT32. Individual files are limited to a maximum size of 4 gigabytes minus 1 byte. Volume size is bounded by cluster size and addressable cluster count, with practical limits typically around 2 terabytes depending on implementation. These limits stem from the filesystem’s on-disk structures and addressing model rather than from storage hardware capabilities.

In real-world workflows, FAT32 is commonly used for removable media such as USB flash drives, memory cards, and external storage intended for cross-platform use. Operating systems map file offsets to cluster chains, convert those to logical block addresses, and issue read or write requests through storage drivers. Firmware environments, including bootloaders and system initialization code, often rely on FAT32 because of its predictable structure and minimal requirements.

FAT32 interacts closely with other system layers. Disk partitioning schemes define the logical block ranges that contain the filesystem. Firmware such as BIOS and UEFI can parse FAT32 volumes directly to locate boot files. Operating systems expose the filesystem through standard file APIs while internally managing allocation, caching, and consistency. Despite its age, FAT32 remains relevant due to this deep integration.

The following simplified conceptual example illustrates cluster chaining in FAT32:


File start cluster: 5
FAT[5]  = 8
FAT[8]  = 12
FAT[12] = EOF

This chain indicates that the file occupies clusters 5, 8, and 12 in sequence, even if those clusters are physically scattered across the disk.

Conceptually, FAT32 behaves like a handwritten index at the front of a notebook that lists which pages belong to each topic. The index is easy to read and update, works in many contexts, and requires no specialized tools, but it becomes inefficient and fragile as the notebook grows larger and more complex.

See FileSystem, LBA, Disk Partitioning, NTFS.

Format

Storage

FileSystem

Read more about FileSystem

/ˈfaɪl ˌsɪstəm/

noun — "organizes storage for data access."

FileSystem is a software and data structure layer that manages how data is stored, retrieved, and organized on storage devices such as hard drives, SSDs, or networked storage. It provides a logical interface for users and applications to interact with files and directories while translating these operations into the physical layout on the storage medium. A file system determines how files are named, how metadata is maintained, how storage space is allocated, and how access permissions are enforced.

Technically, a FileSystem maintains hierarchical structures, commonly directories and subdirectories, with files as leaf nodes. Metadata such as file size, timestamps, permissions, and pointers to physical storage locations are stored in tables, nodes, or inodes depending on the file system design. Common file system types include FAT, FAT32, NTFS, ext4, HFS+, APFS, and XFS, each with optimizations for performance, reliability, concurrency, and scalability. Many file systems implement journaling or transaction logging to protect against corruption from crashes or power failures.

In workflow terms, consider creating a document on a computer. The operating system requests the FileSystem to allocate storage clusters or blocks, update metadata records, and maintain the directory entry. When reading the file, the FileSystem locates the clusters, retrieves the content, and checks permissions. This abstraction ensures that applications do not need to manage the physical layout of bytes on disk, allowing uniform access across different storage devices.

A simplified code example demonstrating file operations through a file system interface:

// Pseudocode for file system usage
fs.createDirectory("/projects")
fileHandle = fs.createFile("/projects/report.txt")
fs.write(fileHandle, "Quarterly project report")
content = fs.read(fileHandle)
print(content)  # outputs: Quarterly project report

Advanced file systems support features such as file compression, encryption, snapshots, quotas, and distributed storage across multiple nodes or devices. They often provide caching layers to improve read/write performance and support concurrency control for multi-user access. Distributed and networked file systems like NFS, SMB, or Ceph implement additional protocols to maintain consistency, availability, and fault tolerance across multiple machines.

Conceptually, a FileSystem is like a library with organized shelves, cataloged books, and an indexing system. Patrons and librarians can store, retrieve, and manage materials without needing to know the physical arrangement of every book, while metadata and logs ensure order and integrity are maintained.

See NTFS, Master File Table, Journaling.

Format

Storage

New Technology File System

Read more about New Technology File System

/ˌɛn.tiːˈɛfˈɛs/

noun — "robust Windows file system."

NTFS, short for New Technology File System, is a proprietary file system developed by Microsoft for Windows operating systems to provide high reliability, scalability, and advanced features beyond those of FAT and FAT32. NTFS organizes data on storage devices using a structured format that supports large files, large volumes, permissions, metadata, and transactional integrity, making it suitable for modern computing environments including desktops, servers, and enterprise storage systems.

Technically, NTFS uses a Master File Table (MFT) to store metadata about every file and directory. Each entry in the MFT contains attributes such as file name, security descriptors, timestamps, data location, and access control information. NTFS supports features like file-level encryption (Encrypting File System, EFS), compression, disk quotas, sparse files, and journaling to track changes for recovery. The file system divides storage into clusters, and files can span multiple clusters, with internal structures managing fragmentation efficiently.

In workflow terms, consider a Windows server hosting multiple user accounts. When a user creates or modifies a document, NTFS updates the MFT entry for that file, maintains access permissions, and optionally logs the change in the NTFS journal. This ensures that in case of a system crash or power failure, the file system can quickly recover and maintain data integrity. Search operations, backup utilities, and security audits rely on NTFS metadata and indexing to operate efficiently.

A simplified example showing file creation and reading from NTFS in pseudocode could be:

// Pseudocode illustrating NTFS file operations
fileHandle = NTFS.createFile("C:\\Documents\\report.txt")
NTFS.write(fileHandle, "Quarterly report data")
data = NTFS.read(fileHandle)
print(data)  # outputs: Quarterly report data

NTFS also supports advanced features for enterprise environments, including transactional file operations via the Transactional NTFS (TxF) API, hard links, reparse points, and integration with Active Directory for access control management. It allows reliable storage of large volumes and files exceeding 16 exabytes theoretically, with practical limits imposed by Windows versions and cluster sizes. NTFS’s journaling mechanism tracks metadata changes to reduce file system corruption risks and enables efficient recovery processes.

Conceptually, NTFS is like a highly organized library catalog with a detailed ledger for every book. Each entry tracks not just the book’s location, but access permissions, history of changes, and cross-references, enabling both rapid access and resilience against damage.

See File System, Master File Table, Journaling.

Format

Storage

Video Codec

Read more about Video Codec

/ˈvɪdi.oʊ ˈkoʊdɛk/

noun — "algorithm for compressing and decompressing digital video."

Video Codec is a software or hardware component that encodes (compresses) and decodes (decompresses) digital video streams. The primary purpose of a video codec is to reduce the size of video data for storage or transmission while preserving acceptable visual quality. Compression is typically lossy, meaning some information is discarded to achieve higher efficiency, though some codecs support lossless compression for specialized applications.

Encoding involves transforming raw video frames into a compressed format using algorithms that exploit spatial and temporal redundancy. Common techniques include motion compensation, transform coding (e.g., discrete cosine transform), and quantization. Decoding reverses this process, reconstructing the video frames for playback. Video codecs operate in conjunction with container formats, such as MP4 or MKV, which organize encoded streams and metadata.

Modern video codecs include H.264 (AVC), H.265 (HEVC), VP9, and AV1. These codecs differ in compression efficiency, computational complexity, and support for features like high dynamic range (HDR), variable frame rates, or hardware acceleration. Encoding and decoding may be performed by a GPU or a dedicated hardware encoder/decoder for real-time performance.

In practical workflows, a content creator records raw footage, which is then encoded with a video codec into a compressed format suitable for streaming or storage. The client device receives the encoded stream, decodes it, and renders frames in sequence. Efficient codecs allow high-resolution video to be transmitted over limited bandwidth while maintaining playback smoothness.

Buffering is closely related, as decoded frames are often temporarily held in memory to accommodate network jitter or processing delays. Adaptive streaming systems monitor buffer levels and dynamically adjust the encoded bitrate to maintain continuity.

Conceptually, a video codec acts as a translator between raw visual data and efficient, transportable digital streams. It allows high-quality video to flow across networks and devices without overwhelming storage or bandwidth.

See Streaming, GPU, Buffering.

Format

Compression

Media

BVH

Read more about BVH

/ˌbiː viː ˈeɪtʃ/

n. "Tree-structured spatial index organizing primitives within nested bounding volumes accelerating ray-primitive intersection unlike flat triangle lists."

BVH, short for Bounding Volume Hierarchy, recursively partitions scene geometry into tight-fitting AABB containers—RTX GPUs traverse top-down skipping entire subtrees when ray misses parent bounds, reducing 10M-triangle scenes to <100 ray-triangle tests per pixel. SAH cost function optimizes splits minimizing expected traversal cost C=Ci+Ca*(1-p)+Cb*p where p measures primitive probability; contrasts k-d trees by object-centered partitioning immune to empty-space waste.

Key characteristics of BVH include: AABB/OBB Containers axis-aligned or oriented boxes per node; SAH Optimization Surface Area Heuristic guides median/split selection; Top-Down Traversal ray skips non-intersecting subtrees; Refit Updates dynamic scenes rebuild leaf bounds only; LBVH Linear construction via Morton codes for GPU parallelism.

Conceptual example of BVH usage:

// BVH node structure for ray tracer
struct BVHNode {
    AABB bounds;              // Node bounding volume
    int left, right;          // Child indices (-1=leaf)
    int prim_start, prim_count; // Leaf primitive range
    float sah_cost;           // Cached SAH metric
};

void build_bvh(std::vector<Triangle>& tris, BVHNode* nodes, int node_idx) {
    BVHNode& node = nodes[node_idx];
    
    if (tris.size() <= 4) {  // Leaf threshold
        node.prim_start = prim_offset;
        node.prim_count = tris.size();
        node.bounds = compute_leaf_bounds(tris);
        return;
    }
    
    // SAH split: try median along longest axis
    int split = sah_partition(tris, node.bounds);
    
    std::vector<Triangle> left_tris(tris.begin(), tris.begin()+split);
    std::vector<Triangle> right_tris(tris.begin()+split, tris.end());
    
    node.left = ++node_counter;
    node.right = ++node_counter;
    
    build_bvh(left_tris, nodes, node.left);
    build_bvh(right_tris, nodes, node.right);
    
    // Union bounds
    node.bounds = union_aabb(nodes[node.left].bounds, nodes[node.right].bounds);
}

bool ray_intersect(const Ray& ray, const BVHNode* nodes, Hit& hit) {
    Stack<int> stack;
    stack.push(0);  // Root
    
    while (!stack.empty()) {
        int idx = stack.pop();
        const BVHNode& node = nodes[idx];
        
        if (!ray.intersects_aabb(node.bounds)) continue;
        
        if (node.prim_count) {
            // Test leaf primitives
            test_triangles(ray, tris.data() + node.prim_start, hit);
        } else {
            stack.push(node.right);
            stack.push(node.left);
        }
    }
}

Conceptually, BVH transforms O(n²) brute-force intersection into O(log n) via spatial exclusion—RTX cores fetch 16-wide nodes testing ray-AABB before triangle SIMD while refit handles skinned meshes swapping vertex buffers without full rebuilds. Top-level acceleration structures TLAS reference BLAS per object enabling instancing; contrasts VHDL streaming operators by preprocessing geometry for cache-coherent ray traversal in Bluetooth AR glasses rendering FHSS-tracked beacons amid dynamic occlusion.

Animation

Data

Format

CSV

Read more about CSV

/ˌsiː-ɛs-ˈviː/

n. “Plain text pretending to be a spreadsheet.”

CSV, or Comma-Separated Values, is a simple text-based file format used to store tabular data. Each line represents a row, and each value within that row is separated by a delimiter — most commonly a comma. Despite its minimalism, CSV is one of the most widely used data interchange formats in computing.

A typical CSV file might represent a table of users, products, or logs. The first line often contains column headers, followed by data rows. Because the format is plain text, it can be created, viewed, and edited with anything from a text editor to a spreadsheet application to a command-line tool.

One reason CSV persists is its universality. Nearly every programming language, database, analytics tool, and spreadsheet application understands CSV. Systems that cannot easily share native formats can almost always agree on CSV as a lowest common denominator.

That simplicity, however, comes with trade-offs. CSV has no built-in data types, schemas, or encoding guarantees. Everything is text. Numbers, dates, booleans, and null values must be interpreted by the consuming system. This flexibility is powerful, but it can also lead to ambiguity and subtle bugs.

Delimiters are another subtle detail. While commas are traditional, some regions and tools use semicolons or tabs to avoid conflicts with decimal separators. Quoting rules allow values to contain commas, line breaks, or quotation marks, but these rules are often implemented inconsistently across software.

In modern data pipelines, CSV is commonly used as an interchange format in ETL workflows. Data may be exported from a database, transformed by scripts, and loaded into analytics platforms such as BigQuery or stored in Cloud Storage. Its lightweight nature makes it ideal for quick transfers and human inspection.

CSV is also favored for audits, reporting, and backups where transparency matters. You can open the file and see the data directly, without specialized tools. This visibility makes it valuable for debugging and verification, even in highly automated systems.

It is important to recognize what CSV is not. It is not self-describing, strongly typed, or optimized for very large datasets. Formats like Parquet or Avro outperform it in scale and structure. Yet CSV endures because it is simple, durable, and unpretentious.

In essence, CSV is data stripped to its bones. No metadata, no ceremony — just rows, columns, and agreement. And in a world full of complex formats, that blunt honesty is often exactly what makes it useful.

Data

Format

File

MIME

Read more about MIME

/maɪm/

n. “This isn’t just data — it’s what the data means.”

MIME, short for Multipurpose Internet Mail Extensions, is the system that tells computers what kind of data they are looking at and how it should be handled. It answers a deceptively simple question: what is this content supposed to be?

Before MIME, the internet mostly assumed everything was plain text. That worked fine for early email and documents, right up until people wanted to send images, audio, video, spreadsheets, or anything that wasn’t just ASCII characters. The moment binary data entered the chat, assumptions broke. MIME was the fix.

At its core, MIME defines media types, often called content types. These appear as strings like text/html, application/json, image/png, or application/pdf. Each type tells the receiving system how to interpret the bytes it is about to process — whether to render them, download them, execute them, or reject them outright.

The structure of a MIME type is deliberate. The first part describes the broad category — text, image, audio, video, application — while the second part narrows down the specific format. This hierarchy allows software to make safe fallback decisions when it encounters unfamiliar data.

MIME originated in email, but it quickly escaped that boundary. Today it is foundational to the web itself. Every HTTP response served over HTTPS includes a Content-Type header powered by MIME. Browsers rely on it to decide whether something should be displayed inline, executed as code, or treated as a downloadable file.

Security depends heavily on MIME behaving correctly. If a server mislabels executable content as something harmless, browsers may execute code they should not. If a browser ignores the declared MIME type and tries to guess instead, entire classes of attacks become possible. This is why modern security headers like nosniff exist — to force strict adherence to MIME declarations.

In APIs and web services, MIME types act as a contract. When a client requests application/json, it expects structured data, not markup or binary blobs. When a server responds with the wrong type, integrations break in subtle and frustrating ways.

MIME also governs multipart messages — payloads that contain multiple different data types bundled together. This is how email attachments work, how file uploads are handled, and how complex form submissions are transmitted across the web.

Despite its age, MIME continues to evolve. New types are registered as new formats emerge, and old ones linger long after they should have been retired. Some are elegant. Some are cursed. All of them are part of the shared vocabulary that keeps the internet interoperable.

MIME does not care about aesthetics. It does not judge content quality. It does not enforce safety by itself. It simply labels reality and hopes the systems reading those labels behave responsibly.

Without MIME, the internet would still exist — but it would be fragile, confused, and perpetually surprised by its own data. With it, browsers, servers, clients, and users all agree on one crucial thing: what a pile of bytes is supposed to be.

Protocol

Format

Subscribe to Format