Compute

Transformer

Read more about Transformer

/trænsˈfɔːrmər/

noun … “a neural network architecture that models relationships using attention mechanisms.”

Convolutional Neural Network

Read more about Convolutional Neural Network

/ˌsiːˌɛnˈɛn/

noun … “a deep learning model for processing grid-like data such as images.”

American Institute of Electrical Engineers

Read more about American Institute of Electrical Engineers

/ˌeɪ.iːˌiːˈiː/

noun … “the original American institute for electrical engineering standards and research.”

IEEE

Read more about IEEE

/ˌaɪ.iːˌiːˈiː/

noun … “the global standards organization for electrical and computing technologies.”

Float64

Read more about Float64

/floʊt ˈsɪksˌtiːfɔːr/

noun … “a 64-bit double-precision floating-point number.”

Float32

Read more about Float32

/floʊt ˈθɜːrtiːtuː/

noun … “a 32-bit single-precision floating-point number.”

INT8

Read more about INT8

/ɪnˈteɪt/

n. “small numbers, absolute certainty.”

INT8 is an 8-bit two's complement integer ranging from -128 to +127, optimized for quantized neural network inference where model weights/activations rounded to nearest integer maintain >99% accuracy versus FP32 training. Post-training quantization or quantization-aware training converts FP32 networks to INT8, enabling 4x throughput and 4x memory reduction on edge TPUs while zero-point offsets handle asymmetric activation ranges.

Key characteristics of INT8 include:

Floating Point 16

Read more about Floating Point 16

/ˌɛf ˈpiː ˈsɪks ˈti:n/

n. "IEEE 754 half-precision 16-bit floating point format trading precision for 2x HBM throughput in AI training."

Floating Point 32

Read more about Floating Point 32

/ˌɛf ˈpiː ˈθɜr ti ˈtu/

n. "IEEE 754 single-precision 32-bit floating point format balancing range and accuracy for graphics/ML workloads."

RNN

Read more about RNN

/ɑr ɛn ˈɛn/

n. "Neural network with feedback loops maintaining hidden state across time steps for sequential data processing."

Subscribe to Compute