TF.js notes drafts

under construction

TF-lite (and NVIDIA :) team get the most progress in quantized weights optimization but TF-lite supports model inference only. Anfortunately quantized models number in TF-Lite: Optimized models hub >> TF Hub :(

Weight quantization

Weights float32 data may be compressed in 1 or 2 bytes (e.g. uint8/16) as
    float = min + scale*uint,     scale = (max - min)/uint_max

We can find quantization in:

Pose Detection in the Browser: PoseNet Model
The multi-pose detector demo with weghts quantization test.

More links:
TensorFlow Model Optimization Toolkit - float16 quantization halves model size
Post-Training Quantization of TensorFlow model to FP16
Optimizing any TensorFlow model using TensorFlow Transform Tools and using TensorRT

TensorFlow Model Optimization Toolkit — Post-Training Integer Quantization
Quantization-Aware Training support in Keras

Dogs vs Cats Image Classification With Image Augmentation

"Raw" float16 weights

In WebGL float data are usually stored in Float32Array() but you can put/get data into GPU as F32 or F16 textures (see e.g. HALF_FLOAT matrix multiplication). FP16 data may be stored into Uint16Array() too, therefore F16 weights may be saved into 2 bytes directly.

Note, that 2 bytes quantized, raw float16 and bfloat16 have different accuracy (16, 10 and 8 fraction bits).

TF-lite and TFjs?

TF-lite supports Float16 quantization of weights but for some reason uses its own file format ".tflite".
Can we use TF-lite models in TFjs?

TF-lite PoseNet Android Demo. Overview
13MB "tflite" model, but not clear how is it quantized?

FP16 Math

With mediump float precision we could get x2 acceleration but for some reason e.g. SSD model works wrongly with FP16 math (for TFjs team). Can we optimize any useful model for FP16 math (NVIDIA can :)?
Mixed-Precision Training of Deep Neural Networks NVIDIA.
Training With Mixed Precision NVIDIA.
TFjs notes     updated 5 Nov 2019