qkeras.quantizers

Functions

binary_sigmoid(x)

Computes binary_sigmoid.

binary_tanh(x)

Computes binary_tanh function that outputs -1 and 1.

get_quantized_initializer(w_initializer, w_range)

Gets the initializer and scales it by the range.

get_quantizer(identifier)

Gets the quantizer.

get_weight_scale(quantizer[, x])

Gets the scales of weights for (stochastic_)binary and ternary quantizers.

hard_sigmoid(x)

Computes hard_sigmoid function that saturates between 0 and 1.

hard_tanh(x)

Computes hard_tanh function that saturates between -1 and 1.

set_internal_sigmoid(mode)

Sets _sigmoid to either real, hard or smooth.

smooth_sigmoid(x)

Implements a linear approximation of a sigmoid function.

smooth_tanh(x)

Computes smooth_tanh function that saturates between -1 and 1.

stochastic_round(x[, precision])

Performs stochastic rounding to the first decimal point.

stochastic_round_po2(x)

Performs stochastic rounding for the power of two.

Classes

bernoulli([alpha, temperature, use_real_sigmoid])

Computes a Bernoulli sample with probability sigmoid(x).

binary([use_01, alpha, training, ...])

Computes the sign(x) returning a value between -alpha and alpha.

quantized_bits([bits, integer, symmetric, ...])

Legacy quantizer: Quantizes the number to a number of bits.

quantized_hswish([bits, integer, symmetric, ...])

Computes a quantized hard swish to a number of bits.

quantized_linear([bits, integer, symmetric, ...])

Linear quantization with fixed number of bits.

quantized_po2([bits, max_value, ...])

Quantizes to the closest power of 2.

quantized_relu([bits, integer, use_sigmoid, ...])

Computes a quantized relu to a number of bits.

quantized_relu_po2([bits, max_value, ...])

Quantizes x to the closest power of 2 when x > 0

quantized_sigmoid([bits, symmetric, ...])

Computes a quantized sigmoid to a number of bits.

quantized_tanh([bits, ...])

Computes a quantized tanh to a number of bits.

quantized_ulaw([bits, integer, symmetric, u])

Computes a u-law quantization.

stochastic_binary([alpha, temperature, ...])

Computes a stochastic activation function returning -alpha or +alpha.

stochastic_ternary([alpha, threshold, ...])

Computes a stochastic activation function returning -alpha, 0 or +alpha.

ternary([alpha, threshold, ...])

Computes an activation function returning -alpha, 0 or +alpha.

class qkeras.quantizers.bernoulli(alpha=None, temperature=6.0, use_real_sigmoid=True)[source]

Bases: BaseQuantizer

Computes a Bernoulli sample with probability sigmoid(x).

This computation uses ST approximation.

To do that, we compute sigmoid(x) and a random sample z ~ U[0,1]. As p in [0,1] and z in [0,1], p - z in [-1,1]. However, -1 will never appear because to get -1 we would need sigmoid(-inf) - z == 1. As a result, the range will be in practical terms [0,1].

The noise introduced by z can be seen as a regularizer to the weights W of y = Wx as y = Wx + Wz for some noise z with mean mu(z) and var(z). As a result, W**2 var(z) to the variance of y, which has the same effect as a regularizer on L2 with lambda = var(z), as presented in Hinton”s Coursera Lecture 9c.

Remember that E[dL/dy] = E[dL/dx] once we add stochastic sampling.

alpha

allows one to specify multiplicative factor for number generation of “auto” or “auto_po2”.

temperature

amplifier factor for sigmoid function, making stochastic less stochastic as it moves away from 0.

use_real_sigmoid

use real sigmoid for probability.

Returns:

Computation of round with stochastic sampling with straight through gradient.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value bernoulli class can represent.

min()[source]

Get the minimum value bernoulli class can represent.

class qkeras.quantizers.binary(use_01=False, alpha=None, training=False, use_stochastic_rounding=False, scale_axis=None, elements_per_scale=None, min_po2_exponent=None, max_po2_exponent=None)[source]

Bases: BaseQuantizer

Computes the sign(x) returning a value between -alpha and alpha.

Although we cannot guarantee E[dL/dy] = E[dL/dx] if we do not use the stochastic sampling, we still use the ST approximation.

Modified from original binary to match QNN implementation.

The binary qunatizer supports multiple-scales per tensor where: - alpha: It can be set to “auto” or “auto_po2” to enable auto-scaling. “auto”

allows arbitrary scale while “auto_po2” allows power-of-two scales only. It can also be set to a fixed value or None (i.e., no scaling).

  • scale_axis: It determines the axis/axes to calculate the auto-scale at.

  • elements_per_scale: It enables fine-grained scaling where it determines the number of elements across scale axis/axes that should be grouped into one scale.

Examples:

  1. Input shape = [1, 8, 8, 16] alpha=”auto”, scale_axis=None, elements_per_scale=None –> Number of separate scales = 16

  2. Input shape = [1, 8, 8, 16] alpha=”auto”, scale_axis=1, elements_per_scale=None –> Number of separate scales = 8

  3. Input shape = [1, 8, 8, 16] alpha=”auto”, scale_axis=1, elements_per_scale=2 –> Number of separate scales = 4

  4. Input shape = [1, 8, 8, 16] alpha=”auto”, scale_axis=[2, 3], elements_per_scale=2 –> Number of separate scales = 4*8 = 32

  5. Input shape = [1, 8, 8, 16] alpha=”auto”, scale_axis=[2, 3], elements_per_scale=[2, 4] –> Number of separate scales = 4*4 = 16

x

tensor to perform sign_through.

bits

number of bits to perform quantization.

use_01

if True, return {0,1} instead of {-1,+1}.

alpha

binary is -alpha or +alpha, or “auto”, “auto_po2” to compute automatically.

use_stochastic_rounding

if true, we perform stochastic rounding.

elements_per_scale

if set to an int or List[int], we create multiple scales per axis across scale_axis, where ‘elements_per_scale’ represents the number of elements/values associated with every separate scale value.

scale_axis

int or List[int] which axis/axes to calculate scale from.

min_po2_exponent

if set while using “auto_po2”, it represents the minimum allowed power of two exponent.

max_po2_exponent

if set while using “auto_po2”, it represents the maximum allowed power of two exponent.

Returns:

Computation of sign operation with straight through gradient.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get maximum value that binary class can respresent.

min()[source]

Get minimum value that binary class can respresent.

qkeras.quantizers.binary_sigmoid(x)[source]

Computes binary_sigmoid.

qkeras.quantizers.binary_tanh(x)[source]

Computes binary_tanh function that outputs -1 and 1.

qkeras.quantizers.get_quantized_initializer(w_initializer, w_range)[source]

Gets the initializer and scales it by the range.

qkeras.quantizers.get_quantizer(identifier)[source]

Gets the quantizer.

Parameters:

identifier – An quantizer, which could be dict, string, or callable function.

Returns:

A quantizer class or quantization function from this file. For example,

Quantizer classes: quantized_bits, quantized_po2, quantized_relu_po2, binary, stochastic_binary, ternary, stochastic_ternary, etc.

Quantization functions: binary_sigmoid, hard_sigmoid, soft_sigmoid, etc.

Raises:

ValueError – An error occurred when quantizer cannot be interpreted.

qkeras.quantizers.get_weight_scale(quantizer, x=None)[source]

Gets the scales of weights for (stochastic_)binary and ternary quantizers.

Parameters:
  • quantizer – A binary or teneray quantizer class.

  • x – A weight tensor. We keep it here for now for backward compatibility.

Returns:

Weight scale per channel for binary and ternary quantizers with auto or auto_po2 alpha/threshold.

qkeras.quantizers.hard_sigmoid(x)[source]

Computes hard_sigmoid function that saturates between 0 and 1.

qkeras.quantizers.hard_tanh(x)[source]

Computes hard_tanh function that saturates between -1 and 1.

class qkeras.quantizers.quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=True, alpha=None, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False, elements_per_scale=None, min_po2_exponent=None, max_po2_exponent=None, post_training_scale=None)[source]

Bases: BaseQuantizer

Legacy quantizer: Quantizes the number to a number of bits.

In general, we want to use a quantization function like:

a = (pow(2,bits) - 1 - 0) / (max(x) - min(x)) b = -min(x) * a

in the equation:

xq = a x + b

This requires multiplication, which is undesirable. So, we enforce weights to be between -1 and 1 (max(x) = 1 and min(x) = -1), and separating the sign from the rest of the number as we make this function symmetric, thus resulting in the following approximation.

  1. max(x) = +1, min(x) = -1

  2. max(x) = -min(x)

a = pow(2,bits-1) b = 0

Finally, just remember that to represent the number with sign, the largest representation is -pow(2,bits) to pow(2, bits-1)

Symmetric and keep_negative allow us to generate numbers that are symmetric (same number of negative and positive representations), and numbers that are positive.

Note

the behavior of quantized_bits is different than Catapult HLS ac_fixed or Vivado HLS ap_fixed. For ac_fixed<word_length, integer_lenth, signed>, when signed = true, it is equavlent to quantized_bits(word_length, integer_length-1, keep_negative=True)

bits

number of bits to perform quantization.

integer

number of bits to the left of the decimal point.

symmetric

if true, we will have the same number of values for positive and negative numbers.

alpha

a tensor or None, the scaling factor per channel. If None, the scaling factor is 1 for all channels.

keep_negative

if true, we do not clip negative numbers.

use_stochastic_rounding

if true, we perform stochastic rounding.

scale_axis

int or List[int] which axis/axes to calculate scale from.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic Variables or not.

elements_per_scale

if set to an int or List[int], we create multiple scales per axis across scale_axis, where ‘elements_per_scale’ represents the number of elements/values associated with every separate scale value. It is only supported when using “auto_po2”.

min_po2_exponent

if set while using “auto_po2”, it represents the minimum allowed power of two exponent.

max_po2_exponent

if set while using “auto_po2”, it represents the maximum allowed power of two exponent.

post_training_scale

if set, it represents the scale value to be used for quantization.

Returns:

Function that computes fixed-point quantization with bits.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get maximum value that quantized_bits class can represent.

min()[source]

Get minimum value that quantized_bits class can represent.

range()[source]

Returns a list of all values that quantized_bits can represent ordered by their binary representation ascending.

class qkeras.quantizers.quantized_hswish(bits=8, integer=0, symmetric=0, alpha=None, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_variables=False, relu_shift=3, relu_upper_bound=6)[source]

Bases: quantized_bits

Computes a quantized hard swish to a number of bits.

# TODO(mschoenb97): Update to inherit from quantized_linear.

Equation of h-swisth function in mobilenet v3: hswish(x) = x * ReluY(x + relu_shift) / Y Y is relu_upper_bound

Parameters:
  • relu_shift (int)

  • relu_upper_bound (int)

bits

number of bits to perform quantization, also known as word length.

integer

number of integer bits.

symmetric

if True, the quantization is in symmetric mode, which puts restricted range for the quantizer. Otherwise, it is in asymmetric mode, which uses the full range.

alpha

a tensor or None, the scaling factor per channel. If None, the scaling factor is 1 for all channels.

use_stochastic_rounding

if true, we perform stochastic rounding. This parameter is passed on to the underlying quantizer quantized_bits which is used to quantize h_swish.

scale_axis

which axis to calculate scale from

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic Variables or not.

relu_shift

integer type, representing the shift amount of the unquantized relu.

relu_upper_bound

integer type, representing an upper bound of the unquantized relu. If None, we apply relu without the upper bound when “is_quantized_clip” is set to false (true by default). Note: The quantized relu uses the quantization parameters (bits and

integer) to upper bound. So it is important to set relu_upper_bound appropriately to the quantization parameters. “is_quantized_clip” has precedence over “relu_upper_bound” for backward compatibility.

get_config()[source]

Add relu_shift and relu_upper_bound to the config file.

min()[source]

Gets the minimum value that quantized_hswish can represent.

class qkeras.quantizers.quantized_linear(bits=8, integer=0, symmetric=1, keep_negative=True, alpha=None, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_variables=False)[source]

Bases: BaseQuantizer

Linear quantization with fixed number of bits.

This quantizer maps inputs to the nearest value of a fixed number of outputs that are evenly spaced, with possible scaling and stochastic rounding. This is an updated version of the legacy quantized_bits.

The core computation is:
  1. Divide the tensor by a quantization scale

  2. Clip the tensor to a specified range

  3. Round to the nearest integer

  4. Multiply the rounded result by the quantization scale

This clip range is determined by
  • The number of bits we have to represent the number

  • Whether we want to have a symmetric range or not

  • Whether we want to keep negative numbers or not

The quantization scale is defined by either the quantizer parameters or the data passed to the __call__ method. See documentation for the alpha parameter to find out more.

For backprop purposes, the quantizer uses the straight-through estimator for the rounding step (https://arxiv.org/pdf/1903.05662.pdf). Thus the gradient of the __call__ method is 1 on the interval [quantization_scale * clip_min, quantization_scale * clip_max] and 0 elsewhere.

The quantizer also supports a number of other optional features: - Stochastic rounding (see the stochastic_rounding parameter) - Quantization noise (see the qnoise_factor parameter)

Notes on the various “scales” in quantized_linear:

  • The quantization scale is the scale used in the core computation (see above). You can access it via the quantization_scale attribute.

  • The data type scale is the scale is determined by the type of data stored on hardware on a small device running a true quantized model. It is the quantization scale needed to represent bits bits, integer of which are integer bits, and one bit is reserved for the sign if keep_negative is True. It can be calculated as 2 ** (integer - bits + keep_negative). You can access it via the data_type_scale attribute.

  • The scale attribute stores the quotient of the quantization scale and the data type scale. This is also the scale that can be directly specified by the user, via the alpha parameter.

These three quantities are related by the equation scale = quantization_scale / data_type_scale.

See the diagram below of scale usage in a quantized conv layer.

data_type_scale —————> stored_weights

(determines decimal point) |

V

conv op

V

accumulator

determines quantization V
range and precision —————> quantization_scale
(per channel) |

V

activation

# TODO: The only fundamentally necessary scale is the quantization scale. # We should consider removing the data type scale and scale attributes, # but know that this will require rewriting much of how qtools and HLS4ML # use these scale attributes.

Note on binary quantization (bits=1):

The core computation is modified here when keep_negative is True to perform a scaled sign function. This is needed because the core computation as defined above requires that 0 be mapped to 0, which does not allow us to keep both positive and negative outputs for binary quantization. Special shifting operations are used to achieve this.

Example usage:

# 8-bit quantization with 3 integer bits >>> q = quantized_linear(8, 3) >>> x = keras.ops.array([0.0, 0.5, 1.0, 1.5, 2.0]) >>> keras.ops.convert_to_numpy(q(x)) array([0., 0., 1., 2., 2.], dtype=float32)

# 2-bit quantization with “auto” and tensor alphas >>> q_auto = quantized_linear(2, alpha=”auto”) >>> x = keras.ops.array([0.0, 0.5, 1.0, 1.5, 2.0]) >>> keras.ops.convert_to_numpy(q_auto(x)) array([0., 0., 0., 2., 2.], dtype=float32) >>> keras.ops.convert_to_numpy(q_auto.scale) array([4.], dtype=float32) >>> keras.ops.convert_to_numpy(q_auto.quantization_scale) array([2.], dtype=float32) >>> q_fixed = quantized_linear(2, alpha=q_auto.scale) >>> q_fixed(x) array([0., 0., 0., 2., 2.], dtype=float32)

Args:

bits (int): Number of bits to represent the number. Defaults to 8. integer (int): Number of bits to the left of the decimal point, used for

data_type_scale. Defaults to 0.

symmetric (bool): If true, we will have the same number of values

for positive and negative numbers. Defaults to True.

alpha (str, Tensor, None): Instructions for determining the quantization

scale. Defaults to None. - If None: the quantization scale is the data type scale, determined

by integer, bits, and keep_negative.

  • If “auto”, the quantization scale is calculated as the minimum floating point scale per-channel that does not clip the max of x.

  • If “auto_po2”, the quantization scale is chosen as the power of two per-channel that minimizes squared error between the quantized x and the original x.

  • If Tensor: The quantization scale is the Tensor passed in multiplied by the data type scale.

keep_negative (bool): If false, we clip negative numbers. Defaults to

True.

use_stochastic_rounding (bool): If true, we perform stochastic rounding

(https://arxiv.org/pdf/1502.02551.pdf).

scale_axis (int, None): Which axis to calculate scale from. If None, we

perform per-channel scaling based off of the image data format. Note that each entry of a rank-1 tensor is considered its own channel by default. See _get_scaling_axis for more details. Defaults to None.

qnoise_factor (float): A scalar from 0 to 1 that represents the level of

quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor) * unquantized_x + qnoise_factor * quantized_x. Defaults to 1.0, which means that the result is fully quantized.

use_variables (bool): If true, we use Variables to store certain

parameters. See the BaseQuantizer implementation for more details. Defaults to False. If set to True, be sure to use the special attribute update methods detailed in the BaseQuantizer.

var_name (str or None): A variable name shared between the Variables

created in on initialization, if use_variables is true. If None, the variable names are generated automatically based on the parameter names along with a uid. Defaults to None.

Returns:

Function that computes linear quantization.

Return type:

function

Raises:

ValueError

  • If bits is not positive, or is too small to represent integer. - If integer is negative. - If alpha is a string but not one of (“auto”, “auto_po2”).

ALPHA_STRING_OPTIONS = ('auto', 'auto_po2')
property auto_alpha

Returns true if using a data-dependent alpha

property bits
property data_type_scale

Quantization scale for the data type

property default_quantization_scale

Calculate and set quantization_scale default

classmethod from_config(config)[source]
get_clip_bounds()[source]

Get bounds of clip range

get_config()[source]
property integer
property keep_negative
max()[source]

Get maximum value that quantized_linear class can represent.

min()[source]

Get minimum value that quantized_linear class can represent.

range()[source]

Returns a list of all values that quantized_linear can represent }.

property scale
property scale_axis
property use_sign_function

Return true if using sign function for quantization

property use_stochastic_rounding
property use_variables
class qkeras.quantizers.quantized_po2(bits=8, max_value=None, use_stochastic_rounding=False, quadratic_approximation=False, log2_rounding='rnd', qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False)[source]

Bases: BaseQuantizer

Quantizes to the closest power of 2.

bits

An integer, the bits allocated for the exponent, its sign and the sign of x.

max_value

An float or None. If None, no max_value is specified. Otherwise, the maximum value of quantized_po2 <= max_value

use_stochastic_rounding

A boolean, default is False, if True, it uses stochastic rounding and forces the mean of x to be x statstically.

quadratic_approximation

A boolean, default is False if True, it forces the exponent to be even number that closted to x.

log2_rounding

A string, log2 rounding mode. “rnd” and “floor” currently supported, corresponding to keras.ops.round and keras.ops.floor respectively.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic Variables or not.

classmethod from_config(config)[source]
get_config()[source]

Gets configugration of the quantizer.

Returns:

A dict mapping quantization configuration, including

bits: bitwidth for exponents. max_value: the maximum value of this quantized_po2 can represent. use_stochastic_rounding:

if True, stochastic rounding is used.

quadratic_approximation:

if True, the exponent is enforced to be even number, which is the closest one to x.

log2_rounding:

A string, Log2 rounding mode

max()[source]

Get the maximum value that quantized_po2 can represent.

min()[source]

Get the minimum value that quantized_po2 can represent.

class qkeras.quantizers.quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0, use_stochastic_rounding=False, relu_upper_bound=None, is_quantized_clip=True, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False)[source]

Bases: BaseQuantizer

Computes a quantized relu to a number of bits.

Modified from:

[https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow]

Assume h(x) = +1 with p = sigmoid(x), -1 otherwise, the expected value of h(x) is:

E[h(x)] = +1 P(p <= sigmoid(x)) - 1 P(p > sigmoid(x))

= +1 P(p <= sigmoid(x)) - 1 ( 1 - P(p <= sigmoid(x)) ) = 2 P(p <= sigmoid(x)) - 1 = 2 sigmoid(x) - 1, if p is sampled from a uniform distribution U[0,1]

If use_sigmoid is 0, we just keep the positive numbers up to 2**integer * (1 - 2**(-bits)) instead of normalizing them, which is easier to implement in hardware.

bits

number of bits to perform quantization.

integer

number of bits to the left of the decimal point.

use_sigmoid

if true, we apply sigmoid to input to normalize it.

negative_slope

slope when activation < 0, needs to be power of 2.

use_stochastic_rounding

if true, we perform stochastic rounding.

relu_upper_bound

A float representing an upper bound of the unquantized relu. If None, we apply relu without the upper bound when “is_quantized_clip” is set to false (true by default). Note: The quantized relu uses the quantization parameters (bits and integer) to upper bound. So it is important to set relu_upper_bound appropriately to the quantization parameters. “is_quantized_clip” has precedence over “relu_upper_bound” for backward compatibility.

is_quantized_clip

A boolean representing whether the inputs are clipped to the maximum value represented by the quantization parameters. This parameter is deprecated, and the default is set to True for backwards compatibility. Users are encouraged to use “relu_upper_bound” instead.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic Variables or not.

Returns:

Function that performs relu + quantization to bits >= 0.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that quantized_relu can represent.

min()[source]

Get the minimum value that quantized_relu can represent.

range()[source]

Returns a list of all values that quantized_relu can represent

ordered by their binary representation ascending.

class qkeras.quantizers.quantized_relu_po2(bits=8, max_value=None, negative_slope=0, use_stochastic_rounding=False, quadratic_approximation=False, log2_rounding='rnd', qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False)[source]

Bases: BaseQuantizer

Quantizes x to the closest power of 2 when x > 0

bits

An integer, the bits allocated for the exponent and its sign.

max_value

default is None, or a non-negative value to put a constraint for the max value.

negative_slope

slope when activation < 0, needs to be power of 2.

use_stochastic_rounding

A boolean, default is False, if True, it uses stochastic rounding and forces the mean of x to be x statstically.

quadratic_approximation

A boolean, default is False if True, it forces the exponent to be even number that is closest to x.

log2_rounding

A string, log2 rounding mode. “rnd” and “floor” currently supported, corresponding to keras.ops.round and keras.ops.floor respectively.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic Variables or not.

classmethod from_config(config)[source]
get_config()[source]

Gets configugration of the quantizer.

Returns:

A dict mapping quantization configuration, including

bits: bitwidth for exponents. max_value: the maximum value of this quantized_relu_po2 can represent. use_stochastic_rounding:

if True, stochastic rounding is used.

quadratic_approximation:

if True, the exponent is enforced to be even number, which is the closest one to x.

log2_rounding:

A string, Log2 rounding mode

max()[source]

Get the maximum value that quantized_relu_po2 can represent.

min()[source]

Get the minimum value that quantized_relu_po2 can represent.

class qkeras.quantizers.quantized_sigmoid(bits=8, symmetric=False, use_real_sigmoid=False, use_stochastic_rounding=False)[source]

Bases: BaseQuantizer

Computes a quantized sigmoid to a number of bits.

bits

number of bits to perform quantization.

symmetric

if true, we will have the same number of values for positive and negative numbers.

use_real_sigmoid

if true, will use the sigmoid from Keras backend

use_stochastic_rounding

if true, we perform stochastic rounding.

Returns:

Function that performs sigmoid + quantization to bits in the range 0.0 to 1.0.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that quantized_sigmoid can represent.

min()[source]

Get the minimum value that quantized_sigmoid can represent.

class qkeras.quantizers.quantized_tanh(bits=8, use_stochastic_rounding=False, symmetric=False, use_real_tanh=False)[source]

Bases: BaseQuantizer

Computes a quantized tanh to a number of bits.

Modified from:

[https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow]

bits

number of bits to perform quantization.

use_stochastic_rounding

if true, we perform stochastic rounding.

symmetric

if true, we will have the same number of values for positive and negative numbers.

use_real_tanh

if true, use the tanh function from Keras backend, if false, use tanh that is defined as 2 * sigmoid(x) - 1

Returns:

Function that performs tanh + quantization to bits in the range -1.0 to 1.0.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that quantized_tanh can represent.

min()[source]

Get the minimum value that quantized_tanh can represent.

class qkeras.quantizers.quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)[source]

Bases: BaseQuantizer

Computes a u-law quantization.

bits

number of bits to perform quantization.

integer

number of bits to the left of the decimal point.

symmetric

if true, we will have the same number of values for positive and negative numbers.

u

parameter of u-law

Returns:

Function that performs ulaw + quantization to bits in the range -1.0 to 1.0.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that quantized_ulaw can represent.

min()[source]

Get the minimum value that quantized_ulaw can represent.

qkeras.quantizers.set_internal_sigmoid(mode)[source]

Sets _sigmoid to either real, hard or smooth.

qkeras.quantizers.smooth_sigmoid(x)[source]

Implements a linear approximation of a sigmoid function.

qkeras.quantizers.smooth_tanh(x)[source]

Computes smooth_tanh function that saturates between -1 and 1.

class qkeras.quantizers.stochastic_binary(alpha=None, temperature=6.0, use_real_sigmoid=True)[source]

Bases: binary

Computes a stochastic activation function returning -alpha or +alpha.

Computes straight-through approximation using random sampling to make E[dL/dy] = E[dL/dx], and computing the sign function. See explanation above.

x

tensor to perform sign opertion with stochastic sampling.

alpha

binary is -alpha or +alpha, or “auto” or “auto_po2”.

bits

number of bits to perform quantization.

temperature

amplifier factor for sigmoid function, making stochastic behavior less stochastic as it moves away from 0.

use_real_sigmoid

use real sigmoid from tensorflow for probablity.

Returns:

Computation of sign with stochastic sampling with straight through gradient.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that stochastic_binary can respresent.

min()[source]

Get the minimum value that stochastic_binary can respresent.

qkeras.quantizers.stochastic_round(x, precision=0.5)[source]

Performs stochastic rounding to the first decimal point.

qkeras.quantizers.stochastic_round_po2(x)[source]

Performs stochastic rounding for the power of two.

class qkeras.quantizers.stochastic_ternary(alpha=None, threshold=None, temperature=8.0, use_real_sigmoid=True, number_of_unrolls=5)[source]

Bases: ternary

Computes a stochastic activation function returning -alpha, 0 or +alpha.

Computes straight-through approximation using random sampling to make E[dL/dy] = E[dL/dx], and computing the sign function. See explanation above.

x

tensor to perform sign opertion with stochastic sampling.

bits

number of bits to perform quantization.

alpha

ternary is -alpha or +alpha, or “auto” or “auto_po2”.

threshold

(1-threshold) specifies the spread of the +1 and -1 values.

temperature

amplifier factor for sigmoid function, making stochastic less stochastic as it moves away from 0.

use_real_sigmoid

use real sigmoid for probability.

number_of_unrolls

number of times we iterate between scale and threshold.

Returns:

Computation of sign with stochastic sampling with straight through gradient.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that stochastic_ternary can respresent.

min()[source]

Get the minimum value that stochastic_ternary can respresent.

class qkeras.quantizers.ternary(alpha=None, threshold=None, use_stochastic_rounding=False, number_of_unrolls=5)[source]

Bases: BaseQuantizer

Computes an activation function returning -alpha, 0 or +alpha.

Right now we assume two type of behavior. For parameters, we should have alpha, threshold and stochastic rounding on. For activations, alpha and threshold should be floating point numbers, and stochastic rounding should be off.

x

tensor to perform sign opertion with stochastic sampling.

bits

number of bits to perform quantization.

alpha

ternary is -alpha or +alpha. Alpha can be “auto” or “auto_po2”.

threshold

threshold to apply “dropout” or dead band (0 value). If “auto” is specified, we will compute it per output layer.

use_stochastic_rounding

if true, we perform stochastic rounding.

Returns:

Computation of sign within the threshold.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that ternary can respresent.

min()[source]

Get the minimum value that ternary can respresent.