qkeras.utils

qkeras.utils.add_bn_fusing_weights(prev_layer, bn_layer, saved_weights)[source]

Adds additional fusing weights to saved_weights.

In hardware inference, we need to combined fuse previous layer’s output with the following batchnorm op. z[i] = bn(y[i]) = inv[i] * y’[i] * scale[i] - bias’[i] is the final output of the previous layer and bn layer, with:

inv[i] = gamma[i]* rsqrt(variance[i]^2+epsilon) is computed from the

bn layer weights

y’[i] is the i-th channel output from the previous layer (before scale) scale[i] is the i-th channel kernel quantizer scale fused_bias[i] = inv[i] * bias[i] + beta[i] - inv[i]*mean[i] where bias is

the bias term from the previous layer, beta and mean are the bn layer weights.

Parameters:
  • prev_layer – qkeras layer, could be QConv2D/QDepthwiseConv2D/QDense.

  • bn_layer – The following QBatchNormalization layer that needs to be fused with the previous layer.

  • saved_weights – Dict. The centralized weights dictionary that exports relevant weights and parameters for hardware inference.

qkeras.utils.clone_model(model, custom_objects=None)[source]

Clone a qkeras model safely.

qkeras.utils.clone_model_and_freeze_auto_po2_scale(orig_model, orig_model_path=None, quantize_model_weights=False)[source]

Clone model and freeze the scale value of auto_po2 type quantizers.

Parameters:
  • orig_model – original model which will be used to clone the new model. If set to None, the function will load the original model from orig_model_path argument.

  • orig_model_path – The path to the original model file. If set to None, the function will load the original model from the orig_model argument.

  • quantize_model_weights

    Bool to quantize weights to HW format. If set to False, the model weights will be in float format. If set to True, the model weights will be in HW format and the function

    will also check if the hw weights extracted from the new model matches the original model.

Returns:

A tuple of the new model and the new model’s hw weights.

Note

  • When using this function to retrain model with fixed scale value. Set quantize_model_weights to False in this case.

  • This function only supports a collection of common layers that will use auto_po2 quantizers. For less common layers, it will raise errors and we will add more support case by case.

Example usage:
model, _ = clone_model_and_freeze_auto_po2_scale(

orig_model_path=”path/to/model”, quantize_model_weights=False)

qkeras.utils.clone_optimizer(opt)[source]
qkeras.utils.convert_to_folded_model(model)[source]

Find Conv/Dense layers followed by BN layers and fold them.

Parameters:

model – Keras model instance.

Returns:

Keras model with folded BN layers. layers_to_fold: list of folded layer names.

Return type:

new_model

qkeras.utils.find_bn_fusing_layer_pair(model, custom_objects={})[source]

Finds layers that can be fused with the following batchnorm layers.

Parameters:
  • model – input model

  • custom_objects – Dict of model specific objects needed for cloning.

Returns:

Dict that marks all the layer pairs that need to be fused.

Note: supports sequential and non-sequential model

qkeras.utils.get_config(quantizer_config, layer, layer_class, parameter=None)[source]

Returns search of quantizer on quantizer_config.

qkeras.utils.get_model_sparsity(model, per_layer=False, allow_list=None)[source]

Calculates the sparsity of the model’s weights and biases.

Quantizes the model weights using model_save_quantized_weights (but does not

save the quantized weights) before calculating the proportion of weights and biases set to zero.

Parameters:
  • model – The model to use to calculate sparsity. Assumes that this is a qkeras model with trained weights.

  • per_layer – If to return a per-layer breakdown of sparsity

  • allow_list – A list of layer class names that sparsity will be calculated for. If set to None, a default list will be used.

Returns:

A float value representing the proportion of weights and biases set to zero in the quantized model. If per_layer is True, it also returns a per-layer breakdown of model sparsity formatted as a list of tuples in the form (<layer name>, <sparsity proportion>)

qkeras.utils.get_y_from_TFOpLambda(model_cfg, layer)[source]

Get the value of “y” from the TFOpLambda layer’s configuration. :type model_cfg: :param model_cfg: dictionary type, model.get_config() output :type layer: :param layer: a given layer instance

Returns:

value of “y” for a TFOpLambda layer. ‘y’ here corresponds to how tensorflow stores TFOpLambda layer parameter in serialization. for example, TFOpLambda(func), where func is knp.multiply(input_tensor, 3). “y” would be the value 3.

qkeras.utils.is_TFOpLambda_layer(layer)[source]
qkeras.utils.load_qmodel(filepath, custom_objects=None, compile=True)[source]

Loads quantized model from Keras’s model.save() h5 file.

Parameters:
  • filepath – one of the following: - string, path to the saved model - h5py.File or h5py.Group object from which to load the model - any file-like object implementing the method read that returns bytes data (e.g. io.BytesIO) that represents a valid h5py file image.

  • custom_objects – Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization.

  • compile – Boolean, whether to compile the model after loading.

Returns:

A Keras model instance. If an optimizer was found as part of the saved model, the model is already compiled. Otherwise, the model is uncompiled and a warning will be displayed. When compile is set to False, the compilation is omitted without any warning.

qkeras.utils.model_quantize(model, quantizer_config, activation_bits, custom_objects=None, transfer_weights=False, prefer_qadaptiveactivation=False, enable_bn_folding=False)[source]

Creates a quantized model from non-quantized model.

The quantized model translation is based on json interface of Keras, which requires a custom_objects dictionary for “string” types.

Because of the way json works, we pass “string” objects for the quantization mechanisms and we perform an eval(“string”) which technically is not safe, but it will do the job.

The quantizer_config is a dictionary with the following form. {

Dense_layer_name: {

“kernel_quantizer”: “quantizer string”, “bias_quantizer”: “quantizer_string”

},

Conv2D_layer_name: {

“kernel_quantizer”: “quantizer string”, “bias_quantizer”: “quantizer_string”

},

Activation_layer_name: “quantizer string”,

“QActivation”: { “relu”: “quantizer_string” },

“QConv2D”: {

“kernel_quantizer”: “quantizer string”, “bias_quantizer”: “quantizer_string”

},

“QBatchNormalization”: {}

}

In the case of “QBidirectional”, we can follow the same form as above. The specified configuration will be used for both forward and backwards layer. {

“Bidirectional”{

“kernel_quantizer” : “quantizer string”, “bias_quantizer” : “quantizer string”, “recurrent_quantizer” : “quantizer string”

}

}

In the case of “QActivation”, we can modify only certain types of activations, for example, a “relu”. In this case we represent the activation name by a dictionary, or we can modify all activations, without representhing as a set.

We right now require a default case in case we cannot find layer name. This simplifies the dictionary because the simplest case, we can just say:

{
“default”: {

“kernel”: “quantized_bits(4)”, “bias”: “quantized_bits(4)”

}

}

and this will quantize all layers’ weights and bias to be created with 4 bits.

Parameters:
  • model – model to be quantized

  • quantizer_config – dictionary (as above) with quantized parameters

  • activation_bits – number of bits for quantized_relu, quantized_tanh, quantized_sigmoid

  • custom_objects – dictionary following keras recommendations for json translation.

  • transfer_weights – if true, weights are to be transfered from model to qmodel.

  • prefer_qadaptiveactivation – Bool. If true, try to use QAdaptiveActivation over QActivation whenever possible

  • enable_bn_folding – Bool. If true, fold conv/dense layers with following batch normalization layers whenever possible. use QConv2DBatchnorm for example, to replace conv2d layers

Returns:

qmodel with quantized operations and custom_objects.

qkeras.utils.model_save_quantized_weights(model, filename=None, custom_objects={})[source]

Quantizes model for inference and save it.

Takes a model with weights, apply quantization function to weights and returns a dictionary with quantized weights.

User should be aware that “po2” quantization functions cannot really be quantized in meaningful way in Keras. So, in order to preserve compatibility with inference flow in Keras, we do not covert “po2” weights and biases to exponents + signs (in case of quantize_po2), but return instead (-1)**sign*(2**round(log2(x))). In the returned dictionary, we will return the pair (sign, round(log2(x))).

Special care needs to be given to quantized_bits(alpha=”auto_po2”) as well. Since in this quantizer, hardware needs the integer weights and scale for hardware inference, this function will return the pair (scale, integer_weights) in the returned dictionary.

Parameters:
  • model – model with weights to be quantized.

  • filename – if specified, we will save the hdf5 containing the quantized weights so that we can use them for inference later on.

  • custom_objects – Dict of model specific objects needed to load/store.

Returns:

dictionary containing layer name and quantized weights that can be used by a hardware generator.

qkeras.utils.print_model_sparsity(model)[source]

Prints sparsity for the pruned layers in the model.

qkeras.utils.quantize_activation(layer_config, activation_bits)[source]

Replaces activation by quantized activation functions.

qkeras.utils.quantized_model_debug(model, X_test, plot=False, plt_instance=None)[source]

Debugs and plots model weights and activations.

Parameters:
  • model – The qkeras model to debug

  • X_test – The sample data to use to give to model.predict

  • plot – Bool. If to plot the results.

  • plt_instance – A matplotlib.pyplot instance used to plot in an IPython environment.

qkeras.utils.quantized_model_dump(model, x_test, output_dir=None, layers_to_dump=[])[source]

Dumps tensors of target layers to binary files.

Parameters:
  • model – qkeras model object.

  • x_test – numpy type, test tensors to generate output tensors.

  • output_dir – a string for the directory to hold binary data.

  • layers_to_dump – a list of string, specified layers by layer customized name.

qkeras.utils.quantized_model_from_json(json_string, custom_objects=None)[source]
qkeras.utils.remove_mask_keys(obj)[source]

Modules