napkinxc.models.BR

class napkinxc.models.BR(output, hash=None, features_threshold=0, norm=True, bias=1.0, optimizer='liblinear', loss='log', weights_threshold=0.1, liblinear_c=10, liblinear_eps=0.1, liblinear_solver=None, liblinear_max_iter=100, eta=1.0, epochs=1, adagrad_eps=0.001, load_as='map', threads=0, mem_limit=0, verbose=0, **kwargs)[source]

Bases: Model

Binary Relevance (multi-label) classifier with linear estimators, using CPP core

__init__(output, hash=None, features_threshold=0, norm=True, bias=1.0, optimizer='liblinear', loss='log', weights_threshold=0.1, liblinear_c=10, liblinear_eps=0.1, liblinear_solver=None, liblinear_max_iter=100, eta=1.0, epochs=1, adagrad_eps=0.001, load_as='map', threads=0, mem_limit=0, verbose=0, **kwargs)[source]

Construct a Binary Relevance model.

Parameters:
  • output (str) – Directory where the model will be stored

  • hash (int, optional) – Hash features to a space of given size, value of this argument is saved with model weights, if None or 0 disable hashing, defaults to None

  • features_threshold (float, optional) – Prune features below given threshold, value of this argument is saved with model weights, defaults to 0

  • norm (bool, optional) – Unit norm feature vector, value of this argument is saved with model weights, defaults to True

  • bias (float, optional) – Value of the bias features, value of this argument is saved with model weights, defaults to 1.0

  • optimizer (str, optional) – Optimizer used for training node classifiers {'liblinear', 'sgd', 'adagrad'}, defaults to 'liblinear'

  • loss (str, optional) – Loss optimized while training node classifiers {'log' (alias 'logistic'), 'l2' (alias 'squaredHinge')}, defaults to 'log'

  • weights_threshold (float, optional) – Threshold value for pruning weights, defaults to 0.1

  • liblinear_c (float, optional) – LIBLINEAR cost co-efficient, inverse regularization strength, smaller values specify stronger regularization, makes effect only if optimizer='liblinear', defaults to 10.0

  • liblinear_eps (float, optional) – LIBLINEAR tolerance of termination criterion, makes effect only if optimizer='liblinear', defaults to 0.1

  • liblinear_solver (str, optional) –

    Override LIBLINEAR solver set by loss parameter (default for loss='log': 'L2R_LR_DUAL', for loss='l2': 'L2R_L2LOSS_SVC_DUAL'), makes effect only if optimizer='liblinear'. Available solvers:

    • 'L2R_LR_DUAL'

    • 'L2R_LR'

    • 'L1R_LR'

    • 'L2R_L2LOSS_SVC_DUAL'

    • 'L2R_L2LOSS_SVC'

    • 'L2R_L1LOSS_SVC_DUAL'

    • 'L1R_L2LOSS_SVC'

    L2R_LR_DUAL and L2R_L2LOSS_SVC_DUAL usually work the best in XC setting, defaults to None

  • liblinear_max_iter (int, optional) – Limits number of iteration by LIBLINEAR, makes effect only if optimizer='liblinear', defaults to 100

  • eta (float, optional) – Step size (learning rate) for online optimizers, defaults to 1.0

  • epochs (int, optional) – Number of training epochs for online optimizers, defaults to 1

  • adagrad_eps (float, optional) – Defines starting step size for AdaGrad, defaults to 0.001

  • threads (int, optional) – Number of threads used for training and prediction, if 0 use number of available CPUs, if -1 use number of available CPUs - 1, defaults to 0

  • mem_limit (float) – Maximum amount of memory (in G) available for training, if 0 use amount of available memory, defaults to 0

  • verbose (bool, optional) – If True print progress, defaults to False

Methods

__init__(output[, hash, features_threshold, ...])

Construct a Binary Relevance model.

fit(X, Y)

Fit the model to the given training data.

fit_on_file(path)

Fit the model to the training data in the given file in multi-label svmlight/libsvm format.

get_params([deep])

Get parameters of this model.

load()

Load the model to RAM.

ofo(X, Y[, type, a, b, epochs])

Perform Online F-measure Optimization procedure on the given data to find optimal thresholds.

predict(X[, top_k, threshold, labels_weights])

Predict labels for data points in X.

predict_for_file(path[, top_k, threshold, ...])

Predict labels for data points in the given file in multi-label svmlight/libsvm format.

predict_proba(X[, top_k, threshold, ...])

Predict labels with probability estimates for data points in X.

predict_proba_for_file(path[, top_k, ...])

Predict labels with probability estimates for data points in the given file in multi-label svmlight/libsvm format.

set_params(**params)

Set parameters for this model.

unload()

Unload the model from RAM.

fit(X, Y)

Fit the model to the given training data.

Parameters:
  • X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Training data points as a matrix or list of lists of int or tuples of int and float (feature id, value).

  • Y (csr_matrix|ndarray|list[list[int]|tuple[int]], list[list[tuple[int, float]], list[int]) – Target labels as a matrix or lists or tuples of ints (multi-label data) or list of ints (multi-class data).

fit_on_file(path)

Fit the model to the training data in the given file in multi-label svmlight/libsvm format.

Parameters:

path (str) – Path to the file.

get_params(deep=False)

Get parameters of this model.

Parameters:

deep – Ignored, added for Scikit-learn compatibility, defaults to False

Returns:

Mapping of string to any

Return type:

dict

load()

Load the model to RAM.

ofo(X, Y, type='micro', a=10, b=20, epochs=1)

Perform Online F-measure Optimization procedure on the given data to find optimal thresholds.

Parameters:
  • X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).

  • Y (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]], list[int]) – Target labels as a matrix or lists or tuples of ints (multi-label data) or list of ints (multi-class data).

  • type (str) – Type of OFO procedure {'micro', 'macro'}, default to 'micro'

  • a (int) – Parameter of OFO procedure, defaults to 10

  • b (int) – Parameter of OFO procedure, defaults to 20

  • epochs (int, optional) – Number of OFO epochs, defaults to 1

Returns:

Single threshold in case of type='micro' and list of thresholds in case of type='macro'

Return type:

float, list[float]

predict(X, top_k=0, threshold=0, labels_weights=None)

Predict labels for data points in X.

Parameters:
  • X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).

  • top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0

  • threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0

  • labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None

Returns:

List of lists with predicted labels.

Return type:

list[list[int]]

predict_for_file(path, top_k=0, threshold=0, labels_weights=None)

Predict labels for data points in the given file in multi-label svmlight/libsvm format.

Parameters:
  • path (str) – Path to the file

  • top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0

  • threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0

  • labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None

Returns:

List of lists with predicted labels.

Return type:

list[list[int]]

predict_proba(X, top_k=0, threshold=0, labels_weights=None)

Predict labels with probability estimates for data points in X.

Parameters:
  • X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).

  • top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0

  • threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0

  • labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None

Returns:

List of list of tuples (label id, probability) with predicted labels

Return type:

list[list[tuple[int, float]]

predict_proba_for_file(path, top_k=0, threshold=0, labels_weights=None)

Predict labels with probability estimates for data points in the given file in multi-label svmlight/libsvm format.

Parameters:
  • path (str) – Path to the file.

  • top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0

  • threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0

  • labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None

Returns:

List of list of tuples (label id, probability) with predicted labels

Return type:

list[list[tuple[int, float]]

set_params(**params)

Set parameters for this model. Should be used only if you know what are you doing.

Param:

**params: Parameter names with their new values.

Returns:

self

Return type:

Model

unload()

Unload the model from RAM.