napkinxc.models.BR¶
- class napkinxc.models.BR(output, hash=None, features_threshold=0, norm=True, bias=1.0, optimizer='liblinear', loss='log', weights_threshold=0.1, liblinear_c=10, liblinear_eps=0.1, liblinear_solver=None, liblinear_max_iter=100, eta=1.0, epochs=1, adagrad_eps=0.001, load_as='map', threads=0, mem_limit=0, verbose=0, **kwargs)[source]¶
Bases:
Model
Binary Relevance (multi-label) classifier with linear estimators, using CPP core
- __init__(output, hash=None, features_threshold=0, norm=True, bias=1.0, optimizer='liblinear', loss='log', weights_threshold=0.1, liblinear_c=10, liblinear_eps=0.1, liblinear_solver=None, liblinear_max_iter=100, eta=1.0, epochs=1, adagrad_eps=0.001, load_as='map', threads=0, mem_limit=0, verbose=0, **kwargs)[source]¶
Construct a Binary Relevance model.
- Parameters:
output (str) – Directory where the model will be stored
hash (int, optional) – Hash features to a space of given size, value of this argument is saved with model weights, if None or 0 disable hashing, defaults to None
features_threshold (float, optional) – Prune features below given threshold, value of this argument is saved with model weights, defaults to 0
norm (bool, optional) – Unit norm feature vector, value of this argument is saved with model weights, defaults to True
bias (float, optional) – Value of the bias features, value of this argument is saved with model weights, defaults to 1.0
optimizer (str, optional) – Optimizer used for training node classifiers {
'liblinear'
,'sgd'
,'adagrad'
}, defaults to'liblinear'
loss (str, optional) – Loss optimized while training node classifiers {
'log'
(alias'logistic'
),'l2'
(alias'squaredHinge'
)}, defaults to'log'
weights_threshold (float, optional) – Threshold value for pruning weights, defaults to 0.1
liblinear_c (float, optional) – LIBLINEAR cost co-efficient, inverse regularization strength, smaller values specify stronger regularization, makes effect only if
optimizer='liblinear'
, defaults to 10.0liblinear_eps (float, optional) – LIBLINEAR tolerance of termination criterion, makes effect only if
optimizer='liblinear'
, defaults to 0.1liblinear_solver (str, optional) –
Override LIBLINEAR solver set by loss parameter (default for
loss='log'
:'L2R_LR_DUAL'
, forloss='l2'
:'L2R_L2LOSS_SVC_DUAL'
), makes effect only ifoptimizer='liblinear'
. Available solvers:'L2R_LR_DUAL'
'L2R_LR'
'L1R_LR'
'L2R_L2LOSS_SVC_DUAL'
'L2R_L2LOSS_SVC'
'L2R_L1LOSS_SVC_DUAL'
'L1R_L2LOSS_SVC'
L2R_LR_DUAL
andL2R_L2LOSS_SVC_DUAL
usually work the best in XC setting, defaults to Noneliblinear_max_iter (int, optional) – Limits number of iteration by LIBLINEAR, makes effect only if
optimizer='liblinear'
, defaults to 100eta (float, optional) – Step size (learning rate) for online optimizers, defaults to 1.0
epochs (int, optional) – Number of training epochs for online optimizers, defaults to 1
adagrad_eps (float, optional) – Defines starting step size for AdaGrad, defaults to 0.001
threads (int, optional) – Number of threads used for training and prediction, if 0 use number of available CPUs, if -1 use number of available CPUs - 1, defaults to 0
mem_limit (float) – Maximum amount of memory (in G) available for training, if 0 use amount of available memory, defaults to 0
verbose (bool, optional) – If True print progress, defaults to False
Methods
__init__
(output[, hash, features_threshold, ...])Construct a Binary Relevance model.
fit
(X, Y)Fit the model to the given training data.
fit_on_file
(path)Fit the model to the training data in the given file in multi-label svmlight/libsvm format.
get_params
([deep])Get parameters of this model.
load
()Load the model to RAM.
ofo
(X, Y[, type, a, b, epochs])Perform Online F-measure Optimization procedure on the given data to find optimal thresholds.
predict
(X[, top_k, threshold, labels_weights])Predict labels for data points in X.
predict_for_file
(path[, top_k, threshold, ...])Predict labels for data points in the given file in multi-label svmlight/libsvm format.
predict_proba
(X[, top_k, threshold, ...])Predict labels with probability estimates for data points in X.
predict_proba_for_file
(path[, top_k, ...])Predict labels with probability estimates for data points in the given file in multi-label svmlight/libsvm format.
set_params
(**params)Set parameters for this model.
unload
()Unload the model from RAM.
- fit(X, Y)¶
Fit the model to the given training data.
- Parameters:
X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Training data points as a matrix or list of lists of int or tuples of int and float (feature id, value).
Y (csr_matrix|ndarray|list[list[int]|tuple[int]], list[list[tuple[int, float]], list[int]) – Target labels as a matrix or lists or tuples of ints (multi-label data) or list of ints (multi-class data).
- fit_on_file(path)¶
Fit the model to the training data in the given file in multi-label svmlight/libsvm format.
- Parameters:
path (str) – Path to the file.
- get_params(deep=False)¶
Get parameters of this model.
- Parameters:
deep – Ignored, added for Scikit-learn compatibility, defaults to False
- Returns:
Mapping of string to any
- Return type:
dict
- load()¶
Load the model to RAM.
- ofo(X, Y, type='micro', a=10, b=20, epochs=1)¶
Perform Online F-measure Optimization procedure on the given data to find optimal thresholds.
- Parameters:
X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).
Y (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]], list[int]) – Target labels as a matrix or lists or tuples of ints (multi-label data) or list of ints (multi-class data).
type (str) – Type of OFO procedure {
'micro'
,'macro'
}, default to'micro'
a (int) – Parameter of OFO procedure, defaults to 10
b (int) – Parameter of OFO procedure, defaults to 20
epochs (int, optional) – Number of OFO epochs, defaults to 1
- Returns:
Single threshold in case of
type='micro'
and list of thresholds in case oftype='macro'
- Return type:
float, list[float]
- predict(X, top_k=0, threshold=0, labels_weights=None)¶
Predict labels for data points in X.
- Parameters:
X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).
top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0
threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0
labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None
- Returns:
List of lists with predicted labels.
- Return type:
list[list[int]]
- predict_for_file(path, top_k=0, threshold=0, labels_weights=None)¶
Predict labels for data points in the given file in multi-label svmlight/libsvm format.
- Parameters:
path (str) – Path to the file
top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0
threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0
labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None
- Returns:
List of lists with predicted labels.
- Return type:
list[list[int]]
- predict_proba(X, top_k=0, threshold=0, labels_weights=None)¶
Predict labels with probability estimates for data points in X.
- Parameters:
X (csr_matrix, ndarray, list[list[int]|tuple[int]], list[list[tuple[int, float]]) – Data points as a matrix or list of lists of int or tuples of int and float (feature id, value).
top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0
threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0
labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None
- Returns:
List of list of tuples (label id, probability) with predicted labels
- Return type:
list[list[tuple[int, float]]
- predict_proba_for_file(path, top_k=0, threshold=0, labels_weights=None)¶
Predict labels with probability estimates for data points in the given file in multi-label svmlight/libsvm format.
- Parameters:
path (str) – Path to the file.
top_k (int) – Predict top-k labels, if 0, the option is ignored, defaults to 0
threshold (float, list[float], ndarray, optional) – Predict labels with probability above the threshold in case of single value or above the specific threshold for each label in case of list or array of values, if 0, the option is ignored, defaults to 0
labels_weights (list[float], ndarray, optional) – Predict labels according to their weights multiplied by probability if None, the option is ignored, defaults to None
- Returns:
List of list of tuples (label id, probability) with predicted labels
- Return type:
list[list[tuple[int, float]]
- set_params(**params)¶
Set parameters for this model. Should be used only if you know what are you doing.
- Param:
**params: Parameter names with their new values.
- Returns:
self
- Return type:
Model
- unload()¶
Unload the model from RAM.