napkinxc.datasets.load_libsvm_file¶
- napkinxc.datasets.load_libsvm_file(file, labels_format='list', sort_indices=False)[source]¶
Load data in the libsvm format into sparse CSR matrix. The format is text-based. Each line contains an instance and is ended by a
\n
character.<label>,<label>,... <feature>(:<value>) <feature>(:<value>) ...
<label>
and<feature>
are indexes that should be positive integers. This method supports less-strict versions of the format. Labels and features do not have to be sorted in ascending order. The:<value>
can be omitted after<feature>
, to assume value = 1. It automatically detects header used in format of datasets from The Extreme Classification Repository,- Parameters:
file (str) – Path to a file to load
labels_format (str) – Format in which load the labels data (
'list'
or'csr_matrix'
), defaults to csr_matrixsort_indices (bool) – If True, sort indices, otherwise keep original order, defaults to True
- Returns:
Features and labels data
- Return type:
(csr_matrix, list[list[int]]) or (csr_matrix, csr_matrix)