napkinxc.datasets.load_libsvm_file

napkinxc.datasets.load_libsvm_file(file, labels_format='list', sort_indices=False)[source]

Load data in the libsvm format into sparse CSR matrix. The format is text-based. Each line contains an instance and is ended by a \n character.

<label>,<label>,... <feature>(:<value>) <feature>(:<value>) ...

<label> and <feature> are indexes that should be positive integers. This method supports less-strict versions of the format. Labels and features do not have to be sorted in ascending order. The :<value> can be omitted after <feature>, to assume value = 1. It automatically detects header used in format of datasets from The Extreme Classification Repository,

Parameters:
  • file (str) – Path to a file to load

  • labels_format (str) – Format in which load the labels data ('list' or 'csr_matrix'), defaults to csr_matrix

  • sort_indices (bool) – If True, sort indices, otherwise keep original order, defaults to True

Returns:

Features and labels data

Return type:

(csr_matrix, list[list[int]]) or (csr_matrix, csr_matrix)