napkinxc.datasets.download_dataset

napkinxc.datasets.download_dataset(dataset, subset='train', format='bow', root='./data', verbose=False)[source]

Downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

Parameters:
  • dataset (str) –

    Name of the dataset to load, case insensitive, available datasets:

    • 'Eurlex-4K' ('bow' format only),

    • 'Eurlex-4.3K' ('bow' format only),

    • 'AmazonCat-13K',

    • 'AmazonCat-14K',

    • 'Wiki10-31K' (alias: 'Wiki10', 'bow' format only),

    • 'DeliciousLarge-200K' (alias: 'DeliciousLarge', 'bow' format only)

    • 'WikiLSHTC-325K' (alias: 'WikiLSHTC', 'bow' format only)

    • 'WikiSeeAlsoTitles-350K',

    • 'WikiTitles-500K',

    • 'WikipediaLarge-500K' (alias: 'WikipediaLarge'),

    • 'AmazonTitles-670K',

    • 'Amazon-670K',

    • 'AmazonTitles-3M',

    • 'Amazon-3M',

    • 'LF-AmazonTitles-131K' (for now 'bow' format only),

    • 'LF-Amazon-131K' (for now 'bow' format only),

    • 'LF-WikiSeeAlsoTitles-320K' (for now 'bow' format only),

    • 'LF-WikiSeeAlso-320K' (for now 'bow' format only),

    • 'LF-WikiTitles-500K' (for now 'bow' format only),

    • 'LF-AmazonTitles-1.3M' (for now 'bow' format only).

  • subset (str, optional) – Subset of dataset to download {'train', 'test', 'validation'}, defaults to 'train'

  • format (str, optional) – Format of dataset to load {'bow' (bag-of-words/tf-idf weights, alias 'tf-idf'), 'raw' (raw text)}, defaults to 'bow'

  • root (str, optional) – Location of datasets directory, defaults to './data'

  • verbose (bool, optional) – If True print downloading and loading progress, defaults to False