napkinxc.datasets.download_dataset¶
- napkinxc.datasets.download_dataset(dataset, subset='train', format='bow', root='./data', verbose=False)[source]¶
Downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- Parameters:
dataset (str) –
Name of the dataset to load, case insensitive, available datasets:
'Eurlex-4K'('bow'format only),'Eurlex-4.3K'('bow'format only),'AmazonCat-13K','AmazonCat-14K','Wiki10-31K'(alias:'Wiki10','bow'format only),'DeliciousLarge-200K'(alias:'DeliciousLarge','bow'format only)'WikiLSHTC-325K'(alias:'WikiLSHTC','bow'format only)'WikiSeeAlsoTitles-350K','WikiTitles-500K','WikipediaLarge-500K'(alias:'WikipediaLarge'),'AmazonTitles-670K','Amazon-670K','AmazonTitles-3M','Amazon-3M','LF-AmazonTitles-131K'(for now'bow'format only),'LF-Amazon-131K'(for now'bow'format only),'LF-WikiSeeAlsoTitles-320K'(for now'bow'format only),'LF-WikiSeeAlso-320K'(for now'bow'format only),'LF-WikiTitles-500K'(for now'bow'format only),'LF-AmazonTitles-1.3M'(for now'bow'format only).
subset (str, optional) – Subset of dataset to download {
'train','test','validation'}, defaults to'train'format (str, optional) – Format of dataset to load {
'bow'(bag-of-words/tf-idf weights, alias'tf-idf'),'raw'(raw text)}, defaults to'bow'root (str, optional) – Location of datasets directory, defaults to
'./data'verbose (bool, optional) – If True print downloading and loading progress, defaults to False