napkinxc.datasets.download_dataset¶
- napkinxc.datasets.download_dataset(dataset, subset='train', format='bow', root='./data', verbose=False)[source]¶
Downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.
- Parameters:
dataset (str) –
Name of the dataset to load, case insensitive, available datasets:
'Eurlex-4K'
('bow'
format only),'Eurlex-4.3K'
('bow'
format only),'AmazonCat-13K'
,'AmazonCat-14K'
,'Wiki10-31K'
(alias:'Wiki10'
,'bow'
format only),'DeliciousLarge-200K'
(alias:'DeliciousLarge'
,'bow'
format only)'WikiLSHTC-325K'
(alias:'WikiLSHTC'
,'bow'
format only)'WikiSeeAlsoTitles-350K'
,'WikiTitles-500K'
,'WikipediaLarge-500K'
(alias:'WikipediaLarge'
),'AmazonTitles-670K'
,'Amazon-670K'
,'AmazonTitles-3M'
,'Amazon-3M'
,'LF-AmazonTitles-131K'
(for now'bow'
format only),'LF-Amazon-131K'
(for now'bow'
format only),'LF-WikiSeeAlsoTitles-320K'
(for now'bow'
format only),'LF-WikiSeeAlso-320K'
(for now'bow'
format only),'LF-WikiTitles-500K'
(for now'bow'
format only),'LF-AmazonTitles-1.3M'
(for now'bow'
format only).
subset (str, optional) – Subset of dataset to download {
'train'
,'test'
,'validation'
}, defaults to'train'
format (str, optional) – Format of dataset to load {
'bow'
(bag-of-words/tf-idf weights, alias'tf-idf'
),'raw'
(raw text)}, defaults to'bow'
root (str, optional) – Location of datasets directory, defaults to
'./data'
verbose (bool, optional) – If True print downloading and loading progress, defaults to False