napkinxc.datasets.load_json_lines_file

napkinxc.datasets.load_json_lines_file(file, features_fields=['title', 'content'], labels_field='target_ind', gzip_file=None)[source]

Load data in the JSON lines format into list of features and list of labels.

Parameters:
  • file (str) – Path to a JSON lines file to load

  • features_fields (list[str], optional) – list of fields of JSON line that contain features, fields will be concatenated in the specified order, defaults to [‘title’, ‘content’]

  • labels_field (str, optional) – field name that contains labels, defaults to 'target_ind'

  • gzip_file (bool, optional) – If True, read file as gzip file, if None, decide based on file extension, defaults to None

Returns:

Raw text of documents and labels

Return type:

(list[str], list[list[int|str]])