napkinxc.measures.Jain_et_al_propensity

napkinxc.measures.Jain_et_al_propensity(Y, A=0.55, B=1.5)[source]

Calculate propensity as proposed in Jain et al. 2016. Propensity \(p_l\) of label \(l\) is calculated as:

\[C = (\log N - 1)(B + 1)^A \,, \ p_l = \frac{1}{1 + C(N_l + B)^{-A}} \,,\]

where \(N\) is total number of data points, \(N_j\) is total number of data points for and \(A\) and \(B\) are dataset specific parameters.

Parameters:
  • Y (ndarray, csr_matrix, list[list[int]]) – Labels (typically ground truth for train data) provided as a matrix with non-zero values for relevant labels.

  • A (float, optional) –

    Dataset specific parameter, typical values:

    • 0.5: WikiLSHTC-325K and WikipediaLarge-500K

    • 0.6: Amazon-670K and Amazon-3M

    • 0.55: otherwise

    Defaults to 0.55

  • B (float, optional) –

    Dataset specific parameter, typical values:

    • 0.4: WikiLSHTC-325K and WikipediaLarge-500K

    • 2.6: Amazon-670K and Amazon-3M

    • 1.5: otherwise

    Defaults to 1.5

Returns:

Array with the propensity for all labels

Return type:

ndarray