napkinxc.measures.Jain_et_al_propensity¶

napkinxc.measures.Jain_et_al_propensity(Y, A=0.55, B=1.5)[source]¶

Calculate propensity as proposed in Jain et al. 2016. Propensity \(p_l\) of label \(l\) is calculated as:

\[C = (\log N - 1)(B + 1)^A \,, \ p_l = \frac{1}{1 + C(N_l + B)^{-A}} \,,\]

where \(N\) is total number of data points, \(N_j\) is total number of data points for and \(A\) and \(B\) are dataset specific parameters.

Parameters:

Y (ndarray, csr_matrix, list[list[int]]) – Labels (typically ground truth for train data) provided as a matrix with non-zero values for relevant labels.
A (float, optional) –
Dataset specific parameter, typical values:
- 0.5: WikiLSHTC-325K and WikipediaLarge-500K
- 0.6: Amazon-670K and Amazon-3M
- 0.55: otherwise
Defaults to 0.55
B (float, optional) –
Dataset specific parameter, typical values:
- 0.4: WikiLSHTC-325K and WikipediaLarge-500K
- 2.6: Amazon-670K and Amazon-3M
- 1.5: otherwise
Defaults to 1.5

Returns:

Array with the propensity for all labels

Return type:

ndarray