Skip to content

Workflow module

XiCorrelation

Class containing Xi Correlation computation components

__init__(x, y=None)

If only x is passed, computes correlation between each column of x. If y is also passed, computes correlation between each column of x vs each column of y.

If only x is passed, x MUST be 2-d. Otherwise, both x and y can be 1-d

Parameters:

Name Type Description Default
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required
y npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None

Raises:

Type Description
ValueError

If x and y are not of the same shape.

ValueError

If there's less than 2 columns to compute correlation.

compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)

Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.

Xi Coefficient based on

Chatterjee (2020). "A new coefficient of correlation"

Modified Xi Coefficient based on

Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"

The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.

Parameters:

Name Type Description Default
get_modified_xi bool

Should the modified xi be computed? Defaults to True when there are no ties, and False when ties are present.

None
m_nearest_neighbours int

Only used when get_modified_xi is True. Defaults to square-root of array size.

None
get_p_values bool

Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0).

False

Returns:

Type Description
Union[_RetType, Tuple[_RetType, _RetType]]

float/np.ndarray/pd.DataFrame:

Union[_RetType, Tuple[_RetType, _RetType]]
  • Xi Coefficient Values.
  • If both X and Y are 1-d, returns a single float.
  • If X is numpy object, returns a 2-D numpy array.
  • Otherwise returns a pd.DataFrame.
Union[_RetType, Tuple[_RetType, _RetType]]
  • P-Values (only when get_p_values are true):
  • Same format at Xi

compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)

Helper function to compute the Xi Coefficient - uses the class machinery from XiCorrelation.

Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.

Xi Coefficient based on

Chatterjee (2020). "A new coefficient of correlation"

Modified Xi Coefficient based on

Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"

The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.

If only X is passed, computes correlation between each column of X. If Y is also passed, computes correlation between each column of X vs each column of Y.

If only X is passed, X MUST be 2-d. Otherwise, both X and Y can be 1-d

Parameters:

Name Type Description Default
x npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

required
y npt.ArrayLike

A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.

None
get_modified_xi bool

Should the modified xi be computed? By default this is True when there are no ties and False when ties are present

None
m_nearest_neighbours int

Only used if get_modified_xi is True.

None
get_p_values bool

Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0).

False

Returns:

Type Description
Union[_RetType, Tuple[_RetType, _RetType]]

float/np.ndarray/pd.DataFrame:

Union[_RetType, Tuple[_RetType, _RetType]]
  • Xi Coefficient Values.
  • If both X and Y are 1-d, returns a single float.
  • If X is numpy object, returns a 2-D numpy array.
  • Otherwise returns a pd.DataFrame.
Union[_RetType, Tuple[_RetType, _RetType]]
  • P-Values (only if get_p_values are true):
  • Same format at Xi