Workflow module

`XiCorrelation`

Class containing Xi Correlation computation components

`init(x, y=None)`

If only x is passed, computes correlation between each column of x. If y is also passed, computes correlation between each column of x vs each column of y.

If only x is passed, x MUST be 2-d. Otherwise, both x and y can be 1-d

Parameters:

Name	Type	Description	Default
`x`	`npt.ArrayLike`	A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.	required
`y`	`npt.ArrayLike`	A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.	`None`

Raises:

Type	Description
`ValueError`	If x and y are not of the same shape.
`ValueError`	If there's less than 2 columns to compute correlation.

`compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)`

Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.

Xi Coefficient based on

Chatterjee (2020). "A new coefficient of correlation"

Modified Xi Coefficient based on

Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"

The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.

Parameters:

Name	Type	Description	Default
`get_modified_xi`	`bool`	Should the modified xi be computed? Defaults to True when there are no ties, and False when ties are present.	`None`
`m_nearest_neighbours`	`int`	Only used when get_modified_xi is True. Defaults to square-root of array size.	`None`
`get_p_values`	`bool`	Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0).	`False`

Returns:

Type	Description
`Union[_RetType, Tuple[_RetType, _RetType]]`	float/np.ndarray/pd.DataFrame:
`Union[_RetType, Tuple[_RetType, _RetType]]`	Xi Coefficient Values. If both X and Y are 1-d, returns a single float. If X is numpy object, returns a 2-D numpy array. Otherwise returns a pd.DataFrame.
`Union[_RetType, Tuple[_RetType, _RetType]]`	P-Values (only when get_p_values are true): Same format at Xi

`compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)`

Helper function to compute the Xi Coefficient - uses the class machinery from XiCorrelation.

Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.

Xi Coefficient based on

Chatterjee (2020). "A new coefficient of correlation"

Modified Xi Coefficient based on

Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"

The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.

If only X is passed, computes correlation between each column of X. If Y is also passed, computes correlation between each column of X vs each column of Y.

If only X is passed, X MUST be 2-d. Otherwise, both X and Y can be 1-d

Parameters:

Name	Type	Description	Default
`x`	`npt.ArrayLike`	A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.	required
`y`	`npt.ArrayLike`	A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame.	`None`
`get_modified_xi`	`bool`	Should the modified xi be computed? By default this is True when there are no ties and False when ties are present	`None`
`m_nearest_neighbours`	`int`	Only used if get_modified_xi is True.	`None`
`get_p_values`	`bool`	Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0).	`False`

Returns:

Type	Description
`Union[_RetType, Tuple[_RetType, _RetType]]`	float/np.ndarray/pd.DataFrame:
`Union[_RetType, Tuple[_RetType, _RetType]]`	Xi Coefficient Values. If both X and Y are 1-d, returns a single float. If X is numpy object, returns a 2-D numpy array. Otherwise returns a pd.DataFrame.
`Union[_RetType, Tuple[_RetType, _RetType]]`	P-Values (only if get_p_values are true): Same format at Xi

Workflow module

XiCorrelation

__init__(x, y=None)

compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)

compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)

`XiCorrelation`

`init(x, y=None)`

`compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)`

`compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)`