Workflow module
XiCorrelation
Class containing Xi Correlation computation components
__init__(x, y=None)
If only x
is passed, computes correlation between each column of x
.
If y
is also passed, computes correlation between each column of x
vs each column of y
.
If only x
is passed, x
MUST be 2-d. Otherwise, both x
and y
can be 1-d
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
y |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If x and y are not of the same shape. |
ValueError
|
If there's less than 2 columns to compute correlation. |
compute_xi(get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)
Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.
Xi Coefficient based on
Modified Xi Coefficient based on
Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"
The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
get_modified_xi |
bool
|
Should the modified xi be computed? Defaults to True when there are no ties, and False when ties are present. |
None
|
m_nearest_neighbours |
int
|
Only used when get_modified_xi is True. Defaults to square-root of array size. |
None
|
get_p_values |
bool
|
Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0). |
False
|
Returns:
Type | Description |
---|---|
Union[_RetType, Tuple[_RetType, _RetType]]
|
float/np.ndarray/pd.DataFrame: |
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
compute_xi_correlation(x, y=None, get_modified_xi=None, m_nearest_neighbours=None, get_p_values=False)
Helper function to compute the Xi Coefficient - uses the class machinery from XiCorrelation
.
Compute the Xi Coefficient (Chatterjee's Rank Correlation) between columns in X and Y.
Xi Coefficient based on
Modified Xi Coefficient based on
Lin and Han (2021). "On boosting the power of Chatterjee's rank correlation"
The modified Xi Coefficient looks at M nearest neighbours to compute the correlation. This allows the coefficient to converge much faster. However, it is computationally slightly more intensive. For very large data, the two are likely to be very similar. We recommend using the modified Xi Coefficient.
If only X is passed, computes correlation between each column of X. If Y is also passed, computes correlation between each column of X vs each column of Y.
If only X is passed, X MUST be 2-d. Otherwise, both X and Y can be 1-d
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
required |
y |
npt.ArrayLike
|
A single list or list of lists or 1D/2D numpy array or pd.Series or pd.DataFrame. |
None
|
get_modified_xi |
bool
|
Should the modified xi be computed? By default this is True when there are no ties and False when ties are present |
None
|
m_nearest_neighbours |
int
|
Only used if get_modified_xi is True. |
None
|
get_p_values |
bool
|
Should the p-values be computed? The null hypothesis is that Y is completely independent of X (i.e., xi = 0). |
False
|
Returns:
Type | Description |
---|---|
Union[_RetType, Tuple[_RetType, _RetType]]
|
float/np.ndarray/pd.DataFrame: |
Union[_RetType, Tuple[_RetType, _RetType]]
|
|
Union[_RetType, Tuple[_RetType, _RetType]]
|
|