Clinical Trial Optimization

Optimize clinical trials based on the Mayo Clinic dataset

Hosted by

ingenii

Mayo clinic dataset

Primary biliary cholangitis, an autoimmune condition, results in the gradual deterioration of the small bile ducts within the liver. Despite its slow progression, it inevitably culminates in cirrhosis and liver decompensation.

In this instance, the data originates from the Mayo Clinic trial carried out between 1974 and 1984, involving 312 participants in a randomized trial. Upon conducting a basic logistic regression analysis to forecast treatment outcomes, we identified three statistically significant covariates exhibiting the most substantial impact:

$w_1$ : Age of the patient
$w_2$ : Alkaline Phosphatase in U/liter
$w_3$ : Prothrombin time in seconds

The covariates need to be normalized to have zero sample mean and unit sample variance. In this use case, we will work with a smaller version of the dataset, containing only

N=100

samples.

Solution

The solution is to be given by two binary arrays group1 and group2, each of size

n=100

. For a patient

i

, group1[i]=1 and group2[i]=0 indicates that patient

i

is assigned to group 1. If a patient is assigned to both groups (group1[i]=1, group0[i]=1) or to none (group1[i]=0, group0[i]=0) , the solution will be marked as unfeasible.

Toy example

Let's look at a simple example. Consider 6 patients, with the following covariates:

Patient 1: $w_1 = 58$ , $w_2= 1718$ and $w_3 = 12.2$
Patient 2: $w_1 = 56$ , $w_2= 7394.8$ and $w_3 = 10.6$
Patient 3: $w_1 = 70$ , $w_2= 516$ and $w_3 = 12$
Patient 4: $w_1 = 55$ , $w_2= 6121$ and $w_3 = 10.3$
Patient 5: $w_1 = 38$ , $w_2= 671$ and $w_3 = 10.9$
Patient 6: $w_1 = 66$ , $w_2= 944$ and $w_3 = 11$

The solution arrays for this toy problem is the following:

group1 = $[1, 0, 0, 1, 0, 1]$
group2 = $[0, 1, 1, 0, 1, 0]$

The discrepancy value for this solution to our toy problem is

\approx 1.485

Template

You can download a solution template by running the following command:

aqora template ingenii-clinical-trial
cd ingenii-clinical-trial
aqora install

You can then fill in submission/solution.ipynb with your solution. You can test your solution by running

aqora test

And finally you can upload your solution by running

aqora upload

Scoring

Each solution is validated and scored according to the following script

class ClinicalTrial:
    rho: float  # Relative importance between first and second moments
    w: np.ndarray  # Normalized patient covariates

    def assert_valid(self, group1: np.ndarray, group2: np.ndarray) -> None:
        """
        Checks if the patient constraints are met.

        Arguments (where n is the number of patients):
            - group1: np.ndarray(size = n) => Binary array of patients belonging to group 1
            - group2: np.ndarray(size = n) => Binary array of patients belonging to group 2
        Throws an AssertionError if the constraints are not met.
        """
        group_size = int(self.w.shape[0] / 2)
        # constraint 1: number of people in each group
        assert (
            np.sum(group1) == group_size
        ), f"Each group should have {group_size} patients"
        # contraint 2: every patient is in one group
        assert (
            group1 + group2 == 1
        ).all(), "Every patient needs to be assigned to one group"

    def discrepancy(self, group1: np.ndarray, group2: np.ndarray) -> float:
        """
        Calculates discrepancy between patient groups.

        Arguments (where n is the number of patients):
            - group1: np.ndarray(size = n) => Binary array of patients belonging to group 1
            - group2: np.ndarray(size = n) => Binary array of patients belonging to group 2
        Returns:
            - float => Value of discrepancy measure for group1 and group2
        """
        # Check that all the constraints are being met
        self.assert_valid(group1, group2)

        # Order of the groups is arbitrary
        if group1[0] == 0:
            group1, group2 = group2, group1

        n, r = self.w.shape

        # Calculate mean values for each covariate
        Mu = []
        for i in range(r):
            Mu.append(self.w[:, i].dot(group1 - group2) / n)

        # Calculate second moments (variance and covariance)
        Var_ii = []  # variance
        Var_ij = []  # covariance

        for i in range(r):
            for j in range(i, r):
                if i == j:
                    Var_ii.append((self.w[:, i] ** 2).dot(group1 - group2) / n)
                else:
                    Var_ij.append(
                        (self.w[:, i] * self.w[:, j]).dot(group1 - group2) / n
                    )

        # Calculate final discrepancy
        discrepancy = (
            np.sum(np.abs(Mu))
            + self.rho * np.sum(np.abs(Var_ii))
            + 2 * self.rho * np.sum(np.abs(Var_ij))
        )
        return discrepancy