Analyzing the results
Last updated
Last updated
We perform stepwise regression to generate the best model with only the most necessary predictors in R:
Here, alcohol consumption has a lesser effect on the results than the other factors, so that can also be omitted. This could be due to the nature of the data in the UK Biobank for this column.
Similarly, smoking seems to have had predominantly "Never" in its entries, making it look like NOT smoking significantly affects the results. So when tied together with other factors, it looks like not smoking could increase the risk of colorectal cancer, which is a strange statement.
We use "incremental r-squared" to compute the difference in R2 values for the model with and without PRS. R-squared can only be obtained for a linear model, so we perform linear regression. We already have R2 for the clinical model.
R2 for integrated model
0.03116385
0.04671455
Incremental R2
0.01612564
0.03167634
The positive values say that PRS has indeed added information and increased the clinical model's accuracy instead of worsening it.
The net reclassification index (NRI) is a measure used to assess the improvement in risk prediction models when new predictors are added. It quantifies the net correct reclassification of individuals into higher or lower-risk categories based on adding new predictors. To calculate the NRI, you must compare the risk prediction model with and without the PRS variable.
NRI = P(up|event) − P(down|event) + P(down|nonevent)−P(up|nonevent)
“Up” means that the new risk model places a person into a higher risk category than the old model. Similarly, “down” means the new model places a person into a lower-risk category.
In other words, if you create a confusion matrix
NRI = P(true positive) + P(false negative) - [(P(false positive) + P(true negative)]
The NRI ranges from -2 to +2, with a value of 0 indicating no improvement or change in risk prediction.
The NRI value obtained is 0.853516. It suggests a substantial positive improvement in the model's ability to correctly classify individuals into their respective risk categories upon including PRS. This value implies that, on average, there is a considerable increase in the proportion of correctly reclassified individuals into higher or lower-risk categories with the inclusion of PRS as a new predictor. The higher the NRI value, the stronger the improvement in risk prediction.
The area under the curve is computed for the model with and without PRS, and the difference between the two values is obtained. We already have AUC for the clinical model.
AUC for integrated model
0.6028
0.6262
deltaAUC
0.03202171
0.05543415
Just as the previous two metrics, deltaAUC also tells us that the model's accuracy has increased thanks to the addition of PRS as one of the variables.
This answers our research questions and tells us that the addition of polygenic risk scores has indeed increased the accuracy of predicting the risk of colorectal cancer.