Your privacy, your choice

We use essential cookies to make sure the site can function. We also use optional cookies for advertising, personalisation of content, usage analysis, and social media.

By accepting optional cookies, you consent to the processing of your personal data - including transfers to third parties. Some third parties are outside of the European Economic Area, with varying standards of data protection.

See our privacy policy for more information on the use of your personal data.

for further information and to change your choices.

Skip to main content
Fig. 2 | BMC Genomic Data

Fig. 2

From: Overestimated prediction using polygenic prediction derived from summary statistics

Fig. 2

PRS performance comparisons for Alzheimer’s disease. ΔAUC and ΔR2 denote the additive gain from introducing PRS term to Model II (refer to Materials and Methods for details). For convenience, we abbreviate the discovery and test sets as D and T, respectively. (A) AD prediction performances with and without subject overlap (D: ADSP, T: AMP-AD). All metrics of overlapping subjects are overestimated, growing in an increasing number of SNPs. (B) sPRS (D: IGAP, T: ADSP) is compared to rPRS (D: ADSP, T: ADSP). (C) AMP-AD data is another T for rPRS (D: ADSP) and sPRS (D: IGAP). D and T of ADSP data are derived from tenfold cross-validation. In both (B) and (C), sPRS performances are significantly higher than rPRS, and we suspect that some participants of IGAP are identical to a subset of ADSP or AMP-AD. (D) A simulated study is conducted with rPRS (D: ADSP, T: AMP-AD), in which a subset of D replaces a growing number of subjects in T (see Results for details). The number of SNPs in the x-axis denotes number of the LD pruned SNPs selected in the order from the lowest P-value thresholds. That is, the lower number of SNP in the left side means the stricter P value threshold and the right-most side is the most generous P value threshold (P < 0.5)

Back to article page