Type 2 diabetes (T2D) can be prevented in pre‐diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre‐diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population‐based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre‐diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P‐values ranging from 2.4 × 10−4 to 2.1 × 10−13. Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)‐Potsdam cohort. Using metabolite–protein network analysis, we identified seven T2D‐related genes that are associated with these three IGT‐specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D.
A targeted metabolomics approach was used to identify candidate biomarkers of pre‐diabetes. The relevance of the identified metabolites is further corroborated with a protein‐metabolite interaction network and gene expression data.
Three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine C2) were found with significantly altered levels in pre‐diabetic individuals compared with normal controls.
Lower levels of glycine and LPC (18:2) were found to predict risks for pre‐diabetes and type 2 diabetes (T2D).
Seven T2D‐related genes (PPARG, TCF7L2, HNF1A, GCK, IGF1, IRS1 and IDE) are functionally associated with the three identified metabolites.
The unique combination of methodologies, including prospective population‐based and nested case–control, as well as cross‐sectional studies, was essential for the identification of the reported biomarkers.
Type 2 diabetes (T2D) is defined by increased blood glucose levels due to pancreatic β‐cell dysfunction and insulin resistance without evidence for specific causes, such as autoimmune destruction of pancreatic β‐cells (Krebs et al, 2002; Stumvoll et al, 2005; Muoio and Newgard, 2008). A state of pre‐diabetes (i.e., impaired fasting glucose (IFG) and/or impaired glucose tolerance (IGT)) with only slightly elevated blood glucose levels may precede T2D for years (McGarry, 2002; Tabak et al, 2012). The development of diabetes in pre‐diabetic individuals can be prevented or delayed by dietary changes and increased physical activity (Tuomilehto et al, 2001; Knowler et al, 2002). However, no specific biomarkers that enable prevention have been reported.
Metabolomics studies allow metabolites involved in disease mechanisms to be discovered by monitoring metabolite level changes in predisposed individuals compared with healthy ones (Shaham et al, 2008; Newgard et al, 2009; Zhao et al, 2010; Pietilainen et al, 2011; Rhee et al, 2011; Wang et al, 2011; Cheng et al, 2012; Goek et al, 2012). Altered metabolite levels may serve as diagnostic biomarkers and enable preventive action. Previous cross‐sectional metabolomics studies of T2D were either based on small sample sizes (Shaham et al, 2008; Wopereis et al, 2009; Zhao et al, 2010; Pietilainen et al, 2011) or did not consider the influence of common risk factors of T2D (Newgard et al, 2009). Recently, based on prospective nested case–control studies with relative large samples (Rhee et al, 2011; Wang et al, 2011), five branched‐chain and aromatic amino acids were identified as predictors of T2D (Wang et al, 2011). Here, using various comprehensive large‐scale approaches, we measured metabolite concentration profiles (Yu et al, 2012) in the population‐based (Holle et al, 2005; Wichmann et al, 2005) Cooperative Health Research in the Region of Augsburg (KORA) baseline (survey 4 (S4)) and follow‐up (F4) studies (Rathmann et al, 2009; Meisinger et al, 2010; Jourdan et al, 2012). The results of these cross‐sectional and prospective studies allowed us to (i) reliably identify candidate biomarkers of pre‐diabetes and (ii) build metabolite–protein networks to understand diabetes‐related metabolic pathways.
Individuals with known T2D were identified by physician‐validated self‐reporting (Rathmann et al, 2010) and excluded from our analysis, to avoid potential influence from anti‐diabetic medication with non‐fasting participants and individuals with missing values (Figure 1A). Based on both fasting and 2‐h glucose values (i.e., 2 h post oral 75 g glucose load), individuals were defined according to the WHO diagnostic criteria to have normal glucose tolerance (NGT), isolated IFG (i‐IFG), IGT or newly diagnosed T2D (dT2D) (WHO, 1999; Rathmann et al, 2009; Meisinger et al, 2010; Supplementary Table S1). The sample sets include 91 dT2D patients and 1206 individuals with non‐T2D, including 866 participants with NGT, 102 with i‐IFG and 238 with IGT, in the cross‐sectional KORA S4 (Figure 1A; study characteristics are shown in Table I). Of the 1010 individuals in a fasting state who participated at baseline and follow‐up surveys (Figure 1B, study characteristics of the KORA F4 survey are shown in Supplementary Table S2), 876 of them were non‐diabetic at baseline. Out of these, about 10% developed T2D (i.e., 91 incident T2D) (Figure 1C). From the 641 individuals with NGT at baseline, 18% developed IGT (i.e., 118 incident IGT) 7 years later (Figure 1D). The study characteristics of the prospective KORA S4→F4 are shown in Table II.
We first screened for significantly differed metabolites concentration among four groups (dT2D, IGT, i‐IFG and NGT) for 140 metabolites with cross‐sectional studies in KORA S4, and for 131 metabolites in KORA F4. Three IGT‐specific metabolites were identified and further investigated in the prospective KORA S4→F4 cohort, to examine whether the baseline metabolite concentrations can predict incident IGT and T2D, and whether they are associated with glucose tolerance 7 years later. Our results are based on a prospective population‐based cohort, which differed from previous nested case–control study (Wang et al, 2011). We also performed analysis with same study design using our data. The obtained results provided clues to explain the differences between the two sets of biomarkers. The three metabolites were also replicated in an independent European Prospective Investigation into Cancer and Nutrition (EPIC)‐Potsdam cohort. Finally, the relevance of the identified metabolites was further investigated with our bioinformatical analysis of protein‐metabolite interaction networks and gene expression data.
Identification of novel pre‐diabetes metabolites distinct from known T2D risk indicators
To identify metabolites with altered concentrations between the individuals with NGT, i‐IFG, IGT and dT2D, we first examined five pairwise comparisons (i‐IFG, IGT and dT2D versus NGT, as well as dT2D versus either i‐IFG or IGT) in the cross‐sectional KORA S4. Based on multivariate logistic regression analysis, 26 metabolite concentrations differed significantly (P‐values<3.6 × 10−4) between two groups in at least one of the five comparisons (Figure 2A; odds ratios (ORs) and P‐values are shown in Table III). These associations were independent of age, sex, body mass index (BMI), physical activity, alcohol intake, smoking, systolic blood pressure (BP) and HDL cholesterol (model 1). As expected, the level of total hexose H1, which is mainly represented by glucose (Pearson's correlation coefficient value r between H1 and fasting glucose reached 0.85; Supplementary Table S3), was significantly different in all five comparisons. The significantly changed metabolite panel differed from NGT to i‐IFG or to IGT. Most of the significantly altered metabolite concentrations were found between individuals with dT2D and IGT as compared with NGT (Supplementary Table S4A).
To investigate whether HbA1c, fasting glucose and fasting insulin levels mediate the shown associations, these were added as covariates to the regression analysis (model 2) in addition to model 1 (Figure 2B). We observed that, under these conditions, no metabolite differed significantly when comparing individuals with dT2D to those with NGT, suggesting that these metabolites are associated with HbA1c, fasting glucose and fasting insulin levels (r values are shown in Supplementary Table S3). Only nine metabolite concentrations significantly differed between IGT and NGT individuals (Table III; Supplementary Table S4B). These metabolites therefore represent novel biomarker candidates, and are independent from the known risk indicators for T2D. The logistic regression analysis was based on each single metabolite, and some of these metabolites are expected to correlate with each other. To further assess the metabolites as a group, we employed two additional statistical methods (the non‐parametric random forest and the parametric stepwise selection) to identify unique and independent biomarker candidates. Out of the nine metabolites, five molecules (i.e., glycine, LPC (18:2), LPC (17:0), LPC (18:1) and C2) were select after random forest, and LPC (17:0) and LPC (18:1) were then removed after the stepwise selection. Thus, three molecules were found to contain independent information: glycine (adjusted OR=0.67 (0.54–0.81), P=8.6 × 10−5), LPC (18:2) (OR=0.58 (0.46–0.72), P=2.1 × 10−6) and acetylcarnitine C2 (OR=1.38 (1.16–1.64), P=2.4 × 10−4) (Figure 2C). Similar results were observed in the follow‐up KORA F4 study (Supplementary Figure S1). For instance, when 380 IGT individuals were compared with 2134 NGT participants, these three metabolites were also found to be highly significantly different (glycine, OR=0.64 (0.55–0.75), P=9.3 × 10−8; LPC (18:2), OR=0.47 (0.38–0.57), P=2.1 × 10−13; and C2, OR=1.33 (1.17–1.49), P=4.9 × 10−6) (Supplementary Table S5).
Predict risks of IGT and T2D
To investigate the predictive value for IGT and T2D of the three identified metabolites, we examined the associations between baseline metabolite concentrations and incident IGT and T2D using the prospective KORA S4→F4 cohort (Table II). We compared baseline metabolite concentrations in 118 incident IGT individuals with 471 NGT control individuals. We found that glycine and LPC (18:2), but not C2, were significantly different at the 5% level in both adjusted model 1 and model 2 (Table IV; Supplementary Table S6). Significant differences were additionally observed for glycine and LPC (18:2), but not for C2, at baseline concentrations between the 91 incident T2D individuals and 785 participants who remained diabetes free (non‐T2D). Each standard deviation (s.d.) increment of the combinations of the three metabolites was associated with a 33% decreased risk of future diabetes (OR=0.39 (0.21–0.71), P=0.0002). Individuals in the fourth quartile of the combined metabolite concentrations had a three‐fold lower chance of developing diabetes (OR=0.33 (0.21–0.52), P=1.8 × 10−5), compared with those whose serum levels were in the first quartile (i.e., combination of glycine, LPC (18:2) and C2), indicating a protective effect from higher concentrations of glycine and LPC (18:2) combined with a lower concentration of C2. With the full adjusted model 2, consistent results were obtained for LPC (18:2) but not for glycine (Supplementary Table S6). When the three metabolites were added to the fully adjusted model 2, the area under the receiver‐operating‐characteristic curves (AUC) increased 2.6% (P=0.015) and 1% (P=0.058) for IGT and T2D, respectively (Supplementary Figure S2; Supplementary Table S7). Thus, this provides an improved prediction of IGT and T2D as compared with T2D risk indicators.
Baseline metabolite concentrations correlate with future glucose tolerance
We next investigated the associations between baseline metabolite concentrations and follow‐up 2‐h glucose values after an oral glucose tolerance test. Consistent results were observed for the three metabolites: glycine and LPC (18:2), but not acetylcarnitine C2 levels, were found to be significantly associated, indicating that glycine and LPC (18:2) predict glucose tolerance. Moreover, the three metabolites (glycine, LPC (18:2) and C2) revealed high significance even in the fully adjusted model 2 in the cross‐sectional KORA S4 cohort (Supplementary Table S8). As expected, a very significant association (P=1.5 × 10−22) was observed for hexose H1 in model 1, while no significance (P=0.12) was observed for it in the fully adjusted model 2 (Supplementary Table S8).
Prospective population‐based versus nested case–control designs
To investigate the predict value of the five branched‐chain and aromatic amino acids (isoleucine, leucine, valine, tyrosine and phenylalanine) (Wang et al, 2011) in our study, we correlated the baseline metabolite concentrations with follow‐up 2‐h glucose values. We found none of them to be associated significantly, indicating that the five amino acids cannot predict risk of IGT (β estimates and P‐values are shown in Supplementary Table S9). Furthermore, none of these five amino acids showed associations with 2‐h glucose values in the cross‐sectional KORA S4 study (Supplementary Table S8).
To replicate the identified five branched‐chain and aromatic amino acids (Wang et al, 2011), we matched our baseline samples to the 91 incident T2D using the same method described previously (Wang et al, 2011). We replicated four out of the five branched‐chain and aromatic amino acids (characteristics of the case–control and non‐T2D samples are shown in Supplementary Table S10; ORs and P‐values are given in Supplementary Table S11). As expected, the three identified IGT‐specific metabolites did not significantly differ between the matched case control samples, because the selected controls were enriched with individuals accompanied by high‐risk features such as obesity and elevated fasting glucose as described by Wang et al (2011). In fact, the 91 matched controls include about 50% pre‐diabetes individuals, which is significantly higher than the general population (about 15%).
Replication in the cross‐sectional EPIC‐Potsdam cohort
Metabolomics data from serum samples of a randomly drawn EPIC‐Potsdam subcohort (n=2500) were used for replication. Glycine (OR=0.60 (0.47–0.77), P=7.4 × 10−5) and LPC (18:2) (OR=0.79 (0.63–0.98), P=0.037) were replicated when 133 T2D patients were compared with 1253 individuals with NGT at baseline (Supplementary Table S12). However, acetylcarnitine C2 (OR=0.98 (0.81–1.19), P=0.858) could not be replicated when T2D patients were compared with NGT individuals, since the IGT participants were not available in the data set. The absolute levels of these three metabolites were in a similar range, with only slight differences that were due probably to the differences of the two cohorts or to potential batch effects of metabolomics measurements (Supplementary Tables S12 and S15). Thus, these data therefore provide an independent validation of the metabolomics study.
Metabolite–protein interaction networks confirmed by transcription levels
To investigate the underlying molecular mechanism for the three identified IGT metabolites, we studied their associations with T2D‐related genes by analyzing protein‐metabolite interaction networks (Wishart et al, 2009; Szklarczyk et al, 2011). In all, 7 out of the 46 known T2D‐related genes (PPARG, TCF7L2, HNF1A, GCK, IGF1, IRS1 and IDE) were linked to these metabolites through related enzymes or proteins (Figure 3A; the list of 46 genes is shown in Supplementary Table S13). To validate the networks, the links between metabolites, enzymes, pathway‐related proteins and T2D‐related genes were manually checked for biochemical relevance and classified into four groups: signaling regulation, transcription, physical interaction and the same pathway (Supplementary Table S14).
Gene expression analysis in whole‐blood samples of participants from the KORA S4 revealed significant variations (P‐values ranging from 9.4 × 10−3 to 1.1 × 10−6) of transcript levels of four enzymes, namely, carnitine/acylcarnitine translocase (CAC), carnitine acetyltransferase (CrAT), 5‐aminolevulinate synthase 1 (ALAS‐H) and cytosolic phospholipase A2 (cPLA2), which are known to be strongly associated with the levels of the three metabolites (Figure 3B). The clear relationship between changes in metabolites and transcription levels of associated enzymes strongly suggests that these metabolites are functionally associated with T2D genes in established pathways.
Using a cross‐sectional approach (KORA S4, F4), we analyzed 140 metabolites and identified three (glycine, LPC (18:2) and C2) which are IGT‐specific metabolites with high statistical significance. Notably, these three metabolites are distinct from the currently known T2D risk indicators (e.g., age, BMI, systolic BP, HDL cholesterol, HbA1c, fasting glucose and fasting insulin). A prospective analysis (KORA S4→F4) shows that low levels of glycine and LPC at baseline predict the risks of developing IGT and/or T2D. Glycine and LPC especially were shown to be strong predictors of glucose tolerance, even 7 years before disease onset. Moreover, those two metabolites were independently replicated in the EPIC‐Potsdam cross‐sectional study. Finally, based on our analysis of interaction networks, and supported by gene expression profiles, we found that seven T2D‐related genes are functionally associated with the three IGT candidate metabolites.
Different study designs reveal progression of IGT and T2D
From a methodological point of view, our study is unique with respect to the large sample sizes and the availability of metabolomics data from two time points. This allowed us to compare results generated with cross‐sectional and prospective approaches directly, as well as with results from prospective population‐based cohort and nested case–control designs. We found that individuals with IGT have elevated concentrations of the acetylcarnitine C2 as compared with NGT individuals only in the cross‐sectional study, whereas C2 was unable to predict IGT and T2D 7 years before the disease onset. We speculate that the acetylcarnitine C2 might be an event with a quick effect.
Our analysis could replicate four out of the five branched‐chain and aromatic amino acids recently reported to be predictors of T2D using nested/selected case–control samples (Wang et al, 2011). However, the population‐based prospective study employed in our study revealed that these five amino acids are in fact not associated with future 2‐h glucose values. It should be taken into account, however, that more pre‐diabetes individuals (∼50%) were in the control group of that study design, and that these markers were unable to be extended to the general population (with only 0.4% improvement from the T2D risk indicators as reported in the Framingham Offspring Study) (Wang et al, 2011). Most likely, changes in these amino acids happen at a later stage in the development of T2D (e.g., from IGT to T2D); indeed, similar phenomenon was also observed in our study (Supplementary Figure S1D). In contrast, we found that combined glycine, LPC (18:2) and C2 have 2.6 and 1% increment in predicting IGT and T2D in addition to the common risk indicators of T2D. This suggests they are better candidate for early biomarkers, and specifically from NGT to IGT, than the five amino acids.
IFG and IGT should be considered as two different phenotypes
By definition (WHO, 1999; ADA, 2010), individuals with IFG or IGT or both are considered as pre‐diabetics. Yet we observed different behaviors regarding the change of the metabolite panel from NGT to i‐IFG or to IGT, indicating that i‐IFG and IGT are two different phenotypes. For future studies, we therefore suggest separating IFG from IGT.
The observed decrease in the serum concentration of glycine in individuals with IGT and dT2D may result from insulin resistance (Pontiroli et al, 2004). It was already reported that insulin represses ALAS‐H expression (Phillips and Kushner, 2005). As insulin sensitivity progressively decreases during diabetes development (McGarry, 2002; Stumvoll et al, 2005; Faerch et al, 2009; Tabak et al, 2009), it is expected that the expression levels of the enzyme increase in individuals with IGT and dT2D, since ALAS‐H catalyzes the condensation of glycine and succinyl‐CoA into 5‐aminolevulinic acid (Bishop, 1990). This may explain our observation that glycine was lower in both individuals with IGT and those with dT2D. However, the level of fasting insulin in IGT and T2D individuals was higher than in NGT participants in the KORA S4 study, suggesting that yet undetected pathways may also play roles here.
Acetylcarnitine is produced by the mitochondrial matrix enzyme, CrAT, from carnitine and acetyl‐CoA, a molecule that is a product of both fatty acid β‐oxidation and glucose oxidation and can be used by the citric acid cycle for energy generation. We observed higher transcriptional level of CrAT in individuals with IGT and T2D, most probably due to an activation of the peroxisome proliferator activated receptor alpha (PPAR‐α) pathway in peroxisomes (Horie et al, 1981). Higher expression of CrAT would explain the elevated levels of acetylcarnitine C2 in IGT individuals. Although it is not clear if mitochondrial CrAT is overexpressed when there is increased fatty acid β‐oxidation (e.g., in diabetes; Noland et al, 2009), it is expected that additional acetylcarnitine will be formed by CrAT due to increased substrate availability (acetyl‐CoA), thereby releasing pyruvate dehydrogenase inhibition by acetyl‐CoA and stimulating glucose uptake and oxidation. An increase of acylcarnitines, and in particular of acetylcarnitine C2, is a hallmark in diabetic people (Adams et al, 2009). Cellular lipid levels are increased in humans with IGT or overt T2D who also may have altered mitochondrial function (Morino et al, 2005; Szendroedi et al, 2007). Together, these findings reflect an important role of increased cellular lipid metabolites and impaired mitochondrial β‐oxidation in the development of insulin resistance (McGarry, 2002; Szendroedi et al, 2007; Koves et al, 2008).
In our study, individuals with IGT and dT2D had lower cPLA2 transcription levels, suggesting reduced cPLA2 activity. As a result, a concomitant decrease in the concentration of arachidonic acid (AA), a product of cPLA2 activity, is expected. AA has been shown to inhibit glucose uptake by adipocytes (Malipa et al, 2008) in a mechanism that is probably insulin independent and that involves the GLUT‐1 transporter. Therefore, our findings may point to regulatory effects in individuals with IGT, since the inhibition of AA production would result in an increased glucose uptake.
While our metabolite profiles provide a snapshot of human metabolism, more detailed metabolic profile follow‐ups, with longer time spans and more time points, are necessary to further evaluate the development of the novel biomarkers. Moreover, the influence from long‐term dietary habits should not be ignored, even though we used only serum from fasting individuals (Altmaier et al, 2011; Primrose et al, 2011). Furthermore, additional tissue samples (e.g., muscle and adipocytes) and experimental approaches are needed to characterize the causal pathways in detail.
Three novel metabolites, glycine, LPC (18:2) and C2, were identified as pre‐diabetes‐specific markers. Their changes might precede other branched‐chain and aromatic amino acids markers in the progression of T2D. Combined levels of glycine, LPC (18:2) and C2 can predict risk not only for IGT but also for T2D. Targeting the pathways that involve these newly proposed potential biomarkers would help to take preventive steps against T2D at an earlier stage.
Materials and methods
Written informed consent was obtained from each KORA and EPIC‐Potsdam participant. The KORA and EPIC‐Potsdam studies were approved by the ethics committee of the Bavarian Medical Association and the Medical Society of the State of Brandenburg, respectively.
Sample source and classification
The KORA surveys are population‐based studies conducted in the city of Augsburg and the surrounding towns and villages (Holle et al, 2005; Wichmann et al, 2005). KORA is a research platform in the field of epidemiology, health economics and health‐care research. Four surveys were conducted with 18 079 participants recruited from 1984 to 2001. The S4 consists of 4261 individuals (aged 25–74 years) examined from 1999 to 2001. From 2006 to 2008, 3080 participants (with an age range of 32–81) took part in an F4 survey. Ascertainments of anthropometric measurements and personal interviews, as well as laboratory measurements of persons, from the KORA S4/F4 have been described elsewhere (Rathmann et al, 2009; Meisinger et al, 2010; Jourdan et al, 2012).
In the KORA cohort, blood was drawn into S‐Monovette® serum tubes (SARSTEDT AG & Co., Nümbrecht, Germany) in the morning between 0800 and 1030, h after at least 8 h of fasting. Tubes were gently inverted twice, followed by 30 min resting at room temperature, to obtain complete coagulation. For serum collection, blood was centrifuged at 2750 g at 15°C for 10 min. Serum was filled into synthetic straws, which were stored in liquid nitrogen until the metabolic analyses were conducted.
Metabolite measurements and exclusion of metabolites
For the KORA S4 survey, the targeted metabolomics approach was based on measurements with the AbsoluteIDQ™ p180 kit (BIOCRATES Life Sciences AG, Innsbruck, Austria). This method allows simultaneous quantification of 188 metabolites using liquid chromatography and flow injection analysis–mass spectrometry. The assay procedures have been described previously in detail (Illig et al, 2010; Römisch‐Margl et al, 2012). For each kit plate, five references (human plasma pooled material, Seralab) and three zero samples (PBS) were measured in addition to the KORA samples. To ensure data quality, each metabolite had to meet two criteria: (1) the coefficient of variance (CV) for the metabolite in the total 110 reference samples had to be smaller than 25%. In total, seven outliers were removed because their concentrations were larger than the mean plus 5 × s.d.; (2) 50% of all measured sample concentrations for the metabolite should be above the limit of detection (LOD), which is defined as 3 × median of the three zero samples. In total, 140 metabolites passed the quality controls (Supplementary Table S15): one hexose (H1), 21 acylcarnitines, 21 amino acids, 8 biogenic amines, 13 sphingomyelins (SMs), 33 diacyl (aa) phosphatidylcholines (PCs), 35 acyl‐alkyl (ae) PCs and 8 lysoPCs. Concentrations of all analyzed metabolites are reported in μM.
Gene expression analysis
Peripheral blood was drawn under fasting conditions from 599 KORA S4 individuals at the same time as the serum samples used for metabolic profiling were prepared. Blood samples were collected directly in PAXgene (TM) Blood RNA tubes (PreAnalytiX). The RNA extraction was performed using the PAXgene Blood miRNA kit (PreAnalytiX). Purity and integrity of RNA was assessed on the Bioanalyzer (Agilent) with the 6000 Nano LabChip reagent set (Agilent). In all, 500 ng of RNA was reverse‐transcribed into cRNA and biotin‐UTP labeled, using the Illumina TotalPrep‐96 RNA Amplification Kit (Ambion). In all, 3000, ng of cRNA was hybridized to the Illumina HumanHT‐12 v3 Expression BeadChip. Chips were washed, detected and scanned according to manufacturer's instructions. Raw data were exported from the Illumina ‘GenomeStudio’ Software to R. The data were converted into logarithmic scores and normalized using the quantile method (Bolstad et al, 2003). The sample sets comprised 383 individuals with NGT, 104 with IGT and 26 with dT2D. The known T2D individuals were removed as had been done for the metabolomics analysis.
Metabolite concentrations of Glycine, LPC (18:2) and C2 with T2D status in the KORA S4 and F4 are provided (Supplementary Table S16). Additional data from the KORA S4 and F4 studies, including the metabolite concentrations and the gene expression with clinical phenotypes used in this study, are available upon request from KORA‐gen (http://epi.helmholtz‐muenchen.de/kora‐gen). Requests should be sent to and are subject to approval by the KORA board to ensure that appropriate conditions are met to preserve patient privacy. Formal collaboration and co‐authorship with members of the KORA study is not an automatic condition to obtain access to the data published in the present paper. More general information about KORA, including S4 and F4 study design and clinical variables, can be found at http://epi.helmholtz‐muenchen.de/kora‐gen/seiten/variablen_e.php and http:/helmholtz‐muenchen.de/en/kora‐en/information‐for‐scientists/current‐kora‐studies.
Calculations were performed under the R statistical environment (http://www.r‐project.org/).
Multivariate logistic regression and linear regression
In multivariate logistic regression analysis, ORs for single metabolites were calculated between two groups. The concentration of each metabolite was scaled to have a mean of zero and an s.d. of one; thus, all reported OR values correspond to the change per s.d. of metabolite concentration. Various T2D risk factors were added to the logistic regression analysis as covariates. To handle false discovery rates from multiple comparisons, the cutoff point for significance was calculated according to the Bonferroni correction, at a level of 3.6 × 10−4 (for a total use of 140 metabolites at the 5% level). Because the metabolites were correlated within well‐defined biological groups (e.g., 8 lysoPCs, 33 diacyl PCs, 35 acyl‐alkyl PCs and 13 SMs), this correction was conservative.
Additionally, the categorized metabolite concentrations and combined scores (see below) were analyzed, and the ORs were calculated across quartiles. To test the trend across quartiles, we assigned all individuals either the median value of the concentrations or the combined scores, and obtained the P‐values using the same regression model.
For linear regression analyses, β estimates were calculated from the concentration of each metabolite and the 2‐h glucose value. The concentration of each metabolite was log‐transformed and normalized to have a mean of zero and an s.d. of one. Various risk factors in the logistic regression were added as covariates, and the same significance level (3.6 × 10−4) was adopted.
Combination of metabolites
To obtain the combined scores of metabolites, the scaled metabolite concentrations (mean=0, s.d.=1) were first modeled with multivariate logistic regression containing all confounding variables. The coefficients of these metabolites from the model were then used to calculate a weighted sum for each individual. In accordance with the decreasing trend of glycine and LPC (18:2), we inverted these values as the combined scores.
Residuals of metabolite concentrations
To avoid the influence of other confounding factors when plotting the concentration of metabolites, we used the residuals from a linear regression model. Metabolite concentrations were log‐transformed and scaled (mean=0, s.d.=1), and the residuals were then deduced from the linear regression that included the corresponding confounding factors.
Random forest, stepwise selection methods and candidate biomarker selection
To select candidate biomarkers, we applied two additional methods: the random forest selection (Breiman, 2001) and the stepwise selection, which assess the metabolites as a group.
Between two groups, the supervised classification method of random forest was first used to select the metabolites among the 30 highest ranking variables of importance score, allowing the best separation of the individuals from different groups. T2D risk indicators were also included in this method with all the metabolites.
We further selected the metabolites using stepwise selection on the logistic regression model. Metabolites with significantly different concentrations between the compared groups in logistic regression, and which were also selected using random forest, were used in this model along with all the risk indicators. Akaike's Information Criterion (AIC) was used to evaluate the performance of these subsets of metabolites used in the models. The model with minimal AIC was chosen. The AUC was used to evaluate the models.
Metabolite–protein interactions from the Human Metabolome Database (HMDB; Wishart et al, 2009) and protein–protein interactions in the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; Szklarczyk et al, 2011) were used to construct a network containing relationships between metabolites, enzymes, other proteins and T2D‐related genes. The candidate metabolites were assigned to HMDB IDs using the metaP‐Server (Kastenmuller et al, 2011), and their associated enzymes were derived according to the annotations provided by HMDB. These enzymes were connected to the 46 T2D‐related genes (considered at that point), allowing for 1 intermediate protein (other proteins) through STRING protein functional interaction and optimized by eliminating edges with a STRING score of <0.7 and undirected paths. The subnetworks were connected by the shortest path from metabolites to T2D‐related genes.
The EPIC‐Postdam is part of the multicenter EPIC study (Boeing et al, 1999; Riboli et al, 2002). It was drawn from the general adult population in Potsdam and surrounding areas and consists of 27 548 participants recruited from 1994 to 1998 (Boeing et al, 1999). At baseline, participants underwent anthropometric and BP measurements, completed an interview on prevalent diseases, a questionnaire on socioeconomic and lifestyle factors and submitted a validated food frequency questionnaire. Follow‐up questionnaires were administered every 2–3 years (Bergmann et al, 1999).
From the EPIC‐Potsdam population, a substudy of 2500 participants was randomly selected from all participants who had provided blood samples at baseline (n=26 444). The substudy had a limited number of fasting samples available. Therefore, non‐fasting samples were also considered. Out of the substudy, 814 participants were excluded because of missing information on relevant covariates or missing fasting samples. Individuals with NGT and T2D were determined according to HbA1c categories defined by the American Diabetes Association in 2010 (ADA, 2010).
In the EPIC‐Potsdam study, 30 ml of blood was drawn by qualified medical staff during the baseline examination, immediately fractionated into serum, plasma, buffy coat and erythrocytes and aliquoted into straws. The blood samples were stored in liquid nitrogen (at −196°C) until the metabolic analyses.
Metabolite measurements for the EPIC‐Potsdam samples were performed using the same kit and the same method as for the KORA F4 samples (Floegel et al, 2011).
Calculations were performed using the Statistical Analysis System (SAS), Version 9.2 (SAS Institute, Inc., Cary, NC, USA).
We express our appreciation to all KORA and EPIC‐Potsdam study participants for donating their blood and time. We thank the field staff in Augsburg who conducted the KORA studies. The KORA group consisted of HE Wichmann (speaker), A Peters, C Meisinger, T Illig, R Holle and J John, as well as their co‐workers, and they were responsible for the design and conduction of the studies. We thank all the staff of the Institute of Epidemiology, Helmholtz Zentrum München, and the Genome Analysis Center, as well as the Metabolomic Platform, who helped in the sample logistics, the metabolite profiling assays and the genetic expression analyses, especially A Sabunchi, H Chavez, B Hochstrat, F Scharl, N Lindemann and J Scarpa. We thank M Sattler, W Mewes, VA Raker and J Mendes for comments and suggestions. This study was supported in part by a grant from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V). In addition, this work was partly supported by the BMBF project ‘Metabolomics of ageing’ (FKZ: 01DO12030) and Project ‘SysMBo: Systems Biology of Metabotypes’ (FKZ: 0315494A). Further support for this study was obtained from the Federal Ministry of Health (Berlin, Germany), the Ministry of Innovation, Science, Research and Technology of the state North‐Rhine Westphalia (Düsseldorf, Germany) and the Federal Ministry of Education, Science, Research and Technology (NGFN‐Plus AtheroGenomics/01GS0423; Berlin, Germany). The KORA research platform and the KORA Augsburg studies are financed by the Helmholtz Zentrum München, German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education, Science, Research and Technology and by the State of Bavaria. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author contributions: RWS, ZY, CHe, KS, HP, AP, TM, HEW, TP, JA and TI designed the research. RWS, CHe, CP, WRM, MC, KH and HP performed the experiments. RWS, ZY, CHe, ACM, AF, YH, KH, MC, CHo, BT, HG, TX, EB, AD, KM, HYO, YL, LX, KS, AP, HP, TM, MR, HEW, TP, JA and TI analyzed the data. RWS, ZY, CHe, ACM, AF, YH, CHo, HP, TM, AP, MR, TP and JA wrote the paper.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Material [msb201243-sup-0001.doc]
Supplementary Table S16 [msb201243-sup-0002.xls]
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2012 EMBO and Macmillan Publishers Limited