TauFlowNet: Revealing latent propagation mechanism of tau aggregates using deep neural transport equations.
Mounting evidence shows that Alzheimer’s disease (AD) is characterized by the propagation of tau aggregates throughout the brain in a prion-like manner. Since current pathology imaging technologies only provide a spatial mapping of tau accumulation, computational modeling becomes indispensable in analyzing the spatiotemporal propagation patterns of widespread tau aggregates from the longitudinal data. However, current state-of-the-art works focus on the longitudinal change of focal patterns, lacking a system-level understanding of the tau propagation mechanism that can explain and forecast the cascade of tau accumulation. To address this limitation, we conceptualize that the intercellular spreading of tau pathology forms a dynamic system where each node (brain region) is ubiquitously wired with other nodes while interacting with the build-up of pathological burdens. In this context, we formulate the biological process of tau spreading in a principled potential energy transport model (constrained by brain network topology), which allows us to develop an explainable neural network for uncovering the spatiotemporal dynamics of tau propagation from the longitudinal tau-PET scans. Specifically, we first translate the transport equation into a GNN (graph neural network) backbone, where the spreading flows are essentially driven by the potential energy of tau accumulation at each node. Conventional GNNs employ a l2-norm graph smoothness prior, resulting in nearly equal potential energies across nodes, leading to vanishing flows. Following this clue, we introduce the total variation (TV) into the graph transport model, where the nature of system’s Euler-Lagrange equations is to maximize the spreading flow while minimizing the overall potential energy. On top of this min-max optimization scenario, we design a generative adversarial network (GAN-like) to characterize the TV-based spreading flow of tau aggregates, coined TauFlowNet. We evaluate our TauFlowNet on ADNI and OASIS datasets in terms of the prediction accuracy of future tau accumulation and explore the propagation mechanism of tau aggregates as the disease progresses. Compared to the current counterpart methods, our physics-informed deep model yields more accurate and interpretable results, demonstrating great potential in discovering novel neurobiological mechanisms through the lens of machine learning.
URL:
Multi-View Separable Pyramid Network for AD Prediction at MCI Stage by 18F-FDG Brain PET Imaging.
Alzheimer’s Disease (AD), one of the main causes of death in elderly people, is characterized by Mild Cognitive Impairment (MCI) at prodromal stage. Nevertheless, only part of MCI subjects could progress to AD. The main objective of this paper is thus to identify those who will develop a dementia of AD type among MCI patients. 18F-FluoroDeoxyGlucose Positron Emission Tomography (18F-FDG PET) serves as a neuroimaging modality for early diagnosis as it can reflect neural activity via measuring glucose uptake at resting-state. In this paper, we design a deep network on 18F-FDG PET modality to address the problem of AD identification at early MCI stage. To this end, a Multi-view Separable Pyramid Network (MiSePyNet) is proposed, in which representations are learned from axial, coronal and sagittal views of PET scans so as to offer complementary information and then combined to make a decision jointly. Different from the widely and naturally used 3D convolution operations for 3D images, the proposed architecture is deployed with separable convolution from slice-wise to spatial-wise successively, which can retain the spatial information and reduce training parameters compared to 2D and 3D networks, respectively. Experiments on ADNI dataset show that the proposed method can yield better performance than both traditional and deep learning-based algorithms for predicting the progression of Mild Cognitive Impairment, with a classification accuracy of 83.05%.
URL:
Toward an interpretable Alzheimer’s disease diagnostic model with regional abnormality representation via deep learning.
In this paper, we propose a novel method for magnetic resonance imaging based Alzheimer’s disease (AD) or mild cognitive impairment (MCI) diagnosis that systematically integrates voxel-based, region-based, and patch-based approaches into a unified framework. Specifically, we parcellate the brain into predefined regions based on anatomical knowledge (i.e., templates) and derive complex nonlinear relationships among voxels, whose intensities denote volumetric measurements, within each region. Unlike existing methods that use cubical or rectangular shapes, we consider the anatomical shapes of regions as atypical patches. Using complex nonlinear relationships among voxels in each region learned by deep neural networks, we extract a “regional abnormality representation.” We then make a final clinical decision by integrating the regional abnormality representations over the entire brain. It is noteworthy that the regional abnormality representations allow us to interpret and understand the symptomatic observations of a subject with AD or MCI by mapping and visualizing these observations in the brain space. On the baseline MRI dataset from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, our method achieves state-of-the-art performance for four binary classification tasks and one three-class classification task. Additionally, we conducted exhaustive experiments and analysis to validate the efficacy and potential of our method.
URL:
Disentangling time series between brain tissues improves fMRI data quality using a time-dependent deep neural network.
Functional MRI (fMRI) is a prominent imaging technique to probe brain function, however, a substantial proportion of noise from multiple sources influences the reliability and reproducibility of fMRI data analysis and limits its clinical applications. Extensive effort has been devoted to improving fMRI data quality, but in the last two decades, there is no consensus reached which technique is more effective. In this study, we developed a novel deep neural network for denoising fMRI data, named denoising neural network (DeNN). This deep neural network is 1) applicable without requiring externally recorded data to model noise; 2) spatially and temporally adaptive to the variability of noise in different brain regions at different time points; 3) automated to output denoised data without manual interference; 4) trained and applied on each subject separately and 5) insensitive to the repetition time (TR) of fMRI data. When we compared DeNN with a number of nuisance regression methods for denoising fMRI data from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, only DeNN had connectivity for functionally uncorrelated regions close to zero and successfully identified unbiased correlations between the posterior cingulate cortex seed and multiple brain regions within the default mode network or task positive network. The whole brain functional connectivity maps computed with DeNN-denoised data are approximately three times as homogeneous as the functional connectivity maps computed with raw data. Furthermore, the improved homogeneity strengthens rather than weakens the statistical power of fMRI in detecting intrinsic functional differences between cognitively normal subjects and subjects with Alzheimer’s disease.
URL:
Estimating explainable Alzheimer’s disease likelihood map via clinically-guided prototype learning.
Identifying Alzheimer’s disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Using clinically-guided prototype learning, we propose a novel deep-learning approach through eXplainable AD Likelihood Map Estimation (XADLiME) for AD progression modeling over 3D sMRIs. Specifically, we establish a set of topologically-aware prototypes onto the clusters of latent clinical features, uncovering an AD spectrum manifold. Considering this pseudo map as an enriched reference, we employ an estimating network to approximate the AD likelihood map over a 3D sMRI scan. Additionally, we promote the explainability of such a likelihood map by revealing a comprehensible overview from clinical and morphological perspectives. During the inference, this estimated likelihood map served as a substitute for unseen sMRI scans for effectively conducting the downstream task while providing thorough explainable states.
URL:
LCGNet: Local Sequential Feature Coupling Global Representation Learning for Functional Connectivity Network Analysis with fMRI.
Analysis of functional connectivity networks (FCNs) derived from resting-state functional magnetic resonance imaging (rs-fMRI) has greatly advanced our understanding of brain diseases, including Alzheimer’s disease (AD) and attention deficit hyperactivity disorder (ADHD). Advanced machine learning techniques, such as convolutional neural networks (CNNs), have been used to learn high-level feature representations of FCNs for automated brain disease classification. Even though convolution operations in CNNs are good at extracting local properties of FCNs, they generally cannot well capture global temporal representations of FCNs. Recently, the transformer technique has demonstrated remarkable performance in various tasks, which is attributed to its effective self-attention mechanism in capturing the global temporal feature representations. However, it cannot effectively model the local network characteristics of FCNs. To this end, in this paper, we propose a novel network structure for Local sequential feature Coupling Global representation learning (LCGNet) to take advantage of convolutional operations and self-attention mechanisms for enhanced FCN representation learning. Specifically, we first build a dynamic FCN for each subject using an overlapped sliding window approach. We then construct three sequential components (i.e., edge-to-vertex layer, vertex-to-network layer, and network-to-temporality layer) with a dual backbone branch of CNN and transformer to extract and couple from local to global topological information of brain networks. Experimental results on two real datasets (i.e., ADNI and ADHD-200) with rs-fMRI data show the superiority of our LCGNet.
URL:
DAFT: A universal module to interweave tabular data and 3D images in CNNs.
Prior work on Alzheimer’s Disease (AD) has demonstrated that convolutional neural networks (CNNs) can leverage the high-dimensional image information for diagnosing patients. Beside such data-driven approaches, many established biomarkers exist and are typically represented as tabular data, such as demographics, genetic alterations, or laboratory measurements from cerebrospinal fluid. However, little research has focused on the effective integration of tabular data into existing CNN architectures to improve patient diagnosis. We introduce the Dynamic Affine Feature Map Transform (DAFT), a general-purpose module for CNNs that incites or represses high-level concepts learned from a 3D image by conditioning feature maps of a convolutional layer on both a patient’s image and tabular clinical information. This is achieved by using an auxiliary neural network that outputs a scaling factor and offset to dynamically apply an affine transformation to the feature maps of a convolutional layer. In our experiments on AD diagnosis and time-to-dementia prediction, we show that the DAFT is highly effective in combining 3D image and tabular information by achieving a mean balanced accuracy of 0.622 for diagnosis, and mean c-index of 0.748 for time-to-dementia prediction, thus outperforming all baseline methods. Finally, our extensive ablation study and empirical experiments reveal that the performance improvement due to the DAFT is robust with respect to many design choices.
URL:
A Bayesian group sparse multi-task regression model for imaging genetics.
MOTIVATION: Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group l2,1-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. RESULTS: We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes. AVAILABILITY AND IMPLEMENTATION: Software and sample data is available as an R package ‘bgsmtr’ that can be downloaded from The Comprehensive R Archive Network (CRAN). CONTACT: nathoo@uvic.ca. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL:
Deep Learning of Static and Dynamic Brain Functional Networks for Early MCI Detection.
While convolutional neural network (CNN) has been demonstrating powerful ability to learn hierarchical spatial features from medical images, it is still difficult to apply it directly to resting-state functional MRI (rs-fMRI) and the derived brain functional networks (BFNs). We propose a novel CNN framework to simultaneously learn embedded features from BFNs for brain disease diagnosis. Since BFNs can be built by considering both static and dynamic functional connectivity (FC), we first decompose rs-fMRI into multiple static BFNs with modified independent component analysis. Then, the voxel-wise variability in dynamic FC is used to quantify BFN dynamics. A set of paired 3D images representing static/dynamic BFNs can be fed into 3D CNNs, from which we can hierarchically and simultaneously learn static/dynamic BFN features. As a result, the dynamic BFN features can complement static BFN features and, at the meantime, different BFNs can help each other toward a joint and better classification. We validate our method with a publicly accessible, large cohort of rs-fMRI dataset in early-stage mild cognitive impairment (eMCI) diagnosis, which is one of the most challenging problems to the clinicians. By comparing with a conventional method, our method shows significant diagnostic performance improvement by almost 10%. This result demonstrates the effectiveness of deep learning in preclinical Alzheimer’s disease diagnosis, based on the complex and high-dimensional voxel-wise spatiotemporal patterns of the resting-state brain functional connectomics. The framework provides a new but intuitive way to fully exploit deeply embedded diagnostic features from rs-fMRI for a better-individualized diagnosis of various neurological diseases.
URL:
MC-RVAE: Multi-channel recurrent variational autoencoder for multimodal Alzheimer’s disease progression modelling.
The progression of neurodegenerative diseases, such as Alzheimer’s Disease, is the result of complex mechanisms interacting across multiple spatial and temporal scales. Understanding and predicting the longitudinal course of the disease requires harnessing the variability across different data modalities and time, which is extremely challenging. In this paper, we propose a model based on recurrent variational autoencoders that is able to capture cross-channel interactions between different modalities and model temporal information. These are achieved thanks to its multi-channel architecture and its shared latent variational space, parametrized with a recurrent neural network. We evaluate our model on both synthetic and real longitudinal datasets, the latter including imaging and non-imaging data, with N=897 subjects. Results show that our multi-channel recurrent variational autoencoder outperforms a set of baselines (KNN, random forest, and group factor analysis) for the task of reconstructing missing modalities, reducing the mean absolute error by 5% (w.r.t. the best baseline) for both subcortical volumes and cortical thickness. Our model is robust to missing features within each modality and is able to generate realistic synthetic imaging biomarkers trajectories from cognitive scores.
URL:
A distributed multitask multimodal approach for the prediction of Alzheimer’s disease in a longitudinal study.
Predicting the progression of Alzheimer’s Disease (AD) has been held back for decades due to the lack of sufficient longitudinal data required for the development of novel machine learning algorithms. This study proposes a novel machine learning algorithm for predicting the progression of Alzheimer’s disease using a distributed multimodal, multitask learning method. More specifically, each individual task is defined as a regression model, which predicts cognitive scores at a single time point. Since the prediction tasks for multiple intervals are related to each other in chronological order, multitask regression models have been developed to track the relationship between subsequent tasks. Furthermore, since subjects have various combinations of recording modalities together with other genetic, neuropsychological and demographic risk factors, special attention is given to the fact that each modality may experience a specific sparsity pattern. The model is hence generalized by exploiting multiple individual multitask regression coefficient matrices for each modality. The outcome for each independent modality-specific learner is then integrated with complementary information, known as risk factor parameters, revealing the most prevalent trends of the multimodal data. This new feature space is then used as input to the gradient boosting kernel in search for a more accurate prediction. This proposed model not only captures the complex relationships between the different feature representations, but it also ignores any unrelated information which might skew the regression coefficients. Comparative assessments are made between the performance of the proposed method with several other well-established methods using different multimodal platforms. The results indicate that by capturing the interrelatedness between the different modalities and extracting only relevant information in the data, even in an incomplete longitudinal dataset, will yield minimized prediction errors.
URL:
Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort.
MOTIVATION: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the influence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. RESULTS: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer’s disease. Built upon regression analysis, our model uses a new form of regularization, group l(2,1)-norm (G(2,1)-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G(2,1)-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an l(2,1)-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs. AVAILABILITY: Software is publicly available at: http://ranger.uta.edu/%7eheng/imaging-genetics/.
URL: http://ranger.uta.edu/%7eheng/imaging-genetics/.
Multi-Modal Diagnosis of Alzheimer’s Disease using Interpretable Graph Convolutional Networks.
The interconnection between brain regions in neurological disease encodes vital information for the advancement of biomarkers and diagnostics. Although graph convolutional networks are widely applied for discovering brain connection patterns that point to disease conditions, the potential of connection patterns that arise from multiple imaging modalities has yet to be fully realized. In this paper, we propose a multi-modal sparse interpretable GCN framework (SGCN) for the detection of Alzheimer’s disease (AD) and its prodromal stage, known as mild cognitive impairment (MCI). In our experimentation, SGCN learned the sparse regional importance probability to find signature regions of interest (ROIs), and the connective importance probability to reveal disease-specific brain network connections. We evaluated SGCN on the Alzheimer’s Disease Neuroimaging Initiative database with multi-modal brain images and demonstrated that the ROI features learned by SGCN were effective for enhancing AD status identification. The identified abnormalities were significantly correlated with AD-related clinical symptoms. We further interpreted the identified brain dysfunctions at the level of large-scale neural systems and sex-related connectivity abnormalities in AD/MCI. The salient ROIs and the prominent brain connectivity abnormalities interpreted by SGCN are considerably important for developing novel biomarkers. These findings contribute to a better understanding of the network-based disorder via multi-modal diagnosis and offer the potential for precision diagnostics. The source code is available at https://github.com/Houliang-Zhou/SGCN.
URL: https://github.com/Houliang-Zhou/SGCN.
Classification of Brain Disorders in rs-fMRI via Local-to-Global Graph Neural Networks.
Recently, functional brain network has been used for the classification of brain disorders, such as Autism Spectrum Disorder (ASD) and Alzheimer’s disease (AD). Existing methods either ignore the non-imaging information associated with the subjects and the relationship between the subjects, or cannot identify and analyze disease-related local brain regions and biomarkers, leading to inaccurate classification results. This paper proposes a local-to-global graph neural network (LG-GNN) to address this issue. A local ROI-GNN is designed to learn feature embeddings of local brain regions and identify biomarkers, and a global Subject-GNN is then established to learn the relationship between the subjects with the embeddings generated by the local ROI-GNN and the non-imaging information. The local ROI-GNN contains a self-attention based pooling module to preserve the embeddings most important for the classification. The global Subject-GNN contains an adaptive weight aggregation block to generate the multi-scale feature embedding corresponding to each subject. The proposed LG-GNN is thoroughly validated using two public datasets for ASD and AD classification. The experimental results demonstrated that it achieves the state-of-the-art performance in terms of various evaluation metrics.
URL:
Learning Spatio-Temporal Model of Disease Progression with NeuralODEs from Longitudinal Volumetric Data.
Robust forecasting of the future anatomical changes inflicted by an ongoing disease is an extremely challenging task that is out of grasp even for experienced healthcare professionals. Such a capability, however, is of great importance since it can improve patient management by providing information on the speed of disease progression already at the admission stage, or it can enrich the clinical trials with fast progressors and avoid the need for control arms by the means of digital twins. In this work, we develop a deep learning method that models the evolution of age-related disease by processing a single medical scan and providing a segmentation of the target anatomy at a requested future point in time. Our method represents a time-invariant physical process and solves a large-scale problem of modeling temporal pixel-level changes utilizing NeuralODEs. In addition, we demonstrate the approaches to incorporate the prior domain-specific constraints into our method and define temporal Dice loss for learning temporal objectives. To evaluate the applicability of our approach across different age-related diseases and imaging modalities, we developed and tested the proposed method on the datasets with 967 retinal OCT volumes of 100 patients with Geographic Atrophy and 2823 brain MRI volumes of 633 patients with Alzheimer’s Disease. For Geographic Atrophy, the proposed method outperformed the related baseline models in the atrophy growth prediction. For Alzheimer’s Disease, the proposed method demonstrated remarkable performance in predicting the brain ventricle changes induced by the disease, achieving the state-of-the-art result on TADPOLE cross-sectional prediction challenge dataset.
URL:
Dual Attention Multi-Instance Deep Learning for Alzheimer’s Disease Diagnosis With Structural MRI.
Structural magnetic resonance imaging (sMRI) is widely used for the brain neurological disease diagnosis, which could reflect the variations of brain. However, due to the local brain atrophy, only a few regions in sMRI scans have obvious structural changes, which are highly correlative with pathological features. Hence, the key challenge of sMRI-based brain disease diagnosis is to enhance the identification of discriminative features. To address this issue, we propose a dual attention multi-instance deep learning network (DA-MIDL) for the early diagnosis of Alzheimer’s disease (AD) and its prodromal stage mild cognitive impairment (MCI). Specifically, DA-MIDL consists of three primary components: 1) the Patch-Nets with spatial attention blocks for extracting discriminative features within each sMRI patch whilst enhancing the features of abnormally changed micro-structures in the cerebrum, 2) an attention multi-instance learning (MIL) pooling operation for balancing the relative contribution of each patch and yield a global different weighted representation for the whole brain structure, and 3) an attention-aware global classifier for further learning the integral features and making the AD-related classification decisions. Our proposed DA-MIDL model is evaluated on the baseline sMRI scans of 1689 subjects from two independent datasets (i.e., ADNI and AIBL). The experimental results show that our DA-MIDL model can identify discriminative pathological locations and achieve better classification performance in terms of accuracy and generalizability, compared with several state-of-the-art methods.
URL:
Relationship Induced Multi-Template Learning for Diagnosis of Alzheimer’s Disease and Mild Cognitive Impairment.
As shown in the literature, methods based on multiple templates usually achieve better performance, compared with those using only a single template for processing medical images. However, most existing multi-template based methods simply average or concatenate multiple sets of features extracted from different templates, which potentially ignores important structural information contained in the multi-template data. Accordingly, in this paper, we propose a novel relationship induced multi-template learning method for automatic diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI), by explicitly modeling structural information in the multi-template data. Specifically, we first nonlinearly register each brain’s magnetic resonance (MR) image separately onto multiple pre-selected templates, and then extract multiple sets of features for this MR image. Next, we develop a novel feature selection algorithm by introducing two regularization terms to model the relationships among templates and among individual subjects. Using these selected features corresponding to multiple templates, we then construct multiple support vector machine (SVM) classifiers. Finally, an ensemble classification is used to combine outputs of all SVM classifiers, for achieving the final result. We evaluate our proposed method on 459 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, including 97 AD patients, 128 normal controls (NC), 117 progressive MCI (pMCI) patients, and 117 stable MCI (sMCI) patients. The experimental results demonstrate promising classification performance, compared with several state-of-the-art methods for multi-template based AD/MCI classification.
URL:
Identification of molecular subtypes of dementia by using blood-proteins interaction-aware graph propagational network.
Plasma protein biomarkers have been considered promising tools for diagnosing dementia subtypes due to their low variability, cost-effectiveness, and minimal invasiveness in diagnostic procedures. Machine learning (ML) methods have been applied to enhance accuracy of the biomarker discovery. However, previous ML-based studies often overlook interactions between proteins, which are crucial in complex disorders like dementia. While protein-protein interactions (PPIs) have been used in network models, these models often fail to fully capture the diverse properties of PPIs due to their local awareness. This drawback increases the chance of neglecting critical components and magnifying the impact of noisy interactions. In this study, we propose a novel graph-based ML model for dementia subtype diagnosis, the graph propagational network (GPN). By propagating the independent effect of plasma proteins on PPI network, the GPN extracts the globally interactive effects between proteins. Experimental results showed that the interactive effect between proteins yielded to further clarify the differences between dementia subtype groups and contributed to the performance improvement where the GPN outperformed existing methods by 10.4% on average.
URL:
Identifying progressive imaging genetic patterns via multi-task sparse canonical correlation analysis: a longitudinal study of the ADNI cohort.
MOTIVATION: Identifying the genetic basis of the brain structure, function and disorder by using the imaging quantitative traits (QTs) as endophenotypes is an important task in brain science. Brain QTs often change over time while the disorder progresses and thus understanding how the genetic factors play roles on the progressive brain QT changes is of great importance and meaning. Most existing imaging genetics methods only analyze the baseline neuroimaging data, and thus those longitudinal imaging data across multiple time points containing important disease progression information are omitted. RESULTS: We propose a novel temporal imaging genetic model which performs the multi-task sparse canonical correlation analysis (T-MTSCCA). Our model uses longitudinal neuroimaging data to uncover that how single nucleotide polymorphisms (SNPs) play roles on affecting brain QTs over the time. Incorporating the relationship of the longitudinal imaging data and that within SNPs, T-MTSCCA could identify a trajectory of progressive imaging genetic patterns over the time. We propose an efficient algorithm to solve the problem and show its convergence. We evaluate T-MTSCCA on 408 subjects from the Alzheimer’s Disease Neuroimaging Initiative database with longitudinal magnetic resonance imaging data and genetic data available. The experimental results show that T-MTSCCA performs either better than or equally to the state-of-the-art methods. In particular, T-MTSCCA could identify higher canonical correlation coefficients and capture clearer canonical weight patterns. This suggests that T-MTSCCA identifies time-consistent and time-dependent SNPs and imaging QTs, which further help understand the genetic basis of the brain QT changes over the time during the disease progression. AVAILABILITY AND IMPLEMENTATION: The software and simulation data are publicly available at https://github.com/dulei323/TMTSCCA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/dulei323/TMTSCCA.
Latent Representation Learning for Alzheimer’s Disease Diagnosis With Incomplete Multi-Modality Neuroimaging and Genetic Data.
The fusion of complementary information contained in multi-modality data [e.g., magnetic resonance imaging (MRI), positron emission tomography (PET), and genetic data] has advanced the progress of automated Alzheimer’s disease (AD) diagnosis. However, multi-modality based AD diagnostic models are often hindered by the missing data, i.e., not all the subjects have complete multi-modality data. One simple solution used by many previous studies is to discard samples with missing modalities. However, this significantly reduces the number of training samples, thus leading to a sub-optimal classification model. Furthermore, when building the classification model, most existing methods simply concatenate features from different modalities into a single feature vector without considering their underlying associations. As features from different modalities are often closely related (e.g., MRI and PET features are extracted from the same brain region), utilizing their inter-modality associations may improve the robustness of the diagnostic model. To this end, we propose a novel latent representation learning method for multi-modality based AD diagnosis. Specifically, we use all the available samples (including samples with incomplete modality data) to learn a latent representation space. Within this space, we not only use samples with complete multi-modality data to learn a common latent representation, but also use samples with incomplete multi-modality data to learn independent modality-specific latent representations. We then project the latent representations to the label space for AD diagnosis. We perform experiments using 737 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, and the experimental results verify the effectiveness of our proposed method.
URL:
Integrating spatial-anatomical regularization and structure sparsity into SVM: Improving interpretation of Alzheimer’s disease classification.
In recent years, machine learning approaches have been successfully applied to the field of neuroimaging for classification and regression tasks. However, many approaches do not give an intuitive relation between the raw features and the diagnosis. Therefore, they are difficult for clinicians to interpret. Moreover, most approaches treat the features extracted from the brain (for example, voxelwise gray matter concentration maps from brain MRI) as independent variables and ignore their spatial and anatomical relations. In this paper, we present a new Support Vector Machine (SVM)-based learning method for the classification of Alzheimer’s disease (AD), which integrates spatial-anatomical information. In this way, spatial-neighbor features in the same anatomical region are encouraged to have similar weights in the SVM model. Secondly, we introduce a group lasso penalty to induce structure sparsity, which may help clinicians to assess the key regions involved in the disease. For solving this learning problem, we use an accelerated proximal gradient descent approach. We tested our method on the subset of ADNI data selected by Cuingnet et al. (2011) for Alzheimer’s disease classification, as well as on an independent larger dataset from ADNI. Good classification performance is obtained for distinguishing cognitive normals (CN) vs. AD, as well as on distinguishing between various sub-types (e.g. CN vs. Mild Cognitive Impairment). The model trained on Cuignet’s dataset for AD vs. CN classification was directly used without re-training to the independent larger dataset. Good performance was achieved, demonstrating the generalizability of the proposed methods. For all experiments, the classification results are comparable or better than the state-of-the-art, while the weight map more clearly indicates the key regions related to Alzheimer’s disease.
URL:
A deep learning framework identifies dimensional representations of Alzheimer’s Disease from brain structure.
Heterogeneity of brain diseases is a challenge for precision diagnosis/prognosis. We describe and validate Smile-GAN (SeMI-supervised cLustEring-Generative Adversarial Network), a semi-supervised deep-clustering method, which examines neuroanatomical heterogeneity contrasted against normal brain structure, to identify disease subtypes through neuroimaging signatures. When applied to regional volumes derived from T1-weighted MRI (two studies; 2,832 participants; 8,146 scans) including cognitively normal individuals and those with cognitive impairment and dementia, Smile-GAN identified four patterns or axes of neurodegeneration. Applying this framework to longitudinal data revealed two distinct progression pathways. Measures of expression of these patterns predicted the pathway and rate of future neurodegeneration. Pattern expression offered complementary performance to amyloid/tau in predicting clinical progression. These deep-learning derived biomarkers offer potential for precision diagnostics and targeted clinical trial recruitment.
URL:
Multi-modal classification of neurodegenerative disease by progressive graph-based transductive learning.
Graph-based transductive learning (GTL) is a powerful machine learning technique that is used when sufficient training data is not available. In particular, conventional GTL approaches first construct a fixed inter-subject relation graph that is based on similarities in voxel intensity values in the feature domain, which can then be used to propagate the known phenotype data (i.e., clinical scores and labels) from the training data to the testing data in the label domain. However, this type of graph is exclusively learned in the feature domain, and primarily due to outliers in the observed features, may not be optimal for label propagation in the label domain. To address this limitation, a progressive GTL (pGTL) method is proposed that gradually finds an intrinsic data representation that more accurately aligns imaging features with the phenotype data. In general, optimal feature-to-phenotype alignment is achieved using an iterative approach that: (1) refines inter-subject relationships observed in the feature domain by using the learned intrinsic data representation in the label domain, (2) updates the intrinsic data representation from the refined inter-subject relationships, and (3) verifies the intrinsic data representation on the training data to guarantee an optimal classification when applied to testing data. Additionally, the iterative approach is extended to multi-modal imaging data to further improve pGTL classification accuracy. Using Alzheimer’s disease and Parkinson’s disease study data, the classification accuracy of the proposed pGTL method is compared to several state-of-the-art classification methods, and the results show pGTL can more accurately identify subjects, even at different progression stages, in these two study data sets.
URL:
Multiclass Classification of Alzheimer’s Disease Prodromal Stages using Sequential Feature Embeddings and Regularized Multikernel Support Vector Machine.
The detection of patients in the cognitive normal (CN), mild cognitive impairment (MCI), and Alzheimer’s disease (AD) stages of neurodegeneration is crucial for early treatment interventions. However, the heterogeneity of MCI data samples poses a challenge for CN vs. MCI vs. AD multiclass classification, as some samples are closer to AD while others are closer to CN in the feature space. Previous attempts to address this challenge produced inaccurate results, leading most frameworks to break the assessment into binary classification tasks such as AD vs. CN, AD vs. MCI, and CN vs. MCI. Other methods proposed sequential binary classifications such as CN vs. others and dividing others into AD vs. MCI. While those approaches may have yielded encouraging results, the sequential binary classification method makes interpretation and comparison with other frameworks challenging and subjective. Those frameworks exhibited varying accuracy scores for different binary tasks, making it unclear how to compare the model performance with other direct multiclass methods. Therefore, we introduce a classification framework comprising unsupervised ensemble manifold regularized sparse low-rank approximation and regularized multikernel support vector machine (SVM). This framework first extracts a joint feature embedding from MRI and PET neuroimaging features, which were then combined with the Apoe4, Adas11, MPACC digits, and Intracranial volume features using a regularized multikernel SVM. Using that framework, we achieved a state-of-the-art (SOTA) result in a CN vs. MCI vs. AD multiclass classification (mean accuracy: 84.87+-6.09, F1 score: 84.83+-6.12 vs 67.69). The methods generalize well to binary classification tasks, achieving SOTA results in all but the CN vs. MCI category, which was slightly lower than the best score by just 0.2%.
URL:
Predicting Cognitive Declines Using Longitudinally Enriched Representations for Imaging Biomarkers.
A critical challenge in using longitudinal neuroimaging data to study the progressions of Alzheimer’s Disease (AD) is the varied number of missing records of the patients during the course when AD develops. To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation with fixed length for imaging biomarkers, which aims to simultaneously capture the information conveyed by both baseline neuroimaging record and progressive variations characterized by varied counts of available follow-up records over time. Because the learned biomarker representations are a set of fixed-length vectors, they can be readily used by traditional machine learning models to study AD developments. Take into account that the missing brain scans are not aligned in terms of time in a studied cohort, we develop a new objective that maximizes the ratio of the summations of a number of l1 -norm distances for improved robustness, which, though, is difficult to efficiently solve in general. Thus, we derive a new efficient and non-greedy iterative solution algorithm and rigorously prove its convergence. We have performed extensive experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. A clear performance gain has been achieved in predicting ten different cognitive scores when we compare the original baseline biomarker representations against the learned representations with longitudinal enrichments. We further observe that the top selected biomarkers by our new method are in accordance with known knowledge in AD studies. These promising results have demonstrated improved performances of our new method that validate its effectiveness.
URL:
Deep Multi-Modal Discriminative and Interpretability Network for Alzheimer’s Disease Diagnosis.
Multi-modal fusion has become an important data analysis technology in Alzheimer’s disease (AD) diagnosis, which is committed to effectively extract and utilize complementary information among different modalities. However, most of the existing fusion methods focus on pursuing common feature representation by transformation, and ignore discriminative structural information among samples. In addition, most fusion methods use high-order feature extraction, such as deep neural network, by which it is difficult to identify biomarkers. In this paper, we propose a novel method named deep multi-modal discriminative and interpretability network (DMDIN), which aligns samples in a discriminative common space and provides a new approach to identify significant brain regions (ROIs) in AD diagnosis. Specifically, we reconstruct each modality with a hierarchical representation through multilayer perceptron (MLP), and take advantage of the shared self-expression coefficients constrained by diagonal blocks to embed the structural information of inter-class and the intra-class. Further, the generalized canonical correlation analysis (GCCA) is adopted as a correlation constraint to generate a discriminative common space, in which samples of the same category gather while samples of different categories stay away. Finally, in order to enhance the interpretability of the deep learning model, we utilize knowledge distillation to reproduce coordinated representations and capture influence of brain regions in AD classification. Experiments show that the proposed method performs better than several state-of-the-art methods in AD diagnosis.
URL:
Predicting Alzheimer’s disease progression using deep recurrent neural networks.
Early identification of individuals at risk of developing Alzheimer’s disease (AD) dementia is important for developing disease-modifying therapies. In this study, given multimodal AD markers and clinical diagnosis of an individual from one or more timepoints, we seek to predict the clinical diagnosis, cognition and ventricular volume of the individual for every month (indefinitely) into the future. We proposed and applied a minimal recurrent neural network (minimalRNN) model to data from The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge, comprising longitudinal data of 1677 participants (Marinescu et al., 2018) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We compared the performance of the minimalRNN model and four baseline algorithms up to 6 years into the future. Most previous work on predicting AD progression ignore the issue of missing data, which is a prevalent issue in longitudinal data. Here, we explored three different strategies to handle missing data. Two of the strategies treated the missing data as a “preprocessing” issue, by imputing the missing data using the previous timepoint (“forward filling”) or linear interpolation (“linear filling). The third strategy utilized the minimalRNN model itself to fill in the missing data both during training and testing (“model filling”). Our analyses suggest that the minimalRNN with “model filling” compared favorably with baseline algorithms, including support vector machine/regression, linear state space (LSS) model, and long short-term memory (LSTM) model. Importantly, although the training procedure utilized longitudinal data, we found that the trained minimalRNN model exhibited similar performance, when using only 1 input timepoint or 4 input timepoints, suggesting that our approach might work well with just cross-sectional data. An earlier version of our approach was ranked 5th (out of 53 entries) in the TADPOLE challenge in 2019. The current approach is ranked 2nd out of 63 entries as of June 3rd, 2020.
URL:
Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links.
In recent years, deep learning approaches have gained significant attention in predicting brain disorders using neuroimaging data. However, conventional methods often rely on single-modality data and supervised models, which provide only a limited perspective of the intricacies of the highly complex brain. Moreover, the scarcity of accurate diagnostic labels in clinical settings hinders the applicability of the supervised models. To address these limitations, we propose a novel self-supervised framework for extracting multiple representations from multimodal neuroimaging data to enhance group inferences and enable analysis without resorting to labeled data during pre-training. Our approach leverages Deep InfoMax (DIM), a self-supervised methodology renowned for its efficacy in learning representations by estimating mutual information without the need for explicit labels. While DIM has shown promise in predicting brain disorders from single-modality MRI data, its potential for multimodal data remains untapped. This work extends DIM to multimodal neuroimaging data, allowing us to identify disorder-relevant brain regions and explore multimodal links. We present compelling evidence of the efficacy of our multimodal DIM analysis in uncovering disorder-relevant brain regions, including the hippocampus, caudate, insula, - and multimodal links with the thalamus, precuneus, and subthalamus hypothalamus. Our self-supervised representations demonstrate promising capabilities in predicting the presence of brain disorders across a spectrum of Alzheimer’s phenotypes. Comparative evaluations against state-of-the-art unsupervised methods based on autoencoders, canonical correlation analysis, and supervised models highlight the superiority of our proposed method in achieving improved classification performance, capturing joint information, and interpretability capabilities. The computational efficiency of the decoder-free strategy enhances its practical utility, as it saves compute resources without compromising performance. This work offers a significant step forward in addressing the challenge of understanding multimodal links in complex brain disorders, with potential applications in neuroimaging research and clinical diagnosis.
URL:
Designing weighted correlation kernels in convolutional neural networks for functional connectivity based brain disease diagnosis.
Functional connectivity networks (FCNs) based on functional magnetic resonance imaging (fMRI) have been widely applied to analyzing and diagnosing brain diseases, such as Alzheimer’s disease (AD) and its prodrome stage, i.e., mild cognitive impairment (MCI). Existing studies usually use Pearson correlation coefficient (PCC) method to construct FCNs, and then extract network measures (e.g., clustering coefficients) as features to learn a diagnostic model. However, the valuable observation information in network construction (e.g., specific contributions of different time points), as well as high-level and high-order network features are neglected in these studies. In this paper, we first define a novel weighted correlation kernel (called wc-kernel) to measure the correlation of brain regions, by which weighting factors are learned in a data-driven manner to characterize the contributions of different time points, thus conveying the richer interaction information among brain regions compared with the PCC method. Furthermore, we build a wc-kernel based convolutional neural network (CNN) (called wck-CNN) framework for learning the hierarchical (i.e., from local to global and also from low-level to high-level) features for disease diagnosis, by using fMRI data. Specifically, we first define a layer to build dynamic FCNs using our proposed wc-kernels. Then, we define another three layers to sequentially extract local (brain region specific), global (brain network specific) and temporal features from the constructed dynamic FCNs for classification. Experimental results on 174 subjects (a total of 563 scans) with rest-state fMRI (rs-fMRI) data from ADNI database demonstrate the efficacy of our proposed method.
URL:
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature.
URL:
Cost-Sensitive Weighted Contrastive Learning Based on Graph Convolutional Networks for Imbalanced Alzheimer’s Disease Staging.
Identifying the progression stages of Alzheimer’s disease (AD) can be considered as an imbalanced multi-class classification problem in machine learning. It is challenging due to the class imbalance issue and the heterogeneity of the disease. Recently, graph convolutional networks (GCNs) have been successfully applied in AD classification. However, these works did not handle the class imbalance issue in classification. Besides, they ignore the heterogeneity of the disease. To this end, we propose a novel cost-sensitive weighted contrastive learning method based on graph convolutional networks (CSWCL-GCNs) for imbalanced AD staging using resting-state functional magnetic resonance imaging (rs-fMRI). The proposed method is developed on a multi-view graph constructed using the functional connectivity (FC) and high-order functional connectivity (HOFC) features of the subjects. A novel cost-sensitive weighted contrastive learning procedure is proposed to capture discriminative information from the minority classes, encouraging the samples in the minority class to provide adequate supervision. Considering the heterogeneity of the disease, the weights of the negative pairs are introduced into contrastive learning and they are computed based on the distance to class prototypes, which are automatically learned from the training data. Meanwhile, the cost-sensitive mechanism is further introduced into contrastive learning to handle the class imbalance issue. The proposed CSWCL-GCN is evaluated on 720 subjects (including 184 NCs, 40 SMC patients, 208 EMCI patients, 172 LMCI patients and 116 AD patients) from the ADNI (Alzheimer’s Disease Neuroimaging Initiative). Experimental results show that the proposed CSWCL-GCN outperforms state-of-the-art methods on the ADNI database.
URL:
Inferring protein expression changes from mRNA in Alzheimer’s dementia using deep neural networks.
Identifying the molecular systems and proteins that modify the progression of Alzheimer’s disease and related dementias (ADRD) is central to drug target selection. However, discordance between mRNA and protein abundance, and the scarcity of proteomic data, has limited our ability to advance candidate targets that are mainly based on gene expression. Therefore, by using a deep neural network that predicts protein abundance from mRNA expression, here we attempt to track the early protein drivers of ADRD. Specifically, by applying the clei2block deep learning model to 1192 brain RNA-seq samples, we identify protein modules and disease-associated expression changes that were not directly observed at the mRNA level. Moreover, pseudo-temporal trajectory inference based on the predicted proteome became more closely correlated with cognitive decline and hippocampal atrophy compared to RNA-based trajectories. This suggests that the predicted changes in protein expression could provide a better molecular representation of ADRD progression. Furthermore, overlaying clinical traits on protein pseudotime trajectory identifies protein modules altered before cognitive impairment. These results demonstrate how our method can be used to identify potential early protein drivers and possible drug targets for treating and/or preventing ADRD.
URL:
A deep learning model for early prediction of Alzheimer’s disease dementia based on hippocampal magnetic resonance imaging data.
INTRODUCTION: It is challenging at baseline to predict when and which individuals who meet criteria for mild cognitive impairment (MCI) will ultimately progress to Alzheimer’s disease (AD) dementia. METHODS: A deep learning method is developed and validated based on magnetic resonance imaging scans of 2146 subjects (803 for training and 1343 for validation) to predict MCI subjects’ progression to AD dementia in a time-to-event analysis setting. RESULTS: The deep-learning time-to-event model predicted individual subjects’ progression to AD dementia with a concordance index of 0.762 on 439 Alzheimer’s Disease Neuroimaging Initiative testing MCI subjects with follow-up duration from 6 to 78 months (quartiles: [24, 42, 54]) and a concordance index of 0.781 on 40 Australian Imaging Biomarkers and Lifestyle Study of Aging testing MCI subjects with follow-up duration from 18 to 54 months (quartiles: [18, 36, 54]). The predicted progression risk also clustered individual subjects into subgroups with significant differences in their progression time to AD dementia (P < .0002). Improved performance for predicting progression to AD dementia (concordance index = 0.864) was obtained when the deep learning-based progression risk was combined with baseline clinical measures. DISCUSSION: Our method provides a cost effective and accurate means for prognosis and potentially to facilitate enrollment in clinical trials with individuals likely to progress within a specific temporal period.
URL:
TR-GAN: Multi-Session Future MRI Prediction With Temporal Recurrent Generative Adversarial Network.
Magnetic Resonance Imaging (MRI) has been proven to be an efficient way to diagnose Alzheimer’s disease (AD). Recent dramatic progress on deep learning greatly promotes the MRI analysis based on data-driven CNN methods using a large-scale longitudinal MRI dataset. However, most of the existing MRI datasets are fragmented due to unexpected quits of volunteers. To tackle this problem, we propose a novel Temporal Recurrent Generative Adversarial Network (TR-GAN) to complete missing sessions of MRI datasets. Unlike existing GAN-based methods, which either fail to generate future sessions or only generate fixed-length sessions, TR-GAN takes all past sessions to recurrently and smoothly generate future ones with variant length. Specifically, TR-GAN adopts recurrent connection to deal with variant input sequence length and flexibly generate future variant sessions. Besides, we also design a multiple scale & location (MSL) module and a SWAP module to encourage the model to better focus on detailed information, which helps to generate high-quality MRI data. Compared with other popular GAN architectures, TR-GAN achieved the best performance in all evaluation metrics of two datasets. After expanding the Whole MRI dataset, the balanced accuracy of AD vs. cognitively normal (CN) vs. mild cognitive impairment (MCI) and stable MCI vs. progressive MCI classification can be increased by 3.61% and 4.00%, respectively.
URL:
View-aligned hypergraph learning for Alzheimer’s disease diagnosis with incomplete multi-modality data.
Effectively utilizing incomplete multi-modality data for the diagnosis of Alzheimer’s disease (AD) and its prodrome (i.e., mild cognitive impairment, MCI) remains an active area of research. Several multi-view learning methods have been recently developed for AD/MCI diagnosis by using incomplete multi-modality data, with each view corresponding to a specific modality or a combination of several modalities. However, existing methods usually ignore the underlying coherence among views, which may lead to sub-optimal learning performance. In this paper, we propose a view-aligned hypergraph learning (VAHL) method to explicitly model the coherence among views. Specifically, we first divide the original data into several views based on the availability of different modalities and then construct a hypergraph in each view space based on sparse representation. A view-aligned hypergraph classification (VAHC) model is then proposed, by using a view-aligned regularizer to capture coherence among views. We further assemble the class probability scores generated from VAHC, via a multi-view label fusion method for making a final classification decision. We evaluate our method on the baseline ADNI-1 database with 807 subjects and three modalities (i.e., MRI, PET, and CSF). Experimental results demonstrate that our method outperforms state-of-the-art methods that use incomplete multi-modality data for AD/MCI diagnosis.
URL:
Training recurrent neural networks robust to incomplete data: Application to Alzheimer’s disease progression modeling.
Disease progression modeling (DPM) using longitudinal data is a challenging machine learning task. Existing DPM algorithms neglect temporal dependencies among measurements, make parametric assumptions about biomarker trajectories, do not model multiple biomarkers jointly, and need an alignment of subjects’ trajectories. In this paper, recurrent neural networks (RNNs) are utilized to address these issues. However, in many cases, longitudinal cohorts contain incomplete data, which hinders the application of standard RNNs and requires a pre-processing step such as imputation of the missing values. Instead, we propose a generalized training rule for the most widely used RNN architecture, long short-term memory (LSTM) networks, that can handle both missing predictor and target values. The proposed LSTM algorithm is applied to model the progression of Alzheimer’s disease (AD) using six volumetric magnetic resonance imaging (MRI) biomarkers, i.e., volumes of ventricles, hippocampus, whole brain, fusiform, middle temporal gyrus, and entorhinal cortex, and it is compared to standard LSTM networks with data imputation and a parametric, regression-based DPM method. The results show that the proposed algorithm achieves a significantly lower mean absolute error (MAE) than the alternatives with p < 0.05 using Wilcoxon signed rank test in predicting values of almost all of the MRI biomarkers. Moreover, a linear discriminant analysis (LDA) classifier applied to the predicted biomarker values produces a significantly larger area under the receiver operating characteristic curve (AUC) of 0.90 vs. at most 0.84 with p < 0.001 using McNemar’s test for clinical diagnosis of AD. Inspection of MAE curves as a function of the amount of missing data reveals that the proposed LSTM algorithm achieves the best performance up until more than 74% missing values. Finally, it is illustrated how the method can successfully be applied to data with varying time intervals. This paper shows that built-in handling of missing values in training an LSTM network benefits the application of RNNs in neurodegenerative disease progression modeling in longitudinal cohorts.
URL:
Disease prediction with edge-variational graph convolutional networks.
The need for computational models that can incorporate imaging data with non-imaging data while investigating inter-subject associations arises in the task of population-based disease analysis. Although off-the-shelf deep convolutional neural networks have empowered representation learning from imaging data, incorporating data of different modalities complementarily in a unified model to improve the disease diagnostic quality is still challenging. In this work, we propose a generalizable graph-convolutional framework for population-based disease prediction on multi-modal medical data. Unlike previous methods constructing a static affinity population graph in a hand-crafting manner, the proposed framework can automatically learn to build a population graph with variational edges, which we show can be optimized jointly with spectral graph convolutional networks. In addition, to estimate the predictive uncertainty related to the constructed graph, we propose Monte-Carlo edge dropout uncertainty estimation. Experimental results on four multi-modal datasets demonstrate that the proposed method can substantially improve the predictive accuracy for Autism Spectrum Disorder, Alzheimer’s disease, and ocular diseases. A sufficient ablation study with in-depth discussion is conducted to evaluate the effectiveness of each component and the choice of algorithmic details of the proposed method. The results indicate the potential and extendability of the proposed framework in leveraging multi-modal data for population-based disease prediction.
URL:
Latent diffusion model-based MRI superresolution enhances mild cognitive impairment prognostication and Alzheimer’s disease classification.
INTRODUCTION: Timely diagnosis and prognostication of Alzheimer’s disease (AD) and mild cognitive impairment (MCI) are pivotal for effective intervention. Artificial intelligence (AI) in neuroradiology may aid in such appropriate diagnosis and prognostication. This study aimed to evaluate the potential of novel diffusion model-based AI for enhancing AD and MCI diagnosis through superresolution (SR) of brain magnetic resonance (MR) images. METHODS: 1.5T brain MR scans of patients with AD or MCI and healthy controls (NC) from Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI1) were superresolved to 3T using a novel diffusion model-based generative AI (d3T) and a convolutional neural network-based model (c3T). Comparisons of image quality to actual 1.5T and 3T MRI were conducted based on signal-to-noise ratio (SNR), naturalness image quality evaluator (NIQE), and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE). Voxel-based volumetric analysis was then conducted to study whether 3T* images offered more accurate volumetry than 1.5T images. Binary and multiclass classifications of AD, MCI, and NC were conducted to evaluate whether 3T* images offered superior AD classification performance compared to actual 1.5T MRI. Moreover, CNN-based classifers were used to predict conversion of MCI to AD, to evaluate the prognostication performance of 3T* images. The classification performances were evaluated using accuracy, sensitivity, specificity, F1 score, Matthews correlation coefficient (MCC), and area under the receiver-operating curves (AUROC). RESULTS: Analysis of variance (ANOVA) detected significant differences in image quality among the 1.5T, c3T, d3T, and 3T groups across all metrics. Both c3T* and d3T* showed superior image quality compared to 1.5T MRI in NIQE and BRISQUE with statistical significance. While the hippocampal volumes measured in 3T* and 3T images were not significantly different, the hippocampal volume measured in 1.5T images showed significant difference. 3T-based AD classifications showed superior performance across all performance metrics compared to 1.5T-based AD classification. Classification performance between d3T and actual 3T was not significantly different. 3T* images offered superior accuracy in predicting the conversion of MCI to AD than 1.5T images did. CONCLUSIONS: The diffusion model-based MRI SR enhances the resolution of brain MR images, significantly improving diagnostic and prognostic accuracy for AD and MCI. Superresolved 3T* images closely matched actual 3T MRIs in quality and volumetric accuracy, and notably improved the prediction performance of conversion from MCI to AD.
URL:
A machine learning approach to brain epigenetic analysis reveals kinases associated with Alzheimer’s disease.
Alzheimer’s disease (AD) is influenced by both genetic and environmental factors; thus, brain epigenomic alterations may provide insights into AD pathogenesis. Multiple array-based Epigenome-Wide Association Studies (EWASs) have identified robust brain methylation changes in AD; however, array-based assays only test about 2% of all CpG sites in the genome. Here, we develop EWASplus, a computational method that uses a supervised machine learning strategy to extend EWAS coverage to the entire genome. Application to six AD-related traits predicts hundreds of new significant brain CpGs associated with AD, some of which are further validated experimentally. EWASplus also performs well on data collected from independent cohorts and different brain regions. Genes found near top EWASplus loci are enriched for kinases and for genes with evidence for physical interactions with known AD genes. In this work, we show that EWASplus implicates additional epigenetic loci for AD that are not found using array-based AD EWASs.
URL:
Feature aggregation graph convolutional network based on imaging genetic data for diagnosis and pathogeny identification of Alzheimer’s disease.
The roles of brain regions activities and gene expressions in the development of Alzheimer’s disease (AD) remain unclear. Existing imaging genetic studies usually has the problem of inefficiency and inadequate fusion of data. This study proposes a novel deep learning method to efficiently capture the development pattern of AD. First, we model the interaction between brain regions and genes as node-to-node feature aggregation in a brain region-gene network. Second, we propose a feature aggregation graph convolutional network (FAGCN) to transmit and update the node feature. Compared with the trivial graph convolutional procedure, we replace the input from the adjacency matrix with a weight matrix based on correlation analysis and consider common neighbor similarity to discover broader associations of nodes. Finally, we use a full-gradient saliency graph mechanism to score and extract the pathogenetic brain regions and risk genes. According to the results, FAGCN achieved the best performance among both traditional and cutting-edge methods and extracted AD-related brain regions and genes, providing theoretical and methodological support for the research of related diseases.
URL:
Inter-modality relationship constrained multi-modality multi-task feature selection for Alzheimer’s Disease and mild cognitive impairment identification.
Previous studies have demonstrated that the use of integrated information from multi-modalities could significantly improve diagnosis of Alzheimer’s Disease (AD). However, feature selection, which is one of the most important steps in classification, is typically performed separately for each modality, which ignores the potentially strong inter-modality relationship within each subject. Recent emergence of multi-task learning approach makes the joint feature selection from different modalities possible. However, joint feature selection may unfortunately overlook different yet complementary information conveyed by different modalities. We propose a novel multi-task feature selection method to preserve the complementary inter-modality information. Specifically, we treat feature selection from each modality as a separate task and further impose a constraint for preserving the inter-modality relationship, besides separately enforcing the sparseness of the selected features from each modality. After feature selection, a multi-kernel support vector machine (SVM) is further used to integrate the selected features from each modality for classification. Our method is evaluated using the baseline PET and MRI images of subjects obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Our method achieves a good performance, with an accuracy of 94.37% and an area under the ROC curve (AUC) of 0.9724 for AD identification, and also an accuracy of 78.80% and an AUC of 0.8284 for mild cognitive impairment (MCI) identification. Moreover, the proposed method achieves an accuracy of 67.83% and an AUC of 0.6957 for separating between MCI converters and MCI non-converters (to AD). These performances demonstrate the superiority of the proposed method over the state-of-the-art classification methods.
URL:
Morbigenous brain region and gene detection with a genetically evolved random neural network cluster approach in late mild cognitive impairment.
MOTIVATION: The multimodal data fusion analysis becomes another important field for brain disease detection and increasing researches concentrate on using neural network algorithms to solve a range of problems. However, most current neural network optimizing strategies focus on internal nodes or hidden layer numbers, while ignoring the advantages of external optimization. Additionally, in the multimodal data fusion analysis of brain science, the problems of small sample size and high-dimensional data are often encountered due to the difficulty of data collection and the specialization of brain science data, which may result in the lower generalization performance of neural network. RESULTS: We propose a genetically evolved random neural network cluster (GERNNC) model. Specifically, the fusion characteristics are first constructed to be taken as the input and the best type of neural network is selected as the base classifier to form the initial random neural network cluster. Second, the cluster is adaptively genetically evolved. Based on the GERNNC model, we further construct a multi-tasking framework for the classification of patients with brain disease and the extraction of significant characteristics. In a study of genetic data and functional magnetic resonance imaging data from the Alzheimer’s Disease Neuroimaging Initiative, the framework exhibits great classification performance and strong morbigenous factor detection ability. This work demonstrates that how to effectively detect pathogenic components of the brain disease on the high-dimensional medical data and small samples. AVAILABILITY AND IMPLEMENTATION: The Matlab code is available at https://github.com/lizi1234560/GERNNC.git.
URL: https://github.com/lizi1234560/GERNNC.git.
Interpretable classification of Alzheimer’s disease pathologies with a convolutional neural network pipeline.
Neuropathologists assess vast brain areas to identify diverse and subtly-differentiated morphologies. Standard semi-quantitative scoring approaches, however, are coarse-grained and lack precise neuroanatomic localization. We report a proof-of-concept deep learning pipeline that identifies specific neuropathologies-amyloid plaques and cerebral amyloid angiopathy-in immunohistochemically-stained archival slides. Using automated segmentation of stained objects and a cloud-based interface, we annotate > 70,000 plaque candidates from 43 whole slide images (WSIs) to train and evaluate convolutional neural networks. Networks achieve strong plaque classification on a 10-WSI hold-out set (0.993 and 0.743 areas under the receiver operating characteristic and precision recall curve, respectively). Prediction confidence maps visualize morphology distributions at high resolution. Resulting network-derived amyloid beta (Abeta)-burden scores correlate well with established semi-quantitative scores on a 30-WSI blinded hold-out. Finally, saliency mapping demonstrates that networks learn patterns agreeing with accepted pathologic features. This scalable means to augment a neuropathologist’s ability suggests a route to neuropathologic deep phenotyping.
URL:
Imaging-based enrichment criteria using deep learning algorithms for efficient clinical trials in mild cognitive impairment.
The mild cognitive impairment (MCI) stage of Alzheimer’s disease (AD) may be optimal for clinical trials to test potential treatments for preventing or delaying decline to dementia. However, MCI is heterogeneous in that not all cases progress to dementia within the time frame of a trial and some may not have underlying AD pathology. Identifying those MCIs who are most likely to decline during a trial and thus most likely to benefit from treatment will improve trial efficiency and power to detect treatment effects. To this end, using multimodal, imaging-derived, inclusion criteria may be especially beneficial. Here, we present a novel multimodal imaging marker that predicts future cognitive and neural decline from [F-18]fluorodeoxyglucose positron emission tomography (PET), amyloid florbetapir PET, and structural magnetic resonance imaging, based on a new deep learning algorithm (randomized denoising autoencoder marker, rDAm). Using ADNI2 MCI data, we show that using rDAm as a trial enrichment criterion reduces the required sample estimates by at least five times compared with the no-enrichment regime and leads to smaller trials with high statistical power, compared with existing methods.
URL:
Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning.
MOTIVATION: Recent advances in brain imaging and high-throughput genotyping techniques enable new approaches to study the influence of genetic and anatomical variations on brain functions and disorders. Traditional association studies typically perform independent and pairwise analysis among neuroimaging measures, cognitive scores and disease status, and ignore the important underlying interacting relationships between these units. RESULTS: To overcome this limitation, in this article, we propose a new sparse multimodal multitask learning method to reveal complex relationships from gene to brain to symptom. Our main contributions are three-fold: (i) introducing combined structured sparsity regularizations into multimodal multitask learning to integrate multidimensional heterogeneous imaging genetics data and identify multimodal biomarkers; (ii) utilizing a joint classification and regression learning model to identify disease-sensitive and cognition-relevant biomarkers; (iii) deriving a new efficient optimization algorithm to solve our non-smooth objective function and providing rigorous theoretical analysis on the global optimum convergency. Using the imaging genetics data from the Alzheimer’s Disease Neuroimaging Initiative database, the effectiveness of the proposed method is demonstrated by clearly improved performance on predicting both cognitive scores and disease status. The identified multimodal biomarkers could predict not only disease status but also cognitive function to help elucidate the biological pathway from gene to brain structure and function, and to cognition and disease. AVAILABILITY: Software is publicly available at: http://ranger.uta.edu/%7eheng/multimodal/.
URL: http://ranger.uta.edu/%7eheng/multimodal/.
Multi-scale semi-supervised clustering of brain images: Deriving disease subtypes.
Disease heterogeneity is a significant obstacle to understanding pathological processes and delivering precision diagnostics and treatment. Clustering methods have gained popularity for stratifying patients into subpopulations (i.e., subtypes) of brain diseases using imaging data. However, unsupervised clustering approaches are often confounded by anatomical and functional variations not related to a disease or pathology of interest. Semi-supervised clustering techniques have been proposed to overcome this and, therefore, capture disease-specific patterns more effectively. An additional limitation of both unsupervised and semi-supervised conventional machine learning methods is that they typically model, learn and infer from data using a basis of feature sets pre-defined at a fixed anatomical or functional scale (e.g., atlas-based regions of interest). Herein we propose a novel method, “Multi-scAle heteroGeneity analysIs and Clustering” (MAGIC), to depict the multi-scale presentation of disease heterogeneity, which builds on a previously proposed semi-supervised clustering method, HYDRA. It derives multi-scale and clinically interpretable feature representations and exploits a double-cyclic optimization procedure to effectively drive identification of inter-scale-consistent disease subtypes. More importantly, to understand the conditions under which the clustering model can estimate true heterogeneity related to diseases, we conducted extensive and systematic semi-simulated experiments to evaluate the proposed method on a sizeable healthy control sample from the UK Biobank (N = 4403). We then applied MAGIC to imaging data from Alzheimer’s disease (ADNI, N = 1728) and schizophrenia (PHENOM, N = 1166) patients to demonstrate its potential and challenges in dissecting the neuroanatomical heterogeneity of common brain diseases. Taken together, we aim to provide guidance regarding when such analyses can succeed or should be taken with caution. The code of the proposed method is publicly available at https://github.com/anbai106/MAGIC.
URL: https://github.com/anbai106/MAGIC.
Machine learning-based quantification for disease uncertainty increases the statistical power of genetic association studies.
MOTIVATION: Allowance for increasingly large samples is a key to identify the association of genetic variants with Alzheimer’s disease (AD) in genome-wide association studies (GWAS). Accordingly, we aimed to develop a method that incorporates patients with mild cognitive impairment (MCI) and unknown cognitive status in GWAS using a machine learning-based AD prediction model. RESULTS: Simulation analyses showed that weighting imputed phenotypes (WIP) method increased the statistical power compared to ordinary logistic regression using only AD cases and controls. Applied to real-world data, the penalized logistic method had the highest AUC (0.96) for AD prediction and WIP method performed well in terms of power. We identified an association (p < 5.0x10-8) of AD with several variants in the APOE region and rs143625563 in LMX1A. Our method, which allows the inclusion of individuals with MCI, improves the statistical power of GWAS for AD. We discovered a novel association with LMX1A. AVAILABILITY AND IMPLEMENTATION: Simulation codes can be accessed at https://github.com/Junkkkk/wGEE_GWAS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/Junkkkk/wGEE_GWAS.
Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer’s disease.
Alzheimer’s disease (AD) is one of the most common neurodegenerative diseases with a commonly seen prodromal mild cognitive impairment (MCI) phase where memory loss is the main complaint progressively worsening with behavior issues and poor self-care. However, not all individuals clinically diagnosed with MCI progress to AD. A fraction of subjects with MCI either progress to non-AD dementia or remain stable at the MCI stage without progressing to dementia. Although a curative treatment of AD is currently unavailable, it is extremely important to correctly identify the individuals in the MCI phase that will go on to develop AD so that they may benefit from a curative treatment when one becomes available in the near future. At the same time, it would be highly desirable to also correctly identify those in the MCI phase that do not have AD pathology so they may be spared from unnecessary pharmocologic interventions that, at best, may provide them no benefit, and at worse, could further harm them with adverse side-effects. Additionally, it may be easier and simpler to identify the cause of the cognitive impairment in these non-AD cases, and hence proper identification of prodromal AD will be of benefit to these individuals as well. Fluorodeoxy glucose positron emission tomography (FDG-PET) captures the metabolic activity of the brain, and this imaging modality has been reported to identify changes related to AD prior to the onset of structural changes. Prior work on designing classifier using FDG-PET imaging has been promising. Since deep-learning has recently emerged as a powerful tool to mine features and use them for accurate labeling of the group membership of given images, we propose a novel deep-learning framework using FDG-PET metabolism imaging to identify subjects at the MCI stage with presymptomatic AD and discriminate them from other subjects with MCI (non-AD / non-progressive). Our multiscale deep neural network obtained 82.51% accuracy of classification just using measures from a single modality (FDG-PET metabolism data) outperforming other comparable FDG-PET classifiers published in the recent literature.
URL:
Hybrid representation learning for cognitive diagnosis in late-life depression over 5 years with structural MRI.
Late-life depression (LLD) is a highly prevalent mood disorder occurring in older adults and is frequently accompanied by cognitive impairment (CI). Studies have shown that LLD may increase the risk of Alzheimer’s disease (AD). However, the heterogeneity of presentation of geriatric depression suggests that multiple biological mechanisms may underlie it. Current biological research on LLD progression incorporates machine learning that combines neuroimaging data with clinical observations. There are few studies on incident cognitive diagnostic outcomes in LLD based on structural MRI (sMRI). In this paper, we describe the development of a hybrid representation learning (HRL) framework for predicting cognitive diagnosis over 5 years based on T1-weighted sMRI data. Specifically, we first extract prediction-oriented MRI features via a deep neural network, and then integrate them with handcrafted MRI features via a Transformer encoder for cognitive diagnosis prediction. Two tasks are investigated in this work, including (1) identifying cognitively normal subjects with LLD and never-depressed older healthy subjects, and (2) identifying LLD subjects who developed CI (or even AD) and those who stayed cognitively normal over five years. We validate the proposed HRL on 294 subjects with T1-weighted MRIs from two clinically harmonized studies. Experimental results suggest that the HRL outperforms several classical machine learning and state-of-the-art deep learning methods in LLD identification and prediction tasks.
URL:
Alzheimer’s disease diagnosis from multi-modal data via feature inductive learning and dual multilevel graph neural network.
Multi-modal data can provide complementary information of Alzheimer’s disease (AD) and its development from different perspectives. Such information is closely related to the diagnosis, prevention, and treatment of AD, and hence it is necessary and critical to study AD through multi-modal data. Existing learning methods, however, usually ignore the influence of feature heterogeneity and directly fuse features in the last stages. Furthermore, most of these methods only focus on local fusion features or global fusion features, neglecting the complementariness of features at different levels and thus not sufficiently leveraging information embedded in multi-modal data. To overcome these shortcomings, we propose a novel framework for AD diagnosis that fuses gene, imaging, protein, and clinical data. Our framework learns feature representations under the same feature space for different modalities through a feature induction learning (FIL) module, thereby alleviating the impact of feature heterogeneity. Furthermore, in our framework, local and global salient multi-modal feature interaction information at different levels is extracted through a novel dual multilevel graph neural network (DMGNN). We extensively validate the proposed method on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and experimental results demonstrate our method consistently outperforms other state-of-the-art multi-modal fusion methods. The code is publicly available on the GitHub website. (https://github.com/xiankantingqianxue/MIA-code.git).
URL: https://github.com/xiankantingqianxue/MIA-code.git
Disentangling brain atrophy heterogeneity in Alzheimer’s disease: a deep self-supervised approach with interpretable latent space.
Alzheimer’s disease (AD) is heterogeneous, but existing methods for capturing this heterogeneity through dimensionality reduction and unsupervised clustering have limitations when it comes to extracting intricate atrophy patterns. In this study, we propose a deep learning based self-supervised framework that characterizes complex atrophy features using latent space representation. It integrates feature engineering, classification, and clustering to synergistically disentangle heterogeneity in Alzheimer’s disease. Through this representation learning, we trained a clustered latent space with distinct atrophy patterns and clinical characteristics in AD, and replicated the findings in prodromal Alzheimer’s disease. Moreover, we discovered that these clusters are not solely attributed to subtypes but also reflect disease progression in the latent space, representing the core dimensions of heterogeneity, namely progression and subtypes. Furthermore, longitudinal latent space analysis revealed two distinct disease progression pathways: medial temporal and parietotemporal pathways. The proposed approach enables effective latent representations that can be integrated with individual-level cognitive profiles, thereby facilitating a comprehensive understanding of AD heterogeneity.
URL:
Cascaded Multi-Modal Mixing Transformers for Alzheimer’s Disease Classification with Incomplete Data.
Accurate medical classification requires a large number of multi-modal data, and in many cases, different feature types. Previous studies have shown promising results when using multi-modal data, outperforming single-modality models when classifying diseases such as Alzheimer’s Disease (AD). However, those models are usually not flexible enough to handle missing modalities. Currently, the most common workaround is discarding samples with missing modalities which leads to considerable data under-utilisation. Adding to the fact that labelled medical images are already scarce, the performance of data-driven methods like deep learning can be severely hampered. Therefore, a multi-modal method that can handle missing data in various clinical settings is highly desirable. In this paper, we present Multi-Modal Mixing Transformer (3MT), a disease classification transformer that not only leverages multi-modal data but also handles missing data scenarios. In this work, we test 3MT for AD and Cognitively normal (CN) classification and mild cognitive impairment (MCI) conversion prediction to progressive MCI (pMCI) or stable MCI (sMCI) using clinical and neuroimaging data. The model uses a novel Cascaded Modality Transformers architecture with cross-attention to incorporate multi-modal information for more informed predictions. We propose a novel modality dropout mechanism to ensure an unprecedented level of modality independence and robustness to handle missing data scenarios. The result is a versatile network that enables the mixing of arbitrary numbers of modalities with different feature types and also ensures full data utilization in missing data scenarios. The model is trained and evaluated on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset with the state-of-the-art performance and further evaluated with The Australian Imaging Biomarker & Lifestyle Flagship Study of Ageing (AIBL) dataset with missing data.
URL:
Identifying the neuroanatomical basis of cognitive impairment in Alzheimer’s disease by correlation- and nonlinearity-aware sparse Bayesian learning.
Predicting cognitive performance of subjects from their magnetic resonance imaging (MRI) measures and identifying relevant imaging biomarkers are important research topics in the study of Alzheimer’s disease. Traditionally, this task is performed by formulating a linear regression problem. Recently, it is found that using a linear sparse regression model can achieve better prediction accuracy. However, most existing studies only focus on the exploitation of sparsity of regression coefficients, ignoring useful structure information in regression coefficients. Also, these linear sparse models may not capture more complicated and possibly nonlinear relationships between cognitive performance and MRI measures. Motivated by these observations, in this work we build a sparse multivariate regression model for this task and propose an empirical sparse Bayesian learning algorithm. Different from existing sparse algorithms, the proposed algorithm models the response as a nonlinear function of the predictors by extending the predictor matrix with block structures. Further, it exploits not only inter-vector correlation among regression coefficient vectors, but also intra-block correlation in each regression coefficient vector. Experiments on the Alzheimer’s Disease Neuroimaging Initiative database showed that the proposed algorithm not only achieved better prediction performance than state-of-the-art competitive methods, but also effectively identified biologically meaningful patterns.
URL:
ICAM-Reg: Interpretable Classification and Regression With Feature Attribution for Mapping Neurological Phenotypes in Individual Scans.
An important goal of medical imaging is to be able to precisely detect patterns of disease specific to individual scans; however, this is challenged in brain imaging by the degree of heterogeneity of shape and appearance. Traditional methods, based on image registration, historically fail to detect variable features of disease, as they utilise population-based analyses, suited primarily to studying group-average effects. In this paper we therefore take advantage of recent developments in generative deep learning to develop a method for simultaneous classification, or regression, and feature attribution (FA). Specifically, we explore the use of a VAE-GAN (variational autoencoder - general adversarial network) for translation called ICAM, to explicitly disentangle class relevant features, from background confounds, for improved interpretability and regression of neurological phenotypes. We validate our method on the tasks of Mini-Mental State Examination (MMSE) cognitive test score prediction for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, as well as brain age prediction, for both neurodevelopment and neurodegeneration, using the developing Human Connectome Project (dHCP) and UK Biobank datasets. We show that the generated FA maps can be used to explain outlier predictions and demonstrate that the inclusion of a regression module improves the disentanglement of the latent space. Our code is freely available on GitHub https://github.com/CherBass/ICAM.
URL: https://github.com/CherBass/ICAM.
Predicting changes in brain metabolism and progression from mild cognitive impairment to dementia using multitask Deep Learning models and explainable AI.
BACKGROUND: The prediction of Alzheimer’s disease (AD) progression from its early stages is a research priority. In this context, the use of Artificial Intelligence (AI) in AD has experienced a notable surge in recent years. However, existing investigations predominantly concentrate on distinguishing clinical phenotypes through cross-sectional approaches. This study aims to investigate the potential of modeling additional dimensions of the disease, such as variations in brain metabolism assessed via [18F]-fluorodeoxyglucose positron emission tomography (FDG-PET), and utilize this information to identify patients with mild cognitive impairment (MCI) who will progress to dementia (pMCI). METHODS: We analyzed data from 1,617 participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) who had undergone at least one FDG-PET scan. We identified the brain regions with the most significant hypometabolism in AD and used Deep Learning (DL) models to predict future changes in brain metabolism. The best-performing model was then adapted under a multi-task learning framework to identify pMCI individuals. Finally, this model underwent further analysis using eXplainable AI (XAI) techniques. RESULTS: Our results confirm a strong association between hypometabolism, disease progression, and cognitive decline. Furthermore, we demonstrated that integrating data on changes in brain metabolism during training enhanced the models’ ability to detect pMCI individuals (sensitivity=88.4%, specificity=86.9%). Lastly, the application of XAI techniques enabled us to delve into the brain regions with the most significant impact on model predictions, highlighting the importance of the hippocampus, cingulate cortex, and some subcortical structures. CONCLUSION: This study introduces a novel dimension to predictive modeling in AD, emphasizing the importance of projecting variations in brain metabolism under a multi-task learning paradigm.
URL:
Integrative analysis of multi-omics and imaging data with incorporation of biological information via structural Bayesian factor analysis.
MOTIVATION: With the rapid development of modern technologies, massive data are available for the systematic study of Alzheimer’s disease (AD). Though many existing AD studies mainly focus on single-modality omics data, multi-omics datasets can provide a more comprehensive understanding of AD. To bridge this gap, we proposed a novel structural Bayesian factor analysis framework (SBFA) to extract the information shared by multi-omics data through the aggregation of genotyping data, gene expression data, neuroimaging phenotypes and prior biological network knowledge. Our approach can extract common information shared by different modalities and encourage biologically related features to be selected, guiding future AD research in a biologically meaningful way. METHOD: Our SBFA model decomposes the mean parameters of the data into a sparse factor loading matrix and a factor matrix, where the factor matrix represents the common information extracted from multi-omics and imaging data. Our framework is designed to incorporate prior biological network information. Our simulation study demonstrated that our proposed SBFA framework could achieve the best performance compared with the other state-of-the-art factor-analysis-based integrative analysis methods. RESULTS: We apply our proposed SBFA model together with several state-of-the-art factor analysis models to extract the latent common information from genotyping, gene expression and brain imaging data simultaneously from the ADNI biobank database. The latent information is then used to predict the functional activities questionnaire score, an important measurement for diagnosis of AD quantifying subjects’ abilities in daily life. Our SBFA model shows the best prediction performance compared with the other factor analysis models. AVAILABILITY: Code are publicly available at https://github.com/JingxuanBao/SBFA. CONTACT: qlong@upenn.edu.
URL: https://github.com/JingxuanBao/SBFA.
GenEpi: gene-based epistasis discovery using machine learning.
BACKGROUND: Genome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). RESULTS: In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power. CONCLUSIONS: The results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future.
URL:
AD-Syn-Net: systematic identification of Alzheimer’s disease-associated mutation and co-mutation vulnerabilities via deep learning.
Alzheimer’s disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework (‘AD-Syn-Net’), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.
URL: https://github.com/Pan-Bio/AD-mutation-effectors.
Dementia key gene identification with multi-layered SNP-gene-disease network.
MOTIVATION: Recently, various approaches for diagnosing and treating dementia have received significant attention, especially in identifying key genes that are crucial for dementia. If the mutations of such key genes could be tracked, it would be possible to predict the time of onset of dementia and significantly aid in developing drugs to treat dementia. However, gene finding involves tremendous cost, time and effort. To alleviate these problems, research on utilizing computational biology to decrease the search space of candidate genes is actively conducted. In this study, we propose a framework in which diseases, genes and single-nucleotide polymorphisms are represented by a layered network, and key genes are predicted by a machine learning algorithm. The algorithm utilizes a network-based semi-supervised learning model that can be applied to layered data structures. RESULTS: The proposed method was applied to a dataset extracted from public databases related to diseases and genes with data collected from 186 patients. A portion of key genes obtained using the proposed method was verified in silico through PubMed literature, and the remaining genes were left as possible candidate genes. AVAILABILITY AND IMPLEMENTATION: The code for the framework will be available at http://www.alphaminers.net/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: http://www.alphaminers.net/.
Gene-SGAN: discovering disease subtypes with imaging and genetic signatures via multi-view weakly-supervised deep clustering.
Disease heterogeneity has been a critical challenge for precision diagnosis and treatment, especially in neurologic and neuropsychiatric diseases. Many diseases can display multiple distinct brain phenotypes across individuals, potentially reflecting disease subtypes that can be captured using MRI and machine learning methods. However, biological interpretability and treatment relevance are limited if the derived subtypes are not associated with genetic drivers or susceptibility factors. Herein, we describe Gene-SGAN - a multi-view, weakly-supervised deep clustering method - which dissects disease heterogeneity by jointly considering phenotypic and genetic data, thereby conferring genetic correlations to the disease subtypes and associated endophenotypic signatures. We first validate the generalizability, interpretability, and robustness of Gene-SGAN in semi-synthetic experiments. We then demonstrate its application to real multi-site datasets from 28,858 individuals, deriving subtypes of Alzheimer’s disease and brain endophenotypes associated with hypertension, from MRI and single nucleotide polymorphism data. Derived brain phenotypes displayed significant differences in neuroanatomical patterns, genetic determinants, biological and clinical biomarkers, indicating potentially distinct underlying neuropathologic processes, genetic drivers, and susceptibility factors. Overall, Gene-SGAN is broadly applicable to disease subtyping and endophenotype discovery, and is herein tested on disease-related, genetically-associated neuroimaging phenotypes.
URL:
From phenotype to genotype: an association study of longitudinal phenotypic markers to Alzheimer’s disease relevant SNPs.
MOTIVATION: Imaging genetic studies typically focus on identifying single-nucleotide polymorphism (SNP) markers associated with imaging phenotypes. Few studies perform regression of SNP values on phenotypic measures for examining how the SNP values change when phenotypic measures are varied. This alternative approach may have a potential to help us discover important imaging genetic associations from a different perspective. In addition, the imaging markers are often measured over time, and this longitudinal profile may provide increased power for differentiating genotype groups. How to identify the longitudinal phenotypic markers associated to disease sensitive SNPs is an important and challenging research topic. RESULTS: Taking into account the temporal structure of the longitudinal imaging data and the interrelatedness among the SNPs, we propose a novel ‘task-correlated longitudinal sparse regression’ model to study the association between the phenotypic imaging markers and the genotypes encoded by SNPs. In our new association model, we extend the widely used l(2,1)-norm for matrices to tensors to jointly select imaging markers that have common effects across all the regression tasks and time points, and meanwhile impose the trace-norm regularization onto the unfolded coefficient tensor to achieve low rank such that the interrelationship among SNPs can be addressed. The effectiveness of our method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected imaging predictors relevant to disease sensitive SNPs. AVAILABILITY: Software is publicly available at: http://ranger.uta.edu/%7eheng/Longitudinal/ CONTACT: heng@uta.edu or shenli@inpui.edu.
URL: http://ranger.uta.edu/%7eheng/Longitudinal/
Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm.
MOTIVATION: Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single-nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. The complexity of these datasets has presented critical bioinformatics challenges that require new enabling tools. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, most of the existing SCCA algorithms are designed using the soft thresholding method, which assumes that the input features are independent from one another. This assumption clearly does not hold for the imaging genetic data. In this article, we propose a new knowledge-guided SCCA algorithm (KG-SCCA) to overcome this limitation as well as improve learning results by incorporating valuable prior knowledge. RESULTS: The proposed KG-SCCA method is able to model two types of prior knowledge: one as a group structure (e.g. linkage disequilibrium blocks among SNPs) and the other as a network structure (e.g. gene co-expression network among brain regions). The new model incorporates these prior structures by introducing new regularization terms to encourage weight similarity between grouped or connected features. A new algorithm is designed to solve the KG-SCCA model without imposing the independence constraint on the input features. We demonstrate the effectiveness of our algorithm with both synthetic and real data. For real data, using an Alzheimer’s disease (AD) cohort, we examine the imaging genetic associations between all SNPs in the APOE gene (i.e. top AD gene) and amyloid deposition measures among cortical regions (i.e. a major AD hallmark). In comparison with a widely used SCCA implementation, our KG-SCCA algorithm produces not only improved cross-validation performances but also biologically meaningful results. AVAILABILITY: Software is freely available on request.
URL:
Deep recurrent model for individualized prediction of Alzheimer’s disease progression.
Alzheimer’s disease (AD) is known as one of the major causes of dementia and is characterized by slow progression over several years, with no treatments or available medicines. In this regard, there have been efforts to identify the risk of developing AD in its earliest time. While many of the previous works considered cross-sectional analysis, more recent studies have focused on the diagnosis and prognosis of AD with longitudinal or time series data in a way of disease progression modeling. Under the same problem settings, in this work, we propose a novel computational framework that can predict the phenotypic measurements of MRI biomarkers and trajectories of clinical status along with cognitive scores at multiple future time points. However, in handling time series data, it generally faces many unexpected missing observations. In regard to such an unfavorable situation, we define a secondary problem of estimating those missing values and tackle it in a systematic way by taking account of temporal and multivariate relations inherent in time series data. Concretely, we propose a deep recurrent network that jointly tackles the four problems of (i) missing value imputation, (ii) phenotypic measurements forecasting, (iii) trajectory estimation of a cognitive score, and (iv) clinical status prediction of a subject based on his/her longitudinal imaging biomarkers. Notably, the learnable parameters of all the modules in our predictive models are trained in an end-to-end manner by taking the morphological features and cognitive scores as input, with our circumspectly defined loss function. In our experiments over The Alzheimers Disease Prediction Of Longitudinal Evolution (TADPOLE) challenge cohort, we measured performance for various metrics and compared our method to competing methods in the literature. Exhaustive analyses and ablation studies were also conducted to better confirm the effectiveness of our method.
URL:
Trans-channel fluorescence learning improves high-content screening for Alzheimer’s disease therapeutics.
In microscopy-based drug screens, fluorescent markers carry critical information on how compounds affect different biological processes. However, practical considerations, such as the labor and preparation formats needed to produce different image channels, hinders the use of certain fluorescent markers. Consequently, completed screens may lack biologically informative but experimentally impractical markers. Here, we present a deep learning method for overcoming these limitations. We accurately generated predicted fluorescent signals from other related markers and validated this new machine learning (ML) method on two biologically distinct datasets. We used the ML method to improve the selection of biologically active compounds for Alzheimer’s disease (AD) from a completed high-content high-throughput screen (HCS) that had only contained the original markers. The ML method identified novel compounds that effectively blocked tau aggregation, which had been missed by traditional screening approaches unguided by ML. The method improved triaging efficiency of compound rankings over conventional rankings by raw image channels. We reproduced this ML pipeline on a biologically independent cancer-based dataset, demonstrating its generalizability. The approach is disease-agnostic and applicable across diverse fluorescence microscopy datasets.
URL:
Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies.
BACKGROUND: Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. RESULTS: In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. CONCLUSIONS: After validating the performance of our method using simulation experiments, we further apply it to Alzheimer’s disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer’s disease.
URL:
Tissue-specific network-based genome wide study of amygdala imaging phenotypes to identify functional interaction modules.
MOTIVATION: Network-based genome-wide association studies (GWAS) aim to identify functional modules from biological networks that are enriched by top GWAS findings. Although gene functions are relevant to tissue context, most existing methods analyze tissue-free networks without reflecting phenotypic specificity. RESULTS: We propose a novel module identification framework for imaging genetic studies using the tissue-specific functional interaction network. Our method includes three steps: (i) re-prioritize imaging GWAS findings by applying machine learning methods to incorporate network topological information and enhance the connectivity among top genes; (ii) detect densely connected modules based on interactions among top re-prioritized genes; and (iii) identify phenotype-relevant modules enriched by top GWAS findings. We demonstrate our method on the GWAS of [18F]FDG-PET measures in the amygdala region using the imaging genetic data from the Alzheimer’s Disease Neuroimaging Initiative, and map the GWAS results onto the amygdala-specific functional interaction network. The proposed network-based GWAS method can effectively detect densely connected modules enriched by top GWAS findings. Tissue-specific functional network can provide precise context to help explore the collective effects of genes with biologically meaningful interactions specific to the studied phenotype. AVAILABILITY AND IMPLEMENTATION: The R code and sample data are freely available at http://www.iu.edu/shenlab/tools/gwasmodule/. CONTACT: shenli@iu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: http://www.iu.edu/shenlab/tools/gwasmodule/.
Multimodal data fusion based on IGERNNC algorithm for detecting pathogenic brain regions and genes in Alzheimer’s disease.
At present, the study on the pathogenesis of Alzheimer’s disease (AD) by multimodal data fusion analysis has been attracted wide attention. It often has the problems of small sample size and high dimension with the multimodal medical data. In view of the characteristics of multimodal medical data, the existing genetic evolution random neural network cluster (GERNNC) model combine genetic evolution algorithm and neural network for the classification of AD patients and the extraction of pathogenic factors. However, the model does not take into account the non-linear relationship between brain regions and genes and the problem that the genetic evolution algorithm can fall into local optimal solutions, which leads to the overall performance of the model is not satisfactory. In order to solve the above two problems, this paper made some improvements on the construction of fusion features and genetic evolution algorithm in GERNNC model, and proposed an improved genetic evolution random neural network cluster (IGERNNC) model. The IGERNNC model uses mutual information correlation analysis method to combine resting-state functional magnetic resonance imaging data with single nucleotide polymorphism data for the construction of fusion features. Based on the traditional genetic evolution algorithm, elite retention strategy and large variation genetic algorithm are added to avoid the model falling into the local optimal solution. Through multiple independent experimental comparisons, the IGERNNC model can more effectively identify AD patients and extract relevant pathogenic factors, which is expected to become an effective tool in the field of AD research.
URL:
A simulative deep learning model of SNP interactions on chromosome 19 for predicting Alzheimer’s disease risk and rates of disease progression.
BACKGROUND: Identifying genetic patterns that contribute to Alzheimer’s disease (AD) is important not only for pre-symptomatic risk assessment but also for building personalized therapeutic strategies. METHODS: We implemented a novel simulative deep learning model to chromosome 19 genetic data from the Alzheimer’s Disease Neuroimaging Initiative and the Imaging and Genetic Biomarkers of Alzheimer’s Disease datasets. The model quantified the contribution of each single nucleotide polymorphism (SNP) and their epistatic impact on the likelihood of AD using the occlusion method. The top 35 AD-risk SNPs in chromosome 19 were identified, and their ability to predict the rate of AD progression was analyzed. RESULTS: Rs561311966 (APOC1) and rs2229918 (ERCC1/CD3EAP) were recognized as the most powerful factors influencing AD risk. The top 35 chromosome 19 AD-risk SNPs were significant predictors of AD progression. DISCUSSION: The model successfully estimated the contribution of AD-risk SNPs that account for AD progression at the individual level. This can help in building preventive precision medicine.
URL:
Alzheimer’s disease diagnosis from multi-modal data via feature inductive learning and dual multilevel graph neural network.
Multi-modal data can provide complementary information of Alzheimer’s disease (AD) and its development from different perspectives. Such information is closely related to the diagnosis, prevention, and treatment of AD, and hence it is necessary and critical to study AD through multi-modal data. Existing learning methods, however, usually ignore the influence of feature heterogeneity and directly fuse features in the last stages. Furthermore, most of these methods only focus on local fusion features or global fusion features, neglecting the complementariness of features at different levels and thus not sufficiently leveraging information embedded in multi-modal data. To overcome these shortcomings, we propose a novel framework for AD diagnosis that fuses gene, imaging, protein, and clinical data. Our framework learns feature representations under the same feature space for different modalities through a feature induction learning (FIL) module, thereby alleviating the impact of feature heterogeneity. Furthermore, in our framework, local and global salient multi-modal feature interaction information at different levels is extracted through a novel dual multilevel graph neural network (DMGNN). We extensively validate the proposed method on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset and experimental results demonstrate our method consistently outperforms other state-of-the-art multi-modal fusion methods. The code is publicly available on the GitHub website. (https://github.com/xiankantingqianxue/MIA-code.git).
URL: https://github.com/xiankantingqianxue/MIA-code.git
White matter hyperintensities segmentation using the ensemble U-Net with multi-scale highlighting foregrounds.
White matter hyperintensities (WMHs) are abnormal signals within the white matter region on the human brain MRI and have been associated with aging processes, cognitive decline, and dementia. In the current study, we proposed a U-Net with multi-scale highlighting foregrounds (HF) for WMHs segmentation. Our method, U-Net with HF, is designed to improve the detection of the WMH voxels with partial volume effects. We evaluated the segmentation performance of the proposed approach using the Challenge training dataset. Then we assessed the clinical utility of the WMH volumes that were automatically computed using our method and the Alzheimer’s Disease Neuroimaging Initiative database. We demonstrated that the U-Net with HF significantly improved the detection of the WMH voxels at the boundary of the WMHs or in small WMH clusters quantitatively and qualitatively. Up to date, the proposed method has achieved the best overall evaluation scores, the highest dice similarity index, and the best F1-score among 39 methods submitted on the WMH Segmentation Challenge that was initially hosted by MICCAI 2017 and is continuously accepting new challengers. The evaluation of the clinical utility showed that the WMH volume that was automatically computed using U-Net with HF was significantly associated with cognitive performance and improves the classification between cognitive normal and Alzheimer’s disease subjects and between patients with mild cognitive impairment and those with Alzheimer’s disease. The implementation of our proposed method is publicly available using Dockerhub (https://hub.docker.com/r/wmhchallenge/pgs).
URL: https://hub.docker.com/r/wmhchallenge/pgs
Inferring brain causal and temporal-lag networks for recognizing abnormal patterns of dementia.
Brain functional network analysis has become a popular method to explore the laws of brain organization and identify biomarkers of neurological diseases. However, it is still a challenging task to construct an ideal brain network due to the limited understanding of the human brain. Existing methods often ignore the impact of temporal-lag on the results of brain network modeling, which may lead to some unreliable conclusions. To overcome this issue, we propose a novel brain functional network estimation method, which can simultaneously infer the causal mechanisms and temporal-lag values among brain regions. Specifically, our method converts the lag learning into an instantaneous effect estimation problem, and further embeds the search objectives into a deep neural network model as parameters to be learned. To verify the effectiveness of the proposed estimation method, we perform experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database by comparing the proposed model with several existing methods, including correlation-based and causality-based methods. The experimental results show that our brain networks constructed by the proposed estimation method can not only achieve promising classification performance, but also exhibit some characteristics of physiological mechanisms. Our approach provides a new perspective for understanding the pathogenesis of brain diseases. The source code is released at https://github.com/NJUSTxiazw/CTLN.
URL: https://github.com/NJUSTxiazw/CTLN.
HyperTMO: a trusted multi-omics integration framework based on hypergraph convolutional network for patient classification.
MOTIVATION: The rapid development of high-throughput biomedical technologies can provide researchers with detailed multi-omics data. The multi-omics integrated analysis approach based on machine learning contributes a more comprehensive perspective to human disease research. However, there are still significant challenges in representing single-omics data and integrating multi-omics information. RESULTS: This paper presents HyperTMO, a Trusted Multi-Omics integration framework based on Hypergraph convolutional network for patient classification. HyperTMO constructs hypergraph structures to represent the association between samples in single-omics data, then evidence extraction is performed by hypergraph convolutional network, and multi-omics information is integrated at an evidence level. Lastly, we experimentally demonstrate that HyperTMO outperforms other state-of-the-art methods in breast cancer subtype classification and Alzheimer’s disease classification tasks using multi-omics data from TCGA (BRCA) and ROSMAP datasets. Importantly, HyperTMO is the first attempt to integrate hypergraph structure, evidence theory, and multi-omics integration for patient classification. Its accurate and robust properties bring great potential for applications in clinical diagnosis. AVAILABILITY: HyperTMO and datasets are publicly available at https://github.com/ippousyuga/HyperTMO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/ippousyuga/HyperTMO.
Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis.
For the last decade, it has been shown that neuroimaging can be a potential tool for the diagnosis of Alzheimer’s Disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), and also fusion of different modalities can further provide the complementary information to enhance diagnostic accuracy. Here, we focus on the problems of both feature representation and fusion of multimodal information from Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET). To our best knowledge, the previous methods in the literature mostly used hand-crafted features such as cortical thickness, gray matter densities from MRI, or voxel intensities from PET, and then combined these multimodal features by simply concatenating into a long vector or transforming into a higher-dimensional kernel space. In this paper, we propose a novel method for a high-level latent and shared feature representation from neuroimaging modalities via deep learning. Specifically, we use Deep Boltzmann Machine (DBM)(2), a deep network with a restricted Boltzmann machine as a building block, to find a latent hierarchical feature representation from a 3D patch, and then devise a systematic method for a joint feature representation from the paired patches of MRI and PET with a multimodal DBM. To validate the effectiveness of the proposed method, we performed experiments on ADNI dataset and compared with the state-of-the-art methods. In three binary classification problems of AD vs. healthy Normal Control (NC), MCI vs. NC, and MCI converter vs. MCI non-converter, we obtained the maximal accuracies of 95.35%, 85.67%, and 74.58%, respectively, outperforming the competing methods. By visual inspection of the trained model, we observed that the proposed method could hierarchically discover the complex latent patterns inherent in both MRI and PET.
URL:
Deep learning-based EEG analysis to classify normal, mild cognitive impairment, and dementia: Algorithms and dataset.
For automatic EEG diagnosis, this paper presents a new EEG data set with well-organized clinical annotations called Chung-Ang University Hospital EEG (CAUEEG), which has event history, patient’s age, and corresponding diagnosis labels. We also designed two reliable evaluation tasks for the low-cost, non-invasive diagnosis to detect brain disorders: i) CAUEEG-Dementia with normal, mci, and dementia diagnostic labels and ii) CAUEEG-Abnormal with normal and abnormal. Based on the CAUEEG dataset, this paper proposes a new fully end-to-end deep learning model, called the CAUEEG End-to-end Deep neural Network (CEEDNet). CEEDNet pursues to bring all the functional elements for the EEG analysis in a seamless learnable fashion while restraining non-essential human intervention. Extensive experiments showed that our CEEDNet significantly improves the accuracy compared with existing methods, such as machine learning methods and Ieracitano-CNN (Ieracitano et al., 2019), due to taking full advantage of end-to-end learning. The high ROC-AUC scores of 0.9 on CAUEEG-Dementia and 0.86 on CAUEEG-Abnormal recorded by our CEEDNet models demonstrate that our method can lead potential patients to early diagnosis through automatic screening.
URL:
Multiple instance learning for classification of dementia in brain MRI.
Machine learning techniques have been widely used to detect morphological abnormalities from structural brain magnetic resonance imaging data and to support the diagnosis of neurological diseases such as dementia. In this paper, we propose to use a multiple instance learning (MIL) method in an application for the detection of Alzheimer’s disease (AD) and its prodromal stage mild cognitive impairment (MCI). In our work, local intensity patches are extracted as features. However, not all the patches extracted from patients with dementia are equally affected by the disease and some of them may not be characteristic of morphology associated with the disease. Therefore, there is some ambiguity in assigning disease labels to these patches. The problem of the ambiguous training labels can be addressed by weakly supervised learning techniques such as MIL. A graph is built for each image to exploit the relationships among the patches and then to solve the MIL problem. The constructed graphs contain information about the appearances of patches and the relationships among them, which can reflect the inherent structures of images and aids the classification. Using the baseline MR images of 834 subjects from the ADNI study, the proposed method can achieve a classification accuracy of 89% between AD patients and healthy controls, and 70% between patients defined as stable MCI and progressive MCI in a leave-one-out cross validation. Compared with two state-of-the-art methods using the same dataset, the proposed method can achieve similar or improved results, providing an alternative framework for the detection and prediction of neurodegenerative diseases.
URL:
Structured and Sparse Canonical Correlation Analysis as a Brain-Wide Multi-Modal Data Fusion Approach.
Multi-modal data fusion has recently emerged as a comprehensive neuroimaging analysis approach, which usually uses canonical correlation analysis (CCA). However, the current CCA-based fusion approaches face problems like high-dimensionality, multi-collinearity, unimodal feature selection, asymmetry, and loss of spatial information in reshaping the imaging data into vectors. This paper proposes a structured and sparse CCA (ssCCA) technique as a novel CCA method to overcome the above problems. To investigate the performance of the proposed algorithm, we have compared three data fusion techniques: standard CCA, regularized CCA, and ssCCA, and evaluated their ability to detect multi-modal data associations. We have used simulations to compare the performance of these approaches and probe the effects of non-negativity constraint, the dimensionality of features, sample size, and noise power. The results demonstrate that ssCCA outperforms the existing standard and regularized CCA-based fusion approaches. We have also applied the methods to real functional magnetic resonance imaging (fMRI) and structural MRI data of Alzheimer’s disease (AD) patients (n = 34) and healthy control (HC) subjects (n = 42) from the ADNI database. The results illustrate that the proposed unsupervised technique differentiates the transition pattern between the subject-course of AD patients and HC subjects with a p-value of less than 1x10-6 . Furthermore, we have depicted the brain mapping of functional areas that are most correlated with the anatomical changes in AD patients relative to HC subjects.
URL:
Prediction of Alzheimer’s disease-specific phospholipase c gamma-1 SNV by deep learning-based approach for high-throughput screening.
Exon splicing triggered by unpredicted genetic mutation can cause translational variations in neurodegenerative disorders. In this study, we discover Alzheimer’s disease (AD)-specific single-nucleotide variants (SNVs) and abnormal exon splicing of phospholipase c gamma-1 (PLCgamma1) gene, using genome-wide association study (GWAS) and a deep learning-based exon splicing prediction tool. GWAS revealed that the identified single-nucleotide variations were mainly distributed in the H3K27ac-enriched region of PLCgamma1 gene body during brain development in an AD mouse model. A deep learning analysis, trained with human genome sequences, predicted 14 splicing sites in human PLCgamma1 gene, and one of these completely matched with an SNV in exon 27 of PLCgamma1 gene in an AD mouse model. In particular, the SNV in exon 27 of PLCgamma1 gene is associated with abnormal splicing during messenger RNA maturation. Taken together, our findings suggest that this approach, which combines in silico and deep learning-based analyses, has potential for identifying the clinical utility of critical SNVs in AD prediction.
URL:
Self-supervised learning of neighborhood embedding for longitudinal MRI.
In recent years, several deep learning models recommend first to represent Magnetic Resonance Imaging (MRI) as latent features before performing a downstream task of interest (such as classification or regression). The performance of the downstream task generally improves when these latent representations are explicitly associated with factors of interest. For example, we derived such a representation for capturing brain aging by applying self-supervised learning to longitudinal MRIs and then used the resulting encoding to automatically identify diseases accelerating the aging of the brain. We now propose a refinement of this representation by replacing the linear modeling of brain aging with one that is consistent in local neighborhoods in the latent space. Called Longitudinal Neighborhood Embedding (LNE), we derive an encoding so that neighborhoods are age-consistent (i.e., brain MRIs of different subjects with similar brain ages are in close proximity of each other) and progression-consistent, i.e., the latent space is defined by a smooth trajectory field where each trajectory captures changes in brain ages between a pair of MRIs extracted from a longitudinal sequence. To make the problem computationally tractable, we further propose a strategy for mini-batch sampling so that the resulting local neighborhoods accurately approximate the ones that would be defined based on the whole cohort. We evaluate LNE on three different downstream tasks: (1) to predict chronological age from T1-w MRI of 274 healthy subjects participating in a study at SRI International; (2) to distinguish Normal Control (NC) from Alzheimer’s Disease (AD) and stable Mild Cognitive Impairment (sMCI) from progressive Mild Cognitive Impairment (pMCI) based on T1-w MRI of 632 participants of the Alzheimer’s Disease Neuroimaging Initiative (ADNI); and (3) to distinguish no-to-low from moderate-to-heavy alcohol drinkers based on fractional anisotropy derived from diffusion tensor MRIs of 764 adolescents recruited by the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). Across the three data sets, the visualization of the smooth trajectory vector fields and superior accuracy on downstream tasks demonstrate the strength of the proposed method over existing self-supervised methods in extracting information related to brain aging, which could help study the impact of substance use and neurodegenerative disorders. The code is available at https://github.com/ouyangjiahong/longitudinal-neighbourhood-embedding.
URL: https://github.com/ouyangjiahong/longitudinal-neighbourhood-embedding.
Predicting brain structural network using functional connectivity.
Uncovering the non-trivial brain structure-function relationship is fundamentally important for revealing organizational principles of human brain. However, it is challenging to infer a reliable relationship between individual brain structure and function, e.g., the relations between individual brain structural connectivity (SC) and functional connectivity (FC). Brain structure-function displays a distributed and heterogeneous pattern, that is, many functional relationships arise from non-overlapping sets of anatomical connections. This complex relation can be interwoven with widely existed individual structural and functional variations. Motivated by the advances of generative adversarial network (GAN) and graph convolutional network (GCN) in the deep learning field, in this work, we proposed a multi-GCN based GAN (MGCN-GAN) to infer individual SC based on corresponding FC by automatically learning the complex associations between individual brain structural and functional networks. The generator of MGCN-GAN is composed of multiple multi-layer GCNs which are designed to model complex indirect connections in brain network. The discriminator of MGCN-GAN is a single multi-layer GCN which aims to distinguish the predicted SC from real SC. To overcome the inherent unstable behavior of GAN, we designed a new structure-preserving (SP) loss function to guide the generator to learn the intrinsic SC patterns more effectively. Using Human Connectome Project (HCP) dataset and Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset as test beds, our MGCN-GAN model can generate reliable individual SC from FC. This result implies that there may exist a common regulation between specific brain structural and functional architectures across different individuals.
URL:
AITeQ: a machine learning framework for Alzheimer’s prediction using a distinctive five-gene signature.
Neurodegenerative diseases, such as Alzheimer’s disease, pose a significant global health challenge with their complex etiology and elusive biomarkers. In this study, we developed the Alzheimer’s Identification Tool (AITeQ) using ribonucleic acid-sequencing (RNA-seq), a machine learning (ML) model based on an optimized ensemble algorithm for the identification of Alzheimer’s from RNA-seq data. Analysis of RNA-seq data from several studies identified 87 differentially expressed genes. This was followed by a ML protocol involving feature selection, model training, performance evaluation, and hyperparameter tuning. The feature selection process undertaken in this study, employing a combination of four different methodologies, culminated in the identification of a compact yet impactful set of five genes. Twelve diverse ML models were trained and tested using these five genes (CNKSR1, EPHA2, CLSPN, OLFML3, and TARBP1). Performance metrics, including precision, recall, F1 score, accuracy, Matthew’s correlation coefficient, and receiver operating characteristic area under the curve were assessed for the finally selected model. Overall, the ensemble model consisting of logistic regression, naive Bayes classifier, and support vector machine with optimized hyperparameters was identified as the best and was used to develop AITeQ. AITeQ is available at: https://github.com/ishtiaque-ahammad/AITeQ.
URL: https://github.com/ishtiaque-ahammad/AITeQ.
A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease.
Some forms of mild cognitive impairment (MCI) are the clinical precursors of Alzheimer’s disease (AD), while other MCI types tend to remain stable over-time and do not progress to AD. To identify and choose effective and personalized strategies to prevent or slow the progression of AD, we need to develop objective measures that are able to discriminate the MCI patients who are at risk of AD from those MCI patients who have less risk to develop AD. Here, we present a novel deep learning architecture, based on dual learning and an ad hoc layer for 3D separable convolutions, which aims at identifying MCI patients who have a high likelihood of developing AD within 3 years. Our deep learning procedures combine structural magnetic resonance imaging (MRI), demographic, neuropsychological, and APOe4 genetic data as input measures. The most novel characteristics of our machine learning model compared to previous ones are the following: 1) our deep learning model is multi-tasking, in the sense that it jointly learns to simultaneously predict both MCI to AD conversion as well as AD vs. healthy controls classification, which facilitates relevant feature extraction for AD prognostication; 2) the neural network classifier employs fewer parameters than other deep learning architectures which significantly limits data-overfitting (we use ~550,000 network parameters, which is orders of magnitude lower than other network designs); 3) both structural MRI images and their warp field characteristics, which quantify local volumetric changes in relation to the MRI template, were used as separate input streams to extract as much information as possible from the MRI data. All analyses were performed on a subset of the database made publicly available via the Alzheimer’s Disease Neuroimaging Initiative (ADNI), (n = 785 participants, n = 192 AD patients, n = 409 MCI patients (including both MCI patients who convert to AD and MCI patients who do not covert to AD), and n = 184 healthy controls). The most predictive combination of inputs were the structural MRI images and the demographic, neuropsychological, and APOe4 data. In contrast, the warp field metrics were of little added predictive value. The algorithm was able to distinguish the MCI patients developing AD within 3 years from those patients with stable MCI over the same time-period with an area under the curve (AUC) of 0.925 and a 10-fold cross-validated accuracy of 86%, a sensitivity of 87.5%, and specificity of 85%. To our knowledge, this is the highest performance achieved so far using similar datasets. The same network provided an AUC of 1 and 100% accuracy, sensitivity, and specificity when classifying patients with AD from healthy controls. Our classification framework was also robust to the use of different co-registration templates and potentially irrelevant features/image portions. Our approach is flexible and can in principle integrate other imaging modalities, such as PET, and diverse other sets of clinical data. The convolutional framework is potentially applicable to any 3D image dataset and gives the flexibility to design a computer-aided diagnosis system targeting the prediction of several medical conditions and neuropsychiatric disorders via multi-modal imaging and tabular clinical data.
URL:
Neuroimaging feature extraction using a neural network classifier for imaging genetics.
BACKGROUND: Dealing with the high dimension of both neuroimaging data and genetic data is a difficult problem in the association of genetic data to neuroimaging. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer’s Disease (AD) for subsequent relation to genetics. The neuroimaging-genetic pipeline we propose is comprised of image processing, neuroimaging feature extraction and genetic association steps. We present a neural network classifier for extracting neuroimaging features that are related with the disease. The proposed method is data-driven and requires no expert advice or a priori selection of regions of interest. We further propose a multivariate regression with priors specified in the Bayesian framework that allows for group sparsity at multiple levels including SNPs and genes. RESULTS: We find the features extracted with our proposed method are better predictors of AD than features used previously in the literature suggesting that single nucleotide polymorphisms (SNPs) related to the features extracted by our proposed method are also more relevant for AD. Our neuroimaging-genetic pipeline lead to the identification of some overlapping and more importantly some different SNPs when compared to those identified with previously used features. CONCLUSIONS: The pipeline we propose combines machine learning and statistical methods to benefit from the strong predictive performance of blackbox models to extract relevant features while preserving the interpretation provided by Bayesian models for genetic association. Finally, we argue in favour of using automatic feature extraction, such as the method we propose, in addition to ROI or voxelwise analysis to find potentially novel disease-relevant SNPs that may not be detected when using ROIs or voxels alone.
URL:
Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer’s disease neuropathologies.
Deep neural networks (DNNs) capture complex relationships among variables, however, because they require copious samples, their potential has yet to be fully tapped for understanding relationships between gene expression and human phenotypes. Here we introduce an analysis framework, namely MD-AD (Multi-task Deep learning for Alzheimer’s Disease neuropathology), which leverages an unexpected synergy between DNNs and multi-cohort settings. In these settings, true joint analysis can be stymied using conventional statistical methods, which require “harmonized” phenotypes and tend to capture cohort-level variations, obscuring subtler true disease signals. Instead, MD-AD incorporates related phenotypes sparsely measured across cohorts, and learns interactions between genes and phenotypes not discovered using linear models, identifying subtler signals than cohort-level variations which can be uniquely recapitulated in animal models and across tissues. We show that MD-AD exploits sex-specific relationships between microglial immune response and neuropathology, providing a nuanced context for the association between inflammatory genes and Alzheimer’s Disease.
URL:
HYDRA: Revealing heterogeneity of imaging and genetic patterns through a multiple max-margin discriminative analysis framework.
Multivariate pattern analysis techniques have been increasingly used over the past decade to derive highly sensitive and specific biomarkers of diseases on an individual basis. The driving assumption behind the vast majority of the existing methodologies is that a single imaging pattern can distinguish between healthy and diseased populations, or between two subgroups of patients (e.g., progressors vs. non-progressors). This assumption effectively ignores the ample evidence for the heterogeneous nature of brain diseases. Neurodegenerative, neuropsychiatric and neurodevelopmental disorders are largely characterized by high clinical heterogeneity, which likely stems in part from underlying neuroanatomical heterogeneity of various pathologies. Detecting and characterizing heterogeneity may deepen our understanding of disease mechanisms and lead to patient-specific treatments. However, few approaches tackle disease subtype discovery in a principled machine learning framework. To address this challenge, we present a novel non-linear learning algorithm for simultaneous binary classification and subtype identification, termed HYDRA (Heterogeneity through Discriminative Analysis). Neuroanatomical subtypes are effectively captured by multiple linear hyperplanes, which form a convex polytope that separates two groups (e.g., healthy controls from pathologic samples); each face of this polytope effectively defines a disease subtype. We validated HYDRA on simulated and clinical data. In the latter case, we applied the proposed method independently to the imaging and genetic datasets of the Alzheimer’s Disease Neuroimaging Initiative (ADNI 1) study. The imaging dataset consisted of T1-weighted volumetric magnetic resonance images of 123 AD patients and 177 controls. The genetic dataset consisted of single nucleotide polymorphism information of 103 AD patients and 139 controls. We identified 3 reproducible subtypes of atrophy in AD relative to controls: (1) diffuse and extensive atrophy, (2) precuneus and extensive temporal lobe atrophy, as well some prefrontal atrophy, (3) atrophy pattern very much confined to the hippocampus and the medial temporal lobe. The genetics dataset yielded two subtypes of AD characterized mainly by the presence/absence of the apolipoprotein E (APOE) epsilon4 genotype, but also involving differential presence of risk alleles of CD2AP, SPON1 and LOC39095 SNPs that were associated with differences in the respective patterns of brain atrophy, especially in the precuneus. The results demonstrate the potential of the proposed approach to map disease heterogeneity in neuroimaging and genetic studies.
URL:
Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification.
Deep learning is a promising tool that uses nonlinear transformations to extract features from high-dimensional data. Deep learning is challenging in genome-wide association studies (GWAS) with high-dimensional genomic data. Here we propose a novel three-step approach (SWAT-CNN) for identification of genetic variants using deep learning to identify phenotype-related single nucleotide polymorphisms (SNPs) that can be applied to develop accurate disease classification models. In the first step, we divided the whole genome into nonoverlapping fragments of an optimal size and then ran convolutional neural network (CNN) on each fragment to select phenotype-associated fragments. In the second step, using a Sliding Window Association Test (SWAT), we ran CNN on the selected fragments to calculate phenotype influence scores (PIS) and identify phenotype-associated SNPs based on PIS. In the third step, we ran CNN on all identified SNPs to develop a classification model. We tested our approach using GWAS data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) including (N = 981; cognitively normal older adults (CN) = 650 and AD = 331). Our approach identified the well-known APOE region as the most significant genetic locus for AD. Our classification model achieved an area under the curve (AUC) of 0.82, which was compatible with traditional machine learning approaches, random forest and XGBoost. SWAT-CNN, a novel deep learning-based genome-wide approach, identified AD-associated SNPs and a classification model for AD and may hold promise for a range of biomedical applications.
URL:
Deep learning detection of informative features in tau PET for Alzheimer’s disease classification.
BACKGROUND: Alzheimer’s disease (AD) is the most common type of dementia, typically characterized by memory loss followed by progressive cognitive decline and functional impairment. Many clinical trials of potential therapies for AD have failed, and there is currently no approved disease-modifying treatment. Biomarkers for early detection and mechanistic understanding of disease course are critical for drug development and clinical trials. Amyloid has been the focus of most biomarker research. Here, we developed a deep learning-based framework to identify informative features for AD classification using tau positron emission tomography (PET) scans. RESULTS: The 3D convolutional neural network (CNN)-based classification model of AD from cognitively normal (CN) yielded an average accuracy of 90.8% based on five-fold cross-validation. The LRP model identified the brain regions in tau PET images that contributed most to the AD classification from CN. The top identified regions included the hippocampus, parahippocampus, thalamus, and fusiform. The layer-wise relevance propagation (LRP) results were consistent with those from the voxel-wise analysis in SPM12, showing significant focal AD associated regional tau deposition in the bilateral temporal lobes including the entorhinal cortex. The AD probability scores calculated by the classifier were correlated with brain tau deposition in the medial temporal lobe in MCI participants (r = 0.43 for early MCI and r = 0.49 for late MCI). CONCLUSION: A deep learning framework combining 3D CNN and LRP algorithms can be used with tau PET images to identify informative features for AD classification and may have application for early detection during prodromal stages of AD.
URL:
Uncovering the heterogeneity and temporal complexity of neurodegenerative diseases with Subtype and Stage Inference.
The heterogeneity of neurodegenerative diseases is a key confound to disease understanding and treatment development, as study cohorts typically include multiple phenotypes on distinct disease trajectories. Here we introduce a machine-learning technique-Subtype and Stage Inference (SuStaIn)-able to uncover data-driven disease phenotypes with distinct temporal progression patterns, from widely available cross-sectional patient studies. Results from imaging studies in two neurodegenerative diseases reveal subgroups and their distinct trajectories of regional neurodegeneration. In genetic frontotemporal dementia, SuStaIn identifies genotypes from imaging alone, validating its ability to identify subtypes; further the technique reveals within-genotype heterogeneity. In Alzheimer’s disease, SuStaIn uncovers three subtypes, uniquely characterising their temporal complexity. SuStaIn provides fine-grained patient stratification, which substantially enhances the ability to predict conversion between diagnostic categories over standard models that ignore subtype (p = 7.18 x 10-4) or temporal stage (p = 3.96 x 10-5). SuStaIn offers new promise for enabling disease subtype discovery and precision medicine.
URL:
PPAD: a deep learning architecture to predict progression of Alzheimer’s disease.
MOTIVATION: Alzheimer’s disease (AD) is a neurodegenerative disease that affects millions of people worldwide. Mild cognitive impairment (MCI) is an intermediary stage between cognitively normal state and AD. Not all people who have MCI convert to AD. The diagnosis of AD is made after significant symptoms of dementia such as short-term memory loss are already present. Since AD is currently an irreversible disease, diagnosis at the onset of the disease brings a huge burden on patients, their caregivers, and the healthcare sector. Thus, there is a crucial need to develop methods for the early prediction AD for patients who have MCI. Recurrent neural networks (RNN) have been successfully used to handle electronic health records (EHR) for predicting conversion from MCI to AD. However, RNN ignores irregular time intervals between successive events which occurs common in electronic health record data. In this study, we propose two deep learning architectures based on RNN, namely Predicting Progression of Alzheimer’s Disease (PPAD) and PPAD-Autoencoder. PPAD and PPAD-Autoencoder are designed for early predicting conversion from MCI to AD at the next visit and multiple visits ahead for patients, respectively. To minimize the effect of the irregular time intervals between visits, we propose using age in each visit as an indicator of time change between successive visits. RESULTS: Our experimental results conducted on Alzheimer’s Disease Neuroimaging Initiative and National Alzheimer’s Coordinating Center datasets showed that our proposed models outperformed all baseline models for most prediction scenarios in terms of F2 and sensitivity. We also observed that the age feature was one of top features and was able to address irregular time interval problem. AVAILABILITY AND IMPLEMENTATION: https://github.com/bozdaglab/PPAD.
URL: https://github.com/bozdaglab/PPAD.
Prediction of disease-free survival for precision medicine using cooperative learning on multi-omic data.
In precision medicine, both predicting the disease susceptibility of an individual and forecasting its disease-free survival are areas of key research. Besides the classical epidemiological predictor variables, data from multiple (omic) platforms are increasingly available. To integrate this wealth of information, we propose new methodology to combine both cooperative learning, a recent approach to leverage the predictive power of several datasets, and polygenic hazard score models. Polygenic hazard score models provide a practitioner with a more differentiated view of the predicted disease-free survival than the one given by merely a point estimate, for instance computed with a polygenic risk score. Our aim is to leverage the advantages of cooperative learning for the computation of polygenic hazard score models via Cox’s proportional hazard model, thereby improving the prediction of the disease-free survival. In our experimental study, we apply our methodology to forecast the disease-free survival for Alzheimer’s disease (AD) using three layers of data. One layer contains epidemiological variables such as sex, APOE (apolipoprotein E, a genetic risk factor for AD) status and 10 leading principal components. Another layer contains selected genomic loci, and the last layer contains methylation data for selected CpG sites. We demonstrate that the survival curves computed via cooperative learning yield an AUC of around $0.7$, above the state-of-the-art performance of its competitors. Importantly, the proposed methodology returns (1) a linear score that can be easily interpreted (in contrast to machine learning approaches), and (2) a weighting of the predictive power of the involved data layers, allowing for an assessment of the importance of each omic (or other) platform. Similarly to polygenic hazard score models, our methodology also allows one to compute individual survival curves for each patient.
URL:
Identification of early mild cognitive impairment using multi-modal data and graph convolutional networks.
BACKGROUND: The identification of early mild cognitive impairment (EMCI), which is an early stage of Alzheimer’s disease (AD) and is associated with brain structural and functional changes, is still a challenging task. Recent studies show great promises for improving the performance of EMCI identification by combining multiple structural and functional features, such as grey matter volume and shortest path length. However, extracting which features and how to combine multiple features to improve the performance of EMCI identification have always been a challenging problem. To address this problem, in this study we propose a new EMCI identification framework using multi-modal data and graph convolutional networks (GCNs). Firstly, we extract grey matter volume and shortest path length of each brain region based on automated anatomical labeling (AAL) atlas as feature representation from T1w MRI and rs-fMRI data of each subject, respectively. Then, in order to obtain features that are more helpful in identifying EMCI, a common multi-task feature selection method is applied. Afterwards, we construct a non-fully labelled subject graph using imaging and non-imaging phenotypic measures of each subject. Finally, a GCN model is adopted to perform the EMCI identification task. RESULTS: Our proposed EMCI identification method is evaluated on 210 subjects, including 105 subjects with EMCI and 105 normal controls (NCs), with both T1w MRI and rs-fMRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Experimental results show that our proposed framework achieves an accuracy of 84.1% and an area under the receiver operating characteristic (ROC) curve (AUC) of 0.856 for EMCI/NC classification. In addition, by comparison, the accuracy and AUC values of our proposed framework are better than those of some existing methods in EMCI identification. CONCLUSION: Our proposed EMCI identification framework is effective and promising for automatic diagnosis of EMCI in clinical practice.
URL:
Ensemble sparse classification of Alzheimer’s disease.
The high-dimensional pattern classification methods, e.g., support vector machines (SVM), have been widely investigated for analysis of structural and functional brain images (such as magnetic resonance imaging (MRI)) to assist the diagnosis of Alzheimer’s disease (AD) including its prodromal stage, i.e., mild cognitive impairment (MCI). Most existing classification methods extract features from neuroimaging data and then construct a single classifier to perform classification. However, due to noise and small sample size of neuroimaging data, it is challenging to train only a global classifier that can be robust enough to achieve good classification performance. In this paper, instead of building a single global classifier, we propose a local patch-based subspace ensemble method which builds multiple individual classifiers based on different subsets of local patches and then combines them for more accurate and robust classification. Specifically, to capture the local spatial consistency, each brain image is partitioned into a number of local patches and a subset of patches is randomly selected from the patch pool to build a weak classifier. Here, the sparse representation-based classifier (SRC) method, which has shown to be effective for classification of image data (e.g., face), is used to construct each weak classifier. Then, multiple weak classifiers are combined to make the final decision. We evaluate our method on 652 subjects (including 198 AD patients, 225 MCI and 229 normal controls) from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using MR images. The experimental results show that our method achieves an accuracy of 90.8% and an area under the ROC curve (AUC) of 94.86% for AD classification and an accuracy of 87.85% and an AUC of 92.90% for MCI classification, respectively, demonstrating a very promising performance of our method compared with the state-of-the-art methods for AD/MCI classification using MR images.
URL:
Simultaneous segmentation and grading of anatomical structures for patient’s classification: application to Alzheimer’s disease.
In this paper, we propose an innovative approach to robustly and accurately detect Alzheimer’s disease (AD) based on the distinction of specific atrophic patterns of anatomical structures such as hippocampus (HC) and entorhinal cortex (EC). The proposed method simultaneously performs segmentation and grading of structures to efficiently capture the anatomical alterations caused by AD. Known as SNIPE (Scoring by Non-local Image Patch Estimator), the novel proposed grading measure is based on a nonlocal patch-based frame-work and estimates the similarity of the patch surrounding the voxel under study with all the patches present in different training populations. In this study, the training library was composed of two populations: 50 cognitively normal subjects (CN) and 50 patients with AD, randomly selected from the ADNI database. During our experiments, the classification accuracy of patients (CN vs. AD) using several biomarkers was compared: HC and EC volumes, the grade of these structures and finally the combination of their volume and their grade. Tests were completed in a leave-one-out framework using discriminant analysis. First, we showed that biomarkers based on HC provide better classification accuracy than biomarkers based on EC. Second, we demonstrated that structure grading is a more powerful measure than structure volume to distinguish both populations with a classification accuracy of 90%. Finally, by adding the ages of subjects in order to better separate age-related structural changes from disease-related anatomical alterations, SNIPE obtained a classification accuracy of 93%.
URL:
DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction.
BACKGROUND: Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD: To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS: We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer’s disease). CONCLUSION: We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
URL:
Prediction of Alzheimer’s disease progression within 6 years using speech: A novel approach leveraging language models.
INTRODUCTION: Identification of individuals with mild cognitive impairment (MCI) who are at risk of developing Alzheimer’s disease (AD) is crucial for early intervention and selection of clinical trials. METHODS: We applied natural language processing techniques along with machine learning methods to develop a method for automated prediction of progression to AD within 6 years using speech. The study design was evaluated on the neuropsychological test interviews of n = 166 participants from the Framingham Heart Study, comprising 90 progressive MCI and 76 stable MCI cases. RESULTS: Our best models, which used features generated from speech data, as well as age, sex, and education level, achieved an accuracy of 78.5% and a sensitivity of 81.1% to predict MCI-to-AD progression within 6 years. DISCUSSION: The proposed method offers a fully automated procedure, providing an opportunity to develop an inexpensive, broadly accessible, and easy-to-administer screening tool for MCI-to-AD progression prediction, facilitating development of remote assessment. HIGHLIGHTS: Voice recordings from neuropsychological exams coupled with basic demographics can lead to strong predictive models of progression to dementia from mild cognitive impairment. The study leveraged AI methods for speech recognition and processed the resulting text using language models. The developed AI-powered pipeline can lead to fully automated assessment that could enable remote and cost-effective screening and prognosis for Alzehimer’s disease.
URL:
Identification of associations between genotypes and longitudinal phenotypes via temporally-constrained group sparse canonical correlation analysis.
MOTIVATION: Neuroimaging genetics identifies the relationships between genetic variants (i.e., the single nucleotide polymorphisms) and brain imaging data to reveal the associations from genotypes to phenotypes. So far, most existing machine-learning approaches are widely used to detect the effective associations between genetic variants and brain imaging data at one time-point. However, those associations are based on static phenotypes and ignore the temporal dynamics of the phenotypical changes. The phenotypes across multiple time-points may exhibit temporal patterns that can be used to facilitate the understanding of the degenerative process. In this article, we propose a novel temporally constrained group sparse canonical correlation analysis (TGSCCA) framework to identify genetic associations with longitudinal phenotypic markers. RESULTS: The proposed TGSCCA method is able to capture the temporal changes in brain from longitudinal phenotypes by incorporating the fused penalty, which requires that the differences between two consecutive canonical weight vectors from adjacent time-points should be small. A new efficient optimization algorithm is designed to solve the objective function. Furthermore, we demonstrate the effectiveness of our algorithm on both synthetic and real data (i.e., the Alzheimer’s Disease Neuroimaging Initiative cohort, including progressive mild cognitive impairment, stable MCI and Normal Control participants). In comparison with conventional SCCA, our proposed method can achieve strong associations and discover phenotypic biomarkers across multiple time-points to guide disease-progressive interpretation. AVAILABILITY AND IMPLEMENTATION: The Matlab code is available at https://sourceforge.net/projects/ibrain-cn/files/ . CONTACT: dqzhang@nuaa.edu.cn or shenli@iu.edu.
URL: https://sourceforge.net/projects/ibrain-cn/files/
TA-RNN: an attention-based time-aware recurrent neural network architecture for electronic health records.
MOTIVATION: Electronic health records (EHRs) represent a comprehensive resource of a patient’s medical history. EHRs are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as recurrent neural networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely time-aware RNN (TA-RNN) and TA-RNN-autoencoder (TA-RNN-AE) to predict patient’s clinical outcome in EHR at the next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit. RESULTS: The results of the experiments conducted on Alzheimer’s Disease Neuroimaging Initiative (ADNI) and National Alzheimer’s Coordinating Center (NACC) datasets indicated the superior performance of proposed models for predicting Alzheimer’s Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on the Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions. AVAILABILITY AND IMPLEMENTATION: https://github.com/bozdaglab/TA-RNN.
URL: https://github.com/bozdaglab/TA-RNN.
A Deep Generative-Discriminative Learning for Multimodal Representation in Imaging Genetics.
Imaging genetics, one of the foremost emerging topics in the medical imaging field, analyzes the inherent relations between neuroimaging and genetic data. As deep learning has gained widespread acceptance in many applications, pioneering studies employed deep learning frameworks for imaging genetics. However, existing approaches suffer from some limitations. First, they often adopt a simple strategy for joint learning of phenotypic and genotypic features. Second, their findings have not been extended to biomedical applications, e.g., degenerative brain disease diagnosis and cognitive score prediction. Finally, existing studies perform insufficient and inappropriate analyses from the perspective of data science and neuroscience. In this work, we propose a novel deep learning framework to simultaneously tackle the aforementioned issues. Our proposed framework learns to effectively represent the neuroimaging and the genetic data jointly, and achieves state-of-the-art performance when used for Alzheimer’s disease and mild cognitive impairment identification. Furthermore, unlike the existing methods, the framework enables learning the relation between imaging phenotypes and genotypes in a nonlinear way without any prior neuroscientific knowledge. To demonstrate the validity of our proposed framework, we conducted experiments on a publicly available dataset and analyzed the results from diverse perspectives. Based on our experimental results, we believe that the proposed framework has immense potential to provide new insights and perspectives in deep learning-based imaging genetics studies.
URL:
Identifying and ranking potential driver genes of Alzheimer’s disease using multiview evidence aggregation.
MOTIVATION: Late onset Alzheimer’s disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS: We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer’s. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer’s and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION: Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.
URL: https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.
A parameter-efficient deep learning approach to predict conversion from mild cognitive impairment to Alzheimer’s disease.
Some forms of mild cognitive impairment (MCI) are the clinical precursors of Alzheimer’s disease (AD), while other MCI types tend to remain stable over-time and do not progress to AD. To identify and choose effective and personalized strategies to prevent or slow the progression of AD, we need to develop objective measures that are able to discriminate the MCI patients who are at risk of AD from those MCI patients who have less risk to develop AD. Here, we present a novel deep learning architecture, based on dual learning and an ad hoc layer for 3D separable convolutions, which aims at identifying MCI patients who have a high likelihood of developing AD within 3 years. Our deep learning procedures combine structural magnetic resonance imaging (MRI), demographic, neuropsychological, and APOe4 genetic data as input measures. The most novel characteristics of our machine learning model compared to previous ones are the following: 1) our deep learning model is multi-tasking, in the sense that it jointly learns to simultaneously predict both MCI to AD conversion as well as AD vs. healthy controls classification, which facilitates relevant feature extraction for AD prognostication; 2) the neural network classifier employs fewer parameters than other deep learning architectures which significantly limits data-overfitting (we use ~550,000 network parameters, which is orders of magnitude lower than other network designs); 3) both structural MRI images and their warp field characteristics, which quantify local volumetric changes in relation to the MRI template, were used as separate input streams to extract as much information as possible from the MRI data. All analyses were performed on a subset of the database made publicly available via the Alzheimer’s Disease Neuroimaging Initiative (ADNI), (n = 785 participants, n = 192 AD patients, n = 409 MCI patients (including both MCI patients who convert to AD and MCI patients who do not covert to AD), and n = 184 healthy controls). The most predictive combination of inputs were the structural MRI images and the demographic, neuropsychological, and APOe4 data. In contrast, the warp field metrics were of little added predictive value. The algorithm was able to distinguish the MCI patients developing AD within 3 years from those patients with stable MCI over the same time-period with an area under the curve (AUC) of 0.925 and a 10-fold cross-validated accuracy of 86%, a sensitivity of 87.5%, and specificity of 85%. To our knowledge, this is the highest performance achieved so far using similar datasets. The same network provided an AUC of 1 and 100% accuracy, sensitivity, and specificity when classifying patients with AD from healthy controls. Our classification framework was also robust to the use of different co-registration templates and potentially irrelevant features/image portions. Our approach is flexible and can in principle integrate other imaging modalities, such as PET, and diverse other sets of clinical data. The convolutional framework is potentially applicable to any 3D image dataset and gives the flexibility to design a computer-aided diagnosis system targeting the prediction of several medical conditions and neuropsychiatric disorders via multi-modal imaging and tabular clinical data.
URL:
A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer’s disease.
Alzheimer’s disease (AD) is a progressive and irreversible brain degenerative disorder. Mild cognitive impairment (MCI) is a clinical precursor of AD. Although some treatments can delay its progression, no effective cures are available for AD. Accurate early-stage diagnosis of AD is vital for the prevention and intervention of the disease progression. Hippocampus is one of the first affected brain regions in AD. To help AD diagnosis, the shape and volume of the hippocampus are often measured using structural magnetic resonance imaging (MRI). However, these features encode limited information and may suffer from segmentation errors. Additionally, the extraction of these features is independent of the classification model, which could result in sub-optimal performance. In this study, we propose a multi-model deep learning framework based on convolutional neural network (CNN) for joint automatic hippocampal segmentation and AD classification using structural MRI data. Firstly, a multi-task deep CNN model is constructed for jointly learning hippocampal segmentation and disease classification. Then, we construct a 3D Densely Connected Convolutional Networks (3D DenseNet) to learn features of the 3D patches extracted based on the hippocampal segmentation results for the classification task. Finally, the learned features from the multi-task CNN and DenseNet models are combined to classify disease status. Our method is evaluated on the baseline T1-weighted structural MRI data collected from 97 AD, 233 MCI, 119 Normal Control (NC) subjects in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The proposed method achieves a dice similarity coefficient of 87.0% for hippocampal segmentation. In addition, the proposed method achieves an accuracy of 88.9% and an AUC (area under the ROC curve) of 92.5% for classifying AD vs. NC subjects, and an accuracy of 76.2% and an AUC of 77.5% for classifying MCI vs. NC subjects. Our empirical study also demonstrates that the proposed multi-model method outperforms the single-model methods and several other competing methods.
URL:
DeepAtrophy: Teaching a neural network to detect progressive changes in longitudinal MRI of the hippocampal region in Alzheimer’s disease.
Measures of change in hippocampal volume derived from longitudinal MRI are a well-studied biomarker of disease progression in Alzheimer’s disease (AD) and are used in clinical trials to track therapeutic efficacy of disease-modifying treatments. However, longitudinal MRI change measures based on deformable registration can be confounded by MRI artifacts, resulting in over-estimation or underestimation of hippocampal atrophy. For example, the deformation-based-morphometry method ALOHA (Das et al., 2012) finds an increase in hippocampal volume in a substantial proportion of longitudinal scan pairs from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, unexpected, given that the hippocampal gray matter is lost with age and disease progression. We propose an alternative approach to quantify disease progression in the hippocampal region: to train a deep learning network (called DeepAtrophy) to infer temporal information from longitudinal scan pairs. The underlying assumption is that by learning to derive time-related information from scan pairs, the network implicitly learns to detect progressive changes that are related to aging and disease progression. Our network is trained using two categorical loss functions: one that measures the network’s ability to correctly order two scans from the same subject, input in arbitrary order; and another that measures the ability to correctly infer the ratio of inter-scan intervals between two pairs of same-subject input scans. When applied to longitudinal MRI scan pairs from subjects unseen during training, DeepAtrophy achieves greater accuracy in scan temporal ordering and interscan interval inference tasks than ALOHA (88.5% vs. 75.5% and 81.1% vs. 75.0%, respectively). A scalar measure of time-related change in a subject level derived from DeepAtrophy is then examined as a biomarker of disease progression in the context of AD clinical trials. We find that this measure performs on par with ALOHA in discriminating groups of individuals at different stages of the AD continuum. Overall, our results suggest that using deep learning to infer temporal information from longitudinal MRI of the hippocampal region has good potential as a biomarker of disease progression, and hints that combining this approach with conventional deformation-based morphometry algorithms may lead to improved biomarkers in the future.
URL:
Dynamic brain fluctuations outperform connectivity measures and mirror pathophysiological profiles across dementia subtypes: A multicenter study.
From molecular mechanisms to global brain networks, atypical fluctuations are the hallmark of neurodegeneration. Yet, traditional fMRI research on resting-state networks (RSNs) has favored static and average connectivity methods, which by overlooking the fluctuation dynamics triggered by neurodegeneration, have yielded inconsistent results. The present multicenter study introduces a data-driven machine learning pipeline based on dynamic connectivity fluctuation analysis (DCFA) on RS-fMRI data from 300 participants belonging to three groups: behavioral variant frontotemporal dementia (bvFTD) patients, Alzheimer’s disease (AD) patients, and healthy controls. We considered non-linear oscillatory patterns across combined and individual resting-state networks (RSNs), namely: the salience network (SN), mostly affected in bvFTD; the default mode network (DMN), mostly affected in AD; the executive network (EN), partially compromised in both conditions; the motor network (MN); and the visual network (VN). These RSNs were entered as features for dementia classification using a recent robust machine learning approach (a Bayesian hyperparameter tuned Gradient Boosting Machines (GBM) algorithm), across four independent datasets with different MR scanners and recording parameters. The machine learning classification accuracy analysis revealed a systematic and unique tailored architecture of RSN disruption. The classification accuracy ranking showed that the most affected networks for bvFTD were the SN + EN network pair (mean accuracy = 86.43%, AUC = 0.91, sensitivity = 86.45%, specificity = 87.54%); for AD, the DMN + EN network pair (mean accuracy = 86.63%, AUC = 0.89, sensitivity = 88.37%, specificity = 84.62%); and for the bvFTD vs. AD classification, the DMN + SN network pair (mean accuracy = 82.67%, AUC = 0.86, sensitivity = 81.27%, specificity = 83.01%). Moreover, the DFCA classification systematically outperformed canonical connectivity approaches (including both static and linear dynamic connectivity). Our findings suggest that non-linear dynamical fluctuations surpass two traditional seed-based functional connectivity approaches and provide a pathophysiological characterization of global brain networks in neurodegenerative conditions (AD and bvFTD) across multicenter data.
URL:
Whole genome deconvolution unveils Alzheimer’s resilient epigenetic signature.
Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) accurately depicts the chromatin regulatory state and altered mechanisms guiding gene expression in disease. However, bulk sequencing entangles information from different cell types and obscures cellular heterogeneity. To address this, we developed Cellformer, a deep learning method that deconvolutes bulk ATAC-seq into cell type-specific expression across the whole genome. Cellformer enables cost-effective cell type-specific open chromatin profiling in large cohorts. Applied to 191 bulk samples from 3 brain regions, Cellformer identifies cell type-specific gene regulatory mechanisms involved in resilience to Alzheimer’s disease, an uncommon group of cognitively healthy individuals that harbor a high pathological load of Alzheimer’s disease. Cell type-resolved chromatin profiling unveils cell type-specific pathways and nominates potential epigenetic mediators underlying resilience that may illuminate therapeutic opportunities to limit the cognitive impact of the disease. Cellformer is freely available to facilitate future investigations using high-throughput bulk ATAC-seq data.
URL:
Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects.
Mild cognitive impairment (MCI) is a transitional stage between age-related cognitive decline and Alzheimer’s disease (AD). For the effective treatment of AD, it would be important to identify MCI patients at high risk for conversion to AD. In this study, we present a novel magnetic resonance imaging (MRI)-based method for predicting the MCI-to-AD conversion from one to three years before the clinical diagnosis. First, we developed a novel MRI biomarker of MCI-to-AD conversion using semi-supervised learning and then integrated it with age and cognitive measures about the subjects using a supervised learning algorithm resulting in what we call the aggregate biomarker. The novel characteristics of the methods for learning the biomarkers are as follows: 1) We used a semi-supervised learning method (low density separation) for the construction of MRI biomarker as opposed to more typical supervised methods; 2) We performed a feature selection on MRI data from AD subjects and normal controls without using data from MCI subjects via regularized logistic regression; 3) We removed the aging effects from the MRI data before the classifier training to prevent possible confounding between AD and age related atrophies; and 4) We constructed the aggregate biomarker by first learning a separate MRI biomarker and then combining it with age and cognitive measures about the MCI subjects at the baseline by applying a random forest classifier. We experimentally demonstrated the added value of these novel characteristics in predicting the MCI-to-AD conversion on data obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. With the ADNI data, the MRI biomarker achieved a 10-fold cross-validated area under the receiver operating characteristic curve (AUC) of 0.7661 in discriminating progressive MCI patients (pMCI) from stable MCI patients (sMCI). Our aggregate biomarker based on MRI data together with baseline cognitive measurements and age achieved a 10-fold cross-validated AUC score of 0.9020 in discriminating pMCI from sMCI. The results presented in this study demonstrate the potential of the suggested approach for early AD diagnosis and an important role of MRI in the MCI-to-AD conversion prediction. However, it is evident based on our results that combining MRI data with cognitive test results improved the accuracy of the MCI-to-AD conversion prediction.
URL:
Reconstructing subject-specific effect maps.
Predictive models allow subject-specific inference when analyzing disease related alterations in neuroimaging data. Given a subject’s data, inference can be made at two levels: global, i.e. identifiying condition presence for the subject, and local, i.e. detecting condition effect on each individual measurement extracted from the subject’s data. While global inference is widely used, local inference, which can be used to form subject-specific effect maps, is rarely used because existing models often yield noisy detections composed of dispersed isolated islands. In this article, we propose a reconstruction method, named RSM, to improve subject-specific detections of predictive modeling approaches and in particular, binary classifiers. RSM specifically aims to reduce noise due to sampling error associated with using a finite sample of examples to train classifiers. The proposed method is a wrapper-type algorithm that can be used with different binary classifiers in a diagnostic manner, i.e. without information on condition presence. Reconstruction is posed as a Maximum-A-Posteriori problem with a prior model whose parameters are estimated from training data in a classifier-specific fashion. Experimental evaluation is performed on synthetically generated data and data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Results on synthetic data demonstrate that using RSM yields higher detection accuracy compared to using models directly or with bootstrap averaging. Analyses on the ADNI dataset show that RSM can also improve correlation between subject-specific detections in cortical thickness data and non-imaging markers of Alzheimer’s Disease (AD), such as the Mini Mental State Examination Score and Cerebrospinal Fluid amyloid-beta levels. Further reliability studies on the longitudinal ADNI dataset show improvement on detection reliability when RSM is used.
URL:
Identification of gene pathways implicated in Alzheimer’s disease using longitudinal imaging phenotypes with sparse regression.
We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer’s disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 99 probable AD patients and 164 healthy elderly controls in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathway database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing insulin signalling, vascular smooth muscle contraction and focal adhesion. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection. High ranking genes include a number previously linked in gene expression studies to beta-amyloid plaque formation in the AD brain (PIK3R3,PIK3CG,PRKCAandPRKCB), and to AD related changes in hippocampal gene expression (ADCY2, ACTN1, ACACA, and GNAI1). Other high ranking previously validated AD endophenotype-related genes include CR1, TOMM40 and APOE.
URL:
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants.
Deep neural networks (DNNs) have been successfully utilized in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. Here we consider the problem of scalable, robust variable selection in DNNs for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNNs due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: flexible modelling of the nonlinear effect of genetic variants to improve statistical power; multiple knockoffs in the input layer to rigorously control the false discovery rate; hierarchical layers to substantially reduce the number of weight parameters and activations, and improve computational efficiency; and stabilized feature selection to reduce the randomness in identified signals. We evaluate the proposed method in extensive simulation studies and apply it to the analysis of Alzheimer’s disease genetics. We show that the proposed method, when compared with conventional linear and nonlinear methods, can lead to substantially more discoveries.
URL:
Learning to synthesise the ageing brain without longitudinal data.
How will my face look when I get older? Or, for a more challenging question: How will my brain look when I get older? To answer this question one must devise (and learn from data) a multivariate auto-regressive function which given an image and a desired target age generates an output image. While collecting data for faces may be easier, collecting longitudinal brain data is not trivial. We propose a deep learning-based method that learns to simulate subject-specific brain ageing trajectories without relying on longitudinal data. Our method synthesises images conditioned on two factors: age (a continuous variable), and status of Alzheimer’s Disease (AD, an ordinal variable). With an adversarial formulation we learn the joint distribution of brain appearance, age and AD status, and define reconstruction losses to address the challenging problem of preserving subject identity. We compare with several benchmarks using two widely used datasets. We evaluate the quality and realism of synthesised images using ground-truth longitudinal data and a pre-trained age predictor. We show that, despite the use of cross-sectional data, our model learns patterns of gray matter atrophy in the middle temporal gyrus in patients with AD. To demonstrate generalisation ability, we train on one dataset and evaluate predictions on the other. In conclusion, our model shows an ability to separate age, disease influence and anatomy using only 2D cross-sectional data that should be useful in large studies into neurodegenerative disease, that aim to combine several data sources. To facilitate such future studies by the community at large our code is made available at https://github.com/xiat0616/BrainAgeing.
URL: https://github.com/xiat0616/BrainAgeing.
Diagnostic Evidence GAuge of Single cells (DEGAS): a flexible deep transfer learning framework for prioritizing cells in relation to disease.
We propose DEGAS (Diagnostic Evidence GAuge of Single cells), a novel deep transfer learning framework, to transfer disease information from patients to cells. We call such transferrable information “impressions,” which allow individual cells to be associated with disease attributes like diagnosis, prognosis, and response to therapy. Using simulated data and ten diverse single-cell and patient bulk tissue transcriptomic datasets from glioblastoma multiforme (GBM), Alzheimer’s disease (AD), and multiple myeloma (MM), we demonstrate the feasibility, flexibility, and broad applications of the DEGAS framework. DEGAS analysis on myeloma single-cell transcriptomics identified PHF19high myeloma cells associated with progression. Availability: https://github.com/tsteelejohnson91/DEGAS .
URL: https://github.com/tsteelejohnson91/DEGAS
A novel generation adversarial network framework with characteristics aggregation and diffusion for brain disease classification and feature selection.
Imaging genetics provides unique insights into the pathological studies of complex brain diseases by integrating the characteristics of multi-level medical data. However, most current imaging genetics research performs incomplete data fusion. Also, there is a lack of effective deep learning methods to analyze neuroimaging and genetic data jointly. Therefore, this paper first constructs the brain region-gene networks to intuitively represent the association pattern of pathogenetic factors. Second, a novel feature information aggregation model is constructed to accurately describe the information aggregation process among brain region nodes and gene nodes. Finally, a deep learning method called feature information aggregation and diffusion generative adversarial network (FIAD-GAN) is proposed to efficiently classify samples and select features. We focus on improving the generator with the proposed convolution and deconvolution operations, with which the interpretability of the deep learning framework has been dramatically improved. The experimental results indicate that FIAD-GAN can not only achieve superior results in various disease classification tasks but also extract brain regions and genes closely related to AD. This work provides a novel method for intelligent clinical decisions. The relevant biomedical discoveries provide a reliable reference and technical basis for the clinical diagnosis, treatment and pathological analysis of disease.
URL:
Long range early diagnosis of Alzheimer’s disease using longitudinal MR imaging data.
The enormous social and economic cost of Alzheimer’s disease (AD) has driven a number of neuroimaging investigations for early detection and diagnosis. Towards this end, various computational approaches have been applied to longitudinal imaging data in subjects with Mild Cognitive Impairment (MCI), as serial brain imaging could increase sensitivity for detecting changes from baseline, and potentially serve as a diagnostic biomarker for AD. However, current state-of-the-art brain imaging diagnostic methods have limited utility in clinical practice due to the lack of robust predictive power. To address this limitation, we propose a flexible spatial-temporal solution to predict the risk of MCI conversion to AD prior to the onset of clinical symptoms by sequentially recognizing abnormal structural changes from longitudinal magnetic resonance (MR) image sequences. Firstly, our model is trained to sequentially recognize different length partial MR image sequences from different stages of AD. Secondly, our method is leveraged by the inexorably progressive nature of AD. To that end, a Temporally Structured Support Vector Machine (TS-SVM) model is proposed to constrain the partial MR image sequence’s detection score to increase monotonically with AD progression. Furthermore, in order to select the best morphological features for enabling classifiers, we propose a joint feature selection and classification framework. We demonstrate that our early diagnosis method using only two follow-up MR scans is able to predict conversion to AD 12 months ahead of an AD clinical diagnosis with 81.75% accuracy.
URL:
Disentangling Normal Aging From Severity of Disease via Weak Supervision on Longitudinal MRI.
The continuous progression of neurological diseases are often categorized into conditions according to their severity. To relate the severity to changes in brain morphometry, there is a growing interest in replacing these categories with a continuous severity scale that longitudinal MRIs are mapped onto via deep learning algorithms. However, existing methods based on supervised learning require large numbers of samples and those that do not, such as self-supervised models, fail to clearly separate the disease effect from normal aging. Here, we propose to explicitly disentangle those two factors via weak-supervision. In other words, training is based on longitudinal MRIs being labelled either normal or diseased so that the training data can be augmented with samples from disease categories that are not of primary interest to the analysis. We do so by encouraging trajectories of controls to be fully encoded by the direction associated with brain aging. Furthermore, an orthogonal direction linked to disease severity captures the residual component from normal aging in the diseased cohort. Hence, the proposed method quantifies disease severity and its progression speed in individuals without knowing their condition. We apply the proposed method on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI, N =632 ). We then show that the model properly disentangled normal aging from the severity of cognitive impairment by plotting the resulting disentangled factors of each subject and generating simulated MRIs for a given chronological age and condition. Moreover, our representation obtains higher balanced accuracy when used for two downstream classification tasks compared to other pre-training approaches. The code for our weak-supervised approach is available at https://github.com/ouyangjiahong/longitudinal-direction-disentangle.
URL: https://github.com/ouyangjiahong/longitudinal-direction-disentangle.
SR-TWAS: leveraging multiple reference panels to improve transcriptome-wide association study power by ensemble machine learning.
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for transcriptome-wide association studies (TWAS). To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies show that SR-TWAS improves power, due to increased training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our real studies identify 6 independent significant risk genes for Alzheimer’s disease (AD) dementia for supplementary motor area tissue and 9 independent significant risk genes for Parkinson’s disease (PD) for substantia nigra tissue. Relevant biological interpretations are found for these significant risk genes.
URL:
Adapting to evolving MRI data: A transfer learning approach for Alzheimer’s disease prediction.
Integrating 3D magnetic resonance imaging (MRI) with machine learning has shown promising results in healthcare, especially in detecting Alzheimer’s Disease (AD). However, changes in MRI technologies and acquisition protocols often yield limited data, leading to potential overfitting. This study explores Transfer Learning (TL) approaches to enhance AD diagnosis using a Baseline model consisting of a 3D-Convolutional Neural Network trained on 80 3T MRI scans. Two scenarios are explored: (A) utilizing historical data to address changes in MRI acquisitions (from 1.5T to 3T MRI), and (B) adapting 2D models pre-trained on ImageNet (ResNet18, ResNet50, ResNet101) for 3D image processing when historical data is unavailable. In both scenarios, two modeling approaches are tested. The General Approach involves distinct feature extraction and classification steps, using Radiomic features and TL-based features evaluated with six classifiers. The Deep Approach integrates these steps by fine-tuning the pre-trained models for AD diagnosis. In scenario (A), TL significantly boosts the Baseline’s accuracy from 63% to 99%. In scenario (B), Radiomic features better represents 3D MRI than TL-features in the General Approach. Nonetheless, fine-tuning models pre-trained on natural images can increase the Baseline’s accuracy by up to 12 percentage points, achieving an overall accuracy of 83%.
URL:
Probabilistic modeling of anatomical variability using a low dimensional parameterization of diffeomorphisms.
We present an efficient probabilistic model of anatomical variability in a linear space of initial velocities of diffeomorphic transformations and demonstrate its benefits in clinical studies of brain anatomy. To overcome the computational challenges of the high dimensional deformation-based descriptors, we develop a latent variable model for principal geodesic analysis (PGA) based on a low dimensional shape descriptor that effectively captures the intrinsic variability in a population. We define a novel shape prior that explicitly represents principal modes as a multivariate complex Gaussian distribution on the initial velocities in a bandlimited space. We demonstrate the performance of our model on a set of 3D brain MRI scans from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Our model yields a more compact representation of group variation at substantially lower computational cost than the state-of-the-art method such as tangent space PCA (TPCA) and probabilistic principal geodesic analysis (PPGA) that operate in the high dimensional image space.
URL:
Learning genetic epistasis using Bayesian network scoring criteria.
BACKGROUND: Gene-gene epistatic interactions likely play an important role in the genetic basis of many common diseases. Recently, machine-learning and data mining methods have been developed for learning epistatic relationships from data. A well-known combinatorial method that has been successfully applied for detecting epistasis is Multifactor Dimensionality Reduction (MDR). Jiang et al. created a combinatorial epistasis learning method called BNMBL to learn Bayesian network (BN) epistatic models. They compared BNMBL to MDR using simulated data sets. Each of these data sets was generated from a model that associates two SNPs with a disease and includes 18 unrelated SNPs. For each data set, BNMBL and MDR were used to score all 2-SNP models, and BNMBL learned significantly more correct models. In real data sets, we ordinarily do not know the number of SNPs that influence phenotype. BNMBL may not perform as well if we also scored models containing more than two SNPs. Furthermore, a number of other BN scoring criteria have been developed. They may detect epistatic interactions even better than BNMBL.Although BNs are a promising tool for learning epistatic relationships from data, we cannot confidently use them in this domain until we determine which scoring criteria work best or even well when we try learning the correct model without knowledge of the number of SNPs in that model. RESULTS: We evaluated the performance of 22 BN scoring criteria using 28,000 simulated data sets and a real Alzheimer’s GWAS data set. Our results were surprising in that the Bayesian scoring criterion with large values of a hyperparameter called alpha performed best. This score performed better than other BN scoring criteria and MDR at recall using simulated data sets, at detecting the hardest-to-detect models using simulated data sets, and at substantiating previous results using the real Alzheimer’s data set. CONCLUSIONS: We conclude that representing epistatic interactions using BN models and scoring them using a BN scoring criterion holds promise for identifying epistatic genetic variants in data. In particular, the Bayesian scoring criterion with large values of a hyperparameter alpha appears more promising than a number of alternatives.
URL:
MGN-Net: A multi-view graph normalizer for integrating heterogeneous biological network populations.
With the recent technological advances, biological datasets, often represented by networks (i.e., graphs) of interacting entities, proliferate with unprecedented complexity and heterogeneity. Although modern network science opens new frontiers of analyzing connectivity patterns in such datasets, we still lack data-driven methods for extracting an integral connectional fingerprint of a multi-view graph population, let alone disentangling the typical from the atypical variations across the population samples. We present the multi-view graph normalizer network (MGN-Net2), a graph neural network based method to normalize and integrate a set of multi-view biological networks into a single connectional template that is centered, representative, and topologically sound. We demonstrate the use of MGN-Net by discovering the connectional fingerprints of healthy and neurologically disordered brain network populations including Alzheimer’s disease and Autism spectrum disorder patients. Additionally, by comparing the learned templates of healthy and disordered populations, we show that MGN-Net significantly outperforms conventional network integration methods across extensive experiments in terms of producing the most centered templates, recapitulating unique traits of populations, and preserving the complex topology of biological networks. Our evaluations showed that MGN-Net is powerfully generic and easily adaptable in design to different graph-based problems such as identification of relevant connections, normalization and integration.
URL:
Discovering network phenotype between genetic risk factors and disease status via diagnosis-aligned multi-modality regression method in Alzheimer’s disease.
MOTIVATION: Neuroimaging genetics is an emerging field to identify the associations between genetic variants [e.g. single-nucleotide polymorphisms (SNPs)] and quantitative traits (QTs) such as brain imaging phenotypes. However, most of the current studies focus only on the associations between brain structure imaging and genetic variants, while neglecting the connectivity information between brain regions. In addition, the brain itself is a complex network, and the higher-order interaction may contain useful information for the mechanistic understanding of diseases [i.e. Alzheimer’s disease (AD)]. RESULTS: A general framework is proposed to exploit network voxel information and network connectivity information as intermediate traits that bridge genetic risk factors and disease status. Specifically, we first use the sparse representation (SR) model to build hyper-network to express the connectivity features of the brain. The network voxel node features and network connectivity edge features are extracted from the structural magnetic resonance imaging (sMRI) and resting-state functional magnetic resonance imaging (fMRI), respectively. Second, a diagnosis-aligned multi-modality regression method is adopted to fully explore the relationships among modalities of different subjects, which can help further mine the relation between the risk genetics and brain network features. In experiments, all methods are tested on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. The experimental results not only verify the effectiveness of our proposed framework but also discover some brain regions and connectivity features that are highly related to diseases. AVAILABILITY AND IMPLEMENTATION: The Matlab code is available at http://ibrain.nuaa.edu.cn/2018/list.htm.
URL: http://ibrain.nuaa.edu.cn/2018/list.htm.
Optimizing clinico-genomic disease prediction across ancestries: a machine learning strategy with Pareto improvement.
BACKGROUND: Accurate prediction of an individual’s predisposition to diseases is vital for preventive medicine and early intervention. Various statistical and machine learning models have been developed for disease prediction using clinico-genomic data. However, the accuracy of clinico-genomic prediction of diseases may vary significantly across ancestry groups due to their unequal representation in clinical genomic datasets. METHODS: We introduced a deep transfer learning approach to improve the performance of clinico-genomic prediction models for data-disadvantaged ancestry groups. We conducted machine learning experiments on multi-ancestral genomic datasets of lung cancer, prostate cancer, and Alzheimer’s disease, as well as on synthetic datasets with built-in data inequality and distribution shifts across ancestry groups. RESULTS: Deep transfer learning significantly improved disease prediction accuracy for data-disadvantaged populations in our multi-ancestral machine learning experiments. In contrast, transfer learning based on linear frameworks did not achieve comparable improvements for these data-disadvantaged populations. CONCLUSIONS: This study shows that deep transfer learning can enhance fairness in multi-ancestral machine learning by improving prediction accuracy for data-disadvantaged populations without compromising prediction accuracy for other populations, thus providing a Pareto improvement towards equitable clinico-genomic prediction of diseases.
URL:
High-throughput and efficient multilocus genome-wide association study on longitudinal outcomes.
MOTIVATION: With the emerging of high-dimensional genomic data, genetic analysis such as genome-wide association studies (GWAS) have played an important role in identifying disease-related genetic variants and novel treatments. Complex longitudinal phenotypes are commonly collected in medical studies. However, since limited analytical approaches are available for longitudinal traits, these data are often underutilized. In this article, we develop a high-throughput machine learning approach for multilocus GWAS using longitudinal traits by coupling Empirical Bayesian Estimates from mixed-effects modeling with a novel l0-norm algorithm. RESULTS: Extensive simulations demonstrated that the proposed approach not only provided accurate selection of single nucleotide polymorphisms (SNPs) with comparable or higher power but also robust control of false positives. More importantly, this novel approach is highly scalable and could be approximately >1000 times faster than recently published approaches, making genome-wide multilocus analysis of longitudinal traits possible. In addition, our proposed approach can simultaneously analyze millions of SNPs if the computer memory allows, thereby potentially allowing a true multilocus analysis for high-dimensional genomic data. With application to the data from Alzheimer’s Disease Neuroimaging Initiative, we confirmed that our approach can identify well-known SNPs associated with AD and were much faster than recently published approaches (>=6000 times). AVAILABILITY AND IMPLEMENTATION: The source code and the testing datasets are available at https://github.com/Myuan2019/EBE_APML0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/Myuan2019/EBE_APML0.
Quantifying anatomical shape variations in neurological disorders.
We develop a multivariate analysis of brain anatomy to identify the relevant shape deformation patterns and quantify the shape changes that explain corresponding variations in clinical neuropsychological measures. We use kernel Partial Least Squares (PLS) and formulate a regression model in the tangent space of the manifold of diffeomorphisms characterized by deformation momenta. The scalar deformation momenta completely encode the diffeomorphic changes in anatomical shape. In this model, the clinical measures are the response variables, while the anatomical variability is treated as the independent variable. To better understand the “shape-clinical response” relationship, we also control for demographic confounders, such as age, gender, and years of education in our regression model. We evaluate the proposed methodology on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database using baseline structural MR imaging data and neuropsychological evaluation test scores. We demonstrate the ability of our model to quantify the anatomical deformations in units of clinical response. Our results also demonstrate that the proposed method is generic and generates reliable shape deformations both in terms of the extracted patterns and the amount of shape changes. We found that while the hippocampus and amygdala emerge as mainly responsible for changes in test scores for global measures of dementia and memory function, they are not a determinant factor for executive function. Another critical finding was the appearance of thalamus and putamen as most important regions that relate to executive function. These resulting anatomical regions were consistent with very high confidence irrespective of the size of the population used in the study. This data-driven global analysis of brain anatomy was able to reach similar conclusions as other studies in Alzheimer’s disease based on predefined ROIs, together with the identification of other new patterns of deformation. The proposed methodology thus holds promise for discovering new patterns of shape changes in the human brain that could add to our understanding of disease progression in neurological disorders.
URL:
Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease.
Scanning the entire genome in search of variants related to imaging phenotypes holds great promise in elucidating the genetic etiology of neurodegenerative disorders. Here we discuss the application of a penalized multivariate model, sparse reduced-rank regression (sRRR), for the genome-wide detection of markers associated with voxel-wise longitudinal changes in the brain caused by Alzheimer’s disease (AD). Using a sample from the Alzheimer’s Disease Neuroimaging Initiative database, we performed three separate studies that each compared two groups of individuals to identify genes associated with disease development and progression. For each comparison we took a two-step approach: initially, using penalized linear discriminant analysis, we identified voxels that provide an imaging signature of the disease with high classification accuracy; then we used this multivariate biomarker as a phenotype in a genome-wide association study, carried out using sRRR. The genetic markers were ranked in order of importance of association to the phenotypes using a data re-sampling approach. Our findings confirmed the key role of the APOE and TOMM40 genes but also highlighted some novel potential associations with AD.
URL:
Fully Automated Hippocampus Segmentation using T2-informed Deep Convolutional Neural Networks.
Hippocampal atrophy (tissue loss) has become a fundamental outcome parameter in clinical trials on Alzheimer’s disease. To accurately estimate hippocampus volume and track its volume loss, a robust and reliable segmentation is essential. Manual hippocampus segmentation is considered the gold standard but is extensive, time-consuming, and prone to rater bias. Therefore, it is often replaced by automated programs like FreeSurfer, one of the most commonly used tools in clinical research. Recently, deep learning-based methods have also been successfully applied to hippocampus segmentation. The basis of all approaches are clinically used T1-weighted whole-brain MR images with approximately 1mm isotropic resolution. However, such T1 images show low contrast-to-noise ratios (CNRs), particularly for many hippocampal substructures, limiting delineation reliability. To overcome these limitations, high-resolution T2-weighted scans are suggested for better visualization and delineation, as they show higher CNRs and usually allow for higher resolutions. Unfortunately, such time-consuming T2-weighted sequences are not feasible in a clinical routine. We propose an automated hippocampus segmentation pipeline leveraging deep learning with T2w MR images for enhanced hippocampus segmentation of clinical T1-weighted images based on a series of 3D convolutional neural networks and a specifically acquired multi-contrast dataset. This dataset consists of corresponding pairs of high-resolution T1- and T2-weighted images, with the T2 images only used to create more accurate manual ground truth annotations and to train the segmentation network. The T2-based ground truth labels were also used to evaluate all experiments by comparing the masks visually and by various quantitative measures. We compared our approach with four established state-of-the-art hippocampus segmentation algorithms (FreeSurfer, ASHS, HippoDeep, HippMapp3r) and demonstrated a superior segmentation performance. Moreover, we found that the automated segmentation of T1-weighted images benefits from the T2-based ground truth data. In conclusion, this work showed the beneficial use of high-resolution, T2-based ground truth data for training an automated, deep learning-based hippocampus segmentation and provides the basis for a reliable estimation of hippocampal atrophy in clinical studies.
URL:
Locally linear embedding (LLE) for MRI based Alzheimer’s disease classification.
Modern machine learning algorithms are increasingly being used in neuroimaging studies, such as the prediction of Alzheimer’s disease (AD) from structural MRI. However, finding a good representation for multivariate brain MRI features in which their essential structure is revealed and easily extractable has been difficult. We report a successful application of a machine learning framework that significantly improved the use of brain MRI for predictions. Specifically, we used the unsupervised learning algorithm of local linear embedding (LLE) to transform multivariate MRI data of regional brain volume and cortical thickness to a locally linear space with fewer dimensions, while also utilizing the global nonlinear data structure. The embedded brain features were then used to train a classifier for predicting future conversion to AD based on a baseline MRI. We tested the approach on 413 individuals from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) who had baseline MRI scans and complete clinical follow-ups over 3 years with the following diagnoses: cognitive normal (CN; n=137), stable mild cognitive impairment (s-MCI; n=93), MCI converters to AD (c-MCI, n=97), and AD (n=86). We found that classifications using embedded MRI features generally outperformed (p<0.05) classifications using the original features directly. Moreover, the improvement from LLE was not limited to a particular classifier but worked equally well for regularized logistic regressions, support vector machines, and linear discriminant analysis. Most strikingly, using LLE significantly improved (p=0.007) predictions of MCI subjects who converted to AD and those who remained stable (accuracy/sensitivity/specificity: =0.68/0.80/0.56). In contrast, predictions using the original features performed not better than by chance (accuracy/sensitivity/specificity: =0.56/0.65/0.46). In conclusion, LLE is a very effective tool for classification studies of AD using multivariate MRI data. The improvement in predicting conversion to AD in MCI could have important implications for health management and for powering therapeutic trials by targeting non-demented subjects who later convert to AD.
URL:
Automatic classification of AD pathology in FTD phenotypes using natural speech.
INTRODUCTION: Screening for Alzheimer’s disease neuropathologic change (ADNC) in individuals with atypical presentations is challenging but essential for clinical management. We trained automatic speech-based classifiers to distinguish frontotemporal dementia (FTD) patients with ADNC from those with frontotemporal lobar degeneration (FTLD). METHODS: We trained automatic classifiers with 99 speech features from 1 minute speech samples of 179 participants (ADNC = 36, FTLD = 60, healthy controls [HC] = 89). Patients’ pathology was assigned based on autopsy or cerebrospinal fluid analytes. Structural network-based magnetic resonance imaging analyses identified anatomical correlates of distinct speech features. RESULTS: Our classifier showed 0.88 +- $ pm $ 0.03 area under the curve (AUC) for ADNC versus FTLD and 0.93 +- $ pm $ 0.04 AUC for patients versus HC. Noun frequency and pause rate correlated with gray matter volume loss in the limbic and salience networks, respectively. DISCUSSION: Brief naturalistic speech samples can be used for screening FTD patients for underlying ADNC in vivo. This work supports the future development of digital assessment tools for FTD. HIGHLIGHTS: We trained machine learning classifiers for frontotemporal dementia patients using natural speech. We grouped participants by neuropathological diagnosis (autopsy) or cerebrospinal fluid biomarkers. Classifiers well distinguished underlying pathology (Alzheimer’s disease vs. frontotemporal lobar degeneration) in patients. We identified important features through an explainable artificial intelligence approach. This work lays the groundwork for a speech-based neuropathology screening tool.
URL:
NMF-SVM based CAD tool applied to functional brain images for the diagnosis of Alzheimer’s disease.
This paper presents a novel computer-aided diagnosis (CAD) technique for the early diagnosis of the Alzheimer’s disease (AD) based on nonnegative matrix factorization (NMF) and support vector machines (SVM) with bounds of confidence. The CAD tool is designed for the study and classification of functional brain images. For this purpose, two different brain image databases are selected: a single photon emission computed tomography (SPECT) database and positron emission tomography (PET) images, both of them containing data for both Alzheimer’s disease (AD) patients and healthy controls as a reference. These databases are analyzed by applying the Fisher discriminant ratio (FDR) and nonnegative matrix factorization (NMF) for feature selection and extraction of the most relevant features. The resulting NMF-transformed sets of data, which contain a reduced number of features, are classified by means of a SVM-based classifier with bounds of confidence for decision. The proposed NMF-SVM method yields up to 91% classification accuracy with high sensitivity and specificity rates (upper than 90%). This NMF-SVM CAD tool becomes an accurate method for SPECT and PET AD image classification.
URL:
A multi-view learning approach with diffusion model to synthesize FDG PET from MRI T1WI for diagnosis of Alzheimer’s disease.
INTRODUCTION: This study presents a novel multi-view learning approach for machine learning (ML)-based Alzheimer’s disease (AD) diagnosis. METHODS: A diffusion model is proposed to synthesize the fluorodeoxyglucose positron emission tomography (FDG PET) view from the magnetic resonance imaging T1 weighted imaging (MRI T1WI) view and incorporate two synthesis strategies: one-way synthesis and two-way synthesis. To assess the utility of the synthesized views, we use multilayer perceptron (MLP)-based classifiers with various combinations of the views. RESULTS: The two-way synthesis achieves state-of-the-art performance with a structural similarity index measure (SSIM) at 0.9380 and a peak-signal-to-noise ratio (PSNR) at 26.47. The one-way synthesis achieves an SSIM at 0.9282 and a PSNR at 23.83. Both synthesized FDG PET views have shown their effectiveness in improving diagnostic accuracy. DISCUSSION: This work supports the notion that ML-based cross-domain data synthesis can be a useful approach to improve AD diagnosis by providing additional synthesized disease-related views for multi-view learning. HIGHLIGHTS: We propose a diffusion model with two strategies to synthesize fluorodeoxyglucose positron emission tomography (FDG PET) from magnetic resonance imaging T1 weighted imaging (MRI T1WI). We raise multi-view learning with MRl T1Wl and synthesized FDG PET for Alzheimer’s disease (AD) diagnosis. We provide a comprehensive experimental comparison for the synthesized FDG PET view. The feasibility of synthesized FDG PET view in AD diagnosis is validated with various experiments. We demonstrate the ability of synthesized FDG PET to enhance the performance of machine learning-based AD diagnosis.
URL:
Towards a Holistic Cortical Thickness Descriptor: Heat Kernel-Based Grey Matter Morphology Signatures.
In this paper, we propose a heat kernel based regional shape descriptor that may be capable of better exploiting volumetric morphological information than other available methods, thereby improving statistical power on brain magnetic resonance imaging (MRI) analysis. The mechanism of our analysis is driven by the graph spectrum and the heat kernel theory, to capture the volumetric geometry information in the constructed tetrahedral meshes. In order to capture profound brain grey matter shape changes, we first use the volumetric Laplace-Beltrami operator to determine the point pair correspondence between white-grey matter and CSF-grey matter boundary surfaces by computing the streamlines in a tetrahedral mesh. Secondly, we propose multi-scale grey matter morphology signatures to describe the transition probability by random walk between the point pairs, which reflects the inherent geometric characteristics. Thirdly, a point distribution model is applied to reduce the dimensionality of the grey matter morphology signatures and generate the internal structure features. With the sparse linear discriminant analysis, we select a concise morphology feature set with improved classification accuracies. In our experiments, the proposed work outperformed the cortical thickness features computed by FreeSurfer software in the classification of Alzheimer’s disease and its prodromal stage, i.e., mild cognitive impairment, on publicly available data from the Alzheimer’s Disease Neuroimaging Initiative. The multi-scale and physics based volumetric structure feature may bring stronger statistical power than some traditional methods for MRI-based grey matter morphology analysis.
URL:
Genome-wide association neural networks identify genes linked to family history of Alzheimer’s disease.
Augmenting traditional genome-wide association studies (GWAS) with advanced machine learning algorithms can allow the detection of novel signals in available cohorts. We introduce “genome-wide association neural networks (GWANN)” a novel approach that uses neural networks (NNs) to perform a gene-level association study with family history of Alzheimer’s disease (AD). In UK Biobank, we defined cases (n = 42 110) as those with AD or family history of AD and sampled an equal number of controls. The data was split into an 80:20 ratio of training and testing samples, and GWANN was trained on the former followed by identifying associated genes using its performance on the latter. Our method identified 18 genes to be associated with family history of AD. APOE, BIN1, SORL1, ADAM10, APH1B, and SPI1 have been identified by previous AD GWAS. Among the 12 new genes, PCDH9, NRG3, ROR1, LINGO2, SMYD3, and LRRC7 have been associated with neurofibrillary tangles or phosphorylated tau in previous studies. Furthermore, there is evidence for differential transcriptomic or proteomic expression between AD and healthy brains for 10 of the 12 new genes. A series of post hoc analyses resulted in a significantly enriched protein-protein interaction network (P-value < 1 x 10-16), and enrichment of relevant disease and biological pathways such as focal adhesion (P-value = 1 x 10-4), extracellular matrix organization (P-value = 1 x 10-4), Hippo signaling (P-value = 7 x 10-4), Alzheimer’s disease (P-value = 3 x 10-4), and impaired cognition (P-value = 4 x 10-3). Applying NNs for GWAS illustrates their potential to complement existing algorithms and methods and enable the discovery of new associations without the need to expand existing cohorts.
URL:
Functional variants identify sex-specific genes and pathways in Alzheimer’s Disease.
The incidence of Alzheimer’s Disease in females is almost double that of males. To search for sex-specific gene associations, we build a machine learning approach focused on functionally impactful coding variants. This method can detect differences between sequenced cases and controls in small cohorts. In the Alzheimer’s Disease Sequencing Project with mixed sexes, this approach identified genes enriched for immune response pathways. After sex-separation, genes become specifically enriched for stress-response pathways in male and cell-cycle pathways in female. These genes improve disease risk prediction in silico and modulate Drosophila neurodegeneration in vivo. Thus, a general approach for machine learning on functionally impactful variants can uncover sex-specific candidates towards diagnostic biomarkers and therapeutic targets.
URL:
Degenerative adversarial neuroimage nets for brain scan simulations: Application in ageing and dementia.
Accurate and realistic simulation of high-dimensional medical images has become an important research area relevant to many AI-enabled healthcare applications. However, current state-of-the-art approaches lack the ability to produce satisfactory high-resolution and accurate subject-specific images. In this work, we present a deep learning framework, namely 4D-Degenerative Adversarial NeuroImage Net (4D-DANI-Net), to generate high-resolution, longitudinal MRI scans that mimic subject-specific neurodegeneration in ageing and dementia. 4D-DANI-Net is a modular framework based on adversarial training and a set of novel spatiotemporal, biologically-informed constraints. To ensure efficient training and overcome memory limitations affecting such high-dimensional problems, we rely on three key technological advances: i) a new 3D training consistency mechanism called Profile Weight Functions (PWFs), ii) a 3D super-resolution module and iii) a transfer learning strategy to fine-tune the system for a given individual. To evaluate our approach, we trained the framework on 9852 T1-weighted MRI scans from 876 participants in the Alzheimer’s Disease Neuroimaging Initiative dataset and held out a separate test set of 1283 MRI scans from 170 participants for quantitative and qualitative assessment of the personalised time series of synthetic images. We performed three evaluations: i) image quality assessment; ii) quantifying the accuracy of regional brain volumes over and above benchmark models; and iii) quantifying visual perception of the synthetic images by medical experts. Overall, both quantitative and qualitative results show that 4D-DANI-Net produces realistic, low-artefact, personalised time series of synthetic T1 MRI that outperforms benchmark models.
URL:
Disease Progression Modelling of Alzheimer’s Disease using Probabilistic Principal Components Analysis.
The recent biological redefinition of Alzheimer’s Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the estimation of patient-realigning time-shifts. These time-shifts are indispensable for meaningful biomarker modelling, but may impact fitting time or vary with missing data in jointly estimated models. In this work, we estimate an individual’s progression through Alzheimer’s disease by combining multiple biomarkers into a single value using a probabilistic formulation of principal components analysis. Our results show that this variable, which summarises AD through observable biomarkers, is remarkably similar to jointly estimated time-shifts when we compute our scores for the baseline visit, on cross-sectional data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Reproducing the expected properties of clinical datasets, we confirm that estimated scores are robust to missing data or unavailable biomarkers. In addition to cross-sectional insights, we can model the latent variable as an individual progression score by repeating estimations at follow-up examinations and refining long-term estimates as more data is gathered, which would be ideal in a clinical setting. Finally, we verify that our score can be used as a pseudo-temporal scale instead of age to ignore some patient heterogeneity in cohort data and highlight the general trend in expected biomarker evolution in affected individuals.
URL:
Identification of a small set of plasma signalling proteins using neural network for prediction of Alzheimer’s disease.
MOTIVATION: Alzheimer’s disease (AD) is a dementia that gets worse with time resulting in loss of memory and cognitive functions. The life expectancy of AD patients following diagnosis is ~7 years. In 2006, researchers estimated that 0.40% of the world population (range 0.17-0.89%) was afflicted by AD, and that the prevalence rate would be tripled by 2050. Usually, examination of brain tissues is required for definite diagnosis of AD. So, it is crucial to diagnose AD at an early stage via some alternative methods. As the brain controls many functions via releasing signalling proteins through blood, we analyse blood plasma proteins for diagnosis of AD. RESULTS: Here, we use a radial basis function (RBF) network for feature selection called feature selection RBF network for selection of plasma proteins that can help diagnosis of AD. We have identified a set of plasma proteins, smaller in size than previous study, with comparable prediction accuracy. We have also analysed mild cognitive impairment (MCI) samples with our selected proteins. We have used neural networks and support vector machines as classifiers. The principle component analysis, Sammmon projection and heat-map of the selected proteins have been used to demonstrate the proteins’ discriminating power for diagnosis of AD. We have also found a set of plasma signalling proteins that can distinguish incipient AD from MCI at an early stage. Literature survey strongly supports the AD diagnosis capability of the selected plasma proteins.
URL:
Translating amyloid PET of different radiotracers by a deep generative model for interchangeability.
It is challenging to compare amyloid PET images obtained with different radiotracers. Here, we introduce a new approach to improve the interchangeability of amyloid PET acquired with different radiotracers through image-level translation. Deep generative networks were developed using unpaired PET datasets, consisting of 203 [11C]PIB and 850 [18F]florbetapir brain PET images. Using 15 paired PET datasets, the standardized uptake value ratio (SUVR) values obtained from pseudo-PIB or pseudo-florbetapir PET images translated using the generative networks was compared to those obtained from the original images. The generated amyloid PET images showed similar distribution patterns with original amyloid PET of different radiotracers. The SUVR obtained from the original [18F]florbetapir PET was lower than those obtained from the original [11C]PIB PET. The translated amyloid PET images reduced the difference in SUVR. The SUVR obtained from the pseudo-PIB PET images generated from [18F]florbetapir PET showed a good agreement with those of the original PIB PET (ICC = 0.87 for global SUVR). The SUVR obtained from the pseudo-florbetapir PET also showed a good agreement with those of the original [18F]florbetapir PET (ICC = 0.85 for global SUVR). The ICC values between the original and generated PET images were higher than those between original [11C]PIB and [18F]florbetapir images (ICC = 0.65 for global SUVR). Our approach provides the image-level translation of amyloid PET images obtained using different radiotracers. It may facilitate the clinical studies designed with variable amyloid PET images due to long-term clinical follow-up as well as multicenter trials by enabling the translation of different types of amyloid PET.
URL:
Nonlinear dimensionality reduction combining MR imaging with non-imaging information.
We propose a framework for the extraction of biomarkers from low-dimensional manifolds representing inter-subject brain variation. Manifold coordinates of each image capture information about structural shape and appearance and, when a phenotype exists, about the subject’s clinical state. Our framework incorporates subject meta-information into the manifold learning step. Apart from gender and age, information such as genotype or a derived biomarker is often available in clinical studies and can inform the classification of a query subject. Such information, whether discrete or continuous, is used as an additional input to manifold learning, extending the Laplacian Eigenmap objective function and enriching a similarity measure derived from pairwise image similarities. The biomarkers identified with the proposed method are data-driven in contrast to a priori defined biomarkers derived from, e.g., manual or automated segmentations. They form a unified representation of both the imaging and non-imaging measurements, providing a natural use for data analysis and visualization. We test the method to classify subjects with Alzheimer’s Disease (AD), mild cognitive impairment (MCI) and healthy controls enrolled in the ADNI study. Non-imaging metadata used are ApoE genotype, a risk factor associated with AD, and the CSF-concentration of Abeta(1-42), an established biomarker for AD. In addition, we use hippocampal volume as a derived imaging-biomarker to enrich the learned manifold. Our classification results compare favorably to what has been reported in a recent meta-analysis using established neuroimaging methods on the same database.
URL:
Transfer learning for cognitive reserve quantification.
Cognitive reserve (CR) has been introduced to explain individual differences in susceptibility to cognitive or functional impairment in the presence of age or pathology. We developed a deep learning model to quantify the CR as residual variance in memory performance using the Structural Magnetic Resonance Imaging (sMRI) data from a lifespan healthy cohort. The generalizability of the sMRI-based deep learning model was tested in two independent healthy and Alzheimer’s cohorts using transfer learning framework. Structural MRIs were collected from three cohorts: 495 healthy adults (age: 20-80) from RANN, 620 healthy adults (age: 36-100) from lifespan Human Connectome Project Aging (HCPA), and 941 adults (age: 55-92) from Alzheimer’s Disease Neuroimaging Initiative (ADNI). Region of interest (ROI)-specific cortical thickness and volume measures were extracted using the Desikan-Killiany Atlas. CR was quantified by residuals which subtract the predicted memory from the true memory. Cascade neural network (CNN) models were used to train RANN dataset for memory prediction. Transfer learning was applied to transfer the T1 imaging-based model from source domain (RANN) to the target domains (HCPA or ADNI). The CNN model trained on the RANN dataset exhibited strong linear correlation between true and predicted memory based on the T1 cortical thickness and volume predictors. In addition, the model generated from healthy lifespan data (RANN) was able to generalize to an independent healthy lifespan data (HCPA) and older demented participants (ADNI) across different scanner types. The estimated CR was correlated with CR proxies such education and IQ across all three datasets. The current findings suggest that the transfer learning approach is an effective way to generalize the residual-based CR estimation. It is applicable to various diseases and may flexibly incorporate different imaging modalities such as fMRI and PET, making it a promising tool for scientific and clinical purposes.
URL:
Assessing clinical progression from subjective cognitive decline to mild cognitive impairment with incomplete multi-modal neuroimages.
Accurately assessing clinical progression from subjective cognitive decline (SCD) to mild cognitive impairment (MCI) is crucial for early intervention of pathological cognitive decline. Multi-modal neuroimaging data such as T1-weighted magnetic resonance imaging (MRI) and positron emission tomography (PET), help provide objective and supplementary disease biomarkers for computer-aided diagnosis of MCI. However, there are few studies dedicated to SCD progression prediction since subjects usually lack one or more imaging modalities. Besides, one usually has a limited number (e.g., tens) of SCD subjects, negatively affecting model robustness. To this end, we propose a Joint neuroimage Synthesis and Representation Learning (JSRL) framework for SCD conversion prediction using incomplete multi-modal neuroimages. The JSRL contains two components: 1) a generative adversarial network to synthesize missing images and generate multi-modal features, and 2) a classification network to fuse multi-modal features for SCD conversion prediction. The two components are incorporated into a joint learning framework by sharing the same features, encouraging effective fusion of multi-modal features for accurate prediction. A transfer learning strategy is employed in the proposed framework by leveraging model trained on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) with MRI and fluorodeoxyglucose PET from 863 subjects to both the Chinese Longitudinal Aging Study (CLAS) with only MRI from 76 SCD subjects and the Australian Imaging, Biomarkers and Lifestyle (AIBL) with MRI from 235 subjects. Experimental results suggest that the proposed JSRL yields superior performance in SCD and MCI conversion prediction and cross-database neuroimage synthesis, compared with several state-of-the-art methods.
URL:
Gaussian process classification of Alzheimer’s disease and mild cognitive impairment from resting-state fMRI.
Multivariate pattern analysis and statistical machine learning techniques are attracting increasing interest from the neuroimaging community. Researchers and clinicians are also increasingly interested in the study of functional-connectivity patterns of brains at rest and how these relations might change in conditions like Alzheimer’s disease or clinical depression. In this study we investigate the efficacy of a specific multivariate statistical machine learning technique to perform patient stratification from functional-connectivity patterns of brains at rest. Whilst the majority of previous approaches to this problem have employed support vector machines (SVMs) we investigate the performance of Bayesian Gaussian process logistic regression (GP-LR) models with linear and non-linear covariance functions. GP-LR models can be interpreted as a Bayesian probabilistic analogue to kernel SVM classifiers. However, GP-LR methods confer a number of benefits over kernel SVMs. Whilst SVMs only return a binary class label prediction, GP-LR, being a probabilistic model, provides a principled estimate of the probability of class membership. Class probability estimates are a measure of the confidence the model has in its predictions, such a confidence score may be extremely useful in the clinical setting. Additionally, if miss-classification costs are not symmetric, thresholds can be set to achieve either strong specificity or sensitivity scores. Since GP-LR models are Bayesian, computationally expensive cross-validation hyper-parameter grid-search methods can be avoided. We apply these methods to a sample of 77 subjects; 27 with a diagnosis of probable AD, 50 with a diagnosis of a-MCI and a control sample of 39. All subjects underwent a MRI examination at 3T to obtain a 7minute and 20second resting state scan. Our results support the hypothesis that GP-LR models can be effective at performing patient stratification: the implemented model achieves 75% accuracy disambiguating healthy subjects from subjects with amnesic mild cognitive impairment and 97% accuracy disambiguating amnesic mild cognitive impairment subjects from those with Alzheimer’s disease, accuracies are estimated using a held-out test set. Both results are significant at the 1% level.
URL:
Multi-modal multi-task learning for joint prediction of multiple regression and classification variables in Alzheimer’s disease.
Many machine learning and pattern classification methods have been applied to the diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). Recently, rather than predicting categorical variables as in classification, several pattern regression methods have also been used to estimate continuous clinical variables from brain images. However, most existing regression methods focus on estimating multiple clinical variables separately and thus cannot utilize the intrinsic useful correlation information among different clinical variables. On the other hand, in those regression methods, only a single modality of data (usually only the structural MRI) is often used, without considering the complementary information that can be provided by different modalities. In this paper, we propose a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi-modal data. Here, the variables include not only the clinical variables used for regression but also the categorical variable used for classification, with different tasks corresponding to prediction of different variables. Specifically, our method contains two key components, i.e., (1) a multi-task feature selection which selects the common subset of relevant features for multiple variables from each modality, and (2) a multi-modal support vector machine which fuses the above-selected features from all modalities to predict multiple (regression and classification) variables. To validate our method, we perform two sets of experiments on ADNI baseline MRI, FDG-PET, and cerebrospinal fluid (CSF) data from 45 AD patients, 91 MCI patients, and 50 healthy controls (HC). In the first set of experiments, we estimate two clinical variables such as Mini Mental State Examination (MMSE) and Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog), as well as one categorical variable (with value of ‘AD’, ‘MCI’ or ‘HC’), from the baseline MRI, FDG-PET, and CSF data. In the second set of experiments, we predict the 2-year changes of MMSE and ADAS-Cog scores and also the conversion of MCI to AD from the baseline MRI, FDG-PET, and CSF data. The results on both sets of experiments demonstrate that our proposed M3T learning scheme can achieve better performance on both regression and classification tasks than the conventional learning methods.
URL:
A robust and interpretable machine learning approach using multimodal biological data to predict future pathological tau accumulation.
The early stages of Alzheimer’s disease (AD) involve interactions between multiple pathophysiological processes. Although these processes are well studied, we still lack robust tools to predict individualised trajectories of disease progression. Here, we employ a robust and interpretable machine learning approach to combine multimodal biological data and predict future pathological tau accumulation. In particular, we use machine learning to quantify interactions between key pathological markers (beta-amyloid, medial temporal lobe atrophy, tau and APOE 4) at mildly impaired and asymptomatic stages of AD. Using baseline non-tau markers we derive a prognostic index that: (a) stratifies patients based on future pathological tau accumulation, (b) predicts individualised regional future rate of tau accumulation, and (c) translates predictions from deep phenotyping patient cohorts to cognitively normal individuals. Our results propose a robust approach for fine scale stratification and prognostication with translation impact for clinical trial design targeting the earliest stages of AD.
URL:
The relevance voxel machine (RVoxM): a self-tuning Bayesian model for informative image-based prediction.
This paper presents the relevance voxel machine (RVoxM), a dedicated Bayesian model for making predictions based on medical imaging data. In contrast to the generic machine learning algorithms that have often been used for this purpose, the method is designed to utilize a small number of spatially clustered sets of voxels that are particularly suited for clinical interpretation. RVoxM automatically tunes all its free parameters during the training phase, and offers the additional advantage of producing probabilistic prediction outcomes. We demonstrate RVoxM as a regression model by predicting age from volumetric gray matter segmentations, and as a classification model by distinguishing patients with Alzheimer’s disease from healthy controls using surface-based cortical thickness data. Our results indicate that RVoxM yields biologically meaningful models, while providing state-of-the-art predictive accuracy.
URL:
A multimodal deep learning model to infer cell-type-specific functional gene networks.
BACKGROUND: Functional gene networks (FGNs) capture functional relationships among genes that vary across tissues and cell types. Construction of cell-type-specific FGNs enables the understanding of cell-type-specific functional gene relationships and insights into genetic mechanisms of human diseases in disease-relevant cell types. However, most existing FGNs were developed without consideration of specific cell types within tissues. RESULTS: In this study, we created a multimodal deep learning model (MDLCN) to predict cell-type-specific FGNs in the human brain by integrating single-nuclei gene expression data with global protein interaction networks. We systematically evaluated the prediction performance of the MDLCN and showed its superior performance compared to two baseline models (boosting tree and convolutional neural network). Based on the predicted cell-type-specific FGNs, we observed that cell-type marker genes had a higher level of hubness than non-marker genes in their corresponding cell type. Furthermore, we showed that risk genes underlying autism and Alzheimer’s disease were more strongly connected in disease-relevant cell types, supporting the cellular context of predicted cell-type-specific FGNs. CONCLUSIONS: Our study proposes a powerful deep learning approach (MDLCN) to predict FGNs underlying a diverse set of cell types in human brain. The MDLCN model enhances prediction accuracy of cell-type-specific FGNs compared to single modality convolutional neural network (CNN) and boosting tree models, as shown by higher areas under both receiver operating characteristic (ROC) and precision-recall curves for different levels of independent test datasets. The predicted FGNs also show evidence for the cellular context and distinct topological features (i.e. higher hubness and topological score) of cell-type marker genes. Moreover, we observed stronger modularity among disease-associated risk genes in FGNs of disease-relevant cell types. For example, the strength of connectivity among autism risk genes was stronger in neurons, but risk genes underlying Alzheimer’s disease were more connected in microglia.
URL:
SMILE: systems metabolomics using interpretable learning and evolution.
BACKGROUND: Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the “black-box” nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge. RESULTS: In this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer’s disease. CONCLUSIONS: SMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics.
URL:
Multimodal deep learning for Alzheimer’s disease dementia assessment.
Worldwide, there are nearly 10 million new cases of dementia annually, of which Alzheimer’s disease (AD) is the most common. New measures are needed to improve the diagnosis of individuals with cognitive impairment due to various etiologies. Here, we report a deep learning framework that accomplishes multiple diagnostic steps in successive fashion to identify persons with normal cognition (NC), mild cognitive impairment (MCI), AD, and non-AD dementias (nADD). We demonstrate a range of models capable of accepting flexible combinations of routinely collected clinical information, including demographics, medical history, neuropsychological testing, neuroimaging, and functional assessments. We then show that these frameworks compare favorably with the diagnostic accuracy of practicing neurologists and neuroradiologists. Lastly, we apply interpretability methods in computer vision to show that disease-specific patterns detected by our models track distinct patterns of degenerative changes throughout the brain and correspond closely with the presence of neuropathological lesions on autopsy. Our work demonstrates methodologies for validating computational predictions with established standards of medical diagnosis.
URL:
Identifying sex-specific risk architectures for predicting amyloid deposition using neural networks.
In older adults without dementia, White Matter Hyperintensities (WMH) in MRI have been shown to be highly associated with cerebral amyloid deposition, measured by the Pittsburgh compound B (PiB) PET. However, the relation to age, sex, and education in explaining this association is not well understood. We use the voxel counts of regional WMH, age, one-hot encoded sex, and education to predict the regional PiB using a multilayer perceptron with only rectilinear activations using mean squared error. We then develop a novel, robust metric to understand the relevance of each input variable for prediction. Our observations indicate that sex is the most relevant predictor of PiB and that WMH is not relevant for prediction. These results indicate that there is a sex-specific risk architecture for Abeta deposition.
URL:
Predicting sporadic Alzheimer’s disease progression via inherited Alzheimer’s disease-informed machine-learning.
INTRODUCTION: Developing cross-validated multi-biomarker models for the prediction of the rate of cognitive decline in Alzheimer’s disease (AD) is a critical yet unmet clinical challenge. METHODS: We applied support vector regression to AD biomarkers derived from cerebrospinal fluid, structural magnetic resonance imaging (MRI), amyloid-PET and fluorodeoxyglucose positron-emission tomography (FDG-PET) to predict rates of cognitive decline. Prediction models were trained in autosomal-dominant Alzheimer’s disease (ADAD, n = 121) and subsequently cross-validated in sporadic prodromal AD (n = 216). The sample size needed to detect treatment effects when using model-based risk enrichment was estimated. RESULTS: A model combining all biomarker modalities and established in ADAD predicted the 4-year rate of decline in global cognition (R2 = 24%) and memory (R2 = 25%) in sporadic AD. Model-based risk-enrichment reduced the sample size required for detecting simulated intervention effects by 50%-75%. DISCUSSION: Our independently validated machine-learning model predicted cognitive decline in sporadic prodromal AD and may substantially reduce sample size needed in clinical trials in AD.
URL:
Explainable Anatomical Shape Analysis Through Deep Hierarchical Generative Models.
Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer’s disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating high-throughput analysis of normal anatomy and pathology in large-scale studies of volumetric imaging.
URL:
Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach.
INTRODUCTION: Automated computational assessment of neuropsychological tests would enable widespread, cost-effective screening for dementia. METHODS: A novel natural language processing approach is developed and validated to identify different stages of dementia based on automated transcription of digital voice recordings of subjects’ neuropsychological tests conducted by the Framingham Heart Study (n = 1084). Transcribed sentences from the test were encoded into quantitative data and several models were trained and tested using these data and the participants’ demographic characteristics. RESULTS: Average area under the curve (AUC) on the held-out test data reached 92.6%, 88.0%, and 74.4% for differentiating Normal cognition from Dementia, Normal or Mild Cognitive Impairment (MCI) from Dementia, and Normal from MCI, respectively. DISCUSSION: The proposed approach offers a fully automated identification of MCI and dementia based on a recorded neuropsychological test, providing an opportunity to develop a remote screening tool that could be adapted easily to any language.
URL:
Deep multiview learning to identify imaging-driven subtypes in mild cognitive impairment.
BACKGROUND: In Alzheimer’s Diseases (AD) research, multimodal imaging analysis can unveil complementary information from multiple imaging modalities and further our understanding of the disease. One application is to discover disease subtypes using unsupervised clustering. However, existing clustering methods are often applied to input features directly, and could suffer from the curse of dimensionality with high-dimensional multimodal data. The purpose of our study is to identify multimodal imaging-driven subtypes in Mild Cognitive Impairment (MCI) participants using a multiview learning framework based on Deep Generalized Canonical Correlation Analysis (DGCCA), to learn shared latent representation with low dimensions from 3 neuroimaging modalities. RESULTS: DGCCA applies non-linear transformation to input views using neural networks and is able to learn correlated embeddings with low dimensions that capture more variance than its linear counterpart, generalized CCA (GCCA). We designed experiments to compare DGCCA embeddings with single modality features and GCCA embeddings by generating 2 subtypes from each feature set using unsupervised clustering. In our validation studies, we found that amyloid PET imaging has the most discriminative features compared with structural MRI and FDG PET which DGCCA learns from but not GCCA. DGCCA subtypes show differential measures in 5 cognitive assessments, 6 brain volume measures, and conversion to AD patterns. In addition, DGCCA MCI subtypes confirmed AD genetic markers with strong signals that existing late MCI group did not identify. CONCLUSION: Overall, DGCCA is able to learn effective low dimensional embeddings from multimodal data by learning non-linear projections. MCI subtypes generated from DGCCA embeddings are different from existing early and late MCI groups and show most similarity with those identified by amyloid PET features. In our validation studies, DGCCA subtypes show distinct patterns in cognitive measures, brain volumes, and are able to identify AD genetic markers. These findings indicate the promise of the imaging-driven subtypes and their power in revealing disease structures beyond early and late stage MCI.
URL:
Cortical surface reconstruction via unified Reeb analysis of geometric and topological outliers in magnetic resonance images.
In this paper we present a novel system for the automated reconstruction of cortical surfaces from T1-weighted magnetic resonance images. At the core of our system is a unified Reeb analysis framework for the detection and removal of geometric and topological outliers on tissue boundaries. Using intrinsic Reeb analysis, our system can pinpoint the location of spurious branches and topological outliers, and correct them with localized filtering using information from both image intensity distributions and geometric regularity. In this system, we have also developed enhanced tissue classification with Hessian features for improved robustness to image inhomogeneity, and adaptive interpolation to achieve sub-voxel accuracy in reconstructed surfaces. By integrating these novel developments, we have a system that can automatically reconstruct cortical surfaces with improved quality and dramatically reduced computational cost as compared with the popular FreeSurfer software. In our experiments, we demonstrate on 40 simulated MR images and the MR images of 200 subjects from two databases: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and International Consortium of Brain Mapping (ICBM), the robustness of our method in large scale studies. In comparisons with FreeSurfer, we show that our system is able to generate surfaces that better represent cortical anatomy and produce thickness features with higher statistical power in population studies.
URL:
Deep residual inception encoder-decoder network for amyloid PET harmonization.
INTRODUCTION: Multiple positron emission tomography (PET) tracers are available for amyloid imaging, posing a significant challenge to consensus interpretation and quantitative analysis. We accordingly developed and validated a deep learning model as a harmonization strategy. METHOD: A Residual Inception Encoder-Decoder Neural Network was developed to harmonize images between amyloid PET image pairs made with Pittsburgh Compound-B and florbetapir tracers. The model was trained using a dataset with 92 subjects with 10-fold cross validation and its generalizability was further examined using an independent external dataset of 46 subjects. RESULTS: Significantly stronger between-tracer correlations (P < .001) were observed after harmonization for both global amyloid burden indices and voxel-wise measurements in the training cohort and the external testing cohort. DISCUSSION: We proposed and validated a novel encoder-decoder based deep model to harmonize amyloid PET imaging data from different tracers. Further investigation is ongoing to improve the model and apply to additional tracers.
URL:
A longitudinal model for functional connectivity networks using resting-state fMRI.
Many neuroimaging studies collect functional magnetic resonance imaging (fMRI) data in a longitudinal manner. However, the current fMRI literature lacks a general framework for analyzing functional connectivity (FC) networks in fMRI data obtained from a longitudinal study. In this work, we build a novel longitudinal FC model using a variance components approach. First, for all subjects’ visits, we account for the autocorrelation inherent in the fMRI time series data using a non-parametric technique. Second, we use a generalized least squares approach to estimate 1) the within-subject variance component shared across the population, 2) the baseline FC strength, and 3) the FC’s longitudinal trend. Our novel method for longitudinal FC networks seeks to account for the within-subject dependence across multiple visits, the variability due to the subjects being sampled from a population, and the autocorrelation present in fMRI time series data, while restricting the number of parameters in order to make the method computationally feasible and stable. We develop a permutation testing procedure to draw valid inference on group differences in the baseline FC network and change in FC over longitudinal time between a set of patients and a comparable set of controls. To examine performance, we run a series of simulations and apply the model to longitudinal fMRI data collected from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Overall, we found no difference in the global FC network between Alzheimer’s disease patients and healthy controls, but did find differing local aging patterns in the FC between the left hippocampus and the posterior cingulate cortex.
URL:
Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes.
MOTIVATION: Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. RESULTS: We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer’s disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. AVAILABILITY: The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
URL: http://www2.imperial.ac.uk/~gmontana.
A network-driven approach for genome-wide association mapping.
MOTIVATION: It remains a challenge to detect associations between genotypes and phenotypes because of insufficient sample sizes and complex underlying mechanisms involved in associations. Fortunately, it is becoming more feasible to obtain gene expression data in addition to genotypes and phenotypes, giving us new opportunities to detect true genotype-phenotype associations while unveiling their association mechanisms. RESULTS: In this article, we propose a novel method, NETAM, that accurately detects associations between SNPs and phenotypes, as well as gene traits involved in such associations. We take a network-driven approach: NETAM first constructs an association network, where nodes represent SNPs, gene traits or phenotypes, and edges represent the strength of association between two nodes. NETAM assigns a score to each path from an SNP to a phenotype, and then identifies significant paths based on the scores. In our simulation study, we show that NETAM finds significantly more phenotype-associated SNPs than traditional genotype-phenotype association analysis under false positive control, taking advantage of gene expression data. Furthermore, we applied NETAM on late-onset Alzheimer’s disease data and identified 477 significant path associations, among which we analyzed paths related to beta-amyloid, estrogen, and nicotine pathways. We also provide hypothetical biological pathways to explain our findings. AVAILABILITY AND IMPLEMENTATION: Software is available at http://www.sailing.cs.cmu.edu/ CONTACT: : epxing@cs.cmu.edu.
URL: http://www.sailing.cs.cmu.edu/
Bayesian GWAS with Structured and Non-Local Priors.
MOTIVATION: The flexibility of a Bayesian framework is promising for GWAS, but current approaches can benefit from more informative prior models. We introduce a novel Bayesian approach to GWAS, called Structured and Non-Local Priors (SNLPs) GWAS, that improves over existing methods in two important ways. First, we describe a model that allows for a marker’s gene-parent membership and other characteristics to influence its probability of association with an outcome. Second, we describe a non-local alternative model for differential minor allele rates at each marker, in which the null and alternative hypotheses have no common support. RESULTS: We employ a non-parametric model that allows for clustering of the genes in tandem with a regression model for marker-level covariates, and demonstrate how incorporating these additional characteristics can improve power. We further demonstrate that our non-local alternative model gives symmetric rates of convergence for the null and alternative hypotheses, whereas commonly used local alternative models have asymptotic rates that favor the alternative hypothesis over the null. We demonstrate the robustness and flexibility of our structured and non-local model for different data generating scenarios and signal-to-noise ratios. We apply our Bayesian GWAS method to single nucleotide polymorphisms data collected from a pool of Alzheimer’s disease and cognitively normal patients from the Alzheimer’s Database Neuroimaging Initiative. AVAILABILITY AND IMPLEMENTATION: R code to perform the SNLPs method is available at https://github.com/lockEF/BayesianScreening.
URL: https://github.com/lockEF/BayesianScreening.
BEATRICE: Bayesian Fine-mapping from Summary Data using Deep Variational Inference.
MOTIVATION: We introduce a novel framework BEATRICE to identify putative causal variants from GWAS statistics. Identifying causal variants is challenging due to their sparsity and high correlation in the nearby regions. To account for these challenges, we rely on a hierarchical Bayesian model that imposes a binary concrete prior on the set of causal variants. We derive a variational algorithm for this fine-mapping problem by minimizing the KL divergence between an approximate density and the posterior probability distribution of the causal configurations. Correspondingly, we use a deep neural network as an inference machine to estimate the parameters of our proposal distribution. Our stochastic optimization procedure allows us to sample from the space of causal configurations, which we use to compute the posterior inclusion probabilities and determine credible sets for each causal variant. We conduct a detailed simulation study to quantify the performance of our framework against two state-of-the-art baseline methods across different numbers of causal variants and noise paradigms, as defined by the relative genetic contributions of causal and non-causal variants. RESULTS: We demonstrate that BEATRICE achieves uniformly better coverage with comparable power and set sizes, and that the performance gain increases with the number of causal variants. We also show the efficacy BEATRICE in finding causal variants from the GWAS study of Alzheimer’s disease. In comparison to the baselines, only BEATRICE can successfully find the APOE epsilon2 allele, a commonly associated variant of Alzheimer’s. AVAILABILITY: BEATRICE is available for download at https://github.com/sayangsep/Beatrice-Finemapping. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/sayangsep/Beatrice-Finemapping.
Explainable deep neural networks for predicting sample phenotypes from single-cell transcriptomics.
Recent advances in single-cell RNA-Sequencing (scRNA-Seq) technologies have revolutionized our ability to gather molecular insights into different phenotypes at the level of individual cells. The analysis of the resulting data poses significant challenges, and proper statistical methods are required to analyze and extract information from scRNA-Seq datasets. Sample classification based on gene expression data has proven effective and valuable for precision medicine applications. However, standard classification schemas are often not suitable for scRNA-Seq due to their unique characteristics, and new algorithms are required to effectively analyze and classify samples at the single-cell level. Furthermore, existing methods for this purpose have limitations in their usability. Those reasons motivated us to develop singleDeep, an end-to-end pipeline that streamlines the analysis of scRNA-Seq data training deep neural networks, enabling robust prediction and characterization of sample phenotypes. We used singleDeep to make predictions on scRNA-Seq datasets from different conditions, including systemic lupus erythematosus, Alzheimer’s disease and coronavirus disease 2019. Our results demonstrate strong diagnostic performance, validated both internally and externally. Moreover, singleDeep outperformed traditional machine learning methods and alternative single-cell approaches. In addition to prediction accuracy, singleDeep provides valuable insights into cell types and gene importance estimation for phenotypic characterization. This functionality provided additional and valuable information in our use cases. For instance, we corroborated that some interferon signature genes are consistently relevant for autoimmunity across all immune cell types in lupus. On the other hand, we discovered that genes linked to dementia have relevant roles in specific brain cell populations, such as APOE in astrocytes.
URL:
FISH Amyloid - a new method for finding amyloidogenic segments in proteins based on site specific co-occurrence of aminoacids.
BACKGROUND: Amyloids are proteins capable of forming fibrils whose intramolecular contact sites assume densely packed zipper pattern. Their oligomers can underlie serious diseases, e.g. Alzheimer’s and Parkinson’s diseases. Recent studies show that short segments of aminoacids can be responsible for amyloidogenic properties of a protein. A few hundreds of such peptides have been experimentally found but experimental testing of all candidates is currently not feasible. Here we propose an original machine learning method for classification of aminoacid sequences, based on discovering a segment with a discriminative pattern of site-specific co-occurrences between sequence elements. The pattern is based on the positions of residues with correlated occurrence over a sliding window of a specified length. The algorithm first recognizes the most relevant training segment in each positive training instance. Then the classification is based on maximal distances between co-occurrence matrix of the relevant segments in positive training sequences and the matrix from negative training segments. The method was applied for studying sequences of aminoacids with regard to their amyloidogenic properties. RESULTS: Our method was first trained on available datasets of hexapeptides with the amyloidogenic classification, using 5 or 6-residue sliding windows. Depending on the choice of training and testing datasets, the area under ROC curve obtained the value up to 0.80 for experimental, and 0.95 for computationally generated (with 3D profile method) datasets. Importantly, the results on 5-residue segments were not significantly worse, although the classification required that algorithm first recognized the most relevant training segments. The dataset of long sequences, such as sup35 prion and a few other amyloid proteins, were applied to test the method and gave encouraging results. Our web tool FISH Amyloid was trained on all available experimental data 4-10 residues long, offers prediction of amyloidogenic segments in protein sequences. CONCLUSIONS: We proposed a new original classification method which recognizes co-occurrence patterns in sequences. The method reveals characteristic classification pattern of the data and finds the segments where its scoring is the strongest, also in long training sequences. Applied to the problem of amyloidogenic segments recognition, it showed a good potential for classification problems in bioinformatics.
URL:
DeepPerVar: a multi-modal deep learning framework for functional interpretation of genetic variants in personal genome.
MOTIVATION: Understanding the functional consequence of genetic variants, especially the non-coding ones, is important but particularly challenging. Genome-wide association studies (GWAS) or quantitative trait locus analyses may be subject to limited statistical power and linkage disequilibrium, and thus are less optimal to pinpoint the causal variants. Moreover, most existing machine-learning approaches, which exploit the functional annotations to interpret and prioritize putative causal variants, cannot accommodate the heterogeneity of personal genetic variations and traits in a population study, targeting a specific disease. RESULTS: By leveraging paired whole-genome sequencing data and epigenetic functional assays in a population study, we propose a multi-modal deep learning framework to predict genome-wide quantitative epigenetic signals by considering both personal genetic variations and traits. The proposed approach can further evaluate the functional consequence of non-coding variants on an individual level by quantifying the allelic difference of predicted epigenetic signals. By applying the approach to the ROSMAP cohort studying Alzheimer’s disease (AD), we demonstrate that the proposed approach can accurately predict quantitative genome-wide epigenetic signals and in key genomic regions of AD causal genes, learn canonical motifs reported to regulate gene expression of AD causal genes, improve the partitioning heritability analysis and prioritize putative causal variants in a GWAS risk locus. Finally, we release the proposed deep learning model as a stand-alone Python toolkit and a web server. AVAILABILITY AND IMPLEMENTATION: https://github.com/lichen-lab/DeepPerVar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/lichen-lab/DeepPerVar.
Graph Transformer Geometric Learning of Brain Networks Using Multimodal MR Images for Brain Age Estimation.
Brain age is considered as an important biomarker for detecting aging-related diseases such as Alzheimer’s Disease (AD). Magnetic resonance imaging (MRI) have been widely investigated with deep neural networks for brain age estimation. However, most existing methods cannot make full use of multimodal MRIs due to the difference in data structure. In this paper, we propose a graph transformer geometric learning framework to model the multimodal brain network constructed by structural MRI (sMRI) and diffusion tensor imaging (DTI) for brain age estimation. First, we build a two-stream convolutional autoencoder to learn the latent representations for each imaging modality. The brain template with prior knowledge is utilized to calculate the features from the regions of interest (ROIs). Then, a multi-level construction of the brain network is proposed to establish the hybrid ROI connections in space, feature and modality. Next, a graph transformer network is proposed to model the cross-modal interaction and fusion by geometric learning for brain age estimation. Finally, the difference between the estimated age and the chronological age is used as an important biomarker for AD diagnosis. Our method is evaluated with the sMRI and DTI data from UK Biobank and Alzheimer’s Disease Neuroimaging Initiative database. Experimental results demonstrate that our method has achieved promising performances for brain age estimation and AD diagnosis.
URL:
STEPS: Similarity and Truth Estimation for Propagated Segmentations and its application to hippocampal segmentation and brain parcelation.
Anatomical segmentation of structures of interest is critical to quantitative analysis in medical imaging. Several automated multi-atlas based segmentation propagation methods that utilise manual delineations from multiple templates appear promising. However, high levels of accuracy and reliability are needed for use in diagnosis or in clinical trials. We propose a new local ranking strategy for template selection based on the locally normalised cross correlation (LNCC) and an extension to the classical STAPLE algorithm by Warfield et al. (2004), which we refer to as STEPS for Similarity and Truth Estimation for Propagated Segmentations. It addresses the well-known problems of local vs. global image matching and the bias introduced in the performance estimation due to structure size. We assessed the method on hippocampal segmentation using a leave-one-out cross validation with optimised model parameters; STEPS achieved a mean Dice score of 0.925 when compared with manual segmentation. This was significantly better in terms of segmentation accuracy when compared to other state-of-the-art fusion techniques. Furthermore, due to the finer anatomical scale, STEPS also obtains more accurate segmentations even when using only a third of the templates, reducing the dependence on large template databases. Using a subset of Alzheimer’s Disease Neuroimaging Initiative (ADNI) scans from different MRI imaging systems and protocols, STEPS yielded similarly accurate segmentations (Dice=0.903). A cross-sectional and longitudinal hippocampal volumetric study was performed on the ADNI database. Mean+-SD hippocampal volume (mm(3)) was 5195 +- 656 for controls; 4786 +- 781 for MCI; and 4427 +- 903 for Alzheimer’s disease patients and hippocampal atrophy rates (%/year) of 1.09 +- 3.0, 2.74 +- 3.5 and 4.04 +- 3.6 respectively. Statistically significant (p<10(-3)) differences were found between disease groups for both hippocampal volume and volume change rates. Finally, STEPS was also applied in a multi-label segmentation propagation scenario using a leave-one-out cross validation, in order to parcellate 83 separate structures of the brain. Comparisons of STEPS with state-of-the-art multi-label fusion algorithms showed statistically significant segmentation accuracy improvements (p<10(-4)) in several key structures.
URL:
Anat-SFSeg: Anatomically-guided superficial fiber segmentation with point-cloud deep learning.
Diffusion magnetic resonance imaging (dMRI) tractography is a critical technique to map the brain’s structural connectivity. Accurate segmentation of white matter, particularly the superficial white matter (SWM), is essential for neuroscience and clinical research. However, it is challenging to segment SWM due to the short adjacent gyri connection in a U-shaped pattern. In this work, we propose an Anatomically-guided Superficial Fiber Segmentation (Anat-SFSeg) framework to improve the performance on SWM segmentation. The framework consists of a unique fiber anatomical descriptor (named FiberAnatMap) and a deep learning network based on point-cloud data. The spatial coordinates of fibers represented as point clouds, as well as the anatomical features at both the individual and group levels, are fed into a neural network. The network is trained on Human Connectome Project (HCP) datasets and tested on the subjects with a range of cognitive impairment levels. One new metric named fiber anatomical region proportion (FARP), quantifies the ratio of fibers in the defined brain regions and enables the comparison with other methods. Another metric named anatomical region fiber count (ARFC), represents the average fiber number in each cluster for the assessment of inter-subject differences. The experimental results demonstrate that Anat-SFSeg achieves the highest accuracy on HCP datasets and exhibits great generalization on clinical datasets. Diffusion tensor metrics and ARFC show disorder severity associated alterations in patients with Alzheimer’s disease (AD) and mild cognitive impairments (MCI). Correlations with cognitive grades show that these metrics are potential neuroimaging biomarkers for AD. Furthermore, Anat-SFSeg could be utilized to explore other neurodegenerative, neurodevelopmental or psychiatric disorders.
URL:
Machine learning identifies candidates for drug repurposing in Alzheimer’s disease.
Clinical trials of novel therapeutics for Alzheimer’s Disease (AD) have consumed a large amount of time and resources with largely negative results. Repurposing drugs already approved by the Food and Drug Administration (FDA) for another indication is a more rapid and less expensive option. We present DRIAD (Drug Repurposing In AD), a machine learning framework that quantifies potential associations between the pathology of AD severity (the Braak stage) and molecular mechanisms as encoded in lists of gene names. DRIAD is applied to lists of genes arising from perturbations in differentiated human neural cell cultures by 80 FDA-approved and clinically tested drugs, producing a ranked list of possible repurposing candidates. Top-scoring drugs are inspected for common trends among their targets. We propose that the DRIAD method can be used to nominate drugs that, after additional validation and identification of relevant pharmacodynamic biomarker(s), could be readily evaluated in a clinical trial.
URL:
Using high-dimensional machine learning methods to estimate an anatomical risk factor for Alzheimer’s disease across imaging databases.
INTRODUCTION: The main goal of this work is to investigate the feasibility of estimating an anatomical index that can be used as an Alzheimer’s disease (AD) risk factor in the Women’s Health Initiative Magnetic Resonance Imaging Study (WHIMS-MRI) using MRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), a well-characterized imaging database of AD patients and cognitively normal subjects. We called this index AD Pattern Similarity (AD-PS) scores. To demonstrate the construct validity of the scores, we investigated their associations with several AD risk factors. The ADNI and WHIMS imaging databases were collected with different goals, populations and data acquisition protocols: it is important to demonstrate that the approach to estimating AD-PS scores can bridge these differences. METHODS: MRI data from both studies were processed using high-dimensional warping methods. High-dimensional classifiers were then estimated using the ADNI MRI data. Next, the classifiers were applied to baseline and follow-up WHIMS-MRI GM data to generate the GM AD-PS scores. To study the validity of the scores we investigated associations between GM AD-PS scores at baseline (Scan 1) and their longitudinal changes (Scan 2 -Scan 1) with: 1) age, cognitive scores, white matter small vessel ischemic disease (WM SVID) volume at baseline and 2) age, cognitive scores, WM SVID volume longitudinal changes respectively. In addition, we investigated their associations with time until classification of independently adjudicated status in WHIMS-MRI. RESULTS: Higher GM AD-PS scores from WHIMS-MRI baseline data were associated with older age, lower cognitive scores, and higher WM SVID volume. Longitudinal changes in GM AD-PS scores (Scan 2 - Scan 1) were also associated with age and changes in WM SVID volumes and cognitive test scores. Increases in the GM AD-PS scores predicted decreases in cognitive scores and increases in WM SVID volume. GM AD-PS scores and their longitudinal changes also were associated with time until classification of cognitive impairment. Finally, receiver operating characteristic curves showed that baseline GM AD-PS scores of cognitively normal participants carried information about future cognitive status determined during follow-up. DISCUSSION: We applied a high-dimensional machine learning approach to estimate a novel AD risk factor for WHIMS-MRI study participants using ADNI data. The GM AD-PS scores showed strong associations with incident cognitive impairment and cross-sectional and longitudinal associations with age, cognitive function, cognitive status and WM SVID volume lending support to the ongoing validation of the GM AD-PS score.
URL:
High-dimensional generalized median adaptive lasso with application to omics data.
Recently, there has been a growing interest in variable selection for causal inference within the context of high-dimensional data. However, when the outcome exhibits a skewed distribution, ensuring the accuracy of variable selection and causal effect estimation might be challenging. Here, we introduce the generalized median adaptive lasso (GMAL) for covariate selection to achieve an accurate estimation of causal effect even when the outcome follows skewed distributions. A distinctive feature of our proposed method is that we utilize a linear median regression model for constructing penalty weights, thereby maintaining the accuracy of variable selection and causal effect estimation even when the outcome presents extremely skewed distributions. Simulation results showed that our proposed method performs comparably to existing methods in variable selection when the outcome follows a symmetric distribution. Besides, the proposed method exhibited obvious superiority over the existing methods when the outcome follows a skewed distribution. Meanwhile, our proposed method consistently outperformed the existing methods in causal estimation, as indicated by smaller root-mean-square error. We also utilized the GMAL method on a deoxyribonucleic acid methylation dataset from the Alzheimer’s disease (AD) neuroimaging initiative database to investigate the association between cerebrospinal fluid tau protein levels and the severity of AD.
URL:
HIPS: A new hippocampus subfield segmentation method.
The importance of the hippocampus in the study of several neurodegenerative diseases such as Alzheimer’s disease makes it a structure of great interest in neuroimaging. However, few segmentation methods have been proposed to measure its subfields due to its complex structure and the lack of high resolution magnetic resonance (MR) data. In this work, we present a new pipeline for automatic hippocampus subfield segmentation using two available hippocampus subfield delineation protocols that can work with both high and standard resolution data. The proposed method is based on multi-atlas label fusion technology that benefits from a novel multi-contrast patch match search process (using high resolution T1-weighted and T2-weighted images). The proposed method also includes as post-processing a new neural network-based error correction step to minimize systematic segmentation errors. The method has been evaluated on both high and standard resolution images and compared to other state-of-the-art methods showing better results in terms of accuracy and execution time.
URL:
CReg-KD: Model refinement via confidence regularized knowledge distillation for brain imaging.
One of the core challenges of deep learning in medical image analysis is data insufficiency, especially for 3D brain imaging, which may lead to model over-fitting and poor generalization. Regularization strategies such as knowledge distillation are powerful tools to mitigate the issue by penalizing predictive distributions and introducing additional knowledge to reinforce the training process. In this paper, we revisit knowledge distillation as a regularization paradigm by penalizing attentive output distributions and intermediate representations. In particular, we propose a Confidence Regularized Knowledge Distillation (CReg-KD) framework, which adaptively transfers knowledge for distillation in light of knowledge confidence. Two strategies are advocated to regularize the global and local dependencies between teacher and student knowledge. In detail, a gated distillation mechanism is proposed to soften the transferred knowledge globally by utilizing the teacher loss as a confidence score. Moreover, the intermediate representations are attentively and locally refined with key semantic context to mimic meaningful features. To demonstrate the superiority of our proposed framework, we evaluated the framework on two brain imaging analysis tasks (i.e. Alzheimer’s Disease classification and brain age estimation based on T1-weighted MRI) on the Alzheimer’s Disease Neuroimaging Initiative dataset including 902 subjects and a cohort of 3655 subjects from 4 public datasets. Extensive experimental results show that CReg-KD achieves consistent improvements over the baseline teacher model and outperforms other state-of-the-art knowledge distillation approaches, manifesting that CReg-KD as a powerful medical image analysis tool in terms of both promising prediction performance and generalizability.
URL:
Brain age prediction via cross-stratified ensemble learning.
As an important biomarker of neural aging, the brain age reflects the integrity and health of the human brain. Accurate prediction of brain age could help to understand the underlying mechanism of neural aging. In this study, a cross-stratified ensemble learning algorithm with staking strategy was proposed to obtain brain age and the derived predicted age difference (PAD) using T1-weighted magnetic resonance imaging (MRI) data. The approach was characterized as by implementing two modules: one was three base learners of 3D-DenseNet, 3D-ResNeXt, 3D-Inception-v4; another was 14 secondary learners of liner regressions. To evaluate performance, our method was compared with single base learners, regular ensemble learning algorithms, and state-of-the-art (SOTA) methods. The results demonstrated that our proposed model outperformed others models, with three metrics of mean absolute error (MAE), root mean-squared error (RMSE), and coefficient of determination (R2) of 2.9405 years, 3.9458 years, and 0.9597, respectively. Furthermore, there existed significant differences in PAD among the three groups of normal control (NC), mild cognitive impairment (MCI) and Alzheimer’s disease (AD), with an increased trend across NC, MCI, and AD. It was concluded that the proposed algorithm could be effectively used in computing brain aging and PAD, and offering potential for early diagnosis and assessment of normal brain aging and AD.
URL:
PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies.
Despite the abundance of genotype-phenotype association studies, the resulting association outcomes often lack robustness and interpretations. To address these challenges, we introduce PheSeq, a Bayesian deep learning model that enhances and interprets association studies through the integration and perception of phenotype descriptions. By implementing the PheSeq model in three case studies on Alzheimer’s disease, breast cancer, and lung cancer, we identify 1024 priority genes for Alzheimer’s disease and 818 and 566 genes for breast cancer and lung cancer, respectively. Benefiting from data fusion, these findings represent moderate positive rates, high recall rates, and interpretation in gene-disease association studies.
URL:
Deep-gated recurrent unit and diet network-based genome-wide association analysis for detecting the biomarkers of Alzheimer’s disease.
Genome-wide association analysis (GWAS) is a commonly used method to detect the potential biomarkers of Alzheimer’s disease (AD). Most existing GWAS methods entail a high computational cost, disregard correlations among imaging data and correlations among genetic data, and ignore various associations between longitudinal imaging and genetic data. A novel GWAS method was proposed to identify potential AD biomarkers and address these problems. A network based on a gated recurrent unit was applied without imputing incomplete longitudinal imaging data to integrate the longitudinal data of variable lengths and extract an image representation. In this study, a modified diet network that can considerably reduce the number of parameters in the genetic network was proposed to perform GWAS between image representation and genetic data. Genetic representation can be extracted in this way. A link between genetic representation and AD was established to detect potential AD biomarkers. The proposed method was tested on a set of simulated data and a real AD dataset. Results of the simulated data showed that the proposed method can accurately detect relevant biomarkers. Moreover, the results of real AD dataset showed that the proposed method can detect some new risk-related genes of AD. Based on previous reports, no research has incorporated a deep-learning model into a GWAS framework to investigate the potential information on super-high-dimensional genetic data and longitudinal imaging data and create a link between imaging genetics and AD for detecting potential AD biomarkers. Therefore, the proposed method may provide new insights into the underlying pathological mechanism of AD.
URL:
Estimating long-term multivariate progression from short-term data.
MOTIVATION: Diseases that progress slowly are often studied by observing cohorts at different stages of disease for short periods of time. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) follows elders with various degrees of cognitive impairment, from normal to impaired. The study includes a rich panel of novel cognitive tests, biomarkers, and brain images collected every 6 months for as long as 6 years. The relative timing of the observations with respect to disease pathology is unknown. We propose a general semiparametric model and iterative estimation procedure to estimate simultaneously the pathological timing and long-term growth curves. The resulting estimates of long-term progression are fine-tuned using cognitive trajectories derived from the long-term “Personnes Agees Quid” study. RESULTS: We demonstrate with simulations that the method can recover long-term disease trends from short-term observations. The method also estimates temporal ordering of individuals with respect to disease pathology, providing subject-specific prognostic estimates of the time until onset of symptoms. When the method is applied to ADNI data, the estimated growth curves are in general agreement with prevailing theories of the Alzheimer’s disease cascade. Other data sets with common outcome measures can be combined using the proposed algorithm. AVAILABILITY: Software to fit the model and reproduce results with the statistical software R is available as the grace package. ADNI data can be downloaded from the Laboratory of NeuroImaging.
URL:
Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets.
BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer’s disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
URL: https://github.com/HaohanWang/CMM
Modeling autosomal dominant Alzheimer’s disease with machine learning.
INTRODUCTION: Machine learning models were used to discover novel disease trajectories for autosomal dominant Alzheimer’s disease. METHODS: Longitudinal structural magnetic resonance imaging, amyloid positron emission tomography (PET), and fluorodeoxyglucose PET were acquired in 131 mutation carriers and 74 non-carriers from the Dominantly Inherited Alzheimer Network; the groups were matched for age, education, sex, and apolipoprotein epsilon4 (APOE epsilon4). A deep neural network was trained to predict disease progression for each modality. Relief algorithms identified the strongest predictors of mutation status. RESULTS: The Relief algorithm identified the caudate, cingulate, and precuneus as the strongest predictors among all modalities. The model yielded accurate results for predicting future Pittsburgh compound B (R2 = 0.95), fluorodeoxyglucose (R2 = 0.93), and atrophy (R2 = 0.95) in mutation carriers compared to non-carriers. DISCUSSION: Results suggest a sigmoidal trajectory for amyloid, a biphasic response for metabolism, and a gradual decrease in volume, with disease progression primarily in subcortical, middle frontal, and posterior parietal regions.
URL:
AD risk score for the early phases of disease based on unsupervised machine learning.
INTRODUCTION: Identifying cognitively normal individuals at high risk for progression to symptomatic Alzheimer’s disease (AD) is critical for early intervention. METHODS: An AD risk score was derived using unsupervised machine learning. The score was developed using data from 226 cognitively normal individuals and included cerebrospinal fluid, magnetic resonance imaging, and cognitive measures, and validated in an independent cohort. RESULTS: Higher baseline AD progression risk scores (hazard ratio = 2.70, P < 0.001) were associated with greater risks of progression to clinical symptoms of mild cognitive impairment (MCI). Baseline scores had an area under the curve of 0.83 (95% confidence interval: 0.75 to 0.91) for identifying subjects who progressed to MCI/dementia within 5 years. The validation procedure, using data from the Alzheimer’s Disease Neuroimaging Initiative, demonstrated accuracy of prediction across the AD spectrum. DISCUSSION: The derived risk score provides high predictive accuracy for identifying which individuals with normal cognition are likely to show clinical decline due to AD within 5 years.
URL:
Label-free hyperspectral imaging and deep-learning prediction of retinal amyloid beta-protein and phosphorylated tau.
Alzheimer’s disease (AD) is a major risk for the aging population. The pathological hallmarks of AD-an abnormal deposition of amyloid beta-protein (Abeta) and phosphorylated tau (pTau)-have been demonstrated in the retinas of AD patients, including in prodromal patients with mild cognitive impairment (MCI). Abeta pathology, especially the accumulation of the amyloidogenic 42-residue long alloform (Abeta42), is considered an early and specific sign of AD, and together with tauopathy, confirms AD diagnosis. To visualize retinal Abeta and pTau, state-of-the-art methods use fluorescence. However, administering contrast agents complicates the imaging procedure. To address this problem from fundamentals, ex-vivo studies were performed to develop a label-free hyperspectral imaging method to detect the spectral signatures of Abeta42 and pS396-Tau, and predicted their abundance in retinal cross-sections. For the first time, we reported the spectral signature of pTau and demonstrated an accurate prediction of Abeta and pTau distribution powered by deep learning. We expect our finding will lay the groundwork for label-free detection of AD.
URL:
msQSM: Morphology-based self-supervised deep learning for quantitative susceptibility mapping.
Quantitative susceptibility mapping (QSM) has been applied to the measurement of iron deposition and the auxiliary diagnosis of neurodegenerative disease. There still exists a dipole inversion problem in QSM reconstruction. Recently, deep learning approaches have been proposed to resolve this problem. However, most of these approaches are supervised methods that need pairs of the input phase and ground-truth. It remains a challenge to train a model for all resolutions without using the ground-truth and only using one resolution data. To address this, we proposed a self-supervised QSM deep learning method based on morphology. It consists of a morphological QSM builder to decouple the dependency of the QSM on acquisition resolution, and a morphological loss to reduce artifacts effectively and save training time efficiently. The proposed method can reconstruct arbitrary resolution QSM on both human data and animal data, regardless of whether the resolution is higher or lower than that of the training set. Our method outperforms the previous best unsupervised method with a 3.6% higher peak signal-to-noise ratio, 16.2% lower normalized root mean square error, and 22.1% lower high-frequency error norm. The morphological loss reduces training time by 22.1% with respect to the cycle gradient loss used in the previous unsupervised methods. Experimental results show that the proposed method accurately measures QSM with arbitrary resolutions, and achieves state-of-the-art results among unsupervised deep learning methods. Research on applications in neurodegenerative diseases found that our method is robust enough to measure significant increase in striatal magnetic susceptibility in patients during Alzheimer’s disease progression, as well as significant increase in substantia nigra susceptibility in Parkinson’s disease patients, and can be used as an auxiliary differential diagnosis tool for Alzheimer’s disease and Parkinson’s disease.
URL:
Deep learning predicts DNA methylation regulatory variants in specific brain cell types and enhances fine mapping for brain disorders.
DNA methylation (DNAm) is essential for brain development and function and potentially mediates the effects of genetic risk variants underlying brain disorders. We present INTERACT, a transformer-based deep learning model to predict regulatory variants affecting DNAm levels in specific brain cell types, leveraging existing single-nucleus DNAm data from the human brain. We show that INTERACT accurately predicts cell type-specific DNAm profiles, achieving an average area under the receiver operating characteristic curve of 0.99 across cell types. Furthermore, INTERACT predicts cell type-specific DNAm regulatory variants, which reflect cellular context and enrich the heritability of brain-related traits in relevant cell types. We demonstrate that incorporating predicted variant effects and DNAm levels of CpG sites enhances the fine mapping for three brain disorders-schizophrenia, depression, and Alzheimer’s disease-and facilitates mapping causal genes to particular cell types. Our study highlights the power of deep learning in identifying cell type-specific regulatory variants, which will enhance our understanding of the genetics of complex traits.
URL:
AI-based differential diagnosis of dementia etiologies on multimodal data.
Differential diagnosis of dementia remains a challenge in neurology due to symptom overlap across etiologies, yet it is crucial for formulating early, personalized management strategies. Here, we present an artificial intelligence (AI) model that harnesses a broad array of data, including demographics, individual and family medical history, medication use, neuropsychological assessments, functional evaluations and multimodal neuroimaging, to identify the etiologies contributing to dementia in individuals. The study, drawing on 51,269 participants across 9 independent, geographically diverse datasets, facilitated the identification of 10 distinct dementia etiologies. It aligns diagnoses with similar management strategies, ensuring robust predictions even with incomplete data. Our model achieved a microaveraged area under the receiver operating characteristic curve (AUROC) of 0.94 in classifying individuals with normal cognition, mild cognitive impairment and dementia. Also, the microaveraged AUROC was 0.96 in differentiating the dementia etiologies. Our model demonstrated proficiency in addressing mixed dementia cases, with a mean AUROC of 0.78 for two co-occurring pathologies. In a randomly selected subset of 100 cases, the AUROC of neurologist assessments augmented by our AI model exceeded neurologist-only evaluations by 26.25%. Furthermore, our model predictions aligned with biomarker evidence and its associations with different proteinopathies were substantiated through postmortem findings. Our framework has the potential to be integrated as a screening tool for dementia in clinical settings and drug trials. Further prospective studies are needed to confirm its ability to improve patient care.
URL:
A 3D convolutional neural network to classify subjects as Alzheimer’s disease, frontotemporal dementia or healthy controls using brain 18F-FDG PET.
With the arrival of disease-modifying drugs, neurodegenerative diseases will require an accurate diagnosis for optimal treatment. Convolutional neural networks are powerful deep learning techniques that can provide great help to physicians in image analysis. The purpose of this study is to introduce and validate a 3D neural network for classification of Alzheimer’s disease (AD), frontotemporal dementia (FTD) or cognitively normal (CN) subjects based on brain glucose metabolism. Retrospective [18F]-FDG-PET scans of 199 CE, 192 FTD and 200 CN subjects were collected from our local database, Alzheimer’s disease and frontotemporal lobar degeneration neuroimaging initiatives. Training and test sets were created using randomization on a 90 %-10 % basis, and training of a 3D VGG16-like neural network was performed using data augmentation and cross-validation. Performance was compared to clinical interpretation by three specialists in the independent test set. Regions determining classification were identified in an occlusion experiment and Gradient-weighted Class Activation Mapping. Test set subjects were age- and sex-matched across categories. The model achieved an overall 89.8 % accuracy in predicting the class of test scans. Areas under the ROC curves were 93.3 % for AD, 95.3 % for FTD, and 99.9 % for CN. The physicians’ consensus showed a 69.5 % accuracy, and there was substantial agreement between them (kappa = 0.61, 95 % CI: 0.49-0.73). To our knowledge, this is the first study to introduce a deep learning model able to discriminate AD and FTD based on [18F]-FDG PET scans, and to isolate CN subjects with excellent accuracy. These initial results are promising and hint at the potential for generalization to data from other centers.
URL:
IGUANe: A 3D generalizable CycleGAN for multicenter harmonization of brain MR images.
In MRI studies, the aggregation of imaging data from multiple acquisition sites enhances sample size but may introduce site-related variabilities that hinder consistency in subsequent analyses. Deep learning methods for image translation have emerged as a solution for harmonizing MR images across sites. In this study, we introduce IGUANe (Image Generation with Unified Adversarial Networks), an original 3D model that leverages the strengths of domain translation and straightforward application of style transfer methods for multicenter brain MR image harmonization. IGUANe extends CycleGAN by integrating an arbitrary number of domains for training through a many-to-one architecture. The framework based on domain pairs enables the implementation of sampling strategies that prevent confusion between site-related and biological variabilities. During inference, the model can be applied to any image, even from an unknown acquisition site, making it a universal generator for harmonization. Trained on a dataset comprising T1-weighted images from 11 different scanners, IGUANe was evaluated on data from unseen sites. The assessments included the transformation of MR images with traveling subjects, the preservation of pairwise distances between MR images within domains, the evolution of volumetric patterns related to age and Alzheimer’s disease (AD), and the performance in age regression and patient classification tasks. Comparisons with other harmonization and normalization methods suggest that IGUANe better preserves individual information in MR images and is more suitable for maintaining and reinforcing variabilities related to age and AD. Future studies may further assess IGUANe in other multicenter contexts, either using the same model or retraining it for applications to different image modalities. Codes and the trained IGUANe model are available at https://github.com/RocaVincent/iguane_harmonization.git.
URL: https://github.com/RocaVincent/iguane_harmonization.git.
A novel voxel-based method to estimate cortical sulci width and its application to compare patients with Alzheimer’s disease to controls.
A voxel-based method for measuring sulcal width was developed, validated and applied to a database. This method (EDT-based LM) employs the 3D Euclidean Distance Transform (EDT) of the pial surface and a Local Maxima labeling algorithm. A computational phantom was designed to test method performance; results revealed the method’s inaccuracy delta, to range between 0.1 and 0.5 voxels, for a width that varied between 1 and 7 voxels. Two morphological descriptors were computed to characterize each defined sulcus: mean sulcal width (MSW) and mean absolute deviation (MAD). The former is the average width for all available width measurements within the sulcus, and the latter is the deviation of these measurements. The EDT-based LM method was applied to the Minimal Interval Resonance Imaging in the Alzheimer’s Disease (MIRIAD) database, for a set of high-resolution Magnetic Resonance (MR) images of 66 subjects: 43 patients with Alzheimer Disease (AD) and 23 control subjects. AD causes significant gray matter loss; hence, some sulci were expected to broaden. Methodological results concurred with this hypothesis. After a Wilcoxon test, MSW was grater in the case of all sulci pertaining to AD patients, (p < 0.05, FDR corrected), whereas MAD showed significant differences in 8 sulci (p < 0.05, FDR corrected). This work presents a novel voxel-based method for measuring sulcal width and extracting descriptors to characterize and compare the sulci within and across subjects.
URL:
Rigid motion invariant statistical shape modeling based on discrete fundamental forms: Data from the osteoarthritis initiative and the Alzheimer’s disease neuroimaging initiative.
We present a novel approach for nonlinear statistical shape modeling that is invariant under Euclidean motion and thus alignment-free. By analyzing metric distortion and curvature of shapes as elements of Lie groups in a consistent Riemannian setting, we construct a framework that reliably handles large deformations. Due to the explicit character of Lie group operations, our non-Euclidean method is very efficient allowing for fast and numerically robust processing. This facilitates Riemannian analysis of large shape populations accessible through longitudinal and multi-site imaging studies providing increased statistical power. Additionally, as planar configurations form a submanifold in shape space, our representation allows for effective estimation of quasi-isometric surfaces flattenings. We evaluate the performance of our model w.r.t. shape-based classification of hippocampus and femur malformations due to Alzheimer’s disease and osteoarthritis, respectively. In particular, we outperform state-of-the-art classifiers based on geometric deep learning as well as statistical shape modeling especially in presence of sparse training data. To provide insight into the model’s ability of capturing biological shape variability, we carry out an analysis of specificity and generalization ability.
URL:
Predictive markers for AD in a multi-modality framework: an analysis of MCI progression in the ADNI population.
Alzheimer’s Disease (AD) and other neurodegenerative diseases affect over 20 million people worldwide, and this number is projected to significantly increase in the coming decades. Proposed imaging-based markers have shown steadily improving levels of sensitivity/specificity in classifying individual subjects as AD or normal. Several of these efforts have utilized statistical machine learning techniques, using brain images as input, as means of deriving such AD-related markers. A common characteristic of this line of research is a focus on either (1) using a single imaging modality for classification, or (2) incorporating several modalities, but reporting separate results for each. One strategy to improve on the success of these methods is to leverage all available imaging modalities together in a single automated learning framework. The rationale is that some subjects may show signs of pathology in one modality but not in another-by combining all available images a clearer view of the progression of disease pathology will emerge. Our method is based on the Multi-Kernel Learning (MKL) framework, which allows the inclusion of an arbitrary number of views of the data in a maximum margin, kernel learning framework. The principal innovation behind MKL is that it learns an optimal combination of kernel (similarity) matrices while simultaneously training a classifier. In classification experiments MKL outperformed an SVM trained on all available features by 3%-4%. We are especially interested in whether such markers are capable of identifying early signs of the disease. To address this question, we have examined whether our multi-modal disease marker (MMDM) can predict conversion from Mild Cognitive Impairment (MCI) to AD. Our experiments reveal that this measure shows significant group differences between MCI subjects who progressed to AD, and those who remained stable for 3 years. These differences were most significant in MMDMs based on imaging data. We also discuss the relationship between our MMDM and an individual’s conversion from MCI to AD.
URL:
Deep learning for Alzheimer’s disease: Mapping large-scale histological tau protein for neuroimaging biomarker validation.
Abnormal tau inclusions are hallmarks of Alzheimer’s disease and predictors of clinical decline. Several tau PET tracers are available for neurodegenerative disease research, opening avenues for molecular diagnosis in vivo. However, few have been approved for clinical use. Understanding the neurobiological basis of PET signal validation remains problematic because it requires a large-scale, voxel-to-voxel correlation between PET and (immuno) histological signals. Large dimensionality of whole human brains, tissue deformation impacting co-registration, and computing requirements to process terabytes of information preclude proper validation. We developed a computational pipeline to identify and segment particles of interest in billion-pixel digital pathology images to generate quantitative, 3D density maps. The proposed convolutional neural network for immunohistochemistry samples, IHCNet, is at the pipeline’s core. We have successfully processed and immunostained over 500 slides from two whole human brains with three phospho-tau antibodies (AT100, AT8, and MC1), spanning several terabytes of images. Our artificial neural network estimated tau inclusion from brain images, which performs with ROC AUC of 0.87, 0.85, and 0.91 for AT100, AT8, and MC1, respectively. Introspection studies further assessed the ability of our trained model to learn tau-related features. We present an end-to-end pipeline to create terabytes-large 3D tau inclusion density maps co-registered to MRI as a means to facilitate validation of PET tracers.
URL:
scGRNom: a computational pipeline of integrative multi-omics analyses for predicting cell-type disease genes and regulatory networks.
Understanding cell-type-specific gene regulatory mechanisms from genetic variants to diseases remains challenging. To address this, we developed a computational pipeline, scGRNom (single-cell Gene Regulatory Network prediction from multi-omics), to predict cell-type disease genes and regulatory networks including transcription factors and regulatory elements. With applications to schizophrenia and Alzheimer’s disease, we predicted disease genes and regulatory networks for excitatory and inhibitory neurons, microglia, and oligodendrocytes. Further enrichment analyses revealed cross-disease and disease-specific functions and pathways at the cell-type level. Our machine learning analysis also found that cell-type disease genes improved clinical phenotype predictions. scGRNom is a general-purpose tool available at https://github.com/daifengwanglab/scGRNom .
URL: https://github.com/daifengwanglab/scGRNom
Bayesian model reveals latent atrophy factors with dissociable cognitive trajectories in Alzheimer’s disease.
We used a data-driven Bayesian model to automatically identify distinct latent factors of overlapping atrophy patterns from voxelwise structural MRIs of late-onset Alzheimer’s disease (AD) dementia patients. Our approach estimated the extent to which multiple distinct atrophy patterns were expressed within each participant rather than assuming that each participant expressed a single atrophy factor. The model revealed a temporal atrophy factor (medial temporal cortex, hippocampus, and amygdala), a subcortical atrophy factor (striatum, thalamus, and cerebellum), and a cortical atrophy factor (frontal, parietal, lateral temporal, and lateral occipital cortices). To explore the influence of each factor in early AD, atrophy factor compositions were inferred in beta-amyloid-positive (Abeta+) mild cognitively impaired (MCI) and cognitively normal (CN) participants. All three factors were associated with memory decline across the entire clinical spectrum, whereas the cortical factor was associated with executive function decline in Abeta+ MCI participants and AD dementia patients. Direct comparison between factors revealed that the temporal factor showed the strongest association with memory, whereas the cortical factor showed the strongest association with executive function. The subcortical factor was associated with the slowest decline for both memory and executive function compared with temporal and cortical factors. These results suggest that distinct patterns of atrophy influence decline across different cognitive domains. Quantification of this heterogeneity may enable the computation of individual-level predictions relevant for disease monitoring and customized therapies. Factor compositions of participants and code used in this article are publicly available for future research.
URL:
Robust double machine learning model with application to omics data.
BACKGROUND: Recently, there has been a growing interest in combining causal inference with machine learning algorithms. Double machine learning model (DML), as an implementation of this combination, has received widespread attention for their expertise in estimating causal effects within high-dimensional complex data. However, the DML model is sensitive to the presence of outliers and heavy-tailed noise in the outcome variable. In this paper, we propose the robust double machine learning (RDML) model to achieve a robust estimation of causal effects when the distribution of the outcome is contaminated by outliers or exhibits symmetrically heavy-tailed characteristics. RESULTS: In the modelling of RDML model, we employed median machine learning algorithms to achieve robust predictions for the treatment and outcome variables. Subsequently, we established a median regression model for the prediction residuals. These two steps ensure robust causal effect estimation. Simulation study show that the RDML model is comparable to the existing DML model when the data follow normal distribution, while the RDML model has obvious superiority when the data follow mixed normal distribution and t-distribution, which is manifested by having a smaller RMSE. Meanwhile, we also apply the RDML model to the deoxyribonucleic acid methylation dataset from the Alzheimer’s disease (AD) neuroimaging initiative database with the aim of investigating the impact of Cerebrospinal Fluid Amyloid beta 42 (CSF A beta 42) on AD severity. CONCLUSION: These findings illustrate that the RDML model is capable of robustly estimating causal effect, even when the outcome distribution is affected by outliers or displays symmetrically heavy-tailed properties.
URL:
Exploring matrix factorization techniques for significant genes identification of Alzheimer’s disease microarray gene expression data.
BACKGROUND: The wide use of high-throughput DNA microarray technology provide an increasingly detailed view of human transcriptome from hundreds to thousands of genes. Although biomedical researchers typically design microarray experiments to explore specific biological contexts, the relationships between genes are hard to identified because they are complex and noisy high-dimensional data and are often hindered by low statistical power. The main challenge now is to extract valuable biological information from the colossal amount of data to gain insight into biological processes and the mechanisms of human disease. To overcome the challenge requires mathematical and computational methods that are versatile enough to capture the underlying biological features and simple enough to be applied efficiently to large datasets. METHODS: Unsupervised machine learning approaches provide new and efficient analysis of gene expression profiles. In our study, two unsupervised knowledge-based matrix factorization methods, independent component analysis (ICA) and nonnegative matrix factorization (NMF) are integrated to identify significant genes and related pathways in microarray gene expression dataset of Alzheimer’s disease. The advantage of these two approaches is they can be performed as a biclustering method by which genes and conditions can be clustered simultaneously. Furthermore, they can group genes into different categories for identifying related diagnostic pathways and regulatory networks. The difference between these two method lies in ICA assume statistical independence of the expression modes, while NMF need positivity constrains to generate localized gene expression profiles. RESULTS: In our work, we performed FastICA and non-smooth NMF methods on DNA microarray gene expression data of Alzheimer’s disease respectively. The simulation results shows that both of the methods can clearly classify severe AD samples from control samples, and the biological analysis of the identified significant genes and their related pathways demonstrated that these genes play a prominent role in AD and relate the activation patterns to AD phenotypes. It is validated that the combination of these two methods is efficient. CONCLUSIONS: Unsupervised matrix factorization methods provide efficient tools to analyze high-throughput microarray dataset. According to the facts that different unsupervised approaches explore correlations in the high-dimensional data space and identify relevant subspace base on different hypotheses, integrating these methods to explore the underlying biological information from microarray dataset is an efficient approach. By combining the significant genes identified by both ICA and NMF, the biological analysis shows great efficient for elucidating the molecular taxonomy of Alzheimer’s disease and enable better experimental design to further identify potential pathways and therapeutic targets of AD.
URL:
ReRF-Pred: predicting amyloidogenic regions of proteins based on their pseudo amino acid composition and tripeptide composition.
BACKGROUND: Amyloids are insoluble fibrillar aggregates that are highly associated with complex human diseases, such as Alzheimer’s disease, Parkinson’s disease, and type II diabetes. Recently, many studies reported that some specific regions of amino acid sequences may be responsible for the amyloidosis of proteins. It has become very important for elucidating the mechanism of amyloids that identifying the amyloidogenic regions. Accordingly, several computational methods have been put forward to discover amyloidogenic regions. The majority of these methods predicted amyloidogenic regions based on the physicochemical properties of amino acids. In fact, position, order, and correlation of amino acids may also influence the amyloidosis of proteins, which should be also considered in detecting amyloidogenic regions. RESULTS: To address this problem, we proposed a novel machine-learning approach for predicting amyloidogenic regions, called ReRF-Pred. Firstly, the pseudo amino acid composition (PseAAC) was exploited to characterize physicochemical properties and correlation of amino acids. Secondly, tripeptides composition (TPC) was employed to represent the order and position of amino acids. To improve the distinguishability of TPC, all possible tripeptides were analyzed by the binomial distribution method, and only those which have significantly different distribution between positive and negative samples remained. Finally, all samples were characterized by PseAAC and TPC of their amino acid sequence, and a random forest-based amyloidogenic regions predictor was trained on these samples. It was proved by validation experiments that the feature set consisted of PseAAC and TPC is the most distinguishable one for detecting amyloidosis. Meanwhile, random forest is superior to other concerned classifiers on almost all metrics. To validate the effectiveness of our model, ReRF-Pred is compared with a series of gold-standard methods on two datasets: Pep-251 and Reg33. The results suggested our method has the best overall performance and makes significant improvements in discovering amyloidogenic regions. CONCLUSIONS: The advantages of our method are mainly attributed to that PseAAC and TPC can describe the differences between amyloids and other proteins successfully. The ReRF-Pred server can be accessed at http://106.12.83.135:8080/ReRF-Pred/.
URL: http://106.12.83.135:8080/ReRF-Pred/.
Integrated cerebellar radiomic-network model for predicting mild cognitive impairment in Alzheimer’s disease.
INTRODUCTION: Pathological and neuroimaging alterations in the cerebellum of Alzheimer’s disease (AD) patients have been documented. However, the role of cerebellum-derived radiomic and structural connectome modeling in the prediction of AD progression remains unclear. METHODS: Radiomic features were extracted from magnetic resonance imaging (MRI) in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset (n = 1319) and an in-house dataset (n = 308). Integrated machine learning models were developed to predict the conversion risk of normal cognition (NC) to mild cognitive impairment (MCI) over a 6-year follow-up. RESULTS: The cerebellar models outperformed hippocampal models in distinguishing MCI from NC and in predicting transitions from NC to MCI across both cohorts. Key predictors included textural features in the right III and left I and II lobules, and network properties in Vermis I and II, which were associated with cognitive decline in AD. DISCUSSION: Cerebellum-derived radiomic-network modeling shows promise as a tool for early identification and prediction of disease progression during the preclinical stage of AD. HIGHLIGHTS: Altered cerebellar radiomic features and topological networks were identified in the subjects with mild cognitive impairment (MCI). The cerebellar radiomic-network integrated models outperformed hippocampal models in distinguishing MCI from normal cognition. The cerebellar radiomic model effectively predicts MCI risk and can stratify individuals into distinct risk categories. Specific cerebellar radiomic features are associated with cognitive impairment across various stages of amyloid beta and tau pathology.
URL:
Instantiated mixed effects modeling of Alzheimer’s disease markers.
The assessment and prediction of a subject’s current and future risk of developing neurodegenerative diseases like Alzheimer’s disease are of great interest in both the design of clinical trials as well as in clinical decision making. Exploring the longitudinal trajectory of markers related to neurodegeneration is an important task when selecting subjects for treatment in trials and the clinic, in the evaluation of early disease indicators and the monitoring of disease progression. Given that there is substantial intersubject variability, models that attempt to describe marker trajectories for a whole population will likely lack specificity for the representation of individual patients. Therefore, we argue here that individualized models provide a more accurate alternative that can be used for tasks such as population stratification and a subject-specific prognosis. In the work presented here, mixed effects modeling is used to derive global and individual marker trajectories for a training population. Test subject (new patient) specific models are then instantiated using a stratified “marker signature” that defines a subpopulation of similar cases within the training database. From this subpopulation, personalized models of the expected trajectory of several markers are subsequently estimated for unseen patients. These patient specific models of markers are shown to provide better predictions of time-to-conversion to Alzheimer’s disease than population based models.
URL:
Automated hippocampal shape analysis predicts the onset of dementia in mild cognitive impairment.
The hippocampus is involved at the onset of the neuropathological pathways leading to Alzheimer’s disease (AD). Individuals with mild cognitive impairment (MCI) are at increased risk of AD. Hippocampal volume has been shown to predict which MCI subjects will convert to AD. Our aim in the present study was to produce a fully automated prognostic procedure, scalable to high throughput clinical and research applications, for the prediction of MCI conversion to AD using 3D hippocampal morphology. We used an automated analysis for the extraction and mapping of the hippocampus from structural magnetic resonance scans to extract 3D hippocampal shape morphology, and we then applied machine learning classification to predict conversion from MCI to AD. We investigated the accuracy of prediction in 103 MCI subjects (mean age 74.1 years) from the longitudinal AddNeuroMed study. Our model correctly predicted MCI conversion to dementia within a year at an accuracy of 80% (sensitivity 77%, specificity 80%), a performance which is competitive with previous predictive models dependent on manual measurements. Categorization of MCI subjects based on hippocampal morphology revealed more rapid cognitive deterioration in MMSE scores (p<0.01) and CERAD verbal memory (p<0.01) in those subjects who were predicted to develop dementia relative to those predicted to remain stable. The pattern of atrophy associated with increased risk of conversion demonstrated initial degeneration in the anterior part of the cornus ammonis 1 (CA1) hippocampal subregion. We conclude that automated shape analysis generates sensitive measurements of early neurodegeneration which predates the onset of dementia and thus provides a prognostic biomarker for conversion of MCI to AD.
URL:
Assessing polyomic risk to predict Alzheimer’s disease using a machine learning model.
INTRODUCTION: Alzheimer’s disease (AD) is the most common form of dementia in the elderly. Given that AD neuropathology begins decades before symptoms, there is a dire need for effective screening tools for early detection of AD to facilitate early intervention. METHODS: Here, we used tree-based and deep learning methods to train polyomic prediction models for AD affection status and age at onset, employing genomic, proteomic, metabolomic, and drug use data from UK Biobank. We used SHAP to determine the feature’s importance. RESULTS: Our best-performing polyomic model achieved an area under the receiver operating characteristics curve (AUROC) of 0.87. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides apolipoprotein E (APOE) alleles. Increasing the number of cases by including “AD-by-proxy” cases did not improve AD prediction. DISCUSSION: Among the four modalities, genomics, and proteomics were the most informative modality based on AUROC (area under the receiver operating characteristic curve). Our data suggest that two blood-based biomarkers (glial fibrillary acidic protein [GFAP] and CXCL17) may be effective for early presymptomatic prediction of AD. HIGHLIGHTS: We developed a polyomic model to predict AD and age-at-onset using omics and medication use data from EHR. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides APOE alleles. “AD-by-proxy” cases, if used in training, do not improve AD prediction. Proteomics was the most informative modality overall for affection status and AAO prediction.
URL:
Spatiotemporal linear mixed effects modeling for the mass-univariate analysis of longitudinal neuroimage data.
We present an extension of the Linear Mixed Effects (LME) modeling approach to be applied to the mass-univariate analysis of longitudinal neuroimaging (LNI) data. The proposed method, called spatiotemporal LME or ST-LME, builds on the flexible LME framework and exploits the spatial structure in image data. We instantiated ST-LME for the analysis of cortical surface measurements (e.g. thickness) computed by FreeSurfer, a widely-used brain Magnetic Resonance Image (MRI) analysis software package. We validate the proposed ST-LME method and provide a quantitative and objective empirical comparison with two popular alternative methods, using two brain MRI datasets obtained from the Alzheimer’s disease neuroimaging initiative (ADNI) and Open Access Series of Imaging Studies (OASIS). Our experiments revealed that ST-LME offers a dramatic gain in statistical power and repeatability of findings, while providing good control of the false positive rate.
URL:
Surface-based TBM boosts power to detect disease effects on the brain: an N=804 ADNI study.
Computational anatomy methods are now widely used in clinical neuroimaging to map the profile of disease effects on the brain and its clinical correlates. In Alzheimer’s disease (AD), many research groups have modeled localized changes in hippocampal and lateral ventricular surfaces, to provide candidate biomarkers of disease progression for drug trials. We combined the power of parametric surface modeling and tensor-based morphometry to study hippocampal differences associated with AD and mild cognitive impairment (MCI) in 490 subjects (97 AD, 245 MCI, 148 controls) and ventricular differences in 804 subjects scanned as part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI; 184 AD, 391 MCI, 229 controls). We aimed to show that a new multivariate surface statistic based on multivariate tensor-based morphometry (mTBM) and radial distance provides a more powerful way to detect localized anatomical differences than conventional surface-based analysis. In our experiments, we studied correlations between hippocampal atrophy and ventricular enlargement and clinical measures and cerebrospinal fluid biomarkers. The new multivariate statistics gave better effect sizes for detecting morphometric differences, relative to other statistics including radial distance, analysis of the surface tensor and the Jacobian determinant. In empirical tests using false discovery rate curves, smaller sample sizes were needed to detect associations with diagnosis. The analysis pipeline is generic and automated. It may be applied to analyze other brain subcortical structures including the caudate nucleus and putamen. This publically available software may boost power for morphometric studies of subcortical structures in the brain.
URL:
Development and Validation of a Deep Learning Algorithm for Mortality Prediction in Selecting Patients With Dementia for Earlier Palliative Care Interventions.
Importance: Early palliative care interventions drive high-value care but currently are underused. Health care professionals face challenges in identifying patients who may benefit from palliative care. Objective: To develop a deep learning algorithm using longitudinal electronic health records to predict mortality risk as a proxy indicator for identifying patients with dementia who may benefit from palliative care. Design, Setting, and Participants: In this retrospective cohort study, 6-month, 1-year, and 2-year mortality prediction models with recurrent neural networks used patient demographic information and topics generated from clinical notes within Partners HealthCare System, an integrated health care delivery system in Boston, Massachusetts. This study included 26 921 adult patients with dementia who visited the health care system from January 1, 2011, through December 31, 2017. The models were trained using a data set of 24 229 patients and validated using another data set of 2692 patients. Data were analyzed from September 18, 2018, to May 15, 2019. Main Outcomes and Measures: The area under the receiver operating characteristic curve (AUC) for 6-month and 1- and 2-year mortality prediction models and the factors contributing to the predictions. Results: The study cohort included 26 921 patients (16 263 women [60.4%]; mean [SD] age, 74.6 [13.5] years). For the 24 229 patients in the training data set, mean (SD) age was 74.8 (13.2) years and 14 632 (60.4%) were women. For the 2692 patients in the validation data set, mean (SD) age was 75.0 (12.6) years and 1631 (60.6%) were women. The 6-month model reached an AUC of 0.978 (95% CI, 0.977-0.978); the 1-year model, 0.956 (95% CI, 0.955-0.956); and the 2-year model, 0.943 (95% CI, 0.942-0.944). The top-ranked latent topics associated with 6-month and 1- and 2-year mortality in patients with dementia include palliative and end-of-life care, cognitive function, delirium, testing of cholesterol levels, cancer, pain, use of health care services, arthritis, nutritional status, skin care, family meeting, shock, respiratory failure, and swallowing function. Conclusions and Relevance: A deep learning algorithm based on patient demographic information and longitudinal clinical notes appeared to show promising results in predicting mortality among patients with dementia in different time frames. Further research is necessary to determine the feasibility of applying this algorithm in clinical settings for identifying unmet palliative care needs earlier.
URL:
Multi-template tensor-based morphometry: application to analysis of Alzheimer’s disease.
In this paper methods for using multiple templates in tensor-based morphometry (TBM) are presented and compared to the conventional single-template approach. TBM analysis requires non-rigid registrations which are often subject to registration errors. When using multiple templates and, therefore, multiple registrations, it can be assumed that the registration errors are averaged and eventually compensated. Four different methods are proposed for multi-template TBM. The methods were evaluated using magnetic resonance (MR) images of healthy controls, patients with stable or progressive mild cognitive impairment (MCI), and patients with Alzheimer’s disease (AD) from the ADNI database (N=772). The performance of TBM features in classifying images was evaluated both quantitatively and qualitatively. Classification results show that the multi-template methods are statistically significantly better than the single-template method. The overall classification accuracy was 86.0% for the classification of control and AD subjects, and 72.1% for the classification of stable and progressive MCI subjects. The statistical group-level difference maps produced using multi-template TBM were smoother, formed larger continuous regions, and had larger t-values than the maps obtained with single-template TBM.
URL:
Quantifying mechanisms in neurodegenerative diseases (NDDs) using candidate mechanism perturbation amplitude (CMPA) algorithm.
BACKGROUND: Literature derived knowledge assemblies have been used as an effective way of representing biological phenomenon and understanding disease etiology in systems biology. These include canonical pathway databases such as KEGG, Reactome and WikiPathways and disease specific network inventories such as causal biological networks database, PD map and NeuroMMSig. The represented knowledge in these resources delineates qualitative information focusing mainly on the causal relationships between biological entities. Genes, the major constituents of knowledge representations, tend to express differentially in different conditions such as cell types, brain regions and disease stages. A classical approach of interpreting a knowledge assembly is to explore gene expression patterns of the individual genes. However, an approach that enables quantification of the overall impact of differentially expressed genes in the corresponding network is still lacking. RESULTS: Using the concept of heat diffusion, we have devised an algorithm that is able to calculate the magnitude of regulation of a biological network using expression datasets. We have demonstrated that molecular mechanisms specific to Alzheimer (AD) and Parkinson Disease (PD) regulate with different intensities across spatial and temporal resolutions. Our approach depicts that the mitochondrial dysfunction in PD is severe in cortex and advanced stages of PD patients. Similarly, we have shown that the intensity of aggregation of neurofibrillary tangles (NFTs) in AD increases as the disease progresses. This finding is in concordance with previous studies that explain the burden of NFTs in stages of AD. CONCLUSIONS: This study is one of the first attempts that enable quantification of mechanisms represented as biological networks. We have been able to quantify the magnitude of regulation of a biological network and illustrate that the magnitudes are different across spatial and temporal resolution.
URL:
Predicting clinical progression trajectories of early Alzheimer’s disease patients.
BACKGROUND: Models for forecasting individual clinical progression trajectories in early Alzheimer’s disease (AD) are needed for optimizing clinical studies and patient monitoring. METHODS: Prediction models were constructed using a clinical trial training cohort (TC; n = 934) via a gradient boosting algorithm and then evaluated in two validation cohorts (VC 1, n = 235; VC 2, n = 421). Model inputs included baseline clinical features (cognitive function assessments, APOE epsilon4 status, and demographics) and brain magnetic resonance imaging (MRI) measures. RESULTS: The model using clinical features achieved R2 of 0.21 and 0.31 for predicting 2-year cognitive decline in VC 1 and VC 2, respectively. Adding MRI features improved the R2 to 0.29 in VC 1, which employed the same preprocessing pipeline as the TC. Utilizing these model-based predictions for clinical trial enrichment reduced the required sample size by 20% to 49%. DISCUSSION: Our validated prediction models enable baseline prediction of clinical progression trajectories in early AD, benefiting clinical trial enrichment and various applications.
URL:
Unraveling the multiple chronic conditions patterns among people with Alzheimer’s disease and related dementia: A machine learning approach to incorporate synergistic interactions.
INTRODUCTION: Most people with Alzheimer’s disease and related dementia (ADRD) also suffer from two or more chronic conditions, known as multiple chronic conditions (MCC). While many studies have investigated the MCC patterns, few studies have considered the synergistic interactions with other factors (called the syndemic factors) specifically for people with ADRD. METHODS: We included 40,290 visits and identified 18 MCC from the National Alzheimer’s Coordinating Center. Then, we utilized a multi-label XGBoost model to predict developing MCC based on existing MCC patterns and individualized syndemic factors. RESULTS: Our model achieved an overall arithmetic mean of 0.710 AUROC (SD = 0.100) in predicting 18 developing MCC. While existing MCC patterns have enough predictive power, syndemic factors related to dementia, social behaviors, mental and physical health can improve model performance further. DISCUSSION: Our study demonstrated that the MCC patterns among people with ADRD can be learned using a machine-learning approach with syndemic framework adjustments. HIGHLIGHTS: Machine learning models can learn the MCC patterns for people with ADRD. The learned MCC patterns should be adjusted and individualized by syndemic factors. The model can predict which disease is developing based on existing MCC patterns. As a result, this model enables early specific MCC identification and prevention.
URL:
Anatomically interpretable deep learning of brain age captures domain-specific cognitive impairment.
The gap between chronological age (CA) and biological brain age, as estimated from magnetic resonance images (MRIs), reflects how individual patterns of neuroanatomic aging deviate from their typical trajectories. MRI-derived brain age (BA) estimates are often obtained using deep learning models that may perform relatively poorly on new data or that lack neuroanatomic interpretability. This study introduces a convolutional neural network (CNN) to estimate BA after training on the MRIs of 4,681 cognitively normal (CN) participants and testing on 1,170 CN participants from an independent sample. BA estimation errors are notably lower than those of previous studies. At both individual and cohort levels, the CNN provides detailed anatomic maps of brain aging patterns that reveal sex dimorphisms and neurocognitive trajectories in adults with mild cognitive impairment (MCI, N = 351) and Alzheimer’s disease (AD, N = 359). In individuals with MCI (54% of whom were diagnosed with dementia within 10.9 y from MRI acquisition), BA is significantly better than CA in capturing dementia symptom severity, functional disability, and executive function. Profiles of sex dimorphism and lateralization in brain aging also map onto patterns of neuroanatomic change that reflect cognitive decline. Significant associations between BA and neurocognitive measures suggest that the proposed framework can map, systematically, the relationship between aging-related neuroanatomy changes in CN individuals and in participants with MCI or AD. Early identification of such neuroanatomy changes can help to screen individuals according to their AD risk.
URL:
Local MRI analysis approach in the diagnosis of early and prodromal Alzheimer’s disease.
BACKGROUND: Medial temporal lobe (MTL) atrophy is one of the key biomarkers to detect early neurodegenerative changes in the course of Alzheimer’s disease (AD). There is active research aimed at identifying automated methodologies able to extract accurate classification indexes from T1-weighted magnetic resonance images (MRI). Such indexes should be fit for identifying AD patients as early as possible. SUBJECTS: A reference group composed of 144AD patients and 189 age-matched controls was used to train and test the procedure. It was then applied on a study group composed of 302 MCI subjects, 136 having progressed to clinically probable AD (MCI-converters) and 166 having remained stable or recovered to normal condition after a 24month follow-up (MCI-non converters). All subjects came from the ADNI database. METHODS: We sampled the brain with 7 relatively small volumes, mainly centered on the MTL, and 2 control regions. These volumes were filtered to give intensity and textural MRI-based features. Each filtered region was analyzed with a Random Forest (RF) classifier to extract relevant features, which were subsequently processed with a Support Vector Machine (SVM) classifier. Once a prediction model was trained and tested on the reference group, it was used to compute a classification index (CI) on the MCI cohort and to assess its accuracy in predicting AD conversion in MCI patients. The performance of the classification based on the features extracted by the whole 9 volumes is compared with that derived from each single volume. All experiments were performed using a bootstrap sampling estimation, and classifier performance was cross-validated with a 20-fold paradigm. RESULTS: We identified a restricted set of image features correlated with the conversion to AD. It is shown that most information originate from a small subset of the total available features, and that it is enough to give a reliable assessment. We found multiple, highly localized image-based features which alone are responsible for the overall clinical diagnosis and prognosis. The classification index is able to discriminate Controls from AD with an Area Under Curve (AUC)=0.97 (sensitivity 89% at specificity 94%) and Controls from MCI-converters with an AUC=0.92 (sensitivity 89% at specificity 80%). MCI-converters are separated from MCI-non converters with AUC=0.74(sensitivity 72% at specificity 65%). FINDINGS: The present automated MRI-based technique revealed a strong relationship between highly localized baseline-MRI features and the baseline clinical assessment. In addition, the classification index was also used to predict the probability of AD conversion within a time frame of two years. The definition of a single index combining local analysis of several regions can be useful to detect AD neurodegeneration in a typical MCI population.
URL:
dsRID: in silico identification of dsRNA regions using long-read RNA-seq data.
MOTIVATION: Double-stranded RNAs (dsRNAs) are potent triggers of innate immune responses upon recognition by cytosolic dsRNA sensor proteins. Identification of endogenous dsRNAs helps to better understand the dsRNAome and its relevance to innate immunity related to human diseases. RESULTS: Here, we report dsRID (double-stranded RNA identifier), a machine learning-based method to predict dsRNA regions in silico, leveraging the power of long-read RNA-sequencing (RNA-seq) and molecular traits of dsRNAs. Using models trained with PacBio long-read RNA-seq data derived from Alzheimer’s disease (AD) brain, we show that our approach is highly accurate in predicting dsRNA regions in multiple data sets. Applied to an AD cohort sequenced by the ENCODE consortium, we characterize the global dsRNA profile with potentially distinct expression patterns between AD and controls. Together, we show that dsRID provides an effective approach to capture global dsRNA profiles using long-read RNA-seq data. AVAILABILITY: Software implementation of dsRID, and genomic coordinates of regions predicted by dsRID in all samples are available at the GitHub repository: https://github.com/gxiaolab/dsRID. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/gxiaolab/dsRID.
Propagating Uncertainty Across Cascaded Medical Imaging Tasks for Improved Deep Learning Inference.
Although deep networks have been shown to perform very well on a variety of medical imaging tasks, inference in the presence of pathology presents several challenges to common models. These challenges impede the integration of deep learning models into real clinical workflows, where the customary process of cascading deterministic outputs from a sequence of image-based inference steps (e.g. registration, segmentation) generally leads to an accumulation of errors that impacts the accuracy of downstream inference tasks. In this paper, we propose that by embedding uncertainty estimates across cascaded inference tasks, performance on the downstream inference tasks should be improved. We demonstrate the effectiveness of the proposed approach in three different clinical contexts: (i) We demonstrate that by propagating T2 weighted lesion segmentation results and their associated uncertainties, subsequent T2 lesion detection performance is improved when evaluated on a proprietary large-scale, multi-site, clinical trial dataset acquired from patients with Multiple Sclerosis. (ii) We show an improvement in brain tumour segmentation performance when the uncertainty map associated with a synthesised missing MR volume is provided as an additional input to a follow-up brain tumour segmentation network, when evaluated on the publicly available BraTS-2018 dataset. (iii) We show that by propagating uncertainties from a voxel-level hippocampus segmentation task, the subsequent regression of the Alzheimer’s disease clinical score is improved.
URL:
A whole-brain functional connectivity model of Alzheimer’s disease pathology.
INTRODUCTION: Alzheimer’s disease (AD) is characterized by the presence of two proteinopathies, amyloid and tau, which have a cascading effect on the functional and structural organization of the brain. METHODS: In this study, we used a supervised machine learning technique to build a model of functional connections that predicts cerebrospinal fluid (CSF) p-tau/Abeta42 (the PATH-fc model). Resting-state functional magnetic resonance imaging (fMRI) data from 289 older adults in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) were utilized for this model. RESULTS: We successfully derived the PATH-fc model to predict the ratio of p-tau/Abeta42 as well as cognitive functioning in older adults across the spectrum of healthy and pathological aging. However, the in-sample fit magnitude was low, indicating a need for further model development. DISCUSSION: Our pathology-based model of functional connectivity included representation from multiple canonical networks of the brain with intra-network connectivity associated with low pathology and inter-network connectivity associated with higher levels of pathology. HIGHLIGHTS: Whole-brain functional connectivity model (PATH-fc) is linked to AD pathophysiology. The PATH-fc model predicts performance in multiple domains of cognitive functioning. The PATH-fc model is a distributed model including representation from all canonical networks.
URL:
MPI-VGAE: protein-metabolite enzymatic reaction link learning by variational graph autoencoders.
Enzymatic reactions are crucial to explore the mechanistic function of metabolites and proteins in cellular processes and to understand the etiology of diseases. The increasing number of interconnected metabolic reactions allows the development of in silico deep learning-based methods to discover new enzymatic reaction links between metabolites and proteins to further expand the landscape of existing metabolite-protein interactome. Computational approaches to predict the enzymatic reaction link by metabolite-protein interaction (MPI) prediction are still very limited. In this study, we developed a Variational Graph Autoencoders (VGAE)-based framework to predict MPI in genome-scale heterogeneous enzymatic reaction networks across ten organisms. By incorporating molecular features of metabolites and proteins as well as neighboring information in the MPI networks, our MPI-VGAE predictor achieved the best predictive performance compared to other machine learning methods. Moreover, when applying the MPI-VGAE framework to reconstruct hundreds of metabolic pathways, functional enzymatic reaction networks and a metabolite-metabolite interaction network, our method showed the most robust performance among all scenarios. To the best of our knowledge, this is the first MPI predictor by VGAE for enzymatic reaction link prediction. Furthermore, we implemented the MPI-VGAE framework to reconstruct the disease-specific MPI network based on the disrupted metabolites and proteins in Alzheimer’s disease and colorectal cancer, respectively. A substantial number of novel enzymatic reaction links were identified. We further validated and explored the interactions of these enzymatic reactions using molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and facilitate the study of the disrupted metabolisms in diseases.
URL:
Voxel-based classification of FDG PET in dementia using inter-scanner normalization.
Statistical mapping of FDG PET brain images has become a common tool in differential diagnosis of patients with dementia. We present a voxel-based classification system of neurodegenerative dementias based on partial least squares (PLS). Such a classifier relies on image databases of normal controls and dementia cases as training data. Variations in PET image characteristics can be expected between databases, for example due to differences in instrumentation, patient preparation, and image reconstruction. This study evaluates (i) the impact of databases from different scanners on classification accuracy and (ii) a method to improve inter-scanner classification. Brain FDG PET databases from three scanners (A, B, C) at two clinical sites were evaluated. Diagnostic categories included normal controls (NC, nA=26, nB=20, nC=24 for each scanner respectively), Alzheimer’s disease (AD, nA=44, nB=11, nC=16), and frontotemporal dementia (FTD, nA=13, nB=13, nC=5). Spatially normalized images were classified as NC, AD, or FTD using partial least squares. Supervised learning was employed to determine classifier parameters, whereby available data is sub-divided into training and test sets. Four different database setups were evaluated: (i) “in-scanner”: training and test data from the same scanner, (ii) “x-scanner”: training and test data from different scanners, (iii) “train other”: train on both x-scanners, and (iv) “train all”: train on all scanners. In order to moderate the impact of inter-scanner variations on image evaluation, voxel-by-voxel scaling was applied based on “ratio images”. Good classification accuracy of on average 94% was achieved for the in-scanner setups. Accuracy deteriorated for setups with mismatched scanners (79-91%). Ratio-image normalization improved all results with mismatched scanners (85-92%). In conclusion, automatic classification of individual FDG PET in differential diagnosis of dementia is feasible. Accuracy can vary with respect to scanner or acquisition characteristics of the training image data. The adopted approach of ratio-image normalization has been demonstrated to effectively moderate these effects.
URL:
AI-driven attenuation correction for brain PET/MRI: Clinical evaluation of a dementia cohort and importance of the training group size.
INTRODUCTION: Robust and reliable attenuation correction (AC) is a prerequisite for accurate quantification of activity concentration. In combined PET/MRI, AC is challenged by the lack of bone signal in the MRI from which the AC maps has to be derived. Deep learning-based image-to-image translation networks present itself as an optimal solution for MRI-derived AC (MR-AC). High robustness and generalizability of these networks are expected to be achieved through large training cohorts. In this study, we implemented an MR-AC method based on deep learning, and investigated how training cohort size, transfer learning, and MR input affected robustness, and subsequently evaluated the method in a clinical setup, with the overall aim to explore if this method could be implemented in clinical routine for PET/MRI examinations. METHODS: A total cohort of 1037 adult subjects from the Siemens Biograph mMR with two different software versions (VB20P and VE11P) was used. The software upgrade included updates to all MRI sequences. The impact of training group size was investigated by training a convolutional neural network (CNN) on an increasing training group size from 10 to 403. The ability to adapt to changes in the input images between software versions were evaluated using transfer learning from a large cohort to a smaller cohort, by varying training group size from 5 to 91 subjects. The impact of MRI sequence was evaluated by training three networks based on the Dixon VIBE sequence (DeepDixon), T1-weighted MPRAGE (DeepT1), and ultra-short echo time (UTE) sequence (DeepUTE). Blinded clinical evaluation relative to the reference low-dose CT (CT-AC) was performed for DeepDixon in 104 independent 2-[18F]fluoro-2-deoxy-d-glucose ([18F]FDG) PET patient studies performed for suspected neurodegenerative disorder using statistical surface projections. RESULTS: Robustness increased with group size in the training data set: 100 subjects were required to reduce the number of outliers compared to a state-of-the-art segmentation-based method, and a cohort >400 subjects further increased robustness in terms of reduced variation and number of outliers. When using transfer learning to adapt to changes in the MRI input, as few as five subjects were sufficient to minimize outliers. Full robustness was achieved at 20 subjects. Comparable robust and accurate results were obtained using all three types of MRI input with a bias below 1% relative to CT-AC in any brain region. The clinical PET evaluation using DeepDixon showed no clinically relevant differences compared to CT-AC. CONCLUSION: Deep learning based AC requires a large training cohort to achieve accurate and robust performance. Using transfer learning, only five subjects were needed to fine-tune the method to large changes to the input images. No clinically relevant differences were found compared to CT-AC, indicating that clinical implementation of our deep learning-based MR-AC method will be feasible across MRI system types using transfer learning and a limited number of subjects.
URL:
Early prediction of Alzheimer’s disease and related dementias using real-world electronic health records.
INTRODUCTION: This study aims to explore machine learning (ML) methods for early prediction of Alzheimer’s disease (AD) and related dementias (ADRD) using the real-world electronic health records (EHRs). METHODS: A total of 23,835 ADRD and 1,038,643 control patients were identified from the OneFlorida+ Research Consortium. Two ML methods were used to develop the prediction models. Both knowledge-driven and data-driven approaches were explored. Four computable phenotyping algorithms were tested. RESULTS: The gradient boosting tree (GBT) models trained with the data-driven approach achieved the best area under the curve (AUC) scores of 0.939, 0.906, 0.884, and 0.854 for early prediction of ADRD 0, 1, 3, or 5 years before diagnosis, respectively. A number of important clinical and sociodemographic factors were identified. DISCUSSION: We tested various settings and showed the predictive ability of using ML approaches for early prediction of ADRD with EHRs. The models can help identify high-risk individuals for early informed preventive or prognostic clinical decisions.
URL:
Removing inter-subject technical variability in magnetic resonance imaging studies.
Magnetic resonance imaging (MRI) intensities are acquired in arbitrary units, making scans non-comparable across sites and between subjects. Intensity normalization is a first step for the improvement of comparability of the images across subjects. However, we show that unwanted inter-scan variability associated with imaging site, scanner effect, and other technical artifacts is still present after standard intensity normalization in large multi-site neuroimaging studies. We propose RAVEL (Removal of Artificial Voxel Effect by Linear regression), a tool to remove residual technical variability after intensity normalization. As proposed by SVA and RUV [Leek and Storey, 2007, 2008, Gagnon-Bartsch and Speed, 2012], two batch effect correction tools largely used in genomics, we decompose the voxel intensities of images registered to a template into a biological component and an unwanted variation component. The unwanted variation component is estimated from a control region obtained from the cerebrospinal fluid (CSF), where intensities are known to be unassociated with disease status and other clinical covariates. We perform a singular value decomposition (SVD) of the control voxels to estimate factors of unwanted variation. We then estimate the unwanted factors using linear regression for every voxel of the brain and take the residuals as the RAVEL-corrected intensities. We assess the performance of RAVEL using T1-weighted (T1-w) images from more than 900 subjects with Alzheimer’s disease (AD) and mild cognitive impairment (MCI), as well as healthy controls from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. We compare RAVEL to two intensity-normalization-only methods: histogram matching and White Stripe. We show that RAVEL performs best at improving the replicability of the brain regions that are empirically found to be most associated with AD, and that these regions are significantly more present in structures impacted by AD (hippocampus, amygdala, parahippocampal gyrus, enthorinal area, and fornix stria terminals). In addition, we show that the RAVEL-corrected intensities have the best performance in distinguishing between MCI subjects and healthy subjects using the mean hippocampal intensity (AUC=67%), a marked improvement compared to results from intensity normalization alone (AUC=63% and 59% for histogram matching and White Stripe, respectively). RAVEL is promising for many other imaging modalities.
URL:
A gene-level methylome-wide association analysis identifies novel Alzheimer’s disease genes.
MOTIVATION: Transcriptome-wide association studies (TWAS) have successfully facilitated the discovery of novel genetic risk loci for many complex traits, including late-onset Alzheimer’s disease (AD). However, most existing TWAS methods rely only on gene expression and ignore epigenetic modification (i.e., DNA methylation) and functional regulatory information (i.e., enhancer-promoter interactions), both of which contribute significantly to the genetic basis of AD. RESULTS: We develop a novel gene-level association testing method that integrates genetically regulated DNA methylation and enhancer-target gene pairs with genome-wide association study (GWAS) summary results. Through simulations, we show that our approach, referred to as the CMO (cross methylome omnibus) test, yielded well controlled type I error rates and achieved much higher statistical power than competing methods under a wide range of scenarios. Furthermore, compared with TWAS, CMO identified an average of 124% more associations when analyzing several brain imaging-related GWAS results. By analyzing to date the largest AD GWAS of 71,880 cases and 383,378 controls, CMO identified six novel loci for AD, which have been ignored by competing methods. AVAILABILITY: Software: https://github.com/ChongWuLab/CMO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/ChongWuLab/CMO.
Resting state FDG-PET functional connectivity as an early biomarker of Alzheimer’s disease using conjoint univariate and independent component analyses.
Imaging cerebral glucose metabolism with positron emission tomography (PET) in Alzheimer’s disease (AD) has allowed for improved characterisation of this pathology. Such patterns are typically analysed using either univariate or multivariate statistical techniques. In this work we combined voxel-based group analysis and independent component analysis to extract differential characteristic patterns from PET data of glucose metabolism in a large cohort of normal elderly controls and patients with AD. The patterns were used in conjunction with a support vector machine to discriminate between subjects with mild cognitive impairment (MCI) at risk or not of converting to AD. The method was applied to baseline fluoro-deoxyglucose (FDG)-PET images of subjects from the ADNI database. Our approach achieved improved early detection and differentiation of typical versus pathological metabolic patterns in the MCI population, reaching 80% accuracy (85% sensitivity and 75% specificity) when using selected regions. The method has the potential to assist in the advance diagnosis of Alzheimer’s disease, and to identify early in the development of the disease those individuals at high risk of rapid cognitive decline who could be candidates for new therapeutic approaches.
URL:
Predicting probable Alzheimer’s disease using linguistic deficits and biomarkers.
BACKGROUND: The manual diagnosis of neurodegenerative disorders such as Alzheimer’s disease (AD) and related Dementias has been a challenge. Currently, these disorders are diagnosed using specific clinical diagnostic criteria and neuropsychological examinations. The use of several Machine Learning algorithms to build automated diagnostic models using low-level linguistic features resulting from verbal utterances could aid diagnosis of patients with probable AD from a large population. For this purpose, we developed different Machine Learning models on the DementiaBank language transcript clinical dataset, consisting of 99 patients with probable AD and 99 healthy controls. RESULTS: Our models learned several syntactic, lexical, and n-gram linguistic biomarkers to distinguish the probable AD group from the healthy group. In contrast to the healthy group, we found that the probable AD patients had significantly less usage of syntactic components and significantly higher usage of lexical components in their language. Also, we observed a significant difference in the use of n-grams as the healthy group were able to identify and make sense of more objects in their n-grams than the probable AD group. As such, our best diagnostic model significantly distinguished the probable AD group from the healthy elderly group with a better Area Under the Receiving Operating Characteristics Curve (AUC) using the Support Vector Machines (SVM). CONCLUSIONS: Experimental and statistical evaluations suggest that using ML algorithms for learning linguistic biomarkers from the verbal utterances of elderly individuals could help the clinical diagnosis of probable AD. We emphasise that the best ML model for predicting the disease group combines significant syntactic, lexical and top n-gram features. However, there is a need to train the diagnostic models on larger datasets, which could lead to a better AUC and clinical diagnosis of probable AD.
URL:
Early dementia diagnosis, MCI-to-dementia risk prediction, and the role of machine learning methods for feature extraction from integrated biomarkers, in particular for EEG signal analysis.
INTRODUCTION: Dementia in its various forms represents one of the most frightening emergencies for the aging population. Cognitive decline-including Alzheimer’s disease (AD) dementia-does not develop in few days; disease mechanisms act progressively for several years before clinical evidence. METHODS: A preclinical stage, characterized by measurable cognitive impairment, but not overt dementia, is represented by mild cognitive impairment (MCI), which progresses to-or, more accurately, is already in a prodromal form of-AD in about half cases; people with MCI are therefore considered the population at risk for AD deserving special attention for validating screening methods. RESULTS: Graph analysis tools, combined with machine learning methods, represent an interesting probe to identify the distinctive features of physiological/pathological brain aging focusing on functional connectivity networks evaluated on electroencephalographic data and neuropsychological/imaging/genetic/metabolic/cerebrospinal fluid/blood biomarkers. DISCUSSION: On clinical data, this innovative approach for early diagnosis might provide more insight into pathophysiological processes underlying degenerative changes, as well as toward a personalized risk evaluation for pharmacological, nonpharmacological, and rehabilitation treatments.
URL:
Predicting dysfunctional age-related task activations from resting-state network alterations.
Alzheimer’s disease (AD) is linked to changes in fMRI task activations and fMRI resting-state functional connectivity (restFC), which can emerge early in the illness timecourse. These fMRI correlates of unhealthy aging have been studied in largely separate subfields. Taking inspiration from neural network simulations, we propose a unifying mechanism wherein restFC alterations associated with AD disrupt the flow of activations between brain regions, leading to aberrant task activations. We apply this activity flow model in a large sample of clinically normal older adults, which was segregated into healthy (low-risk) and at-risk subgroups based on established imaging (positron emission tomography amyloid) and genetic (apolipoprotein) AD risk factors. Modeling the flow of healthy activations over at-risk AD connectivity effectively transformed the healthy aged activations into unhealthy (at-risk) aged activations. This enabled reliable prediction of at-risk AD task activations, and these predicted activations were related to individual differences in task behavior. These results support activity flow over altered intrinsic functional connections as a mechanism underlying Alzheimer’s-related dysfunction, even in very early stages of the illness. Beyond these mechanistic insights, this approach raises clinical potential by enabling prediction of task activations and associated cognitive dysfunction in individuals without requiring them to perform in-scanner cognitive tasks.
URL:
Evaluating the reliability of neurocognitive biomarkers of neurodegenerative diseases across countries: A machine learning approach.
Accurate early diagnosis of neurodegenerative diseases represents a growing challenge for current clinical practice. Promisingly, current tools can be complemented by computational decision-support methods to objectively analyze multidimensional measures and increase diagnostic confidence. Yet, widespread application of these tools cannot be recommended unless they are proven to perform consistently and reproducibly across samples from different countries. We implemented machine-learning algorithms to evaluate the prediction power of neurocognitive biomarkers (behavioral and imaging measures) for classifying two neurodegenerative conditions -Alzheimer Disease (AD) and behavioral variant frontotemporal dementia (bvFTD)- across three different countries (>200 participants). We use machine-learning tools integrating multimodal measures such as cognitive scores (executive functions and cognitive screening) and brain atrophy volume (voxel based morphometry from fronto-temporo-insular regions in bvFTD, and temporo-parietal regions in AD) to identify the most relevant features in predicting the incidence of the diseases. In the Country-1 cohort, predictions of AD and bvFTD became maximally improved upon inclusion of cognitive screenings outcomes combined with atrophy levels. Multimodal training data from this cohort allowed predicting both AD and bvFTD in the other two novel datasets from other countries with high accuracy (>90%), demonstrating the robustness of the approach as well as the differential specificity and reliability of behavioral and neural markers for each condition. In sum, this is the first study, across centers and countries, to validate the predictive power of cognitive signatures combined with atrophy levels for contrastive neurodegenerative conditions, validating a benchmark for future assessments of reliability and reproducibility.
URL:
Late combination shows that MEG adds to MRI in classifying MCI versus controls.
Early detection of Alzheimer’s disease (AD) is essential for developing effective treatments. Neuroimaging techniques like Magnetic Resonance Imaging (MRI) have the potential to detect brain changes before symptoms emerge. Structural MRI can detect atrophy related to AD, but it is possible that functional changes are observed even earlier. We therefore examined the potential of Magnetoencephalography (MEG) to detect differences in functional brain activity in people with Mild Cognitive Impairment (MCI) - a state at risk of early AD. We introduce a framework for multimodal combination to ask whether MEG data from a resting-state provides complementary information beyond structural MRI data in the classification of MCI versus controls. More specifically, we used multi-kernel learning of support vector machines to classify 163 MCI cases versus 144 healthy elderly controls from the BioFIND dataset. When using the covariance of planar gradiometer data in the low Gamma range (30-48 Hz), we found that adding a MEG kernel improved classification accuracy above kernels that captured several potential confounds (e.g., age, education, time-of-day, head motion). However, accuracy using MEG alone (68%) was worse than MRI alone (71%). When simply concatenating (normalized) features from MEG and MRI into one kernel (Early combination), there was no advantage of combining MEG with MRI versus MRI alone. When combining kernels of modality-specific features (Intermediate combination), there was an improvement in multimodal classification to 74%. The biggest multimodal improvement however occurred when we combined kernels from the predictions of modality-specific classifiers (Late combination), which achieved 77% accuracy (a reliable improvement in terms of permutation testing). We also explored other MEG features, such as the variance versus covariance of magnetometer versus planar gradiometer data within each of 6 frequency bands (delta, theta, alpha, beta, low gamma, or high gamma), and found that they generally provided complementary information for classification above MRI. We conclude that MEG can improve on the MRI-based classification of MCI.
URL:
Natural language processing-based classification of early Alzheimer’s disease from connected speech.
INTRODUCTION: The automated analysis of connected speech using natural language processing (NLP) emerges as a possible biomarker for Alzheimer’s disease (AD). However, it remains unclear which types of connected speech are most sensitive and specific for the detection of AD. METHODS: We applied a language model to automatically transcribed connected speech from 114 Flemish-speaking individuals to first distinguish early AD patients from amyloid negative cognitively unimpaired (CU) and then amyloid negative from amyloid positive CU individuals using five different types of connected speech. RESULTS: The language model was able to distinguish between amyloid negative CU subjects and AD patients with up to 81.9% sensitivity and 81.8% specificity. Discrimination between amyloid positive and negative CU individuals was less accurate, with up to 82.7% sensitivity and 74.0% specificity. Moreover, autobiographical interviews consistently outperformed scene descriptions. DISCUSSION: Our findings highlight the value of autobiographical interviews for the automated analysis of connecting speech. HIGHLIGHTS: This study compared five types of connected speech for the detection of early Alzheimer’s disease (AD). Autobiographical interviews yielded a higher specificity than scene descriptions. A preceding clinical AD classification task can refine the performance of amyloid status classification in cognitively healthy individuals.
URL:
Embracing the disharmony in medical imaging: A Simple and effective framework for domain adaptation.
Domain shift, the mismatch between training and testing data characteristics, causes significant degradation in the predictive performance in multi-source imaging scenarios. In medical imaging, the heterogeneity of population, scanners and acquisition protocols at different sites presents a significant domain shift challenge and has limited the widespread clinical adoption of machine learning models. Harmonization methods, which aim to learn a representation of data invariant to these differences are the prevalent tools to address domain shift, but they typically result in degradation of predictive accuracy. This paper takes a different perspective of the problem: we embrace this disharmony in data and design a simple but effective framework for tackling domain shift. The key idea, based on our theoretical arguments, is to build a pretrained classifier on the source data and adapt this model to new data. The classifier can be fine-tuned for intra-study domain adaptation. We can also tackle situations where we do not have access to ground-truth labels on target data; we show how one can use auxiliary tasks for adaptation; these tasks employ covariates such as age, gender and race which are easy to obtain but nevertheless correlated to the main task. We demonstrate substantial improvements in both intra-study domain adaptation and inter-study domain generalization on large-scale real-world 3D brain MRI datasets for classifying Alzheimer’s disease and schizophrenia.
URL:
rPOP: Robust PET-only processing of community acquired heterogeneous amyloid-PET data.
The reference standard for amyloid-PET quantification requires structural MRI (sMRI) for preprocessing in both multi-site research studies and clinical trials. Here we describe rPOP (robust PET-Only Processing), a MATLAB-based MRI-free pipeline implementing non-linear warping and differential smoothing of amyloid-PET scans performed with any of the FDA-approved radiotracers (18F-florbetapir/FBP, 18F-florbetaben/FBB or 18F-flutemetamol/FLUTE). Each image undergoes spatial normalization based on weighted PET templates and data-driven differential smoothing, then allowing users to perform their quantification of choice. Prior to normalization, users can choose whether to automatically reset the origin of the image to the center of mass or proceed with the pipeline with the image as it is. We validate rPOP with n = 740 (514 FBP, 182 FBB, 44 FLUTE) amyloid-PET scans from the Imaging Dementia-Evidence for Amyloid Scanning - Brain Health Registry sub-study (IDEAS-BHR) and n = 1,518 scans from the Alzheimer’s Disease Neuroimaging Initiative (n = 1,249 FBP, n = 269 FBB), including heterogeneous acquisition and reconstruction protocols. After running rPOP, a standard quantification to extract Standardized Uptake Value ratios and the respective Centiloids conversion was performed. rPOP-based amyloid status (using an independent pathology-based threshold of >=24.4 Centiloid units) was compared with either local visual reads (IDEAS-BHR, n = 663 with complete valid data and reads available) or with amyloid status derived from an MRI-based PET processing pipeline (ADNI, thresholds of >20/>18 Centiloids for FBP/FBB). Finally, within the ADNI dataset, we tested the linear associations between rPOP- and MRI-based Centiloid values. rPOP achieved accurate warping for N = 2,233/2,258 (98.9%) in the first pass. Of the N = 25 warping failures, 24 were rescued with manual reorientation and origin reset prior to warping. We observed high concordance between rPOP-based amyloid status and both visual reads (IDEAS-BHR, Cohen’s k = 0.72 [0.7-0.74], ~86% concordance) or MRI-pipeline based amyloid status (ADNI, k = 0.88 [0.87-0.89], ~94% concordance). rPOP- and MRI-pipeline based Centiloids were strongly linearly related (R2:0.95, p<0.001), with this association being significantly modulated by estimated PET resolution (beta= -0.016, p<0.001). rPOP provides reliable MRI-free amyloid-PET warping and quantification, leveraging widely available software and only requiring an attenuation-corrected amyloid-PET image as input. The rPOP pipeline enables the comparison and merging of heterogeneous datasets and is publicly available at https://github.com/leoiacca/rPOP.
URL: https://github.com/leoiacca/rPOP.
Impaired time-distance reconfiguration patterns in Alzheimer’s disease: a dynamic functional connectivity study with 809 individuals from 7 sites.
BACKGROUND: The dynamic functional connectivity (dFC) has been used successfully to investigate the dysfunction of Alzheimer’s disease (AD) patients. The reconfiguration intensity of nodal dFC, which means the degree of alteration between FCs at different time scales, could provide additional information for understanding the reconfiguration of brain connectivity. RESULTS: In this paper, we introduced a feature named time distance nodal connectivity diversity (tdNCD), and then evaluated the network reconfiguration intensity in every specific brain region in AD using a large multicenter dataset (N = 809 from 7 independent sites). Our results showed that the dysfunction involved in three subnetworks in AD, including the default mode network (DMN), the subcortical network (SCN), and the cerebellum network (CBN). The nodal tdNCD inside the DMN increased in AD compared to normal controls, and the nodal dynamic FC of the SCN and the CBN decreased in AD. Additionally, the classification analysis showed that the classification performance was better when combined tdNCD and FC to classify AD from normal control (ACC = 81%, SEN = 83.4%, SPE = 80.6%, and F1-score = 79.4%) than that only using FC (ACC = 78.2%, SEN = 76.2%, SPE = 76.5%, and F1-score = 77.5%) with a leave-one-site-out cross-validation. Besides, the performance of the three classes classification was improved from 50% (only using FC) to 53.3% (combined FC and tdNCD) (macro F1-score accuracy from 46.8 to 48.9%). More importantly, the classification model showed significant clinically predictive correlations (two classes classification: R = -0.38, P < 0.001; three classes classification: R = -0.404, P < 0.001). More importantly, several commonly used machine learning models confirmed that the tdNCD would provide additional information for classifying AD from normal controls. CONCLUSIONS: The present study demonstrated dynamic reconfiguration of nodal FC abnormities in AD. The tdNCD highlights the potential for further understanding core mechanisms of brain dysfunction in AD. Evaluating the tdNCD FC provides a promising way to understand AD processes better and investigate novel diagnostic brain imaging biomarkers for AD.
URL:
Binary classification of 18F-flutemetamol PET using machine learning: comparison with visual reads and structural MRI.
(18)F-flutemetamol is a positron emission tomography (PET) tracer for in vivo amyloid imaging. The ability to classify amyloid scans in a binary manner as ‘normal’ versus ‘Alzheimer-like’, is of high clinical relevance. We evaluated whether a supervised machine learning technique, support vector machines (SVM), can replicate the assignments made by visual readers blind to the clinical diagnosis, which image components have highest diagnostic value according to SVM and how (18)F-flutemetamol-based classification using SVM relates to structural MRI-based classification using SVM within the same subjects. By means of SVM with a linear kernel, we analyzed (18)F-flutemetamol scans and volumetric MRI scans from 72 cases from the (18)F-flutemetamol phase 2 study (27 clinically probable Alzheimer’s disease (AD), 20 amnestic mild cognitive impairment (MCI), 25 controls). In a leave-one-out approach, we trained the (18)F-flutemetamol based classifier by means of the visual reads and tested whether the classifier was able to reproduce the assignment based on visual reads and which voxels had the highest feature weights. The (18)F-flutemetamol based classifier was able to replicate the assignments obtained by visual reads with 100% accuracy. The voxels with highest feature weights were in the striatum, precuneus, cingulate and middle frontal gyrus. Second, to determine concordance between the gray matter volume- and the (18)F-flutemetamol-based classification, we trained the classifier with the clinical diagnosis as gold standard. Overall sensitivity of the (18)F-flutemetamol- and the gray matter volume-based classifiers were identical (85.2%), albeit with discordant classification in three cases. Specificity of the (18)F-flutemetamol based classifier was 92% compared to 68% for MRI. In the MCI group, the (18)F-flutemetamol based classifier distinguished more reliably between converters and non-converters than the gray matter-based classifier. The visual read-based binary classification of (18)F-flutemetamol scans can be replicated using SVM. In this sample the specificity of (18)F-flutemetamol based SVM for distinguishing AD from controls is higher than that of gray matter volume-based SVM.
URL:
Deep learning-based brain age prediction in normal aging and dementia.
Brain aging is accompanied by patterns of functional and structural change. Alzheimer’s disease (AD), a representative neurodegenerative disease, has been linked to accelerated brain aging. Here, we developed a deep learning-based brain age prediction model using a large collection of fluorodeoxyglucose positron emission tomography and structural magnetic resonance imaging and tested how the brain age gap relates to degenerative syndromes including mild cognitive impairment, AD, frontotemporal dementia and Lewy body dementia. Occlusion analysis, performed to facilitate the interpretation of the model, revealed that the model learns an age- and modality-specific pattern of brain aging. The elevated brain age gap was highly correlated with cognitive impairment and the AD biomarker. The higher gap also showed a longitudinal predictive nature across clinical categories, including cognitively unimpaired individuals who converted to a clinical stage. However, regions generating brain age gaps were different for each diagnostic group of which the AD continuum showed similar patterns to normal aging.
URL:
Amelioration of Alzheimer’s disease pathology by mitophagy inducers identified via machine learning and a cross-species workflow.
A reduced removal of dysfunctional mitochondria is common to aging and age-related neurodegenerative pathologies such as Alzheimer’s disease (AD). Strategies for treating such impaired mitophagy would benefit from the identification of mitophagy modulators. Here we report the combined use of unsupervised machine learning (involving vector representations of molecular structures, pharmacophore fingerprinting and conformer fingerprinting) and a cross-species approach for the screening and experimental validation of new mitophagy-inducing compounds. From a library of naturally occurring compounds, the workflow allowed us to identify 18 small molecules, and among them two potent mitophagy inducers (Kaempferol and Rhapontigenin). In nematode and rodent models of AD, we show that both mitophagy inducers increased the survival and functionality of glutamatergic and cholinergic neurons, abrogated amyloid-beta and tau pathologies, and improved the animals’ memory. Our findings suggest the existence of a conserved mechanism of memory loss across the AD models, this mechanism being mediated by defective mitophagy. The computational-experimental screening and validation workflow might help uncover potent mitophagy modulators that stimulate neuronal health and brain homeostasis.
URL:
Predicting Amyloid Positivity in Cognitively Unimpaired Older Adults: A Machine Learning Approach Using A4 Data.
BACKGROUND AND OBJECTIVES: To develop and test the performance of the Positive Abeta Risk Score (PARS) for prediction of beta-amyloid (Abeta) positivity in cognitively unimpaired individuals for use in clinical research. Detecting Abeta positivity is essential for identifying at-risk individuals who are candidates for early intervention with amyloid targeted treatments. METHODS: We used data from 4,134 cognitively normal individuals from the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s (A4) Study. The sample was divided into training and test sets. A modified version of AutoScore, a machine learning-based software tool, was used to develop a scoring system using the training set. Three risk scores were developed using candidate predictors in various combinations from the following categories: demographics (age, sex, education, race, family history, body mass index, marital status, and ethnicity), subjective measures (Alzheimer’s Disease Cooperative Study Activities of Daily Living-Prevention Instrument, Geriatric Depression Scale, and Memory Complaint Questionnaire), objective measures (free recall, Mini-Mental State Examination, immediate recall, digit symbol substitution, and delayed logical memory scores), and APOE4 status. Performance of the risk scores was evaluated in the independent test set. RESULTS: PARS model 1 included age, body mass index (BMI), and family history and had an area under the curve (AUC) of 0.60 (95% CI 0.57-0.64). PARS model 2 included free recall in addition to the PARS model 1 variables and had an AUC of 0.61 (0.58-0.64). PARS model 3, which consisted of age, BMI, and APOE4 information, had an AUC of 0.73 (0.70-0.76). PARS model 3 showed the highest, but still moderate, performance metrics in comparison with other models with sensitivity of 72.0% (67.6%-76.4%), specificity of 62.1% (58.8%-65.4%), accuracy of 65.3% (62.7%-68.0%), and positive predictive value of 48.1% (44.1%-52.1%). DISCUSSION: PARS models are a set of simple and practical risk scores that may improve our ability to identify individuals more likely to be amyloid positive. The models can potentially be used to enrich trials and serve as a screening step in research settings. This approach can be followed by the use of additional variables for the development of improved risk scores. CLASSIFICATION OF EVIDENCE: This study provides Class II evidence that in cognitively unimpaired individuals PARS models predict Abeta positivity with moderate accuracy.
URL:
Identification of expression patterns in the progression of disease stages by integration of transcriptomic data.
BACKGROUND: In the study of complex diseases using genome-wide expression data from clinical samples, a difficult case is the identification and mapping of the gene signatures associated to the stages that occur in the progression of a disease. The stages usually correspond to different subtypes or classes of the disease, and the difficulty to identify them often comes from patient heterogeneity and sample variability that can hide the biomedical relevant changes that characterize each stage, making standard differential analysis inadequate or inefficient. RESULTS: We propose a methodology to study diseases or disease stages ordered in a sequential manner (e.g. from early stages with good prognosis to more acute or serious stages associated to poor prognosis). The methodology is applied to diseases that have been studied obtaining genome-wide expression profiling of cohorts of patients at different stages. The approach allows searching for consistent expression patterns along the progression of the disease through two major steps: (i) identifying genes with increasing or decreasing trends in the progression of the disease; (ii) clustering the increasing/decreasing gene expression patterns using an unsupervised approach to reveal whether there are consistent patterns and find genes altered at specific disease stages. The first step is carried out using Gamma rank correlation to identify genes whose expression correlates with a categorical variable that represents the stages of the disease. The second step is done using a Self Organizing Map (SOM) to cluster the genes according to their progressive profiles and identify specific patterns. Both steps are done after normalization of the genomic data to allow the integration of multiple independent datasets. In order to validate the results and evaluate their consistency and biological relevance, the methodology is applied to datasets of three different diseases: myelodysplastic syndrome, colorectal cancer and Alzheimer’s disease. A software script written in R, named genediseasePatterns, is provided to allow the use and application of the methodology. CONCLUSION: The method presented allows the analysis of the progression of complex and heterogeneous diseases that can be divided in pathological stages. It identifies gene groups whose expression patterns change along the advance of the disease, and it can be applied to different types of genomic data studying cohorts of patients in different states.
URL:
A computational neurodegenerative disease progression score: method and results with the Alzheimer’s disease Neuroimaging Initiative cohort.
While neurodegenerative diseases are characterized by steady degeneration over relatively long timelines, it is widely believed that the early stages are the most promising for therapeutic intervention, before irreversible neuronal loss occurs. Developing a therapeutic response requires a precise measure of disease progression. However, since the early stages are for the most part asymptomatic, obtaining accurate measures of disease progression is difficult. Longitudinal databases of hundreds of subjects observed during several years with tens of validated biomarkers are becoming available, allowing the use of computational methods. We propose a widely applicable statistical methodology for creating a disease progression score (DPS), using multiple biomarkers, for subjects with a neurodegenerative disease. The proposed methodology was evaluated for Alzheimer’s disease (AD) using the publicly available AD Neuroimaging Initiative (ADNI) database, yielding an Alzheimer’s DPS or ADPS score for each subject and each time-point in the database. In addition, a common description of biomarker changes was produced allowing for an ordering of the biomarkers. The Rey Auditory Verbal Learning Test delayed recall was found to be the earliest biomarker to become abnormal. The group of biomarkers comprising the volume of the hippocampus and the protein concentration amyloid beta and Tau were next in the timeline, and these were followed by three cognitive biomarkers. The proposed methodology thus has potential to stage individuals according to their state of disease progression relative to a population and to deduce common behaviors of biomarkers in the disease itself.
URL:
Exploring the impact of APOE e4 on functional connectivity in Alzheimer’s disease across cognitive impairment levels.
The apolipoprotein E (APOE) e4 allele is a recognized genetic risk factor for Alzheimer’s Disease (AD). Studies have shown that APOE e4 mediates the modulation of intrinsic functional brain networks in cognitively normal individuals and significantly disrupts the whole-brain topological structure in AD patients. However, how APOE e4 regulates brain functional connectivity (FC) and consequently affects the levels of cognitive impairment in AD patients remains unknown. In this study, we systematically analyzed functional magnetic resonance imaging (fMRI) data from two distinct cohorts: an In-house dataset includes 59 AD patients (73.37+-6.42 years), and the ADNI dataset includes 117 AD patients (74.91+-7.91 years). Experimental comparisons were conducted by grouping AD patients based on both APOE e4 status and cognitive impairment levels of AD. Network-Based Statistic (NBS) method and the Graph Neural Network Explainer (GNN-Explainer) were combined to identify significant FC changes across different comparisons. Importantly, the GNN-Explainer method was introduced as an enhancement over the NBS method to better model complex high-order nonlinear characteristics for discovering FC features that significantly contribute to classification tasks. The results showed that APOE e4 primarily influenced temporal lobe FCs, while it influenced different cognitive impairment levels of AD by adjusting prefrontal-parietal FCs. These findings were validated by p-values < 0.05 from NBS method, and 5-fold cross-validation along with ablation studies from the GNN-Explainer method. In conclusion, our findings provide new insights into the role of APOE e4 in altering FC dynamics during the progression of AD, highlighting potential targets for early intervention.
URL:
Tissue-based Alzheimer gene expression markers-comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets.
BACKGROUND: Alzheimer’s disease has been known for more than 100 years and the underlying molecular mechanisms are not yet completely understood. The identification of genes involved in the processes in Alzheimer affected brain is an important step towards such an understanding. Genes differentially expressed in diseased and healthy brains are promising candidates. RESULTS: Based on microarray data we identify potential biomarkers as well as biomarker combinations using three feature selection methods: information gain, mean decrease accuracy of random forest and a wrapper of genetic algorithm and support vector machine (GA/SVM). Information gain and random forest are two commonly used methods. We compare their output to the results obtained from GA/SVM. GA/SVM is rarely used for the analysis of microarray data, but it is able to identify genes capable of classifying tissues into different classes at least as well as the two reference methods. CONCLUSION: Compared to the other methods, GA/SVM has the advantage of finding small, less redundant sets of genes that, in combination, show superior classification characteristics. The biological significance of the genes and gene pairs is discussed.
URL:
Self-supervised learning for accurately modelling hierarchical evolutionary patterns of cerebrovasculature.
Cerebrovascular abnormalities are critical indicators of stroke and neurodegenerative diseases like Alzheimer’s disease (AD). Understanding the normal evolution of brain vessels is essential for detecting early deviations and enabling timely interventions. Here, for the first time, we proposed a pipeline exploring the joint evolution of cortical volumes (CVs) and arterial volumes (AVs) in a large cohort of 2841 individuals. Using advanced deep learning for vessel segmentation, we built normative models of CVs and AVs across spatially hierarchical brain regions. We found that while AVs generally decline with age, distinct trends appear in regions like the circle of Willis. Comparing healthy individuals with those affected by AD or stroke, we identified significant reductions in both CVs and AVs, wherein patients with AD showing the most severe impact. Our findings reveal gender-specific effects and provide critical insights into how these conditions alter brain structure, potentially guiding future clinical assessments and interventions.
URL:
Predicting conversion in cognitively normal and mild cognitive impairment individuals with machine learning: Is the CSF status still relevant?
INTRODUCTION: Machine learning (ML) helps diagnose the mild cognitive impairment-Alzheimer’s disease (MCI-AD) spectrum. However, ML is fed with data unavailable in standard clinical practice. Thus, we tested a novel multi-step ML approach to predict cognitive worsening. METHODS: We selected cognitively normal and MCI participants from the Alzheimer’s Disease Neuroimaging Initiative dataset and categorized them on total tau/amyloid beta 1-42 ratios. ML was applied to predict the 3-year conversion with standard clinical data (SCD), assess the model’s accuracy, and identify the role of cerebrospinal fluid (CSF) biomarkers in this approach. Shapley Additive Explanations (SHAP) analysis was carried out to explore the automated decisional process. RESULTS: The model achieved 84% accuracy across the entire cohort, 86% in patients with negative CSF, and 88% in individuals with AD-like CSF. SHAP analysis identified differences between CSF-positive and -negative patients in predictors of conversion and cut-offs. CONCLUSIONS: The approach yielded good prediction accuracy using SCD. However, CSF-based categorizations are needed to improve predictive accuracy. HIGHLIGHTS: Machine learning algorithms can predict cognitive decline with standard and routinely used clinical data. Classification according to cerebrospinal fluid biomarkers enhances prediction accuracy. Different cut-offs could be applied to neuropsychological tests to predict conversion.
URL:
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach.
MOTIVATION: Automated extraction of participants, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation. RESULTS: We propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmoddataset, a randomly selected and reannotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 COVID-19 RCT abstracts, and a dataset of 150 Alzheimer’s disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level. AVAILABILITY: Our codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICO.
Data-driven decomposition and staging of flortaucipir uptake in Alzheimer’s disease.
INTRODUCTION: Previous approaches pursuing in vivo staging of tau pathology in Alzheimer’s disease (AD) have typically relied on neuropathologically defined criteria. In using predefined systems, these studies may miss spatial deposition patterns which are informative of disease progression. METHODS: We selected discovery (n = 418) and replication (n = 132) cohorts with flortaucipir imaging. Non-negative matrix factorization (NMF) was applied to learn tau covariance patterns and develop a tau staging system. Flortaucipir components were also validated by comparison with amyloid burden, gray matter loss, and the expression of AD-related genes. RESULTS: We found eight flortaucipir covariance patterns which were reproducible and overlapped with relevant gene expression maps. Tau stages were associated with AD severity as indexed by dementia status and neuropsychological performance. Comparisons of flortaucipir uptake with amyloid and atrophy also supported our model of tau progression. DISCUSSION: Data-driven decomposition of flortaucipir uptake provides a novel framework for tau staging which complements existing systems. HIGHLIGHTS: NMF reveals patterns of tau deposition in AD. Data-driven staging of flortaucipir tracks AD severity. Learned flortaucipir patterns overlap with AD-related gene expression.
URL:
High-dimensional generalized propensity score with application to omics data.
Propensity score (PS) methods are popular when estimating causal effects in non-randomized studies. Drawing causal conclusion relies on the unconfoundedness assumption. This assumption is untestable and is considered more plausible if a large number of pre-treatment covariates are included in the analysis. However, previous studies have shown that including unnecessary covariates into PS models can lead to bias and efficiency loss. With the ever-increasing amounts of available data, such as the omics data, there is often little prior knowledge of the exact set of important covariates. Therefore, variable selection for causal inference in high-dimensional settings has received considerable attention in recent years. However, recent studies have focused mainly on binary treatments. In this study, we considered continuous treatments and proposed the generalized outcome-adaptive LASSO (GOAL) to select covariates that can provide an unbiased and statistically efficient estimation. Simulation studies showed that when the outcome model was linear, the GOAL selected almost all true confounders and predictors of outcome and excluded other covariates. The accuracy and precision of the estimates were close to ideal. Furthermore, the GOAL is robust to model misspecification. We applied the GOAL to seven DNA methylation datasets from the Gene Expression Omnibus database, which covered four brain regions, to estimate the causal effects of epigenetic aging acceleration on the incidence of Alzheimer’s disease.
URL:
A model of brain morphological changes related to aging and Alzheimer’s disease from cross-sectional assessments.
In this study we propose a deformation-based framework to jointly model the influence of aging and Alzheimer’s disease (AD) on the brain morphological evolution. Our approach combines a spatio-temporal description of both processes into a generative model. A reference morphology is deformed along specific trajectories to match subject specific morphologies. It is used to define two imaging progression markers: 1) a morphological age and 2) a disease score. These markers can be computed regionally in any brain region. The approach is evaluated on brain structural magnetic resonance images (MRI) from the ADNI database. The model is first estimated on a control population using longitudinal data, then, for each testing subject, the markers are computed cross-sectionally for each acquisition. The longitudinal evolution of these markers is then studied in relation with the clinical diagnosis of the subjects and used to generate possible morphological evolutions. In the model, the morphological changes associated with normal aging are mainly found around the ventricles, while the Alzheimer’s disease specific changes are located in the temporal lobe and the hippocampal area. The statistical analysis of these markers highlights differences between clinical conditions even though the inter-subject variability is quite high. The model is also generative since it can be used to simulate plausible morphological trajectories associated with the disease. Our method quantifies two interpretable scalar imaging biomarkers assessing respectively the effects of aging and disease on brain morphology, at the individual and population level. These markers confirm the presence of an accelerated apparent aging component in Alzheimer’s patients but they also highlight specific morphological changes that can help discriminate clinical conditions even in prodromal stages. More generally, the joint modeling of normal and pathological evolutions shows promising results to describe age-related brain diseases over long time scales.
URL:
Prediction of neuropathologic lesions from clinical data.
INTRODUCTION: Post-mortem analysis provides definitive diagnoses of neurodegenerative diseases; however, only a few can be diagnosed during life. METHODS: This study employed statistical tools and machine learning to predict 17 neuropathologic lesions from a cohort of 6518 individuals using 381 clinical features (Table S1). The multisite data allowed validation of the model’s robustness by splitting train/test sets by clinical sites. A similar study was performed for predicting Alzheimer’s disease (AD) neuropathologic change without specific comorbidities. RESULTS: Prediction results show high performance for certain lesions that match or exceed that of research annotation. Neurodegenerative comorbidities in addition to AD neuropathologic change resulted in compounded, but disproportionate, effects across cognitive domains as the comorbidity number increased. DISCUSSION: Certain clinical features could be strongly associated with multiple neurodegenerative diseases, others were lesion-specific, and some were divergent between lesions. Our approach could benefit clinical research, and genetic and biomarker research by enriching cohorts for desired lesions.
URL:
DEMA: a distance-bounded energy-field minimization algorithm to model and layout biomolecular networks with quantitative features.
SUMMARY: In biology, graph layout algorithms can reveal comprehensive biological contexts by visually positioning graph nodes in their relevant neighborhoods. A layout software algorithm/engine commonly takes a set of nodes and edges and produces layout coordinates of nodes according to edge constraints. However, current layout engines normally do not consider node, edge or node-set properties during layout and only curate these properties after the layout is created. Here, we propose a new layout algorithm, distance-bounded energy-field minimization algorithm (DEMA), to natively consider various biological factors, i.e., the strength of gene-to-gene association, the gene’s relative contribution weight and the functional groups of genes, to enhance the interpretation of complex network graphs. In DEMA, we introduce a parameterized energy model where nodes are repelled by the network topology and attracted by a few biological factors, i.e., interaction coefficient, effect coefficient and fold change of gene expression. We generalize these factors as gene weights, protein-protein interaction weights, gene-to-gene correlations and the gene set annotations-four parameterized functional properties used in DEMA. Moreover, DEMA considers further attraction/repulsion/grouping coefficient to enable different preferences in generating network views. Applying DEMA, we performed two case studies using genetic data in autism spectrum disorder and Alzheimer’s disease, respectively, for gene candidate discovery. Furthermore, we implement our algorithm as a plugin to Cytoscape, an open-source software platform for visualizing networks; hence, it is convenient. Our software and demo can be freely accessed at http://discovery.informatics.uab.edu/dema. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: http://discovery.informatics.uab.edu/dema.
Antemortem differential diagnosis of dementia pathology using structural MRI: Differential-STAND.
The common neurodegenerative pathologies underlying dementia are Alzheimer’s disease (AD), Lewy body disease (LBD) and frontotemporal lobar degeneration (FTLD). Our aim was to identify patterns of atrophy unique to each of these diseases using antemortem structural MRI scans of pathologically confirmed dementia cases and build an MRI-based differential diagnosis system. Our approach of creating atrophy maps using structural MRI and applying them for classification of new incoming patients is labeled Differential-STAND (Differential Diagnosis Based on Structural Abnormality in Neurodegeneration). Pathologically confirmed subjects with a single dementing pathologic diagnosis who had an MRI at the time of clinical diagnosis of dementia were identified: 48 AD, 20 LBD, 47 FTLD-TDP (pathology-confirmed FTLD with TDP-43). Gray matter density in 91 regions-of-interest was measured in each subject and adjusted for head size and age using a database of 120 cognitively normal elderly. The atrophy patterns in each dementia type when compared to pathologically confirmed controls mirrored known disease-specific anatomic patterns: AD-temporoparietal association cortices and medial temporal lobe; FTLD-TDP-frontal and temporal lobes and LBD-bilateral amygdalae, dorsal midbrain and inferior temporal lobes. Differential-STAND based classification of each case was done based on a mixture model generated using bisecting k-means clustering of the information from the MRI scans. Leave-one-out classification showed reasonable performance compared to the autopsy gold standard and clinical diagnosis: AD (sensitivity: 90.7%; specificity: 84%), LBD (sensitivity: 78.6%; specificity: 98.8%) and FTLD-TDP (sensitivity: 84.4%; specificity: 93.8%). The proposed approach establishes a direct a priori relationship between specific topographic patterns on MRI and “gold standard” of pathology which can then be used to predict underlying dementia pathology in new incoming patients.
URL:
Complex networks reveal early MRI markers of Parkinson’s disease.
Parkinson’s disease (PD) is the most common neurological disorder, after Alzheimer’s disease, and is characterized by a long prodromal stage lasting up to 20 years. As age is a prominent factor risk for the disease, next years will see a continuous increment of PD patients, making urgent the development of efficient strategies for early diagnosis and treatments. We propose here a novel approach based on complex networks for accurate early diagnoses using magnetic resonance imaging (MRI) data; our approach also allows us to investigate which are the brain regions mostly affected by the disease. First of all, we define a network model of brain regions and associate to each region proper connectivity measures. Thus, each brain is represented through a feature vector encoding the local relationships brain regions interweave. Then, Random Forests are used for feature selection and learning a compact representation. Finally, we use a Support Vector Machine to combine complex network features with clinical scores typical of PD prodromal phase and provide a diagnostic index. We evaluated the classification performance on the Parkinson’s Progression Markers Initiative (PPMI) database, including a mixed cohort of 169 normal controls (NC) and 374 PD patients. Our model compares favorably with existing state-of-the-art MRI approaches. Besides, as a difference with previous approaches, our methodology ranks the brain regions according to disease effects without any a priori assumption.
URL:
A prediction model to calculate probability of Alzheimer’s disease using cerebrospinal fluid biomarkers.
BACKGROUND: We aimed to develop a prediction model based on cerebrospinal fluid (CSF) biomarkers, that would yield a single estimate representing the probability that dementia in a memory clinic patient is due to Alzheimer’s disease (AD). METHODS: All patients suspected of dementia in whom the CSF biomarkers had been analyzed were selected from a memory clinic database. Clinical diagnosis was AD (n = 272) or non-AD (n = 289). The prediction model was developed with logistic regression analysis and included CSF amyloid beta42, CSF phosphorylated tau181, and sex. Validation was performed on an independent data set from another memory clinic, containing 334 AD and 157 non-AD patients. RESULTS: The prediction model estimated the probability that AD is present as follows: p(AD) = 1/(1 + e (- [-0.3315 + score])), where score is calculated from -1.9486 x ln(amyloid beta42) + 2.7915 x ln(phosphorylated tau181) + 0.9178 x sex (male = 0, female = 1). When applied to the validation data set, the discriminative ability of the model was very good (area under the receiver operating characteristic curve: 0.85). The agreement between the probability of AD predicted by the model and the observed frequency of AD diagnoses was very good after taking into account the difference in AD prevalence between the two memory clinics. CONCLUSIONS: We developed a prediction model that can accurately predict the probability of AD in a memory clinic population suspected of dementia based on CSF amyloid beta42, CSF phosphorylated tau181, and sex.
URL:
Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.
This study establishes that sparse canonical correlation analysis (SCCAN) identifies generalizable, structural MRI-derived cortical networks that relate to five distinct categories of cognition. We obtain multivariate psychometrics from the domain-specific sub-scales of the Philadelphia Brief Assessment of Cognition (PBAC). By using a training and separate testing stage, we find that PBAC-defined cognitive domains of language, visuospatial functioning, episodic memory, executive control, and social functioning correlate with unique and distributed areas of gray matter (GM). In contrast, a parallel univariate framework fails to identify, from the training data, regions that are also significant in the left-out test dataset. The cohort includes164 patients with Alzheimer’s disease, behavioral-variant frontotemporal dementia, semantic variant primary progressive aphasia, non-fluent/agrammatic primary progressive aphasia, or corticobasal syndrome. The analysis is implemented with open-source software for which we provide examples in the text. In conclusion, we show that multivariate techniques identify biologically-plausible brain regions supporting specific cognitive domains. The findings are identified in training data and confirmed in test data.
URL:
Automated deep learning segmentation of high-resolution 7 Tesla postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseases.
Postmortem MRI allows brain anatomy to be examined at high resolution and to link pathology measures with morphometric measurements. However, automated segmentation methods for brain mapping in postmortem MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high-resolution dataset of 135 postmortem human brain tissue specimens imaged at 0.3 mm3 isotropic using a T2w sequence on a 7T whole-body MRI scanner. We developed a deep learning pipeline to segment the cortical mantle by benchmarking the performance of nine deep neural architectures, followed by post-hoc topological correction. We evaluate the reliability of this pipeline via overlap metrics with manual segmentation in 6 specimens, and intra-class correlation between cortical thickness measures extracted from the automatic segmentation and expert-generated reference measures in 36 specimens. We also segment four subcortical structures (caudate, putamen, globus pallidus, and thalamus), white matter hyperintensities, and the normal appearing white matter, providing a limited evaluation of accuracy. We show generalizing capabilities across whole-brain hemispheres in different specimens, and also on unseen images acquired at 0.28 mm3 and 0.16 mm3 isotropic T2*w fast low angle shot (FLASH) sequence at 7T. We report associations between localized cortical thickness and volumetric measurements across key regions, and semi-quantitative neuropathological ratings in a subset of 82 individuals with Alzheimer’s disease (AD) continuum diagnoses. Our code, Jupyter notebooks, and the containerized executables are publicly available at the project webpage (https://pulkit-khandelwal.github.io/exvivo-brain-upenn/).
URL: https://pulkit-khandelwal.github.io/exvivo-brain-upenn/
Gray Matter Age Prediction as a Biomarker for Risk of Dementia.
The gap between predicted brain age using magnetic resonance imaging (MRI) and chronological age may serve as a biomarker for early-stage neurodegeneration. However, owing to the lack of large longitudinal studies, it has been challenging to validate this link. We aimed to investigate the utility of such a gap as a risk biomarker for incident dementia using a deep learning approach for predicting brain age based on MRI-derived gray matter (GM). We built a convolutional neural network (CNN) model to predict brain age trained on 3,688 dementia-free participants of the Rotterdam Study (mean age 66 +- 11 y, 55% women). Logistic regressions and Cox proportional hazards were used to assess the association of the age gap with incident dementia, adjusted for age, sex, intracranial volume, GM volume, hippocampal volume, white matter hyperintensities, years of education, and APOE epsilon4 allele carriership. Additionally, we computed the attention maps, which shows which regions are important for age prediction. Logistic regression and Cox proportional hazard models showed that the age gap was significantly related to incident dementia (odds ratio [OR] = 1.11 and 95% confidence intervals [CI] = 1.05-1.16; hazard ratio [HR] = 1.11, and 95% CI = 1.06-1.15, respectively). Attention maps indicated that GM density around the amygdala and hippocampi primarily drove the age estimation. We showed that the gap between predicted and chronological brain age is a biomarker, complimentary to those that are known, associated with risk of dementia, and could possibly be used for early-stage dementia risk screening.
URL:
SynthSR: A public AI tool to turn heterogeneous clinical brain scans into high-resolution T1-weighted images for 3D morphometry.
Every year, millions of brain magnetic resonance imaging (MRI) scans are acquired in hospitals across the world. These have the potential to revolutionize our understanding of many neurological diseases, but their morphometric analysis has not yet been possible due to their anisotropic resolution. We present an artificial intelligence technique, “SynthSR,” that takes clinical brain MRI scans with any MR contrast (T1, T2, etc.), orientation (axial/coronal/sagittal), and resolution and turns them into high-resolution T1 scans that are usable by virtually all existing human neuroimaging tools. We present results on segmentation, registration, and atlasing of >10,000 scans of controls and patients with brain tumors, strokes, and Alzheimer’s disease. SynthSR yields morphometric results that are very highly correlated with what one would have obtained with high-resolution T1 scans. SynthSR allows sample sizes that have the potential to overcome the power limitations of prospective research studies and shed new light on the healthy and diseased human brain.
URL:
Quantifying uncertainty in brain-predicted age using scalar-on-image quantile regression.
Prediction of subject age from brain anatomical MRI has the potential to provide a sensitive summary of brain changes, indicative of different neurodegenerative diseases. However, existing studies typically neglect the uncertainty of these predictions. In this work we take into account this uncertainty by applying methods of functional data analysis. We propose a penalised functional quantile regression model of age on brain structure with cognitively normal (CN) subjects in the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and use it to predict brain age in Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD) subjects. Unlike the machine learning approaches available in the literature of brain age prediction, which provide only point predictions, the outcome of our model is a prediction interval for each subject.
URL:
Performance of Machine Learning Algorithms for Predicting Progression to Dementia in Memory Clinic Patients.
Importance: Machine learning algorithms could be used as the basis for clinical decision-making aids to enhance clinical practice. Objective: To assess the ability of machine learning algorithms to predict dementia incidence within 2 years compared with existing models and determine the optimal analytic approach and number of variables required. Design, Setting, and Participants: This prognostic study used data from a prospective cohort of 15 307 participants without dementia at baseline to perform a secondary analysis of factors that could be used to predict dementia incidence. Participants attended National Alzheimer Coordinating Center memory clinics across the United States between 2005 and 2015. Analyses were conducted from March to May 2021. Exposures: 258 variables spanning domains of dementia-related clinical measures and risk factors. Main Outcomes and Measures: The main outcome was incident all-cause dementia diagnosed within 2 years of baseline assessment. Results: In a sample of 15 307 participants (mean [SD] age, 72.3 [9.8] years; 9129 [60%] women and 6178 [40%] men) without dementia at baseline, 1568 (10%) received a diagnosis of dementia within 2 years of their initial assessment. Compared with 2 existing models for dementia risk prediction (ie, Cardiovascular Risk Factors, Aging, and Incidence of Dementia Risk Score, and the Brief Dementia Screening Indicator), machine learning algorithms were superior in predicting incident all-cause dementia within 2 years. The gradient-boosted trees algorithm had a mean (SD) overall accuracy of 92% (1%), sensitivity of 0.45 (0.05), specificity of 0.97 (0.01), and area under the curve of 0.92 (0.01) using all 258 variables. Analysis of variable importance showed that only 6 variables were required for machine learning algorithms to achieve an accuracy of 91% and area under the curve of at least 0.89. Machine learning algorithms also identified up to 84% of participants who received an initial dementia diagnosis that was subsequently reversed to mild cognitive impairment or cognitively unimpaired, suggesting possible misdiagnosis. Conclusions and Relevance: These findings suggest that machine learning algorithms could accurately predict incident dementia within 2 years in patients receiving care at memory clinics using only 6 variables. These findings could be used to inform the development and validation of decision-making aids in memory clinics.
URL:
Learning directed acyclic graphical structures with genetical genomics data.
MOTIVATION: Large amount of research efforts have been focused on estimating gene networks based on gene expression data to understand the functional basis of a living organism. Such networks are often obtained by considering pairwise correlations between genes, thus may not reflect the true connectivity between genes. By treating gene expressions as quantitative traits while considering genetic markers, genetical genomics analysis has shown its power in enhancing the understanding of gene regulations. Previous works have shown the improved performance on estimating the undirected network graphical structure by incorporating genetic markers as covariates. Knowing that gene expressions are often due to directed regulations, it is more meaningful to estimate the directed graphical network. RESULTS: In this article, we introduce a covariate-adjusted Gaussian graphical model to estimate the Markov equivalence class of the directed acyclic graphs (DAGs) in a genetical genomics analysis framework. We develop a two-stage estimation procedure to first estimate the regression coefficient matrix by [Formula: see text] penalization. The estimated coefficient matrix is then used to estimate the mean values in our multi-response Gaussian model to estimate the regulatory networks of gene expressions using PC-algorithm. The estimation consistency for high dimensional sparse DAGs is established. Simulations are conducted to demonstrate our theoretical results. The method is applied to a human Alzheimer’s disease dataset in which differential DAGs are identified between cases and controls. R code for implementing the method can be downloaded at http://www.stt.msu.edu/~cui. AVAILABILITY AND IMPLEMENTATION: R code for implementing the method is freely available at http://www.stt.msu.edu/~cui/software.html.
URL: http://www.stt.msu.edu/~cui.
High-throughput target trial emulation for Alzheimer’s disease drug repurposing with real-world data.
Target trial emulation is the process of mimicking target randomized trials using real-world data, where effective confounding control for unbiased treatment effect estimation remains a main challenge. Although various approaches have been proposed for this challenge, a systematic evaluation is still lacking. Here we emulated trials for thousands of medications from two large-scale real-world data warehouses, covering over 10 years of clinical records for over 170 million patients, aiming to identify new indications of approved drugs for Alzheimer’s disease. We assessed different propensity score models under the inverse probability of treatment weighting framework and suggested a model selection strategy for improved baseline covariate balancing. We also found that the deep learning-based propensity score model did not necessarily outperform logistic regression-based methods in covariate balancing. Finally, we highlighted five top-ranked drugs (pantoprazole, gabapentin, atorvastatin, fluticasone, and omeprazole) originally intended for other indications with potential benefits for Alzheimer’s patients.
URL:
Brain clocks capture diversity and disparities in aging and dementia across geographically diverse populations.
Brain clocks, which quantify discrepancies between brain age and chronological age, hold promise for understanding brain health and disease. However, the impact of diversity (including geographical, socioeconomic, sociodemographic, sex and neurodegeneration) on the brain-age gap is unknown. We analyzed datasets from 5,306 participants across 15 countries (7 Latin American and Caribbean countries (LAC) and 8 non-LAC countries). Based on higher-order interactions, we developed a brain-age gap deep learning architecture for functional magnetic resonance imaging (2,953) and electroencephalography (2,353). The datasets comprised healthy controls and individuals with mild cognitive impairment, Alzheimer disease and behavioral variant frontotemporal dementia. LAC models evidenced older brain ages (functional magnetic resonance imaging: mean directional error = 5.60, root mean square error (r.m.s.e.) = 11.91; electroencephalography: mean directional error = 5.34, r.m.s.e. = 9.82) associated with frontoposterior networks compared with non-LAC models. Structural socioeconomic inequality, pollution and health disparities were influential predictors of increased brain-age gaps, especially in LAC (R2 = 0.37, F2 = 0.59, r.m.s.e. = 6.9). An ascending brain-age gap from healthy controls to mild cognitive impairment to Alzheimer disease was found. In LAC, we observed larger brain-age gaps in females in control and Alzheimer disease groups compared with the respective males. The results were not explained by variations in signal quality, demographics or acquisition methods. These findings provide a quantitative framework capturing the diversity of accelerated brain aging.
URL:
Computational refinement of post-translational modifications predicted from tandem mass spectrometry.
MOTIVATION: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson’s, Alzheimer’s, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called ‘blind’) PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments. RESULTS: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We show that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we show that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outperforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis. AVAILABILITY: Our algorithm is implemented in Matlab and is freely available for academic use. The software is available online from http://genes.toronto.edu.
URL: http://genes.toronto.edu.
Introducing TEC-LncMir for prediction of lncRNA-miRNA interactions through deep learning of RNA sequences.
The interactions between long noncoding RNA (lncRNA) and microRNA (miRNA) play critical roles in life processes, highlighting the necessity to enhance the performance of state-of-the-art models. Here, we introduced TEC-LncMir, a novel approach for predicting lncRNA-miRNA interaction using Transformer Encoder and convolutional neural networks (CNNs). TEC-LncMir treats lncRNA and miRNA sequences as natural languages, encodes them using the Transformer Encoder, and combines representations of a pair of microRNA and lncRNA into a contact tensor (a three-dimensional array). Afterward, TEC-LncMir treats the contact tensor as a multi-channel image, utilizes a four-layer CNN to extract the contact tensor’s features, and then uses these features to predict the interaction between the pair of lncRNA and miRNA. We applied a series of comparative experiments to demonstrate that TEC-LncMir significantly improves lncRNA-miRNA interaction prediction, compared with existing state-of-the-art models. We also trained TEC-LncMir utilizing a large training dataset, and as expected, TEC-LncMir achieves unprecedented performance. Moreover, we integrated miRanda into TEC-LncMir to show the secondary structures of high-confidence interactions. Finally, we utilized TEC-LncMir to identify microRNAs interacting with lncRNA NEAT1, where NEAT1 performs as a competitive endogenous RNA of the microRNAs’ targets (mRNAs) in brain cells. We also demonstrated the regulatory mechanism of NEAT1 in Alzheimer’s disease via transcriptome analysis and sequence alignment analysis. Overall, our results demonstrate the effectivity of TEC-LncMir, suggest a potential regulation of miRNAs by NEAT1 in Alzheimer’s disease, and take a significant step forward in lncRNA-miRNA interaction prediction.
URL:
Identification of pan-kinase-family inhibitors using graph convolutional networks to reveal family-sensitive pre-moieties.
BACKGROUND: Human protein kinases, the key players in phosphoryl signal transduction, have been actively investigated as drug targets for complex diseases such as cancer, immune disorders, and Alzheimer’s disease, with more than 60 successful drugs developed in the past 30 years. However, many of these single-kinase inhibitors show low efficacy and drug resistance has become an issue. Owing to the occurrence of highly conserved catalytic sites and shared signaling pathways within a kinase family, multi-target kinase inhibitors have attracted attention. RESULTS: To design and identify such pan-kinase family inhibitors (PKFIs), we proposed PKFI sets for eight families using 200,000 experimental bioactivity data points and applied a graph convolutional network (GCN) to build classification models. Furthermore, we identified and extracted family-sensitive (only present in a family) pre-moieties (parts of complete moieties) by utilizing a visualized explanation (i.e., where the model focuses on each input) method for deep learning, gradient-weighted class activation mapping (Grad-CAM). CONCLUSIONS: This study is the first to propose the PKFI sets, and our results point out and validate the power of GCN models in understanding the pre-moieties of PKFIs within and across different kinase families. Moreover, we highlight the discoverability of family-sensitive pre-moieties in PKFI identification and drug design.
URL:
Testing for spatial heterogeneity in functional MRI using the multivariate general linear model.
Much current research in functional magnetic resonance imaging (fMRI) employs multivariate machine learning approaches (e.g., support vector machines) to detect distributed spatial patterns from the temporal fluctuations of the neural signal. The aim of many studies is not classification, however, but investigation of multivariate spatial patterns, which pattern classifiers detect only indirectly. Here we propose a direct statistical measure for the existence of distributed spatial patterns (or spatial heterogeneity) applicable to fMRI datasets. We extend the univariate general linear model (GLM), typically used in fMRI analysis, to a multivariate case. We demonstrate that contrasting maximum likelihood estimations of different restrictions on this multivariate model can be used to estimate the extent of spatial heterogeneity in fMRI data. Under asymptotic assumptions inference can be made with reference to the chi(2) distribution. The test statistic is then assessed using simulated timecourses derived from real fMRI data followed by analyzing data from a real fMRI experiment. These analyses demonstrate the utility of the proposed measure of heterogeneity as well as considerations in its application. Measuring spatial heterogeneity in fMRI has important theoretical implications in its own right and may have potential uses for better characterising neurological conditions such as stroke and Alzheimer’s disease.
URL:
Artificial intelligence-coupled plasmonic infrared sensor for detection of structural protein biomarkers in neurodegenerative diseases.
Diagnosis of neurodegenerative disorders (NDDs) including Parkinson’s disease and Alzheimer’s disease is challenging owing to the lack of tools to detect preclinical biomarkers. The misfolding of proteins into oligomeric and fibrillar aggregates plays an important role in the development and progression of NDDs, thus underscoring the need for structural biomarker-based diagnostics. We developed an immunoassay-coupled nanoplasmonic infrared metasurface sensor that detects proteins linked to NDDs, such as alpha-synuclein, with specificity and differentiates the distinct structural species using their unique absorption signatures. We augmented the sensor with an artificial neural network enabling unprecedented quantitative prediction of oligomeric and fibrillar protein aggregates in their mixture. The microfluidic integrated sensor can retrieve time-resolved absorbance fingerprints in the presence of a complex biomatrix and is capable of multiplexing for the simultaneous monitoring of multiple pathology-associated biomarkers. Thus, our sensor is a promising candidate for the clinical diagnosis of NDDs, disease monitoring, and evaluation of novel therapies.
URL:
scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses.
Single-cell RNA-sequencing (scRNA-Seq) is widely used to reveal the heterogeneity and dynamics of tissues, organisms, and complex diseases, but its analyses still suffer from multiple grand challenges, including the sequencing sparsity and complex differential patterns in gene expression. We introduce the scGNN (single-cell graph neural network) to provide a hypothesis-free deep learning framework for scRNA-Seq analyses. This framework formulates and aggregates cell-cell relationships with graph neural networks and models heterogeneous gene expression patterns using a left-truncated mixture Gaussian model. scGNN integrates three iterative multi-modal autoencoders and outperforms existing tools for gene imputation and cell clustering on four benchmark scRNA-Seq datasets. In an Alzheimer’s disease study with 13,214 single nuclei from postmortem brain tissues, scGNN successfully illustrated disease-related neural development and the differential mechanism. scGNN provides an effective representation of gene expression and cell-cell relationships. It is also a powerful framework that can be applied to general scRNA-Seq analyses.
URL:
PSSM-Sumo: deep learning based intelligent model for prediction of sumoylation sites using discriminative features.
Post-translational modifications (PTMs) are fundamental to essential biological processes, exerting significant influence over gene expression, protein localization, stability, and genome replication. Sumoylation, a PTM involving the covalent addition of a chemical group to a specific protein sequence, profoundly impacts the functional diversity of proteins. Notably, identifying sumoylation sites has garnered significant attention due to their crucial roles in proteomic functions and their implications in various diseases, including Parkinson’s and Alzheimer’s. Despite the proposal of several computational models for identifying sumoylation sites, their effectiveness could be improved by the limitations associated with conventional learning methodologies. In this study, we introduce pseudo-position-specific scoring matrix (PsePSSM), a robust computational model designed for accurately predicting sumoylation sites using an optimized deep learning algorithm and efficient feature extraction techniques. Moreover, to streamline computational processes and eliminate irrelevant and noisy features, sequential forward selection using a support vector machine (SFS-SVM) is implemented to identify optimal features. The multi-layer Deep Neural Network (DNN) is a robust classifier, facilitating precise sumoylation site prediction. We meticulously assess the performance of PSSM-Sumo through a tenfold cross-validation approach, employing various statistical metrics such as the Matthews Correlation Coefficient (MCC), accuracy, sensitivity, specificity, and the Area under the ROC Curve (AUC). Comparative analyses reveal that PSSM-Sumo achieves an exceptional average prediction accuracy of 98.71%, surpassing existing models. The robustness and accuracy of the proposed model position it as a promising tool for advancing drug discovery and the diagnosis of diverse diseases linked to sumoylation sites.
URL:
A Mixed-Effects Model for Detecting Disrupted Connectivities in Heterogeneous Data.
The human brain is an amazingly complex network. Aberrant activities in this network can lead to various neurological disorders such as multiple sclerosis, Parkinson’s disease, Alzheimer’s disease, and autism. functional magnetic resonance imaging has emerged as an important tool to delineate the neural networks affected by such diseases, particularly autism. In this paper, we propose a special type of mixed-effects model together with an appropriate procedure for controlling false discoveries to detect disrupted connectivities for developing a neural network in whole brain studies. Results are illustrated with a large data set known as autism brain imaging data exchange which includes 361 subjects from eight medical centers.
URL:
Investigating the temporal pattern of neuroimaging-based brain age estimation as a biomarker for Alzheimer’s Disease related neurodegeneration.
Neuroimaging-based brain-age estimation via machine learning has emerged as an important new approach for studying brain aging. The difference between one’s estimated brain age and chronological age, the brain age gap (BAG), has been proposed as an Alzheimer’s Disease (AD) biomarker. However, most past studies on the BAG have been cross-sectional. Quantifying longitudinal changes in an individual’s BAG temporal pattern would likely improve prediction of AD progression and clinical outcome based on neurophysiological changes. To fill this gap, our study conducted predictive modeling using a large neuroimaging dataset with up to 8 years of follow-up to examine the temporal patterns of the BAG’s trajectory and how it varies by subject-level characteristics (sex, APOEe4 carriership) and disease status. Specifically, we explored the pattern and rate of change in BAG over time in individuals who remain stable with normal cognition or mild cognitive impairment (MCI), as well as individuals who progress to clinical AD. Combining multimodal imaging data in a support vector regression model to estimate brain age yielded improved performance over single modality. Multilevel modeling results showed the BAG followed a linear increasing trajectory with a significantly faster rate in individuals with MCI who progressed to AD compared to cognitively normal or MCI individuals who did not progress. The dynamic changes in the BAG during AD progression were further moderated by sex and APOEe4 carriership. Our findings demonstrate the BAG as a potential biomarker for understanding individual specific temporal patterns related to AD progression.
URL:
Dementia risk predictions from German claims data using methods of machine learning.
INTRODUCTION: We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are. METHODS: We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs). RESULTS: Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors. DISCUSSION: Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.
URL:
Improved DTI registration allows voxel-based analysis that outperforms tract-based spatial statistics.
Tract-Based Spatial Statistics (TBSS) is a popular software pipeline to coregister sets of diffusion tensor Fractional Anisotropy (FA) images for performing voxel-wise comparisons. It is primarily defined by its skeleton projection step intended to reduce effects of local misregistration. A white matter “skeleton” is computed by morphological thinning of the inter-subject mean FA, and then all voxels are projected to the nearest location on this skeleton. Here we investigate several enhancements to the TBSS pipeline based on recent advances in registration for other modalities, principally based on groupwise registration with the ANTS-SyN algorithm. We validate these enhancements using simulation experiments with synthetically-modified images. When used with these enhancements, we discover that TBSS’s skeleton projection step actually reduces algorithm accuracy, as the improved registration leaves fewer errors to warrant correction, and the effects of this projection’s compromises become stronger than those of its benefits. In our experiments, our proposed pipeline without skeleton projection is more sensitive for detecting true changes and has greater specificity in resisting false positives from misregistration. We also present comparative results of the proposed and traditional methods, both with and without the skeleton projection step, on three real-life datasets: two comparing differing populations of Alzheimer’s disease patients to matched controls, and one comparing progressive supranuclear palsy patients to matched controls. The proposed pipeline produces more plausible results according to each disease’s pathophysiology.
URL:
A novel method for gathering and prioritizing disease candidate genes based on construction of a set of disease-related MeSH terms.
BACKGROUND: Understanding the molecular mechanisms involved in disease is critical for the development of more effective and individualized strategies for prevention and treatment. The amount of disease-related literature, including new genetic information on the molecular mechanisms of disease, is rapidly increasing. Extracting beneficial information from literature can be facilitated by computational methods such as the knowledge-discovery approach. Several methods for mining gene-disease relationships using computational methods have been developed, however, there has been a lack of research evaluating specific disease candidate genes. RESULTS: We present a novel method for gathering and prioritizing specific disease candidate genes. Our approach involved the construction of a set of Medical Subject Headings (MeSH) terms for the effective retrieval of publications related to a disease candidate gene. Information regarding the relationships between genes and publications was obtained from the gene2pubmed database. The set of genes was prioritized using a “weighted literature score” based on the number of publications and weighted by the number of genes occurring in a publication. Using our method for the disease states of pain and Alzheimer’s disease, a total of 1101 pain candidate genes and 2810 Alzheimer’s disease candidate genes were gathered and prioritized. The precision was 0.30 and the recall was 0.89 in the case study of pain. The precision was 0.04 and the recall was 0.6 in the case study of Alzheimer’s disease. The precision-recall curve indicated that the performance of our method was superior to that of other publicly available tools. CONCLUSIONS: Our method, which involved the use of a set of MeSH terms related to disease candidate genes and a novel weighted literature score, improved the accuracy of gathering and prioritizing candidate genes by focusing on a specific disease.
URL:
Blood-based multivariate methylation risk score for cognitive impairment and dementia.
INTRODUCTION: The established link between DNA methylation and pathophysiology of dementia, along with its potential role as a molecular mediator of lifestyle and environmental influences, positions blood-derived DNA methylation as a promising tool for early dementia risk detection. METHODS: In conjunction with an extensive array of machine learning techniques, we employed whole blood genome-wide DNA methylation data as a surrogate for 14 modifiable and non-modifiable factors in the assessment of dementia risk in independent dementia cohorts. RESULTS: We established a multivariate methylation risk score (MMRS) for identifying mild cognitive impairment cross-sectionally, independent of age and sex (P = 2.0 x 10-3). This score significantly predicted the prospective development of cognitive impairments in independent studies of Alzheimer’s disease (hazard ratio for Rey’s Auditory Verbal Learning Test (RAVLT)-Learning = 2.47) and Parkinson’s disease (hazard ratio for MCI/dementia = 2.59). DISCUSSION: Our work shows the potential of employing blood-derived DNA methylation data in the assessment of dementia risk. HIGHLIGHTS: We used whole blood DNA methylation as a surrogate for 14 dementia risk factors. Created a multivariate methylation risk score for predicting cognitive impairment. Emphasized the role of machine learning and omics data in predicting dementia. The score predicts cognitive impairment development at the population level.
URL:
Significance of plasma p-tau217 in predicting long-term dementia risk in older community residents: Insights from machine learning approaches.
INTRODUCTION: Whether plasma biomarkers play roles in predicting incident dementia among the general population is worth exploring. METHODS: A total of 1857 baseline dementia-free older adults with follow-ups up to 13.5 years were included from a community-based cohort. The Recursive Feature Elimination (RFE) algorithm aided in feature selection from 90 candidate predictors to construct logistic regression, naive Bayes, bagged trees, and random forest models. Area under the curve (AUC) was used to assess the model performance for predicting incident dementia. RESULTS: During the follow-up of 12,716 person-years, 207 participants developed dementia. Four predictive models, incorporated plasma p-tau217, age, and scores of MMSE, STICK, and AVLT, exhibited AUCs ranging from 0.79 to 0.96 in testing datasets. These models maintained robustness across various subgroups and sensitivity analyses. DISCUSSION: Plasma p-tau217 outperforms most traditional variables and may be used to preliminarily screen older individuals at high risk of dementia. HIGHLIGHTS: Plasma p-tau217 showed comparable importance with age and cognitive tests in predicting incident dementia among community older adults. Machine learning models combining plasma p-tau217, age, and cognitive tests exhibited excellent performance in predicting incident dementia. The training models demonstrated robustness in subgroup and sensitivity analysis.
URL:
GCMM: graph convolution network based on multimodal attention mechanism for drug repurposing.
BACKGROUND: The main focus of in silico drug repurposing, which is a promising area for using artificial intelligence in drug discovery, is the prediction of drug-disease relationships. Although many computational models have been proposed recently, it is still difficult to reliably predict drug-disease associations from a variety of sources of data. RESULTS: In order to identify potential drug-disease associations, this paper introduces a novel end-to-end model called Graph convolution network based on a multimodal attention mechanism (GCMM). In particular, GCMM incorporates known drug-disease relations, drug-drug chemical similarity, drug-drug therapeutic similarity, disease-disease semantic similarity, and disease-disease target-based similarity into a heterogeneous network. A Graph Convolution Network encoder is used to learn how diseases and drugs are embedded in various perspectives. Additionally, GCMM can enhance performance by applying a multimodal attention layer to assign various levels of value to various features and the inputting of multi-source information. CONCLUSION: 5 fold cross-validation evaluations show that the GCMM outperforms four recently proposed deep-learning models on the majority of the criteria. It shows that GCMM can predict drug-disease relationships reliably and suggests improvement in the desired metrics. Hyper-parameter analysis and exploratory ablation experiments are also provided to demonstrate the necessity of each module of the model and the highest possible level of prediction performance. Additionally, a case study on Alzheimer’s disease (AD). Four of the five medications indicated by GCMM to have the highest potential correlation coefficient with AD have been demonstrated through literature or experimental research, demonstrating the viability of GCMM. All of these results imply that GCMM can provide a strong and effective tool for drug development and repositioning.
URL:
Evaluation of novel data-driven metrics of amyloid beta deposition for longitudinal PET studies.
PURPOSE: Positron emission tomography (PET) provides in vivo quantification of amyloid-beta (Abeta) pathology. Established methods for assessing Abeta burden can be affected by physiological and technical factors. Novel, data-driven metrics have been developed to account for these sources of variability. We aimed to evaluate the performance of four data-driven amyloid PET metrics against conventional techniques, using a common set of criteria. METHODS: Three cohorts were used for evaluation: Insight 46 (N=464, [18F]florbetapir), AIBL (N=277, [18F]flutemetamol), and an independent test-retest data (N=10, [18F]flutemetamol). Established metrics of amyloid tracer uptake included the Centiloid (CL) and where dynamic data was available, the non-displaceable binding potential (BPND). The four data driven metrics computed were the amyloid load (Abeta load), the Abeta PET pathology accumulation index (Abeta index), the Centiloid derived from non-negative matrix factorisation (CLNMF), and the amyloid pattern similarity score (AMPSS). These metrics were evaluated using reliability and repeatability in test-retest data, associations with BPND and CL, and sample size estimates to detect a 25% slowing in Abeta accumulation. RESULTS: All metrics showed good reliability. Abeta load, Abeta index and CLNMF were strong associated with the BPND. The associations with CL suggests that cross-sectional measures of CLNMF, Abeta index and Abeta load are robust across studies. Sample size estimates for secondary prevention trial scenarios were the lowest for CLNMF and Abeta load compared to the CL. CONCLUSION: Among the novel data-driven metrics evaluated, the Abeta load, the Abeta index and the CLNMF can provide comparable performance to more established quantification methods of Abeta PET tracer uptake. The CLNMF and Abeta load could offer a more precise alternative to CL, although further studies in larger cohorts should be conducted.
URL:
EMBER multidimensional spectral microscopy enables quantitative determination of disease- and cell-specific amyloid strains.
In neurodegenerative diseases, proteins fold into amyloid structures with distinct conformations (strains) that are characteristic of different diseases. However, there is a need to rapidly identify amyloid conformations in situ. Here, we use machine learning on the full information available in fluorescent excitation/emission spectra of amyloid-binding dyes to identify six distinct different conformational strains in vitro, as well as amyloid-beta (Abeta) deposits in different transgenic mouse models. Our EMBER (excitation multiplexed bright emission recording) imaging method rapidly identifies conformational differences in Abeta and tau deposits from Down syndrome, sporadic and familial Alzheimer’s disease human brain slices. EMBER has in situ identified distinct conformational strains of tau inclusions in astrocytes, oligodendrocytes, and neurons from Pick’s disease. In future studies, EMBER should enable high-throughput measurements of the fidelity of strain transmission in cellular and animal neurodegenerative diseases models, time course of amyloid strain propagation, and identification of pathogenic versus benign strains.
URL:
Artificial intelligence velocimetry reveals in vivo flow rates, pressure gradients, and shear stresses in murine perivascular flows.
Quantifying the flow of cerebrospinal fluid (CSF) is crucial for understanding brain waste clearance and nutrient delivery, as well as edema in pathological conditions such as stroke. However, existing in vivo techniques are limited to sparse velocity measurements in pial perivascular spaces (PVSs) or low-resolution measurements from brain-wide imaging. Additionally, volume flow rate, pressure, and shear stress variation in PVSs are essentially impossible to measure in vivo. Here, we show that artificial intelligence velocimetry (AIV) can integrate sparse velocity measurements with physics-informed neural networks to quantify CSF flow in PVSs. With AIV, we infer three-dimensional (3D), high-resolution velocity, pressure, and shear stress. Validation comes from training with 70% of PTV measurements and demonstrating close agreement with the remaining 30%. A sensitivity analysis on the AIV inputs shows that the uncertainty in AIV inferred quantities due to uncertainties in the PVS boundary locations inherent to in vivo imaging is less than 30%, and the uncertainty from the neural net initialization is less than 1%. In PVSs of N = 4 wild-type mice we find mean flow speed 16.33 +- 11.09 microm/s, volume flow rate 2.22 +- 1.983 x 103 microm3/s, axial pressure gradient ( - 2.75 +- 2.01)x10-4 Pa/microm (-2.07 +- 1.51 mmHg/m), and wall shear stress (3.00 +- 1.45)x10-3 Pa (all mean +- SE). Pressure gradients, flow rates, and resistances agree with prior predictions. AIV infers in vivo PVS flows in remarkable detail, which will improve fluid dynamic models and potentially clarify how CSF flow changes with aging, Alzheimer’s disease, and small vessel disease.
URL:
Distinguishing early and late brain aging from the Alzheimer’s disease spectrum: consistent morphological patterns across independent samples.
Alzheimer’s disease (AD) is a debilitating age-related neurodegenerative disorder. Accurate identification of individuals at risk is complicated as AD shares cognitive and brain features with aging. We applied linked independent component analysis (LICA) on three complementary measures of gray matter structure: cortical thickness, area and gray matter density of 137 AD, 78 mild (MCI) and 38 subjective cognitive impairment patients, and 355 healthy adults aged 18-78 years to identify dissociable multivariate morphological patterns sensitive to age and diagnosis. Using the lasso classifier, we performed group classification and prediction of cognition and age at different age ranges to assess the sensitivity and diagnostic accuracy of the LICA patterns in relation to AD, as well as early and late healthy aging. Three components showed high sensitivity to the diagnosis and cognitive status of AD, with different relationships with age: one reflected an anterior-posterior gradient in thickness and gray matter density and was uniquely related to diagnosis, whereas the other two, reflecting widespread cortical thickness and medial temporal lobe volume, respectively, also correlated significantly with age. Repeating the LICA decomposition and between-subject analysis on ADNI data, including 186 AD, 395 MCI and 220 age-matched healthy controls, revealed largely consistent brain patterns and clinical associations across samples. Classification results showed that multivariate LICA-derived brain characteristics could be used to predict AD and age with high accuracy (area under ROC curve up to 0.93 for classification of AD from controls). Comparison between classifiers based on feature ranking and feature selection suggests both common and unique feature sets implicated in AD and aging, and provides evidence of distinct age-related differences in early compared to late aging.
URL:
Multi-atlas segmentation of the whole hippocampus and subfields using multiple automatically generated templates.
INTRODUCTION: Advances in image segmentation of magnetic resonance images (MRI) have demonstrated that multi-atlas approaches improve segmentation over regular atlas-based approaches. These approaches often rely on a large number of manually segmented atlases (e.g. 30-80) that take significant time and expertise to produce. We present an algorithm, MAGeT-Brain (Multiple Automatically Generated Templates), for the automatic segmentation of the hippocampus that minimises the number of atlases needed whilst still achieving similar agreement to multi-atlas approaches. Thus, our method acts as a reliable multi-atlas approach when using special or hard-to-define atlases that are laborious to construct. METHOD: MAGeT-Brain works by propagating atlas segmentations to a template library, formed from a subset of target images, via transformations estimated by nonlinear image registration. The resulting segmentations are then propagated to each target image and fused using a label fusion method. We conduct two separate Monte Carlo cross-validation experiments comparing MAGeT-Brain and basic multi-atlas whole hippocampal segmentation using differing atlas and template library sizes, and registration and label fusion methods. The first experiment is a 10-fold validation (per parameter setting) over 60 subjects taken from the Alzheimer’s Disease Neuroimaging Database (ADNI), and the second is a five-fold validation over 81 subjects having had a first episode of psychosis. In both cases, automated segmentations are compared with manual segmentations following the Pruessner-protocol. Using the best settings found from these experiments, we segment 246 images of the ADNI1:Complete 1Yr 1.5 T dataset and compare these with segmentations from existing automated and semi-automated methods: FSL FIRST, FreeSurfer, MAPER, and SNT. Finally, we conduct a leave-one-out cross-validation of hippocampal subfield segmentation in standard 3T T1-weighted images, using five high-resolution manually segmented atlases (Winterburn et al., 2013). RESULTS: In the ADNI cross-validation, using 9 atlases MAGeT-Brain achieves a mean Dice’s Similarity Coefficient (DSC) score of 0.869 with respect to manual whole hippocampus segmentations, and also exhibits significantly lower variability in DSC scores than multi-atlas segmentation. In the younger, psychosis dataset, MAGeT-Brain achieves a mean DSC score of 0.892 and produces volumes which agree with manual segmentation volumes better than those produced by the FreeSurfer and FSL FIRST methods (mean difference in volume: 80 mm(3), 1600 mm(3), and 800 mm(3), respectively). Similarly, in the ADNI1:Complete 1Yr 1.5 T dataset, MAGeT-Brain produces hippocampal segmentations well correlated (r>0.85) with SNT semi-automated reference volumes within disease categories, and shows a conservative bias and a mean difference in volume of 250 mm(3) across the entire dataset, compared with FreeSurfer and FSL FIRST which both overestimate volume differences by 2600 mm(3) and 2800 mm(3) on average, respectively. Finally, MAGeT-Brain segments the CA1, CA4/DG and subiculum subfields on standard 3T T1-weighted resolution images with DSC overlap scores of 0.56, 0.65, and 0.58, respectively, relative to manual segmentations. CONCLUSION: We demonstrate that MAGeT-Brain produces consistent whole hippocampal segmentations using only 9 atlases, or fewer, with various hippocampal definitions, disease populations, and image acquisition types. Additionally, we show that MAGeT-Brain identifies hippocampal subfields in standard 3T T1-weighted images with overlap scores comparable to competing methods.
URL:
Integrating longitudinal information in hippocampal volume measurements for the early detection of Alzheimer’s disease.
BACKGROUND: Structural MRI measures for monitoring Alzheimer’s Disease (AD) progression are becoming instrumental in the clinical practice, and more so in the context of longitudinal studies. This investigation addresses the impact of four image analysis approaches on the longitudinal performance of the hippocampal volume. METHODS: We present a hippocampal segmentation algorithm and validate it on a gold-standard manual tracing database. We segmented 460 subjects from ADNI, each subject having been scanned twice at baseline, 12-month and 24month follow-up scan (1.5T, T1 MRI). We used the bilateral hippocampal volume v and its variation, measured as the annualized volume change Lambda=deltav/year(mm(3)/y). Four processing approaches with different complexity are compared to maximize the longitudinal information, and they are tested for cohort discrimination ability. Reference cohorts are Controls vs. Alzheimer’s Disease (CTRL/AD) and CTRL vs. Mild Cognitive Impairment who subsequently progressed to AD dementia (CTRL/MCI-co). We discuss the conditions on v and the added value of Lambda in discriminating subjects. RESULTS: The age-corrected bilateral annualized atrophy rate (%/year) were: -1.6 (0.6) for CTRL, -2.2 (1.0) for MCI-nc, -3.2 (1.2) for MCI-co and -4.0 (1.5) for AD. Combined (v, Lambda) discrimination ability gave an Area under the ROC curve (auc)=0.93 for CTRL vs AD and auc=0.88 for CTRL vs MCI-co. CONCLUSIONS: Longitudinal volume measurements can provide meaningful clinical insight and added value with respect to the baseline provided the analysis procedure embeds the longitudinal information.
URL:
Predicting continuous amyloid PET values with CSF tau phosphorylation occupancies.
INTRODUCTION: Cerebrospinal fluid (CSF) tau phosphorylation at multiple sites is associated with cortical amyloid and other pathologic changes in Alzheimer’s disease. These relationships can be non-linear. We used an artificial neural network to assess the ability of 10 different CSF tau phosphorylation sites to predict continuous amyloid positron emission tomography (PET) values. METHODS: CSF tau phosphorylation occupancies at 10 sites (including pT181/T181, pT217/T217, pT231/T231 and pT205/T205) were measured by mass spectrometry in 346 individuals (57 cognitively impaired, 289 cognitively unimpaired). We generated synthetic amyloid PET scans using biomarkers and evaluated their performance. RESULTS: Concentration of CSF pT217/T217 had low predictive error (average error: 13%), but also a low predictive range (ceiling 63 Centiloids). CSF pT231/T231 has slightly higher error (average error: 19%) but predicted through a greater range (87 Centiloids). DISCUSSION: Tradeoffs exist in biomarker selection. Some phosphorylation sites offer greater concordance with amyloid PET at lower levels, while others perform better over a greater range. HIGHLIGHTS: Novel pTau isoforms can predict cortical amyloid burden. pT217/T217 accurately predicts cortical amyloid burden in low-amyloid individuals. Traditional CSF biomarkers correspond with higher levels of amyloid.
URL:
Home monitoring of daily living activities and prediction of agitation risk in a cohort of people living with dementia.
BACKGROUND: People living with dementia (PLWD) have an increased susceptibility to developing adverse physical and psychological events. Internet of Things (IoT) technologies provides new ways to remotely monitor patients within the comfort of their homes, particularly important for the timely delivery of appropriate healthcare. Presented here is data collated as part of the on-going UK Dementia Research Institute’s Care Research and Technology Centre cohort and Technology Integrated Health Management (TIHM) study. There are two main aims to this work: first, to investigate the effect of the COVID-19 quarantine on the performance of daily living activities of PLWD, on which there is currently little research; and second, to create a simple classification model capable of effectively predicting agitation risk in PLWD, allowing for the generation of alerts with actionable information by which to prevent such outcomes. METHOD: A within-subject, date-matched study was conducted on daily living activity data using the first COVID-19 quarantine as a natural experiment. Supervised machine learning approaches were then applied to combined physiological and environmental data to create two simple classification models: a single marker model trained using ambient temperature as a feature, and a multi-marker model using ambient temperature, body temperature, movement, and entropy as features. RESULT: There are 102 PLWD total included in the dataset, with all patients having an established diagnosis of dementia, but with ranging types and severity. The COVID-19 study was carried out on a sub-group of 21 patient households. In 2020, PLWD had a significant increase in daily household activity (p = 1.40e-08), one-way repeated measures ANOVA). Moreover, there was a significant interaction between the pandemic quarantine and patient gender on night-time bed-occupancy duration (p = 3.00e-02, two-way mixed-effect ANOVA). On evaluating the models using 10-fold cross validation, both the single and multi-marker model were shown to balance precision and recall well, having F1-scores of 0.80 and 0.66, respectively. CONCLUSION: Remote monitoring technologies provide a continuous and reliable way of monitoring patient day-to-day wellbeing. The application of statistical analyses and machine learning algorithms to combined physiological and environmental data has huge potential to positively impact the delivery of healthcare for PLWD.
URL:
Characterizing the Clinical Features and Atrophy Patterns of MAPT-Related Frontotemporal Dementia With Disease Progression Modeling.
BACKGROUND AND OBJECTIVE: Mutations in the MAPT gene cause frontotemporal dementia (FTD). Most previous studies investigating the neuroanatomical signature of MAPT mutations have grouped all different mutations together and shown an association with focal atrophy of the temporal lobe. The variability in atrophy patterns between each particular MAPT mutation is less well-characterized. We aimed to investigate whether there were distinct groups of MAPT mutation carriers based on their neuroanatomical signature. METHODS: We applied Subtype and Stage Inference (SuStaIn), an unsupervised machine learning technique that identifies groups of individuals with distinct progression patterns, to characterize patterns of regional atrophy in MAPT-associated FTD within the Genetic FTD Initiative (GENFI) cohort study. RESULTS: Eighty-two MAPT mutation carriers were analyzed, the majority of whom had P301L, IVS10+16, or R406W mutations, along with 48 healthy noncarriers. SuStaIn identified 2 groups of MAPT mutation carriers with distinct atrophy patterns: a temporal subtype, in which atrophy was most prominent in the hippocampus, amygdala, temporal cortex, and insula; and a frontotemporal subtype, in which atrophy was more localized to the lateral temporal lobe and anterior insula, as well as the orbitofrontal and ventromedial prefrontal cortex and anterior cingulate. There was one-to-one mapping between IVS10+16 and R406W mutations and the temporal subtype and near one-to-one mapping between P301L mutations and the frontotemporal subtype. There were differences in clinical symptoms and neuropsychological test scores between subtypes: the temporal subtype was associated with amnestic symptoms, whereas the frontotemporal subtype was associated with executive dysfunction. CONCLUSION: Our results demonstrate that different MAPT mutations give rise to distinct atrophy patterns and clinical phenotype, providing insights into the underlying disease biology and potential utility for patient stratification in therapeutic trials.
URL:
TCMFP: a novel herbal formula prediction method based on network target’s score integrated with semi-supervised learning genetic algorithms.
Traditional Chinese medicine (TCM) has accumulated thousands years of knowledge in herbal therapy, but the use of herbal formulas is still characterized by reliance on personal experience. Due to the complex mechanism of herbal actions, it is challenging to discover effective herbal formulas for diseases by integrating the traditional experiences and modern pharmacological mechanisms of multi-target interactions. In this study, we propose a herbal formula prediction approach (TCMFP) combined therapy experience of TCM, artificial intelligence and network science algorithms to screen optimal herbal formula for diseases efficiently, which integrates a herb score (Hscore) based on the importance of network targets, a pair score (Pscore) based on empirical learning and herbal formula predictive score (FmapScore) based on intelligent optimization and genetic algorithm. The validity of Hscore, Pscore and FmapScore was verified by functional similarity and network topological evaluation. Moreover, TCMFP was used successfully to generate herbal formulae for three diseases, i.e. the Alzheimer’s disease, asthma and atherosclerosis. Functional enrichment and network analysis indicates the efficacy of targets for the predicted optimal herbal formula. The proposed TCMFP may provides a new strategy for the optimization of herbal formula, TCM herbs therapy and drug development.
URL:
Deep5hmC: Predicting genome-wide 5-Hydroxymethylcytosine landscape via a multimodal deep learning model.
MOTIVATION: 5-hydroxymethylcytosine (5hmC), a crucial epigenetic mark with a significant role in regulating tissue-specific gene expression, is essential for understanding the dynamic functions of the human genome. Despite its importance, predicting 5hmC modification across the genome remains a challenging task, especially when considering the complex interplay between DNA sequences and various epigenetic factors such as histone modifications and chromatin accessibility. RESULTS: Using tissue-specific 5hmC sequencing data, we introduce Deep5hmC, a multimodal deep learning framework that integrates both the DNA sequence and epigenetic features such as histone modification and chromatin accessibility to predict genome-wide 5hmC modification. The multimodal design of Deep5hmC demonstrates remarkable improvement in predicting both qualitative and quantitative 5hmC modification compared to unimodal versions of Deep5hmC and state-of-the-art machine learning methods. This improvement is demonstrated through benchmarking on a comprehensive set of 5hmC sequencing data collected at four developmental stages during forebrain organoid development and across 17 human tissues. Compared to DeepSEA and random forest, Deep5hmC achieves close 4 % and 17% improvement of AUROC across four forebrain developmental stages, and 6% and 27% across 17 human tissues for predicting binary 5hmC modification sites; and 8% and 22% improvement of Spearman correlation coefficient across four forebrain developmental stages, and 17% and 30% across 17 human tissues for predicting continuous 5hmC modification. Notably, Deep5hmC showcases its practical utility by accurately predicting gene expression and identifying differentially hydroxymethylated regions in a case-control study of Alzheimer’s disease. Deep5hmC significantly improves our understanding of tissue-specific gene regulation and facilitates the development of new biomarkers for complex diseases. AVAILABILITY AND IMPLEMENTATION: Deep5hmC is available via https://github.com/lichen-lab/Deep5hmC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
URL: https://github.com/lichen-lab/Deep5hmC.
Screening for functional transcriptional and splicing regulatory variants with GenIE.
Genome-wide association studies (GWAS) have identified numerous genetic loci underlying human diseases, but a fundamental challenge remains to accurately identify the underlying causal genes and variants. Here, we describe an arrayed CRISPR screening method, Genome engineering-based Interrogation of Enhancers (GenIE), which assesses the effects of defined alleles on transcription or splicing when introduced in their endogenous genomic locations. We use this sensitive assay to validate the activity of transcriptional enhancers and splice regulatory elements in human induced pluripotent stem cells (hiPSCs), and develop a software package (rgenie) to analyse the data. We screen the 99% credible set of Alzheimer’s disease (AD) GWAS variants identified at the clusterin (CLU) locus to identify a subset of likely causal variants, and employ GenIE to understand the impact of specific mutations on splicing efficiency. We thus establish GenIE as an efficient tool to rapidly screen for the role of transcribed variants on gene expression.
URL:
deepDR: a network-based deep learning approach to in silico drug repositioning.
MOTIVATION: Traditional drug discovery and development are often time-consuming and high risk. Repurposing/repositioning of approved drugs offers a relatively low-cost and high-efficiency approach toward rapid development of efficacious treatments. The emergence of large-scale, heterogeneous biological networks has offered unprecedented opportunities for developing in silico drug repositioning approaches. However, capturing highly non-linear, heterogeneous network structures by most existing approaches for drug repositioning has been challenging. RESULTS: In this study, we developed a network-based deep-learning approach, termed deepDR, for in silico drug repurposing by integrating 10 networks: one drug-disease, one drug-side-effect, one drug-target and seven drug-drug networks. Specifically, deepDR learns high-level features of drugs from the heterogeneous networks by a multi-modal deep autoencoder. Then the learned low-dimensional representation of drugs together with clinically reported drug-disease pairs are encoded and decoded collectively via a variational autoencoder to infer candidates for approved drugs for which they were not originally approved. We found that deepDR revealed high performance [the area under receiver operating characteristic curve (AUROC) = 0.908], outperforming conventional network-based or machine learning-based approaches. Importantly, deepDR-predicted drug-disease associations were validated by the ClinicalTrials.gov database (AUROC = 0.826) and we showcased several novel deepDR-predicted approved drugs for Alzheimer’s disease (e.g. risperidone and aripiprazole) and Parkinson’s disease (e.g. methylphenidate and pergolide). AVAILABILITY AND IMPLEMENTATION: Source code and data can be downloaded from https://github.com/ChengF-Lab/deepDR. SUPPLEMENTARY INFORMATION: Supplementary data are available online at Bioinformatics.
URL: https://github.com/ChengF-Lab/deepDR.
Predictive value of ATN biomarker profiles in estimating disease progression in Alzheimer’s disease dementia.
We aimed to evaluate the value of ATN biomarker classification system (amyloid beta [A], pathologic tau [T], and neurodegeneration [N]) for predicting conversion from mild cognitive impairment (MCI) to dementia. In a sample of people with MCI (n = 415) we assessed predictive performance of ATN classification using empirical knowledge-based cut-offs for each component of ATN and compared it to two data-driven approaches, logistic regression and RUSBoost machine learning classifiers, which used continuous clinical or biomarker scores. In data-driven approaches, we identified ATN features that distinguish normals from individuals with dementia and used them to classify persons with MCI into dementia-like and normal groups. Both data-driven classification methods performed better than the empirical cut-offs for ATN biomarkers in predicting conversion to dementia. Classifiers that used clinical features performed as well as classifiers that used ATN biomarkers for prediction of progression to dementia. We discuss that data-driven modeling approaches can improve our ability to predict disease progression and might have implications in future clinical trials.
URL:
Three-dimensional histology reveals dissociable human hippocampal long-axis gradients of Alzheimer’s pathology.
INTRODUCTION: Three-dimensional (3D) histology analyses are essential to overcome sampling variability and understand pathological differences beyond the dissection axis. We present Path2MR, the first pipeline allowing 3D reconstruction of sparse human histology without a magnetic resonance imaging (MRI) reference. We implemented Path2MR with post-mortem hippocampal sections to explore pathology gradients in Alzheimer’s disease. METHODS: Blockface photographs of brain hemisphere slices are used for 3D reconstruction, from which an MRI-like image is generated using machine learning. Histology sections are aligned to the reconstructed hemisphere and subsequently to an atlas in standard space. RESULTS: Path2MR successfully registered histological sections to their anatomic position along the hippocampal longitudinal axis. Combined with histopathology quantification, we found an expected peak of tau pathology at the anterior end of the hippocampus, whereas amyloid-beta (Abeta) displayed a quadratic anterior-posterior distribution. CONCLUSION: Path2MR, which enables 3D histology using any brain bank data set, revealed significant differences along the hippocampus between tau and Abeta. HIGHLIGHTS: Path2MR enables three-dimensional (3D) brain reconstruction from blockface dissection photographs. This pipeline does not require dense specimen sampling or a subject-specific magnetic resonance (MR) image. Anatomically consistent mapping of hippocampal sections was obtained with Path2MR. Our analyses revealed an anterior-posterior gradient of hippocampal tau pathology. In contrast, the peak of amyloid-beta (Abeta) deposition was closer to the hippocampal body.
URL:
Comparing machine learning-derived MRI-based and blood-based neurodegeneration biomarkers in predicting syndromal conversion in early AD.
INTRODUCTION: We compared the machine learning-derived, MRI-based Alzheimer’s disease (AD) resemblance atrophy index (AD-RAI) with plasma neurofilament light chain (NfL) level in predicting conversion of early AD among cognitively unimpaired (CU) and mild cognitive impairment (MCI) subjects. METHODS: We recruited participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) who had the following data: clinical features (age, gender, education, Montreal Cognitive Assessment [MoCA]), structural MRI, plasma biomarkers (p-tau181 , NfL), cerebrospinal fluid biomarkers (CSF) (Abeta42, p-tau181 ), and apolipoprotein E (APOE) epsilon4 genotype. We defined AD using CSF Abeta42 (A+) and p-tau181 (T+). We defined conversion (C+) if a subject progressed to the next syndromal stage within 4 years. RESULTS: Of 589 participants, 96 (16.3%) were A+T+C+. AD-RAI performed better than plasma NfL when added on top of clinical features, plasma p-tau181 , and APOE epsilon4 genotype (area under the curve [AUC] = 0.832 vs. AUC = 0.650 among CU, AUC = 0.853 vs. AUC = 0.805 among MCI) in predicting A+T+C+. DISCUSSION: AD-RAI outperformed plasma NfL in predicting syndromal conversion of early AD. HIGHLIGHTS: AD-RAI outperformed plasma NfL in predicting syndromal conversion among early AD. AD-RAI showed better metrics than volumetric hippocampal measures in predicting syndromal conversion. Combining clinical features, plasma p-tau181 and apolipoprotein E (APOE) with AD-RAI is the best model for predicting syndromal conversion.
URL:
Combinatorial identification of DNA methylation patterns over age in the human brain.
BACKGROUND: DNA methylation plays a key role in developmental processes, which is reflected in changing methylation patterns at specific CpG sites over the lifetime of an individual. The underlying mechanisms are complex and possibly affect multiple genes or entire pathways. RESULTS: We applied a multivariate approach to identify combinations of CpG sites that undergo modifications when transitioning between developmental stages. Monte Carlo feature selection produced a list of ranked and statistically significant CpG sites, while rule-based models allowed for identifying particular methylation changes in these sites. Our rule-based classifier reports combinations of CpG sites, together with changes in their methylation status in the form of easy-to-read IF-THEN rules, which allows for identification of the genes associated with the underlying sites. CONCLUSION: We utilized machine learning and statistical methods to discretize decision class (age) values to get a general pattern of methylation changes over the lifespan. The CpG sites present in the significant rules were annotated to genes involved in brain formation, general development, as well as genes linked to cancer and Alzheimer’s disease.
URL:
Multivariate word properties in fluency tasks reveal markers of Alzheimer’s dementia.
INTRODUCTION: Verbal fluency tasks are common in Alzheimer’s disease (AD) assessments. Yet, standard valid response counts fail to reveal disease-specific semantic memory patterns. Here, we leveraged automated word-property analysis to capture neurocognitive markers of AD vis-a-vis behavioral variant frontotemporal dementia (bvFTD). METHODS: Patients and healthy controls completed two fluency tasks. We counted valid responses and computed each word’s frequency, granularity, neighborhood, length, familiarity, and imageability. These features were used for group-level discrimination, patient-level identification, and correlations with executive and neural (magnetic resonanance imaging [MRI], functional MRI [fMRI], electroencephalography [EEG]) patterns. RESULTS: Valid responses revealed deficits in both disorders. Conversely, frequency, granularity, and neighborhood yielded robust group- and subject-level discrimination only in AD, also predicting executive outcomes. Disease-specific cortical thickness patterns were predicted by frequency in both disorders. Default-mode and salience network hypoconnectivity, and EEG beta hypoconnectivity, were predicted by frequency and granularity only in AD. DISCUSSION: Word-property analysis of fluency can boost AD characterization and diagnosis. HIGHLIGHTS: We report novel word-property analyses of verbal fluency in AD and bvFTD. Standard valid response counts captured deficits and brain patterns in both groups. Specific word properties (e.g., frequency, granularity) were altered only in AD. Such properties predicted cognitive and neural (MRI, fMRI, EEG) patterns in AD. Word-property analysis of fluency can boost AD characterization and diagnosis.
URL:
Spatially and temporally probing distinctive glycerophospholipid alterations in Alzheimer’s disease mouse brain via high-resolution ion mobility-enabled sn-position resolved lipidomics.
Dysregulated glycerophospholipid (GP) metabolism in the brain is associated with the progression of neurodegenerative diseases including Alzheimer’s disease (AD). Routine liquid chromatography-mass spectrometry (LC-MS)-based large-scale lipidomic methods often fail to elucidate subtle yet important structural features such as sn-position, hindering the precise interrogation of GP molecules. Leveraging high-resolution demultiplexing (HRdm) ion mobility spectrometry (IMS), we develop a four-dimensional (4D) lipidomic strategy to resolve GP sn-position isomers. We further construct a comprehensive experimental 4D GP database of 498 GPs identified from the mouse brain and an in-depth extended 4D library of 2500 GPs predicted by machine learning, enabling automated profiling of GPs with detailed acyl chain sn-position assignment. Analyzing three mouse brain regions (hippocampus, cerebellum, and cortex), we successfully identify a total of 592 GPs including 130 pairs of sn-position isomers. Further temporal GPs analysis in the three functional brain regions illustrates their metabolic alterations in AD progression.
URL:
Predicting brain age from functional connectivity in symptomatic and preclinical Alzheimer disease.
“Brain-predicted age” quantifies apparent brain age compared to normative neuroimaging trajectories. Advanced brain-predicted age has been well established in symptomatic Alzheimer disease (AD), but is underexplored in preclinical AD. Prior brain-predicted age studies have typically used structural MRI, but resting-state functional connectivity (FC) remains underexplored. Our model predicted age from FC in 391 cognitively normal, amyloid-negative controls (ages 18-89). We applied the trained model to 145 amyloid-negative, 151 preclinical AD, and 156 symptomatic AD participants to test group differences. The model accurately predicted age in the training set. FC-predicted brain age gaps (FC-BAG) were significantly older in symptomatic AD and significantly younger in preclinical AD compared to controls. There was minimal correspondence between networks predictive of age and AD. Elevated FC-BAG may reflect network disruption during symptomatic AD. Reduced FC-BAG in preclinical AD was opposite to the expected direction, and may reflect a biphasic response to preclinical AD pathology or may be driven by inconsistency between age-related vs. AD-related networks. Overall, FC-predicted brain age may be a sensitive AD biomarker.
URL:
CosGeneGate selects multi-functional and credible biomarkers for single-cell analysis.
MOTIVATION: Selecting representative genes or marker genes to distinguish cell types is an important task in single-cell sequencing analysis. Although many methods have been proposed to select marker genes, the genes selected may have redundancy and/or do not show cell-type-specific expression patterns to distinguish cell types. RESULTS: Here, we present a novel model, named CosGeneGate, to select marker genes for more effective marker selections. CosGeneGate is inspired by combining the advantages of selecting marker genes based on both cell-type classification accuracy and marker gene specific expression patterns. We demonstrate the better performance of the marker genes selected by CosGeneGate for various downstream analyses than the existing methods with both public datasets and newly sequenced datasets. The non-redundant marker genes identified by CosGeneGate for major cell types and tissues in human can be found at the website as follows: https://github.com/VivLon/CosGeneGate/blob/main/marker gene list.xlsx.
URL: https://github.com/VivLon/CosGeneGate/blob/main/marker
The Brain Chart of Aging: Machine-learning analytics reveals links between brain aging, white matter disease, amyloid burden, and cognition in the iSTAGING consortium of 10,216 harmonized MR scans.
INTRODUCTION: Relationships between brain atrophy patterns of typical aging and Alzheimer’s disease (AD), white matter disease, cognition, and AD neuropathology were investigated via machine learning in a large harmonized magnetic resonance imaging database (11 studies; 10,216 subjects). METHODS: Three brain signatures were calculated: Brain-age, AD-like neurodegeneration, and white matter hyperintensities (WMHs). Brain Charts measured and displayed the relationships of these signatures to cognition and molecular biomarkers of AD. RESULTS: WMHs were associated with advanced brain aging, AD-like atrophy, poorer cognition, and AD neuropathology in mild cognitive impairment (MCI)/AD and cognitively normal (CN) subjects. High WMH volume was associated with brain aging and cognitive decline occurring in an 10-year period in CN subjects. WMHs were associated with doubling the likelihood of amyloid beta (Abeta) positivity after age 65. Brain aging, AD-like atrophy, and WMHs were better predictors of cognition than chronological age in MCI/AD. DISCUSSION: A Brain Chart quantifying brain-aging trajectories was established, enabling the systematic evaluation of individuals’ brain-aging patterns relative to this large consortium.
URL:
Deep neural network heatmaps capture Alzheimer’s disease patterns reported in a large meta-analysis of neuroimaging studies.
Deep neural networks currently provide the most advanced and accurate machine learning models to distinguish between structural MRI scans of subjects with Alzheimer’s disease and healthy controls. Unfortunately, the subtle brain alterations captured by these models are difficult to interpret because of the complexity of these multi-layer and non-linear models. Several heatmap methods have been proposed to address this issue and analyze the imaging patterns extracted from the deep neural networks, but no quantitative comparison between these methods has been carried out so far. In this work, we explore these questions by deriving heatmaps from Convolutional Neural Networks (CNN) trained using T1 MRI scans of the ADNI data set and by comparing these heatmaps with brain maps corresponding to Support Vector Machine (SVM) activation patterns. Three prominent heatmap methods are studied: Layer-wise Relevance Propagation (LRP), Integrated Gradients (IG), and Guided Grad-CAM (GGC). Contrary to prior studies where the quality of heatmaps was visually or qualitatively assessed, we obtained precise quantitative measures by computing overlap with a ground-truth map from a large meta-analysis that combined 77 voxel-based morphometry (VBM) studies independently from ADNI. Our results indicate that all three heatmap methods were able to capture brain regions covering the meta-analysis map and achieved better results than SVM activation patterns. Among them, IG produced the heatmaps with the best overlap with the independent meta-analysis.
URL:
ROSE: A Retinal OCT-Angiography Vessel Segmentation Dataset and New Model.
Optical Coherence Tomography Angiography (OCTA) is a non-invasive imaging technique that has been increasingly used to image the retinal vasculature at capillary level resolution. However, automated segmentation of retinal vessels in OCTA has been under-studied due to various challenges such as low capillary visibility and high vessel complexity, despite its significance in understanding many vision-related diseases. In addition, there is no publicly available OCTA dataset with manually graded vessels for training and validation of segmentation algorithms. To address these issues, for the first time in the field of retinal image analysis we construct a dedicated Retinal OCTA SEgmentation dataset (ROSE), which consists of 229 OCTA images with vessel annotations at either centerline-level or pixel level. This dataset with the source code has been released for public access to assist researchers in the community in undertaking research in related topics. Secondly, we introduce a novel split-based coarse-to-fine vessel segmentation network for OCTA images (OCTA-Net), with the ability to detect thick and thin vessels separately. In the OCTA-Net, a split-based coarse segmentation module is first utilized to produce a preliminary confidence map of vessels, and a split-based refined segmentation module is then used to optimize the shape/contour of the retinal microvasculature. We perform a thorough evaluation of the state-of-the-art vessel segmentation models and our OCTA-Net on the constructed ROSE dataset. The experimental results demonstrate that our OCTA-Net yields better vessel segmentation performance in OCTA than both traditional and other deep learning methods. In addition, we provide a fractal dimension analysis on the segmented microvasculature, and the statistical analysis demonstrates significant differences between the healthy control and Alzheimer’s Disease group. This consolidates that the analysis of retinal microvasculature may offer a new scheme to study various neurodegenerative diseases.
URL:
Three-dimensional virtual histology of the human hippocampus based on phase-contrast computed tomography.
We have studied the three-dimensional (3D) cytoarchitecture of the human hippocampus in neuropathologically healthy and Alzheimer’s disease (AD) individuals, based on phase-contrast X-ray computed tomography of postmortem human tissue punch biopsies. In view of recent findings suggesting a nuclear origin of AD, we target in particular the nuclear structure of the dentate gyrus (DG) granule cells. Tissue samples of 20 individuals were scanned and evaluated using a highly automated approach of measurement and analysis, combining multiscale recordings, optimized phase retrieval, segmentation by machine learning, representation of structural properties in a feature space, and classification based on the theory of optimal transport. Accordingly, we find that the prototypical transformation between a structure representing healthy granule cells and the pathological state involves a decrease in the volume of granule cell nuclei, as well as an increase in the electron density and its spatial heterogeneity. The latter can be explained by a higher ratio of heterochromatin to euchromatin. Similarly, many other structural properties can be derived from the data, reflecting both the natural polydispersity of the hippocampal cytoarchitecture between different individuals in the physiological context and the structural effects associated with AD pathology.
URL:
Maximizing utility of neuropsychological measures in sex-specific predictive models of incident Alzheimer’s disease in the Framingham Heart Study.
INTRODUCTION: Sex differences in neuropsychological (NP) test performance might have important implications for the diagnosis of Alzheimer’s disease (AD). This study investigates sex differences in neuropsychological performance among individuals without dementia at baseline. METHODS: Neuropsychological assessment data, both standard test scores and process coded responses, from Framingham Heart Study participants were analyzed for sex differences using regression model and Cox proportional hazards model. Optimal NP profiles were identified by machine learning methods for men and women. RESULTS: Sex differences were observed in both summary scores and composite process scores of NP tests in terms of adjusted means and their associations with AD incidence. The optimal NP profiles for men and women have 10 and 8 measures, respectively, and achieve 0.76 mean area under the curve for AD prediction. DISCUSSION: These results suggest that NP tests can be leveraged for developing more sensitive, sex-specific indices for the diagnosis of AD.
URL:
Bioinformatics strategy to advance the interpretation of Alzheimer’s disease GWAS discoveries: The roads from association to causation.
INTRODUCTION: Genome-wide association studies (GWAS) discovered multiple late-onset Alzheimer’s disease (LOAD)-associated SNPs and inferred the genes based on proximity; however, the actual causal genes are yet to be identified. METHODS: We defined LOAD-GWAS regions by the most significantly associated SNP +-0.5 Mb and developed a bioinformatics pipeline that uses and integrates chromatin state segmentation track to map active enhancers and virtual 4C software to visualize interactions between active enhancers and gene promoters. We augmented our pipeline with biomedical and functional information. RESULTS: We applied the bioinformatics pipeline using three ~1 Mb LOAD-GWAS loci: BIN1, PICALM, CELF1. These loci contain 10-24 genes, an average of 106 active enhancers and 80 CTCF sites. Our strategy identified all genes corresponding to the promoters that interact with the active enhancer that is closest to the LOAD-GWAS-SNP and generated a shorter list of prioritized candidate LOAD genes (5-14/loci), feasible for post-GWAS investigations of causality. DISCUSSION: Interpretation of LOAD-GWAS discoveries requires the integration of brain-specific functional genomic data sets and information related to regulatory activity.
URL:
A plasma protein classifier for predicting amyloid burden for preclinical Alzheimer’s disease.
A blood-based assessment of preclinical disease would have huge potential in the enrichment of participants for Alzheimer’s disease (AD) therapeutic trials. In this study, cognitively unimpaired individuals from the AIBL and KARVIAH cohorts were defined as Abeta negative or Abeta positive by positron emission tomography. Nontargeted proteomic analysis that incorporated peptide fractionation and high-resolution mass spectrometry quantified relative protein abundances in plasma samples from all participants. A protein classifier model was trained to predict Abeta-positive participants using feature selection and machine learning in AIBL and independently assessed in KARVIAH. A 12-feature model for predicting Abeta-positive participants was established and demonstrated high accuracy (testing area under the receiver operator characteristic curve = 0.891, sensitivity = 0.78, and specificity = 0.77). This extensive plasma proteomic study has unbiasedly highlighted putative and novel candidates for AD pathology that should be further validated with automated methodologies.
URL:
Unraveling the heterogeneity in Alzheimer’s disease progression across multiple cohorts and the implications for data-driven disease modeling.
INTRODUCTION: Given study-specific inclusion and exclusion criteria, Alzheimer’s disease (AD) cohort studies effectively sample from different statistical distributions. This heterogeneity can propagate into cohort-specific signals and subsequently bias data-driven investigations of disease progression patterns. METHODS: We built multi-state models for six independent AD cohort datasets to statistically compare disease progression patterns across them. Additionally, we propose a novel method for clustering cohorts with regard to their progression signals. RESULTS: We identified significant differences in progression patterns across cohorts. Models trained on cohort data learned cohort-specific effects that bias their estimations. We demonstrated how six cohorts relate to each other regarding their disease progression. DISCUSSION: Heterogeneity in cohort datasets impedes the reproducibility of data-driven results and validation of progression models generated on single cohorts. To ensure robust scientific insights, it is advisable to externally validate results in independent cohort datasets. The proposed clustering assesses the comparability of cohorts in an unbiased, data-driven manner.
URL:
Mild cognitive impairment understanding: an empirical study by data-driven approach.
BACKGROUND: Cognitive decline has emerged as a significant threat to both public health and personal welfare, and mild cognitive decline/impairment (MCI) can further develop into Dementia/Alzheimer’s disease. While treatment of Dementia/Alzheimer’s disease can be expensive and ineffective sometimes, the prevention of MCI by identifying modifiable risk factors is a complementary and effective strategy. RESULTS: In this study, based on the data collected by Centers for Disease Control and Prevention (CDC) through the nationwide telephone survey, we apply a data-driven approach to re-exam the previously founded risk factors and discover new risk factors. We found that depression, physical health, cigarette usage, education level, and sleep time play an important role in cognitive decline, which is consistent with the previous discovery. Besides that, the first time, we point out that other factors such as arthritis, pulmonary disease, stroke, asthma, marital status also contribute to MCI risk, which is less exploited previously. We also incorporate some machine learning and deep learning algorithms to weigh the importance of various factors contributed to MCI and predicted cognitive declined. CONCLUSION: By incorporating the data-driven approach, we can determine that risk factors significantly correlated with diseases. These correlations could also be expanded to another medical diagnosis besides MCI.
URL:
Transferability of Alzheimer’s disease progression subtypes to an independent population cohort.
In the past, methods to subtype or biotype patients using brain imaging data have been developed. However, it is unclear whether and how these trained machine learning models can be successfully applied to population cohorts to study the genetic and lifestyle factors underpinning these subtypes. This work, using the Subtype and Stage Inference (SuStaIn) algorithm, examines the generalisability of data-driven Alzheimer’s disease (AD) progression models. We first compared SuStaIn models trained separately on Alzheimer’s disease neuroimaging initiative (ADNI) data and an AD-at-risk population constructed from the UK Biobank dataset. We further applied data harmonization techniques to remove cohort effects. Next, we built SuStaIn models on the harmonized datasets, which were then used to subtype and stage subjects in the other harmonized dataset. The first key finding is that three consistent atrophy subtypes were found in both datasets, which match the previously identified subtype progression patterns in AD: ‘typical’, ‘cortical’ and ‘subcortical’. Next, the subtype agreement was further supported by high consistency in individuals’ subtypes and stage assignment based on the different models: more than 92% of the subjects, with reliable subtype assignment in both ADNI and UK Biobank dataset, were assigned to an identical subtype under the model built on the different datasets. The successful transferability of AD atrophy progression subtypes across cohorts capturing different phases of disease development enabled further investigations of associations between AD atrophy subtypes and risk factors. Our study showed that (1) the average age is highest in the typical subtype and lowest in the subcortical subtype; (2) the typical subtype is associated with statistically more-AD-like cerebrospinal fluid biomarkers values in comparison to the other two subtypes; and (3) in comparison to the subcortical subtype, the cortical subtype subjects are more likely to associate with prescription of cholesterol and high blood pressure medications. In summary, we presented cross-cohort consistent recovery of AD atrophy subtypes, showing how the same subtypes arise even in cohorts capturing substantially different disease phases. Our study opened opportunities for future detailed investigations of atrophy subtypes with a broad range of early risk factors, which will potentially lead to a better understanding of the disease aetiology and the role of lifestyle and behaviour on AD.
URL:
Predictive value of ATN biomarker profiles in estimating disease progression in Alzheimer’s disease dementia.
We aimed to evaluate the value of ATN biomarker classification system (amyloid beta [A], pathologic tau [T], and neurodegeneration [N]) for predicting conversion from mild cognitive impairment (MCI) to dementia. In a sample of people with MCI (n = 415) we assessed predictive performance of ATN classification using empirical knowledge-based cut-offs for each component of ATN and compared it to two data-driven approaches, logistic regression and RUSBoost machine learning classifiers, which used continuous clinical or biomarker scores. In data-driven approaches, we identified ATN features that distinguish normals from individuals with dementia and used them to classify persons with MCI into dementia-like and normal groups. Both data-driven classification methods performed better than the empirical cut-offs for ATN biomarkers in predicting conversion to dementia. Classifiers that used clinical features performed as well as classifiers that used ATN biomarkers for prediction of progression to dementia. We discuss that data-driven modeling approaches can improve our ability to predict disease progression and might have implications in future clinical trials.
URL:
Automatic temporal lobe atrophy assessment in prodromal AD: Data from the DESCRIPA study.
BACKGROUND: In the framework of the clinical validation of research tools, this investigation presents a validation study of an automatic medial temporal lobe atrophy measure that is applied to a naturalistic population sampled from memory clinic patients across Europe. METHODS: The procedure was developed on 1.5-T magnetic resonance images from the Alzheimer’s Disease Neuroimaging Initiative database, and it was validated on an independent data set coming from the DESCRIPA study. All images underwent an automatic processing procedure to assess tissue atrophy that was targeted at the hippocampal region. For each subject, the procedure returns a classification index. Once provided with the clinical assessment at baseline and follow-up, subjects were grouped into cohorts to assess classification performance. Each cohort was divided into converters (co) and nonconverters (nc) depending on the clinical outcome at follow-up visit. RESULTS: We found the area under the receiver operating characteristic curve (AUC) was 0.81 for all co versus nc subjects, and AUC was 0.90 for subjective memory complaint (SMCnc) versus all co subjects. Furthermore, when training on mild cognitive impairment (MCI-nc/MCI-co), the classification performance generally exceeds that found when training on controls versus Alzheimer’s disease (CTRL/AD). CONCLUSIONS: Automatic magnetic resonance imaging analysis may assist clinical classification of subjects in a memory clinic setting even when images are not specifically acquired for automatic analysis.
URL:
Identification of evolutionarily conserved gene networks mediating neurodegenerative dementia.
Identifying the mechanisms through which genetic risk causes dementia is an imperative for new therapeutic development. Here, we apply a multistage, systems biology approach to elucidate the disease mechanisms in frontotemporal dementia. We identify two gene coexpression modules that are preserved in mice harboring mutations in MAPT, GRN and other dementia mutations on diverse genetic backgrounds. We bridge the species divide via integration with proteomic and transcriptomic data from the human brain to identify evolutionarily conserved, disease-relevant networks. We find that overexpression of miR-203, a hub of a putative regulatory microRNA (miRNA) module, recapitulates mRNA coexpression patterns associated with disease state and induces neuronal cell death, establishing this miRNA as a regulator of neurodegeneration. Using a database of drug-mediated gene expression changes, we identify small molecules that can normalize the disease-associated modules and validate this experimentally. Our results highlight the utility of an integrative, cross-species network approach to drug discovery.
URL:
Three-dimensional virtual histology of the human hippocampus based on phase-contrast computed tomography.
We have studied the three-dimensional (3D) cytoarchitecture of the human hippocampus in neuropathologically healthy and Alzheimer’s disease (AD) individuals, based on phase-contrast X-ray computed tomography of postmortem human tissue punch biopsies. In view of recent findings suggesting a nuclear origin of AD, we target in particular the nuclear structure of the dentate gyrus (DG) granule cells. Tissue samples of 20 individuals were scanned and evaluated using a highly automated approach of measurement and analysis, combining multiscale recordings, optimized phase retrieval, segmentation by machine learning, representation of structural properties in a feature space, and classification based on the theory of optimal transport. Accordingly, we find that the prototypical transformation between a structure representing healthy granule cells and the pathological state involves a decrease in the volume of granule cell nuclei, as well as an increase in the electron density and its spatial heterogeneity. The latter can be explained by a higher ratio of heterochromatin to euchromatin. Similarly, many other structural properties can be derived from the data, reflecting both the natural polydispersity of the hippocampal cytoarchitecture between different individuals in the physiological context and the structural effects associated with AD pathology.
URL:
The Dementia SomaSignal Test (dSST): A plasma proteomic predictor of 20-year dementia risk.
INTRODUCTION: There is an unmet need for tools to quantify dementia risk during its multi-decade preclinical/prodromal phase, given that current biomarkers predict risk over shorter follow-up periods and are specific to Alzheimer’s disease. METHODS: Using high-throughput proteomic assays and machine learning techniques in the Atherosclerosis Risk in Communities study (n = 11,277), we developed the Dementia SomaSignal Test (dSST). RESULTS: In addition to outperforming existing plasma biomarkers, the dSST predicted mid-life dementia risk over a 20-year follow-up across two independent cohorts with different ethnic backgrounds (areas under the curve [AUCs]: dSST 0.68-0.70, dSST+age 0.75-0.81). In a separate cohort, the dSST was associated with longitudinal declines across multiple cognitive domains, accelerated brain atrophy, and elevated measures of neuropathology (as evidenced by positron emission tomography and plasma biomarkers). DISCUSSION: The dSST is a cost-effective, scalable, and minimally invasive protein-based prognostic aid that can quantify risk up to two decades before dementia onset. HIGHLIGHTS: The Dementia SomaSignal Test (dSST) predicts 20-year dementia risk across two independent cohorts. dSST outperforms existing plasma biomarkers in predicting multi-decade dementia risk. dSST predicts cognitive decline and accelerated brain atrophy in a third cohort. dSST is a prognostic aid that can predict dementia risk over two decades.
URL:
Analysis of long noncoding RNAs highlights region-specific altered expression patterns and diagnostic roles in Alzheimer’s disease.
Increasing evidence has revealed the multiple roles of long noncoding RNAs (lncRNAs) in neurodevelopment, brain function and aging, and their dysregulation was implicated in many types of neurological diseases. However, expression pattern and diagnostic role of lncRNAs in Alzheimer’s disease (AD) remain largely unknown and has gained significant attention. In this study, we performed a comparative analysis for lncRNA expression profiles in four brain regions in brain aging and AD. Our analysis revealed age- and disease-dependent region-specific lncRNA expression patterns in aging and AD. Moreover, we identified a panel of nine lncRNAs (termed LncSigAD9) in a discovery cohort of 114 samples using supervised machine learning and stepwise selection method. The LncSigAD9 was able to differentiate between AD and healthy controls with high diagnostic sensitivity and specificity both in the discovery cohort (86.3 and 89.5%) and the additional independent AD cohort (90.8 and 83.8%). The receiver operating characteristic curves for the LncSigAD9 were 0.863 and 0.939 for discovery and independent cohorts, respectively. Furthermore, the LncSigAD9 demonstrated higher diagnostic performance than nine-minus-one lncRNA signature and mRNA-based signature with a similar number of genes. In silico functional analysis indicated the involvement of lncRNA expression variation in brain development- and metabolism-related biological processes. Taken together, our study highlights the importance of lncRNAs in brain aging and AD, and demonstrated the utility of lncRNAs as a promising biomarker for early AD diagnosis and treatment.
URL:
Use of blood pressure measurements extracted from the electronic health record in predicting Alzheimer’s disease: A retrospective cohort study at two medical centers.
INTRODUCTION: Studies investigating the relationship between blood pressure (BP) measurements from electronic health records (EHRs) and Alzheimer’s disease (AD) rely on summary statistics, like BP variability, and have only been validated at a single institution. We hypothesize that leveraging BP trajectories can accurately estimate AD risk across different populations. METHODS: In a retrospective cohort study, EHR data from Veterans Affairs (VA) patients were used to train and internally validate a machine learning model to predict AD onset within 5 years. External validation was conducted on patients from Michigan Medicine (MM). RESULTS: The VA and MM cohorts included 6860 and 1201 patients, respectively. Model performance using BP trajectories was modest but comparable (area under the receiver operating characteristic curve [AUROC] = 0.64 [95% confidence interval (CI) = 0.54-0.73] for VA vs. AUROC = 0.66 [95% CI = 0.55-0.76] for MM). CONCLUSION: Approaches that directly leverage BP trajectories from EHR data could aid in AD risk stratification across institutions.
URL:
Different oscillatory mechanisms of dementia-related diseases with cognitive impairment in closed-eye state.
The escalating global trend of aging has intensified the focus on health concerns prevalent among the elderly. Notably, Dementia related diseases, including Alzheimer’s disease (AD) and frontotemporal dementia (FTD), significantly impair the quality of life for both affected seniors and their caregivers. However, the underlying neural mechanisms of these diseases remain incompletely understood, especially in terms of neural oscillations. In this study, we leveraged an open dataset containing 36 AD, 23 FTD, and 29 healthy controls (HC) to investigate these mechanisms. We accurately and clearly identified three stable oscillation targets (theta, ~5Hz, alpha, ~10Hz, and beta, ~18Hz) that facilitate differentiation between AD, FTD, and HC both statistically and through classification using machine learning algorithms. Overall, the differences between AD and HC were the most pronounced, with FTD exhibiting intermediate characteristics. The differences in the theta and alpha bands showed a global pattern, whereas the differences in the beta band were localized to the central-temporal region. Moreover, our analysis revealed that the relative theta power was significantly and negatively correlated with the Mini Mental State Examination (MMSE) scores, while the relative alpha and beta power showed a significant positive correlation. This study is the first to pinpoint multiple robust and effective neural oscillation targets to distinguish AD, offering a simple and convenient method that holds promise for future applications in the early screening of large-scale dementia-related diseases.
URL:
A combination model of AD biomarkers revealed by machine learning precisely predicts Alzheimer’s dementia: China Aging and Neurodegenerative Initiative (CANDI) study.
INTRODUCTION: To test the utility of the “A/T/N” system in the Chinese population, we study core Alzheimer’s disease (AD) biomarkers in a newly established Chinese cohort. METHODS: A total of 411 participants were selected, including 96 cognitively normal individuals, 94 patients with mild cognitive impairment (MCI) patients, 173 patients with AD, and 48 patients with non-AD dementia. Fluid biomarkers were measured with single molecule array. Amyloid beta (Abeta) deposition was determined by 18 F-Flobetapir positron emission tomography (PET), and brain atrophy was quantified using magnetic resonance imaging (MRI). RESULTS: Abeta42/Abeta40 was decreased, whereas levels of phosphorylated tau (p-tau) were increased in cerebrospinal fluid (CSF) and plasma from patients with AD. CSF Abeta42/Abeta40, CSF p-tau, and plasma p-tau showed a high concordance in discriminating between AD and non-AD dementia or elderly controls. A combination of plasma p-tau, apolipoprotein E (APOE) genotype, and MRI measures accurately predicted amyloid PET status. DISCUSSION: These results revealed a universal applicability of the “A/T/N” framework in a Chinese population and established an optimal diagnostic model consisting of cost-effective and non-invasive approaches for diagnosing AD.
URL:
Algorithmic Fairness of Machine Learning Models for Alzheimer Disease Progression.
Importance: Predictive models using machine learning techniques have potential to improve early detection and management of Alzheimer disease (AD). However, these models potentially have biases and may perpetuate or exacerbate existing disparities. Objective: To characterize the algorithmic fairness of longitudinal prediction models for AD progression. Design, Setting, and Participants: This prognostic study investigated the algorithmic fairness of logistic regression, support vector machines, and recurrent neural networks for predicting progression to mild cognitive impairment (MCI) and AD using data from participants in the Alzheimer Disease Neuroimaging Initiative evaluated at 57 sites in the US and Canada. Participants aged 54 to 91 years who contributed data on at least 2 visits between September 2005 and May 2017 were included. Data were analyzed in October 2022. Exposures: Fairness was quantified across sex, ethnicity, and race groups. Neuropsychological test scores, anatomical features from T1 magnetic resonance imaging, measures extracted from positron emission tomography, and cerebrospinal fluid biomarkers were included as predictors. Main Outcomes and Measures: Outcome measures quantified fairness of prediction models (logistic regression [LR], support vector machine [SVM], and recurrent neural network [RNN] models), including equal opportunity, equalized odds, and demographic parity. Specifically, if the model exhibited equal sensitivity for all groups, it aligned with the principle of equal opportunity, indicating fairness in predictive performance. Results: A total of 1730 participants in the cohort (mean [SD] age, 73.81 [6.92] years; 776 females [44.9%]; 69 Hispanic [4.0%] and 1661 non-Hispanic [96.0%]; 29 Asian [1.7%], 77 Black [4.5%], 1599 White [92.4%], and 25 other race [1.4%]) were included. Sensitivity for predicting progression to MCI and AD was lower for Hispanic participants compared with non-Hispanic participants; the difference (SD) in true positive rate ranged from 20.9% (5.5%) for the RNN model to 27.8% (9.8%) for the SVM model in MCI and 24.1% (5.4%) for the RNN model to 48.2% (17.3%) for the LR model in AD. Sensitivity was similarly lower for Black and Asian participants compared with non-Hispanic White participants; for example, the difference (SD) in AD true positive rate was 14.5% (51.6%) in the LR model, 12.3% (35.1%) in the SVM model, and 28.4% (16.8%) in the RNN model for Black vs White participants, and the difference (SD) in MCI true positive rate was 25.6% (13.1%) in the LR model, 24.3% (13.1%) in the SVM model, and 6.8% (18.7%) in the RNN model for Asian vs White participants. Models generally satisfied metrics of fairness with respect to sex, with no significant differences by group, except for cognitively normal (CN)-MCI and MCI-AD transitions (eg, an absolute increase [SD] in the true positive rate of CN-MCI transitions of 10.3% [27.8%] for the LR model). Conclusions and Relevance: In this study, models were accurate in aggregate but failed to satisfy fairness metrics. These findings suggest that fairness should be considered in the development and use of machine learning models for AD progression.
URL:
Portable, low-field magnetic resonance imaging for evaluation of Alzheimer’s disease.
Portable, low-field magnetic resonance imaging (LF-MRI) of the brain may facilitate point-of-care assessment of patients with Alzheimer’s disease (AD) in settings where conventional MRI cannot. However, image quality is limited by a lower signal-to-noise ratio. Here, we optimize LF-MRI acquisition and develop a freely available machine learning pipeline to quantify brain morphometry and white matter hyperintensities (WMH). We validate the pipeline and apply it to outpatients presenting with mild cognitive impairment or dementia due to AD. We find hippocampal volumes from <= 3 mm isotropic LF-MRI scans have agreement with conventional MRI and are more accurate than anisotropic counterparts. We also show WMH volume has agreement between manual segmentation and the automated pipeline. The increased availability and reduced cost of LF-MRI, in combination with our machine learning pipeline, has the potential to increase access to neuroimaging for dementia.
URL:
Unified epigenomic, transcriptomic, proteomic, and metabolomic taxonomy of Alzheimer’s disease progression and heterogeneity.
Alzheimer’s disease (AD) is a heterogeneous disorder with abnormalities in multiple biological domains. In an advanced machine learning analysis of postmortem brain and in vivo blood multi-omics molecular data (N = 1863), we integrated epigenomic, transcriptomic, proteomic, and metabolomic profiles into a multilevel biological AD taxonomy. We obtained a personalized multilevel molecular index of AD dementia progression that predicts severity of neuropathologies, and identified three robust molecular-based subtypes that explain much of the pathologic and clinical heterogeneity of AD. These subtypes present distinct patterns of alteration in DNA methylation, RNA, proteins, and metabolites, identifiable in the brain and subsequently in blood. In addition, the genetic variations that predispose to the various AD subtypes in brain predict distinct spatial patterns of alteration in cell types, suggesting a unique influence of each putative AD variant on neuropathological mechanisms. These observations support that an individually tailored multi-omics molecular taxonomy of AD may represent distinct targets for preventive or treatment interventions.
URL:
Brain MAPS: an automated, accurate and robust brain extraction technique using a template library.
Whole brain extraction is an important pre-processing step in neuroimage analysis. Manual or semi-automated brain delineations are labour-intensive and thus not desirable in large studies, meaning that automated techniques are preferable. The accuracy and robustness of automated methods are crucial because human expertise may be required to correct any suboptimal results, which can be very time consuming. We compared the accuracy of four automated brain extraction methods: Brain Extraction Tool (BET), Brain Surface Extractor (BSE), Hybrid Watershed Algorithm (HWA) and a Multi-Atlas Propagation and Segmentation (MAPS) technique we have previously developed for hippocampal segmentation. The four methods were applied to extract whole brains from 682 1.5T and 157 3T T(1)-weighted MR baseline images from the Alzheimer’s Disease Neuroimaging Initiative database. Semi-automated brain segmentations with manual editing and checking were used as the gold-standard to compare with the results. The median Jaccard index of MAPS was higher than HWA, BET and BSE in 1.5T and 3T scans (p<0.05, all tests), and the 1st to 99th centile range of the Jaccard index of MAPS was smaller than HWA, BET and BSE in 1.5T and 3T scans ( p<0.05, all tests). HWA and MAPS were found to be best at including all brain tissues (median false negative rate <=0.010% for 1.5T scans and <=0.019% for 3T scans, both methods). The median Jaccard index of MAPS were similar in both 1.5T and 3T scans, whereas those of BET, BSE and HWA were higher in 1.5T scans than 3T scans (p<0.05, all tests). We found that the diagnostic group had a small effect on the median Jaccard index of all four methods. In conclusion, MAPS had relatively high accuracy and low variability compared to HWA, BET and BSE in MR scans with and without atrophy.
URL:
Fast and accurate modelling of longitudinal and repeated measures neuroimaging data.
Despite the growing importance of longitudinal data in neuroimaging, the standard analysis methods make restrictive or unrealistic assumptions (e.g., assumption of Compound Symmetry–the state of all equal variances and equal correlations–or spatially homogeneous longitudinal correlations). While some new methods have been proposed to more accurately account for such data, these methods are based on iterative algorithms that are slow and failure-prone. In this article, we propose the use of the Sandwich Estimator method which first estimates the parameters of interest with a simple Ordinary Least Square model and second estimates variances/covariances with the “so-called” Sandwich Estimator (SwE) which accounts for the within-subject correlation existing in longitudinal data. Here, we introduce the SwE method in its classic form, and we review and propose several adjustments to improve its behaviour, specifically in small samples. We use intensive Monte Carlo simulations to compare all considered adjustments and isolate the best combination for neuroimaging data. We also compare the SwE method to other popular methods and demonstrate its strengths and weaknesses. Finally, we analyse a highly unbalanced longitudinal dataset from the Alzheimer’s Disease Neuroimaging Initiative and demonstrate the flexibility of the SwE method to fit within- and between-subject effects in a single model. Software implementing this SwE method has been made freely available at http://warwick.ac.uk/tenichols/SwE.
URL: http://warwick.ac.uk/tenichols/SwE.
Selecting software pipelines for change in flortaucipir SUVR: Balancing repeatability and group separation.
Since tau PET tracers were introduced, investigators have quantified them using a wide variety of automated methods. As longitudinal cohort studies acquire second and third time points of serial within-person tau PET data, determining the best pipeline to measure change has become crucial. We compared a total of 415 different quantification methods (each a combination of multiple options) according to their effects on a) differences in annual SUVR change between clinical groups, and b) longitudinal measurement repeatability as measured by the error term from a linear mixed-effects model. Our comparisons used MRI and Flortaucipir scans of 97 Mayo Clinic study participants who clinically either: a) were cognitively unimpaired, or b) had cognitive impairments that were consistent with Alzheimer’s disease pathology. Tested methods included cross-sectional and longitudinal variants of two overarching pipelines (FreeSurfer 6.0, and an in-house pipeline based on SPM12), three choices of target region (entorhinal, inferior temporal, and a temporal lobe meta-ROI), five types of partial volume correction (PVC) (none, two-compartment, three-compartment, geometric transfer matrix (GTM), and a tau-specific GTM variant), seven choices of reference region (cerebellar crus, cerebellar gray matter, whole cerebellum, pons, supratentorial white matter, eroded supratentorial WM, and a composite of eroded supratentorial WM, pons, and whole cerebellum), two choices of region masking (GM or GM and WM), and two choices of statistic (voxel-wise mean vs. median). Our strongest findings were: 1) larger temporal-lobe target regions greatly outperformed entorhinal cortex (median sample size estimates based on a hypothetical clinical trial were 520-526 vs. 1740); 2) longitudinal processing pipelines outperformed cross-sectional pipelines (median sample size estimates were 483 vs. 572); and 3) reference regions including supratentorial WM outperformed traditional cerebellar and pontine options (median sample size estimates were 370 vs. 559). Altogether, our results favored longitudinally SUVR methods and a temporal-lobe meta-ROI that includes adjacent (juxtacortical) WM, a composite reference region (eroded supratentorial WM + pons + whole cerebellum), 2-class voxel-based PVC, and median statistics.
URL:
Real-time detection of 20 amino acids and discrimination of pathologically relevant peptides with functionalized nanopore.
Precise identification and quantification of amino acids is crucial for many biological applications. Here we report a copper(II)-functionalized Mycobacterium smegmatis porin A (MspA) nanopore with the N91H substitution, which enables direct identification of all 20 proteinogenic amino acids when combined with a machine-learning algorithm. The validation accuracy reaches 99.1%, with 30.9% signal recovery. The feasibility of ultrasensitive quantification of amino acids was also demonstrated at the nanomolar range. Furthermore, the capability of this system for real-time analyses of two representative post-translational modifications (PTMs), one unnatural amino acid and ten synthetic peptides using exopeptidases, including clinically relevant peptides associated with Alzheimer’s disease and cancer neoantigens, was demonstrated. Notably, our strategy successfully distinguishes peptides with only one amino acid difference from the hydrolysate and provides the possibility to infer the peptide sequence.
URL:
Single-cell analysis reveals inflammatory interactions driving macular degeneration.
Due to commonalities in pathophysiology, age-related macular degeneration (AMD) represents a uniquely accessible model to investigate therapies for neurodegenerative diseases, leading us to examine whether pathways of disease progression are shared across neurodegenerative conditions. Here we use single-nucleus RNA sequencing to profile lesions from 11 postmortem human retinas with age-related macular degeneration and 6 control retinas with no history of retinal disease. We create a machine-learning pipeline based on recent advances in data geometry and topology and identify activated glial populations enriched in the early phase of disease. Examining single-cell data from Alzheimer’s disease and progressive multiple sclerosis with our pipeline, we find a similar glial activation profile enriched in the early phase of these neurodegenerative diseases. In late-stage age-related macular degeneration, we identify a microglia-to-astrocyte signaling axis mediated by interleukin-1beta which drives angiogenesis characteristic of disease pathogenesis. We validated this mechanism using in vitro and in vivo assays in mouse, identifying a possible new therapeutic target for AMD and possibly other neurodegenerative conditions. Thus, due to shared glial states, the retina provides a potential system for investigating therapeutic approaches in neurodegenerative diseases.
URL:
Early beta-amyloid accumulation in the brain is associated with peripheral T cell alterations.
INTRODUCTION: Fast and minimally invasive approaches for early diagnosis of Alzheimer’s disease (AD) are highly anticipated. Evidence of adaptive immune cells responding to cerebral beta-amyloidosis has raised the question of whether immune markers could be used as proxies for beta-amyloid accumulation in the brain. METHODS: Here, we apply multidimensional mass-cytometry combined with unbiased machine-learning techniques to immunophenotype peripheral blood mononuclear cells from a total of 251 participants in cross-sectional and longitudinal studies. RESULTS: We show that increases in antigen-experienced adaptive immune cells in the blood, particularly CD45RA-reactivated T effector memory (TEMRA) cells, are associated with early accumulation of brain beta-amyloid and with changes in plasma AD biomarkers in still cognitively healthy subjects. DISCUSSION: Our results suggest that preclinical AD pathology is linked to systemic alterations of the adaptive immune system. These immunophenotype changes may help identify and develop novel diagnostic tools for early AD assessment and better understand clinical outcomes.
URL:
Near-lifespan longitudinal tracking of brain microvascular morphology, topology, and flow in male mice.
In age-related neurodegenerative diseases, pathology often develops slowly across the lifespan. As one example, in diseases such as Alzheimer’s, vascular decline is believed to onset decades ahead of symptomology. However, challenges inherent in current microscopic methods make longitudinal tracking of such vascular decline difficult. Here, we describe a suite of methods for measuring brain vascular dynamics and anatomy in mice for over seven months in the same field of view. This approach is enabled by advances in optical coherence tomography (OCT) and image processing algorithms including deep learning. These integrated methods enabled us to simultaneously monitor distinct vascular properties spanning morphology, topology, and function of the microvasculature across all scales: large pial vessels, penetrating cortical vessels, and capillaries. We have demonstrated this technical capability in wild-type and 3xTg male mice. The capability will allow comprehensive and longitudinal study of a broad range of progressive vascular diseases, and normal aging, in key model systems.
URL:
Comparing data-driven and hypothesis-driven MRI-based predictors of cognitive impairment in individuals from the Atherosclerosis Risk in Communities (ARIC) study.
INTRODUCTION: A data-driven index of dementia risk based on magnetic resonance imaging (MRI), the Alzheimer’s Disease Pattern Similarity (AD-PS) score, was estimated for participants in the Atherosclerosis Risk in Communities (ARIC) study. METHODS: AD-PS scores were generated for 839 cognitively non-impaired individuals with a mean follow-up of 4.86 years. The scores and a hypothesis-driven volumetric measure based on several brain regions susceptible to AD were compared as predictors of incident cognitive impairment in different settings. RESULTS: Logistic regression analyses suggest the data-driven AD-PS scores to be more predictive of incident cognitive impairment than its counterpart. Both biomarkers were more predictive of incident cognitive impairment in participants who were White, female, and apolipoprotein E gene (APOE) epsilon4 carriers. Random forest analyses including predictors from different domains ranked the AD-PS scores as the most relevant MRI predictor of cognitive impairment. CONCLUSIONS: Overall, the AD-PS scores were the stronger MRI-derived predictors of incident cognitive impairment in cognitively non-impaired individuals.
URL:
Effects of hardware heterogeneity on the performance of SVM Alzheimer’s disease classifier.
Fully automated machine learning methods based on structural magnetic resonance imaging (MRI) data can assist radiologists in the diagnosis of Alzheimer’s disease (AD). These algorithms require large data sets to learn the separation of subjects with and without AD. Training and test data may come from heterogeneous hardware settings, which can potentially affect the performance of disease classification. A total of 518 MRI sessions from 226 healthy controls and 191 individuals with probable AD from the multicenter Alzheimer’s Disease Neuroimaging Initiative (ADNI) were used to investigate whether grouping data by acquisition hardware (i.e. vendor, field strength, coil system) is beneficial for the performance of a support vector machine (SVM) classifier, compared to the case where data from different hardware is mixed. We compared the change of the SVM decision value resulting from (a) changes in hardware against the effect of disease and (b) changes resulting simply from rescanning the same subject on the same machine. Maximum accuracy of 87% was obtained with a training set of all 417 subjects. Classifiers trained with 95 subjects in each diagnostic group and acquired with heterogeneous scanner settings had an empirical detection accuracy of 84.2+-2.4% when tested on an independent set of the same size. These results mirror the accuracy reported in recent studies. Encouragingly, classifiers trained on images acquired with homogenous and heterogeneous hardware settings had equivalent cross-validation performances. Two scans of the same subject acquired on the same machine had very similar decision values and were generally classified into the same group. Higher variation was introduced when two acquisitions of the same subject were performed on two scanners with different field strengths. The variation was unbiased and similar for both diagnostic groups. The findings of the study encourage the pooling of data from different sites to increase the number of training samples and thereby improving performance of disease classifiers. Although small, a change in hardware could lead to a change of the decision value and thus diagnostic grouping. The findings of this study provide estimators for diagnostic accuracy of an automated disease diagnosis method involving scans acquired with different sets of hardware. Furthermore, we show that the level of confidence in the performance estimation significantly depends on the size of the training sample, and hence should be taken into account in a clinical setting.
URL:
Single-cell peripheral immunoprofiling of Alzheimer’s and Parkinson’s diseases.
Peripheral blood mononuclear cells (PBMCs) may provide insight into the pathogenesis of Alzheimer’s disease (AD) or Parkinson’s disease (PD). We investigated PBMC samples from 132 well-characterized research participants using seven canonical immune stimulants, mass cytometric identification of 35 PBMC subsets, and single-cell quantification of 15 intracellular signaling markers, followed by machine learning model development to increase predictive power. From these, three main intracellular signaling pathways were identified specifically in PBMC subsets from people with AD versus controls: reduced activation of PLCgamma2 across many cell types and stimulations and selectively variable activation of STAT1 and STAT5, depending on stimulant and cell type. Our findings functionally buttress the now multiply-validated observation that a rare coding variant in PLCG2 is associated with a decreased risk of AD. Together, these data suggest enhanced PLCgamma2 activity as a potential new therapeutic target for AD with a readily accessible pharmacodynamic biomarker.
URL:
Automatic classification of patients with Alzheimer’s disease from structural MRI: a comparison of ten methods using the ADNI database.
Recently, several high dimensional classification methods have been proposed to automatically discriminate between patients with Alzheimer’s disease (AD) or mild cognitive impairment (MCI) and elderly controls (CN) based on T1-weighted MRI. However, these methods were assessed on different populations, making it difficult to compare their performance. In this paper, we evaluated the performance of ten approaches (five voxel-based methods, three methods based on cortical thickness and two methods based on the hippocampus) using 509 subjects from the ADNI database. Three classification experiments were performed: CN vs AD, CN vs MCIc (MCI who had converted to AD within 18 months, MCI converters - MCIc) and MCIc vs MCInc (MCI who had not converted to AD within 18 months, MCI non-converters - MCInc). Data from 81 CN, 67 MCInc, 39 MCIc and 69 AD were used for training and hyperparameters optimization. The remaining independent samples of 81 CN, 67 MCInc, 37 MCIc and 68 AD were used to obtain an unbiased estimate of the performance of the methods. For AD vs CN, whole-brain methods (voxel-based or cortical thickness-based) achieved high accuracies (up to 81% sensitivity and 95% specificity). For the detection of prodromal AD (CN vs MCIc), the sensitivity was substantially lower. For the prediction of conversion, no classifier obtained significantly better results than chance. We also compared the results obtained using the DARTEL registration to that using SPM5 unified segmentation. DARTEL significantly improved six out of 20 classification experiments and led to lower results in only two cases. Overall, the use of feature selection did not improve the performance but substantially increased the computation times.
URL:
Performance comparison of 10 different classification techniques in segmenting white matter hyperintensities in aging.
INTRODUCTION: White matter hyperintensities (WMHs) are areas of abnormal signal on magnetic resonance images (MRIs) that characterize various types of histopathological lesions. The load and location of WMHs are important clinical measures that may indicate the presence of small vessel disease in aging and Alzheimer’s disease (AD) patients. Manually segmenting WMHs is time consuming and prone to inter-rater and intra-rater variabilities. Automated tools that can accurately and robustly detect these lesions can be used to measure the vascular burden in individuals with AD or the elderly population in general. Many WMH segmentation techniques use a classifier in combination with a set of intensity and location features to segment WMHs, however, the optimal choice of classifier is unknown. METHODS: We compare 10 different linear and nonlinear classification techniques to identify WMHs from MRI data. Each classifier is trained and optimized based on a set of features obtained from co-registered MR images containing spatial location and intensity information. We further assess the performance of the classifiers using different combinations of MRI contrast information. The performances of the different classifiers were compared on three heterogeneous multi-site datasets, including images acquired with different scanners and different scan-parameters. These included data from the ADC study from University of California Davis, the NACC database and the ADNI study. The classifiers (naive Bayes, logistic regression, decision trees, random forests, support vector machines, k-nearest neighbors, bagging, and boosting) were evaluated using a variety of voxel-wise and volumetric similarity measures such as Dice Kappa similarity index (SI), Intra-Class Correlation (ICC), and sensitivity as well as computational burden and processing times. These investigations enable meaningful comparisons between the performances of different classifiers to determine the most suitable classifiers for segmentation of WMHs. In the spirit of open-source science, we also make available a fully automated tool for segmentation of WMHs with pre-trained classifiers for all these techniques. RESULTS: Random Forests yielded the best performance among all classifiers with mean Dice Kappa (SI) of 0.66+-0.17 and ICC=0.99 for the ADC dataset (using T1w, T2w, PD, and FLAIR scans), SI=0.72+-0.10, ICC=0.93 for the NACC dataset (using T1w and FLAIR scans), SI=0.66+-0.23, ICC=0.94 for ADNI1 dataset (using T1w, T2w, and PD scans) and SI=0.72+-0.19, ICC=0.96 for ADNI2/GO dataset (using T1w and FLAIR scans). Not using the T2w/PD information did not change the performance of the Random Forest classifier (SI=0.66+-0.17, ICC=0.99). However, not using FLAIR information in the ADC dataset significantly decreased the Dice Kappa, but the volumetric correlation did not drastically change (SI=0.47+-0.21, ICC=0.95). CONCLUSION: Our investigations showed that with appropriate features, most off-the-shelf classifiers are able to accurately detect WMHs in presence of FLAIR scan information, while Random Forests had the best performance across all datasets. However, we observed that the performances of most linear classifiers and some nonlinear classifiers drastically decline in absence of FLAIR information, with Random Forest still retaining the best performance.
URL:
Association of Cardiovascular Risk Trajectory With Cognitive Decline and Incident Dementia.
BACKGROUND AND OBJECTIVES: Cardiovascular risk factors have a recently established association with cognitive decline and dementia, yet most studies examine this association through cross-sectional data, precluding an understanding of the longitudinal dynamics of such risk. The current study aims to explore how the ongoing trajectory of cardiovascular risk affects subsequent dementia and memory decline risk. We hypothesize that an accelerated, long-term accumulation of cardiovascular risk, as determined by the Framingham Risk Score (FRS), will be more detrimental to cognitive and dementia state outcomes than a stable cardiovascular risk. METHODS: We assessed an initially healthy, community-dwelling sample recruited from the prospective cohort Betula study. Cardiovascular disease risk, as assessed by the FRS, episodic memory performance, and dementia status were measured at each 5-year time point (T) across 20 to 25 years. Analysis was performed with bayesian additive regression tree, a semiparametric machine-learning method, applied herein as a multistate survival analysis method. RESULTS: Of the 1,244 participants, cardiovascular risk increased moderately over time in 60% of sample, with observations of an accelerated increase in 18% of individuals and minimal change in 22% of individuals. An accelerated, as opposed to a stable, cardiovascular risk trajectory predicted an increased risk of developing Alzheimer disease dementia (average risk ratio [RR] 3.3-5.7, 95% CI 2.6-17.5 at T2, 1.9-6.7 at T5) or vascular dementia (average RR 3.3-4.1, 95% CI 1.1-16.6 at T2, 1.5-7.6 at T5) and was associated with an increased risk of memory decline (average RR 1.4-1.2, 95% CI 1-1.9 at T2, 1-1.5 at T5). A stable cardiovascular risk trajectory appeared to partially mitigate Alzheimer disease dementia risk for APOE epsilon4 carriers. DISCUSSION: The findings of the current study show that the longitudinal, cumulative trajectory of cardiovascular risk is predictive of dementia risk and associated with the emergence of memory decline. As a result, clinical practice may benefit from directing interventions at individuals with accelerating cardiovascular risk.
URL:
GENEVIC: GENetic data exploration and visualization via intelli- gent interactive console.
SUMMARY: The vast generation of genetic data poses a significant challenge in efficiently uncovering valuable knowledge. Introducing GENEVIC, an AI-driven chat framework that tackles this challenge by bridging the gap between genetic data generation and biomedical knowledge discovery. Leveraging generative AI, notably ChatGPT, it serves as a biologist’s ‘copilot’. It automates the analysis, retrieval, and visualization of customized domain-specific genetic information, and integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv, making it a comprehensive tool for biomedical research. In its pilot phase, GENEVIC is assessed using a curated database that ranks genetic variants associated with Alzheimer’s disease, schizophrenia, and cognition, based on their effect weights from the Polygenic Score (PGS) Catalog, thus enabling researchers to prioritize genetic variants in complex diseases. GENEVIC’s operation is user-friendly, accessible without any specialized training, secured by Azure OpenAI’s HIPAA-compliant infrastructure, and evaluated for its efficacy through real-time query testing. As a prototype, GENEVIC is set to advance genetic research, enabling informed biomedical decisions. AVAILABILITY AND IMPLEMENTATION: GENEVIC is publicly accessible at https://genevic- anath2024.streamlit.app. The underlying code is open-source and available via GitHub at https://github.com/bsml320/GENEVIC.git (also at https://github.com/anath2110/GENEVIC.git). SUPPLEMENTARY INFORMATION: Available at Bioinformatics online and at https://github.com/bsml320/GENEVIC_Supplementary.git (also at https://github.com/anath2110/GENEVIC_Supplementary.git).
URL: https://genevic-
DEEPSEN: a convolutional neural network based method for super-enhancer prediction.
BACKGROUND: Super-enhancers (SEs) are clusters of transcriptional active enhancers, which dictate the expression of genes defining cell identity and play an important role in the development and progression of tumors and other diseases. Many key cancer oncogenes are driven by super-enhancers, and the mutations associated with common diseases such as Alzheimer’s disease are significantly enriched with super-enhancers. Super-enhancers have shown great potential for the identification of key oncogenes and the discovery of disease-associated mutational sites. RESULTS: In this paper, we propose a new computational method called DEEPSEN for predicting super-enhancers based on convolutional neural network. The proposed method integrates 36 kinds of features. Compared with existing approaches, our method performs better and can be used for genome-wide prediction of super-enhancers. Besides, we screen important features for predicting super-enhancers. CONCLUSION: Convolutional neural network is effective in boosting the performance of super-enhancer prediction.
URL:
Associations between the multitrajectory neuroplasticity of neuronavigated rTMS-mediated angular gyrus networks and brain gene expression in AD spectrum patients with sleep disorders.
INTRODUCTION: The multifactorial influence of repetitive transcranial magnetic stimulation (rTMS) on neuroplasticity in neural networks is associated with improvements in cognitive dysfunction and sleep disorders. The mechanisms of rTMS and the transcriptional-neuronal correlation in Alzheimer’s disease (AD) patients with sleep disorders have not been fully elucidated. METHODS: Forty-six elderly participants with cognitive impairment (23 patients with low sleep quality and 23 patients with high sleep quality) underwent 4-week periods of neuronavigated rTMS of the angular gyrus and neuroimaging tests, and gene expression data for six post mortem brains were collected from another database. Transcription-neuroimaging association analysis was used to evaluate the effects on cognitive dysfunction and the underlying biological mechanisms involved. RESULTS: Distinct variable neuroplasticity in the anterior and posterior angular gyrus networks was detected in the low sleep quality group. These interactions were associated with multiple gene pathways, and the comprehensive effects were associated with improvements in episodic memory. DISCUSSION: Multitrajectory neuroplasticity is associated with complex biological mechanisms in AD-spectrum patients with sleep disorders. HIGHLIGHTS: This was the first transcription-neuroimaging study to demonstrate that multitrajectory neuroplasticity in neural circuits was induced via neuronavigated rTMS, which was associated with complex gene expression in AD-spectrum patients with sleep disorders. The interactions between sleep quality and neuronavigated rTMS were coupled with multiple gene pathways and improvements in episodic memory. The present strategy for integrating neuroimaging, rTMS intervention, and genetic data provide a new approach to comprehending the biological mechanisms involved in AD.
URL:
A benchmark for hypothalamus segmentation on T1-weighted MR images.
The hypothalamus is a small brain structure that plays essential roles in sleep regulation, body temperature control, and metabolic homeostasis. Hypothalamic structural abnormalities have been reported in neuropsychiatric disorders, such as schizophrenia, amyotrophic lateral sclerosis, and Alzheimer’s disease. Although mag- netic resonance (MR) imaging is the standard examination method for evaluating this region, hypothalamic morphological landmarks are unclear, leading to subjec- tivity and high variability during manual segmentation. Due to these limitations, it is common to find contradicting results in the literature regarding hypothalamic volumetry. To the best of our knowledge, only two automated methods are available in the literature for hypothalamus segmentation, the first of which is our previous method based on U-Net. However, both methods present performance losses when predicting images from different datasets than those used in training. Therefore, this project presents a benchmark consisting of a diverse T1-weighted MR image dataset comprising 1381 subjects from IXI, CC359, OASIS, and MiLI (the latter created specifically for this benchmark). All data were provided using automatically generated hypothalamic masks and a subset containing manually annotated masks. As a baseline, a method for fully automated segmentation of the hypothalamus on T1-weighted MR images with a greater generalization ability is presented. The pro- posed method is a teacher-student-based model with two blocks: segmentation and correction, where the second corrects the imperfections of the first block. After using three datasets for training (MiLI, IXI, and CC359), the prediction performance of the model was measured on two test sets: the first was composed of data from IXI, CC359, and MiLI, achieving a Dice coefficient of 0.83; the second was from OASIS, a dataset not used for training, achieving a Dice coefficient of 0.74. The dataset, the baseline model, and all necessary codes to reproduce the experiments are available at https://github.com/MICLab-Unicamp/HypAST and https://sites.google.com/ view/calgary-campinas-dataset/hypothalamus-benchmarking. In addition, a leaderboard will be maintained with predictions for the test set submitted by anyone working on the same task.
URL: https://github.com/MICLab-Unicamp/HypAST
Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme.
A major challenge to achieving industry-scale biomanufacturing of therapeutic alkaloids is the slow process of biocatalyst engineering. Amaryllidaceae alkaloids, such as the Alzheimer’s medication galantamine, are complex plant secondary metabolites with recognized therapeutic value. Due to their difficult synthesis they are regularly sourced by extraction and purification from the low-yielding daffodil Narcissus pseudonarcissus. Here, we propose an efficient biosensor-machine learning technology stack for biocatalyst development, which we apply to engineer an Amaryllidaceae enzyme in Escherichia coli. Directed evolution is used to develop a highly sensitive (EC50 = 20 muM) and specific biosensor for the key Amaryllidaceae alkaloid branchpoint 4’-O-methylnorbelladine. A structure-based residual neural network (MutComputeX) is subsequently developed and used to generate activity-enriched variants of a plant methyltransferase, which are rapidly screened with the biosensor. Functional enzyme variants are identified that yield a 60% improvement in product titer, 2-fold higher catalytic activity, and 3-fold lower off-product regioisomer formation. A solved crystal structure elucidates the mechanism behind key beneficial mutations.
URL:
Validity, feasibility, and effectiveness of a voice-recognition based digital cognitive screener for dementia and mild cognitive impairment in community-dwelling older Chinese adults: A large-scale implementation study.
INTRODUCTION: We investigated the validity, feasibility, and effectiveness of a voice recognition-based digital cognitive screener (DCS), for detecting dementia and mild cognitive impairment (MCI) in a large-scale community of elderly participants. METHODS: Eligible participants completed demographic, cognitive, functional assessments and the DCS. Neuropsychological tests were used to assess domain-specific and global cognition, while the diagnosis of MCI and dementia relied on the Clinical Dementia Rating Scale. RESULTS: Among the 11,186 participants, the DCS showed high completion rates (97.5%) and a short administration time (5.9 min) across gender, age, and education groups. The DCS demonstrated areas under the receiver operating characteristics curve (AUCs) of 0.95 and 0.83 for dementia and MCI detection, respectively, among 328 participants in the validation phase. Furthermore, the DCS resulted in time savings of 16.2% to 36.0% compared to the Mini-Mental State Examination (MMSE) and Montral Cognitive Assessment (MoCA). DISCUSSION: This study suggests that the DCS is an effective and efficient tool for dementia and MCI case-finding in large-scale cognitive screening. HIGHLIGHTS: To our best knowledge, this is the first cognitive screening tool based on voice recognition and utilizing conversational AI that has been assessed in a large population of Chinese community-dwelling elderly. With the upgrading of a new multimodal understanding model, the DCS can accurately assess participants’ responses, including different Chinese dialects, and provide automatic scores. The DCS not only exhibited good discriminant ability in detecting dementia and MCI cases, it also demonstrated a high completion rate and efficient administration regardless of gender, age, and education differences. The DCS is economically efficient, scalable, and had a better screening efficacy compared to the MMSE or MoCA, for wider implementation.
URL:
Independent replication of advanced brain age in mild cognitive impairment and dementia: detection of future cognitive dysfunction.
We previously developed a novel machine-learning-based brain age model that was sensitive to amyloid. We aimed to independently validate it and to demonstrate its utility using independent clinical data. We recruited 650 participants from South Korean memory clinics to undergo magnetic resonance imaging and clinical assessments. We employed a pretrained brain age model that used data from an independent set of largely Caucasian individuals (n = 757) who had no or relatively low levels of amyloid as confirmed by positron emission tomography (PET). We investigated the association between brain age residual and cognitive decline. We found that our pretrained brain age model was able to reliably estimate brain age (mean absolute error = 5.68 years, r(650) = 0.47, age range = 49-89 year) in the sample with 71 participants with subjective cognitive decline (SCD), 375 with mild cognitive impairment (MCI), and 204 with dementia. Greater brain age was associated with greater amyloid and worse cognitive function [Odds Ratio, (95% Confidence Interval {CI}): 1.28 (1.06-1.55), p = 0.030 for amyloid PET positivity; 2.52 (1.76-3.61), p < 0.001 for dementia]. Baseline brain age residual was predictive of future cognitive worsening even after adjusting for apolipoprotein E e4 and amyloid status [Hazard Ratio, (95% CI): 1.94 (1.33-2.81), p = 0.001 for total 336 follow-up sample; 2.31 (1.44-3.71), p = 0.001 for 284 subsample with baseline Clinical Dementia Rating <= 0.5; 2.40 (1.43-4.03), p = 0.001 for 240 subsample with baseline SCD or MCI]. In independent data set, these results replicate our previous findings using this model, which was able to delineate significant differences in brain age according to the diagnostic stages of dementia as well as amyloid deposition status. Brain age models may offer benefits in discriminating and tracking cognitive impairment in older adults.
URL:
Does feature selection improve classification accuracy?
Impact of sample size and feature selection on classification using anatomical magnetic resonance images. There are growing numbers of studies using machine learning approaches to characterize patterns of anatomical difference discernible from neuroimaging data. The high-dimensionality of image data often raises a concern that feature selection is needed to obtain optimal accuracy. Among previous studies, mostly using fixed sample sizes, some show greater predictive accuracies with feature selection, whereas others do not. In this study, we compared four common feature selection methods. 1) Pre-selected region of interests (ROIs) that are based on prior knowledge. 2) Univariate t-test filtering. 3) Recursive feature elimination (RFE), and 4) t-test filtering constrained by ROIs. The predictive accuracies achieved from different sample sizes, with and without feature selection, were compared statistically. To demonstrate the effect, we used grey matter segmented from the T1-weighted anatomical scans collected by the Alzheimer’s disease Neuroimaging Initiative (ADNI) as the input features to a linear support vector machine classifier. The objective was to characterize the patterns of difference between Alzheimer’s disease (AD) patients and cognitively normal subjects, and also to characterize the difference between mild cognitive impairment (MCI) patients and normal subjects. In addition, we also compared the classification accuracies between MCI patients who converted to AD and MCI patients who did not convert within the period of 12 months. Predictive accuracies from two data-driven feature selection methods (t-test filtering and RFE) were no better than those achieved using whole brain data. We showed that we could achieve the most accurate characterizations by using prior knowledge of where to expect neurodegeneration (hippocampus and parahippocampal gyrus). Therefore, feature selection does improve the classification accuracies, but it depends on the method adopted. In general, larger sample sizes yielded higher accuracies with less advantage obtained by using knowledge from the existing literature.
URL:
Transcriptomic analysis to identify genes associated with selective hippocampal vulnerability in Alzheimer’s disease.
Selective vulnerability of different brain regions is seen in many neurodegenerative disorders. The hippocampus and cortex are selectively vulnerable in Alzheimer’s disease (AD), however the degree of involvement of the different brain regions differs among patients. We classified corticolimbic patterns of neurofibrillary tangles in postmortem tissue to capture extreme and representative phenotypes. We combined bulk RNA sequencing with digital pathology to examine hippocampal vulnerability in AD. We identified hippocampal gene expression changes associated with hippocampal vulnerability and used machine learning to identify genes that were associated with AD neuropathology, including SERPINA5, RYBP, SLC38A2, FEM1B, and PYDC1. Further histologic and biochemical analyses suggested SERPINA5 expression is associated with tau expression in the brain. Our study highlights the importance of embracing heterogeneity of the human brain in disease to identify disease-relevant gene expression.
URL:
Metapaths: similarity search in heterogeneous knowledge graphs via meta-paths.
SUMMARY: Heterogeneous knowledge graphs (KGs) have enabled the modeling of complex systems, from genetic interaction graphs and protein-protein interaction networks to networks representing drugs, diseases, proteins, and side effects. Analytical methods for KGs rely on quantifying similarities between entities, such as nodes, in the graph. However, such methods must consider the diversity of node and edge types contained within the KG via, for example, defined sequences of entity types known as meta-paths. We present metapaths, the first R software package to implement meta-paths and perform meta-path-based similarity search in heterogeneous KGs. The metapaths package offers various built-in similarity metrics for node pair comparison by querying KGs represented as either edge or adjacency lists, as well as auxiliary aggregation methods to measure set-level relationships. Indeed, evaluation of these methods on an open-source biomedical KG recovered meaningful drug and disease-associated relationships, including those in Alzheimer’s disease. The metapaths framework facilitates the scalable and flexible modeling of network similarities in KGs with applications across KG learning. AVAILABILITY AND IMPLEMENTATION: The metapaths R package is available via GitHub at https://github.com/ayushnoori/metapaths and is released under MPL 2.0 (Zenodo DOI: 10.5281/zenodo.7047209). Package documentation and usage examples are available at https://www.ayushnoori.com/metapaths.
URL: https://github.com/ayushnoori/metapaths
Benchmarking machine learning models for late-onset alzheimer’s disease prediction from genomic data.
BACKGROUND: Late-Onset Alzheimer’s Disease (LOAD) is a leading form of dementia. There is no effective cure for LOAD, leaving the treatment efforts to depend on preventive cognitive therapies, which stand to benefit from the timely estimation of the risk of developing the disease. Fortunately, a growing number of Machine Learning methods that are well positioned to address this challenge are becoming available. RESULTS: We conducted systematic comparisons of representative Machine Learning models for predicting LOAD from genetic variation data provided by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort. Our experimental results demonstrate that the classification performance of the best models tested yielded ~72% of area under the ROC curve. CONCLUSIONS: Machine learning models are promising alternatives for estimating the genetic risk of LOAD. Systematic machine learning model selection also provides the opportunity to identify new genetic markers potentially associated with the disease.
URL:
ET-GRU: using multi-layer gated recurrent units to identify electron transport proteins.
BACKGROUND: Electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. It is also the major process to extract energy via redox reactions in the case of oxidation of sugars. Many studies have determined that the electron transport protein has been implicated in a variety of human diseases, i.e. diabetes, Parkinson, Alzheimer’s disease and so on. Few bioinformatics studies have been conducted to identify the electron transport proteins with high accuracy, however, their performance results require a lot of improvements. Here, we present a novel deep neural network architecture to address this problem. RESULTS: Most of the previous studies could not use the original position specific scoring matrix (PSSM) profiles to feed into neural networks, leading to a lack of information and the neural networks consequently could not achieve the best results. In this paper, we present a novel approach by using deep gated recurrent units (GRU) on full PSSMs to resolve this problem. Our approach can precisely predict the electron transporters with the cross-validation and independent test accuracy of 93.5 and 92.3%, respectively. Our approach demonstrates superior performance to all of the state-of-the-art predictors on electron transport proteins. CONCLUSIONS: Through the proposed study, we provide ET-GRU, a web server for discriminating electron transport proteins in particular and other protein functions in general. Also, our achievement could promote the use of GRU in computational biology, especially in protein function prediction.
URL:
Intelligent and effective informatic deconvolution of “Big Data” and its future impact on the quantitative nature of neurodegenerative disease therapy.
Biomedical data sets are becoming increasingly larger and a plethora of high-dimensionality data sets (“Big Data”) are now freely accessible for neurodegenerative diseases, such as Alzheimer’s disease. It is thus important that new informatic analysis platforms are developed that allow the organization and interrogation of Big Data resources into a rational and actionable mechanism for advanced therapeutic development. This will entail the generation of systems and tools that allow the cross-platform correlation between data sets of distinct types, for example, transcriptomic, proteomic, and metabolomic. Here, we provide a comprehensive overview of the latest strategies, including latent semantic analytics, topological data investigation, and deep learning techniques that will drive the future development of diagnostic and therapeutic applications for Alzheimer’s disease. We contend that diverse informatic “Big Data” platforms should be synergistically designed with more advanced chemical/drug and cellular/tissue-based phenotypic analytical predictive models to assist in either de novo drug design or effective drug repurposing.
URL:
Optimizing PiB-PET SUVR change-over-time measurement by a large-scale analysis of longitudinal reliability, plausibility, separability, and correlation with MMSE.
Quantitative measurements of change in beta-amyloid load from Positron Emission Tomography (PET) images play a critical role in clinical trials and longitudinal observational studies of Alzheimer’s disease. These measurements are strongly affected by methodological differences between implementations, including choice of reference region and use of partial volume correction, but there is a lack of consensus for an optimal method. Previous works have examined some relevant variables under varying criteria, but interactions between them prevent choosing a method via combined meta-analysis. In this work, we present a thorough comparison of methods to measure change in beta-amyloid over time using Pittsburgh Compound B (PiB) PET imaging. METHODS: We compare 1,024 different automated software pipeline implementations with varying methodological choices according to four quality metrics calculated over three-timepoint longitudinal trajectories of 129 subjects: reliability (straightness/variance); plausibility (lack of negative slopes); ability to predict accumulator/non-accumulator status from baseline value; and correlation between change in beta-amyloid and change in Mini Mental State Exam (MMSE) scores. RESULTS AND CONCLUSION: From this analysis, we show that an optimal longitudinal measure of beta-amyloid from PiB should use a reference region that includes a combination of voxels in the supratentorial white matter and those in the whole cerebellum, measured using two-class partial volume correction in the voxel space of each subject’s corresponding anatomical MR image.
URL:
A blood-based predictor for neocortical Abeta burden in Alzheimer’s disease: results from the AIBL study.
Dementia is a global epidemic with Alzheimer’s disease (AD) being the leading cause. Early identification of patients at risk of developing AD is now becoming an international priority. Neocortical Abeta (extracellular beta-amyloid) burden (NAB), as assessed by positron emission tomography (PET), represents one such marker for early identification. These scans are expensive and are not widely available, thus, there is a need for cheaper and more widely accessible alternatives. Addressing this need, a blood biomarker-based signature having efficacy for the prediction of NAB and which can be easily adapted for population screening is described. Blood data (176 analytes measured in plasma) and Pittsburgh Compound B (PiB)-PET measurements from 273 participants from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study were utilised. Univariate analysis was conducted to assess the difference of plasma measures between high and low NAB groups, and cross-validated machine-learning models were generated for predicting NAB. These models were applied to 817 non-imaged AIBL subjects and 82 subjects from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) for validation. Five analytes showed significant difference between subjects with high compared to low NAB. A machine-learning model (based on nine markers) achieved sensitivity and specificity of 80 and 82%, respectively, for predicting NAB. Validation using the ADNI cohort yielded similar results (sensitivity 79% and specificity 76%). These results show that a panel of blood-based biomarkers is able to accurately predict NAB, supporting the hypothesis for a relationship between a blood-based signature and Abeta accumulation, therefore, providing a platform for developing a population-based screen.
URL:
Blood RNA transcripts reveal similar and differential alterations in fundamental cellular processes in Alzheimer’s disease and other neurodegenerative diseases.
BACKGROUND: Dysfunctional processes in Alzheimer’s disease and other neurodegenerative diseases lead to neural degeneration in the central and peripheral nervous system. Research demonstrates that neurodegeneration of any kind is a systemic disease that may even begin outside of the region vulnerable to the disease. Neurodegenerative diseases are defined by the vulnerabilities and pathology occurring in the regions affected. METHOD: A random forest machine learning analysis on whole blood transcriptomes from six neurodegenerative diseases generated unbiased disease-classifying RNA transcripts subsequently subjected to pathway analysis. RESULTS: We report that transcripts of the blood transcriptome selected for each of the neurodegenerative diseases represent fundamental biological cell processes including transcription regulation, degranulation, immune response, protein synthesis, apoptosis, cytoskeletal components, ubiquitylation/proteasome, and mitochondrial complexes that are also affected in the brain and reveal common themes across six neurodegenerative diseases. CONCLUSION: Neurodegenerative diseases share common dysfunctions in fundamental cellular processes. Identifying regional vulnerabilities will reveal unique disease mechanisms. HIGHLIGHTS: Transcriptomics offer information about dysfunctional processes. Comparing multiple diseases will expose unique malfunctions within diseases. Blood RNA can be used ante mortem to track expression changes in neurodegenerative diseases. Protocol standardization will make public datasets compatible.
URL:
Early-onset Alzheimer disease clinical variants: multivariate analyses of cortical thickness.
OBJECTIVE: To assess patterns of reduced cortical thickness in different clinically defined variants of early-onset Alzheimer disease (AD) and to explore the hypothesis that these variants span a phenotypic continuum rather than represent distinct subtypes. METHODS: The case-control study included 25 patients with posterior cortical atrophy (PCA), 15 patients with logopenic progressive aphasia (LPA), and 14 patients with early-onset typical amnestic AD (tAD), as well as 30 healthy control subjects. Cortical thickness was measured using FreeSurfer, and differences and commonalities in patterns of reduced cortical thickness were assessed between patient groups and controls. Given the difficulty of using mass-univariate statistics to test ideas of continuous variation, we use multivariate machine learning algorithms to visualize the spectrum of subjects and to assess separation of patient groups from control subjects and from each other. RESULTS: Although each patient group showed disease-specific reductions in cortical thickness compared with control subjects, common areas of cortical thinning were identified, mainly involving temporoparietal regions. Multivariate analyses permitted clear separation between control subjects and patients and moderate separation between patients with PCA and LPA, while patients with tAD were distributed along a continuum between these extremes. Significant classification performance could nevertheless be obtained when every pair of patient groups was compared directly. CONCLUSIONS: Analyses of cortical thickness patterns support the hypothesis that different clinical presentations of AD represent points in a phenotypic spectrum of neuroanatomical variation. Machine learning shows promise for syndrome separation and for identifying common anatomic patterns across syndromes that may signify a common pathology, both aspects of interest for treatment trials.
URL:
Biomarker clustering in autosomal dominant Alzheimer’s disease.
INTRODUCTION: As the number of biomarkers used to study Alzheimer’s disease (AD) continues to increase, it is important to understand the utility of any given biomarker, as well as what additional information a biomarker provides when compared to others. METHODS: We used hierarchical clustering to group 19 cross-sectional biomarkers in autosomal dominant AD. Feature selection identified biomarkers that were the strongest predictors of mutation status and estimated years from symptom onset (EYO). Biomarkers identified included clinical assessments, neuroimaging, cerebrospinal fluid amyloid, and tau, and emerging biomarkers of neuronal integrity and inflammation. RESULTS: Three primary clusters were identified: neurodegeneration, amyloid/tau, and emerging biomarkers. Feature selection identified amyloid and tau measures as the primary predictors of mutation status and EYO. Emerging biomarkers of neuronal integrity and inflammation were relatively weak predictors. DISCUSSION: These results provide novel insight into our understanding of the relationships among biomarkers and the staging of biomarkers based on disease progression.
URL:
Dissociation of tau pathology and neuronal hypometabolism within the ATN framework of Alzheimer’s disease.
Alzheimer’s disease (AD) is defined by amyloid (A) and tau (T) pathologies, with T better correlated to neurodegeneration (N). However, T and N have complex regional relationships in part related to non-AD factors that influence N. With machine learning, we assessed heterogeneity in 18F-flortaucipir vs. 18F-fluorodeoxyglucose positron emission tomography as markers of T and neuronal hypometabolism (NM) in 289 symptomatic patients from the Alzheimer’s Disease Neuroimaging Initiative. We identified six T/NM clusters with differing limbic and cortical patterns. The canonical group was defined as the T/NM pattern with lowest regression residuals. Groups resilient to T had less hypometabolism than expected relative to T and displayed better cognition than the canonical group. Groups susceptible to T had more hypometabolism than expected given T and exhibited worse cognitive decline, with imaging and clinical measures concordant with non-AD copathologies. Together, T/NM mismatch reveals distinct imaging signatures with pathobiological and prognostic implications for AD.
URL:
Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse.
A variety of algorithms have been proposed for computer-aided diagnosis of dementia from anatomical brain MRI. These approaches achieve high accuracy when applied to research data sets but their performance on real-life clinical routine data has not been evaluated yet. The aim of this work was to study the performance of such approaches on clinical routine data, based on a hospital data warehouse, and to compare the results to those obtained on a research data set. The clinical data set was extracted from the hospital data warehouse of the Greater Paris area, which includes 39 different hospitals. The research set was composed of data from the Alzheimer’s Disease Neuroimaging Initiative data set. In the clinical set, the population of interest was identified by exploiting the diagnostic codes from the 10th revision of the International Classification of Diseases that are assigned to each patient. We studied how the imbalance of the training sets, in terms of contrast agent injection and image quality, may bias the results. We demonstrated that computer-aided diagnosis performance was strongly biased upwards (over 17 percent points of balanced accuracy) by the confounders of image quality and contrast agent injection, a phenomenon known as the Clever Hans effect or shortcut learning. When these biases were removed, the performance was very poor. In any case, the performance was considerably lower than on the research data set. Our study highlights that there are still considerable challenges for translating dementia computer-aided diagnosis systems to clinical routine.
URL: