Record Details

SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data

OAR@ICRISAT

View Archive Info
 
 
Field Value
 
Relation http://oar.icrisat.org/11757/
https://doi.org/10.3389/fpls.2020.552509
doi:10.3389/fpls.2020.552509
 
Title SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data
 
Creator Kar, S
Garin, V
Kholová, J
Vadez, V
Durbha, S S
Tanaka, R
Iwata, H
Urban, M O
Adinarayana, J
 
Subject Crop Physiology
 
Description The rapid development of phenotyping technologies over the last years gave the
opportunity to study plant development over time. The treatment of the massive
amount of data collected by high-throughput phenotyping (HTP) platforms is however
an important challenge for the plant science community. An important issue is to
accurately estimate, over time, the genotypic component of plant phenotype. In outdoor
and field-based HTP platforms, phenotype measurements can be substantially affected
by data-generation inaccuracies or failures, leading to erroneous or missing data. To
solve that problem, we developed an analytical pipeline composed of three modules:
detection of outliers, imputation of missing values, and mixed-model genotype adjusted
means computation with spatial adjustment. The pipeline was tested on three different
traits (3D leaf area, projected leaf area, and plant height), in two crops (chickpea,
sorghum), measured during two seasons. Using real-data analyses and simulations,
we showed that the sequential application of the three pipeline steps was particularly
useful to estimate smooth genotype growth curves from raw data containing a large
amount of noise, a situation that is potentially frequent in data generated on outdoor
HTP platforms. The procedure we propose can handle up to 50% of missing values. It
is also robust to data contamination rates between 20 and 30% of the data. The pipeline
was further extended to model the genotype time series data. A change-point analysis
allowed the determination of growth phases and the optimal timing where genotypic
differences were the largest. The estimated genotypic values were used to cluster the
genotypes during the optimal growth phase. Through a two-way analysis of variance
(ANOVA), clusters were found to be consistently defined throughout the growth duration.
Therefore, we could show, on a wide range of scenarios, that the pipeline facilitated
efficient extraction of useful information from outdoor HTP platform data. High-quality
plant growth time series data is also provided to support breeding decisions. The R
code of the pipeline is available at https://github.com/ICRISAT-GEMS/SpaTemHTP.
 
Publisher Frontiers Media
 
Date 2020-11
 
Type Article
PeerReviewed
 
Format application/pdf
 
Language en
 
Identifier http://oar.icrisat.org/11757/1/fpls-11-552509.pdf
Kar, S and Garin, V and Kholová, J and Vadez, V and Durbha, S S and Tanaka, R and Iwata, H and Urban, M O and Adinarayana, J (2020) SpaTemHTP: A Data Analysis Pipeline for Efficient Processing and Utilization of Temporal High-Throughput Phenotyping Data. Frontiers in Plant Science (TSI), 11 (552509). pp. 1-16. ISSN 1664-462X