Record Details

Outliers in Designed Experiments

KRISHI: Publication and Data Inventory Repository

View Archive Info
 
 
Field Value
 
Title Outliers in Designed Experiments
Not Available
 
Creator Lalmohan Bhar
Rajender Parsad
V.K. Gupta
 
Subject Outlier(s)
Block designs
Testing the presence of outliers
M-estimation procedure
 
Description Not Available
An outlier in a set of data is an observation (or an observation vector) that appears to be inconsistent with the remainder of the observations in that data set. Occurrence of outlier(s) is common in every field in which data collection is involved. In agricultural experiments, outlier(s) is/are likely to appear in the experimental data due to disease and or insect-pest attack on some plots in the field, or due to unintentional heavy irrigation on some particular block(s) or plot(s) of the experiment. Outlier(s) may creep in due to transcription errors. Presence of such abnormally high or low observations may cause a deviation from the assumptions particularly those of normality and homogeneity of observations. It is, therefore, important to detect the presence of outlier(s) along with deviations from these assumptions and suggest remedial measures. The problem of outliers has been studied extensively in linear regression models. Approaches to study of outliers are generally divided into two broad categories: (i) to identify the outlier(s) for further study and (ii) to accommodate the possibility of outlier(s) by suitable modifications of the models and or method of analysis. The first approach relates to detection of outlier(s) while the second one relates to the study of robust methods of estimation of parameters that minimize the influence of outlier(s) on inference concerning parameters. A number of test statistics have been developed to detect outliers in linear regression models. Among them Cook-statistic is a widely used statistic. Other important test statistics for detection of outlier(s) are AP and Qk-statistic. M-estimation procedure is a very powerful robust method of estimation used in linear regression model. In M-estimation a function of errors is minimized to obtain parameter estimates, unlike least squares method where sum of square of errors is minimized. Each observation gets different weights for estimating parameters where as in the usual procedure of least squares all observations get equal weights. This function is called objective function. A good number of objective functions such as Huber’s function, Andrew’s function etc. are now available. Another procedure of robust estimation of parametric function is Least Median of Squares (LMS) method wherein median of the errors is minimized to obtain the parameter estimates. Though, the general set up of an experimental design is that of a linear model, yet detection and testing of outlier(s) and application of robust methods in experimental designs need special attention because (i) the design matrix does not have full column rank (ii) interest is only in a sub set of parameters rather than whole vector of parameters. Not much research appears to have been done on detection of outliers and robust methods of estimation in designed experiments. The available test statistic and robust procedures of estimation cannot be applied directly to this situation. One can, however, instead of taking post experimental remedial measures, take pre-experimental measures by adopting a robust design for experimentation. A robust design is insensitive to the presence of outlying observations in the sense that the inference problem on linear function of treatment effects is not affected by the presence of outliers in the experimental data. However, this study is so far confined to identify robust designs against presence of a single outlier. With this view in mind the present study has been taken to investigate thoroughly the problem of outliers in designed experiments. Both detection and accommodation of outliers have been considered in the present investigation. Problem associated with outliers has been discussed with some examples in the first chapter. A thorough review of the subject is also presented in the first chapter along with the scope of the present investigation. The practical utility of the present investigation is also discussed in this chapter. Detection of outliers in designed experiments has been considered in the second chapter. For detecting outliers in designed experiments Bhar and Gupta (2001) provided three statistics viz., Cook-statistic, AP-statistic and Qk-statistic. These statistics are applied to real experimental data taken from Agricultural Field Experiments Information System (AFEIS), IASRI. It has been found that many of these experiments contain outliers. Actually these experimental data were investigated for the presence of any kind of problems like non-normality or heterogeneity of error variance under a project entitled ‘A diagnostic study of field experiments’ conducted at IASRI. Based on the normality and homogeneity of errors, these data were grouped into several groups like non-normal and heterogeneous error variance etc. Statistics for detecting outliers were applied to these data sets. The results obtained is summarized in a table. Once outlier(s) are identified, next question may arise what to do with these outliers? One way to handle outliers is to simply discard the observations. The second way is to perform an analysis of covariance by taking one as the value of the covariate for the outlying observation and zeros for the rest of the observations. Both types of analysis for those experiments where outliers were found were carried out. Outlier detection method has been illustrated with an example. The detection of influential subsets or multiple outliers is more difficult, owing to masking and swamping problems. Masking occurs when one outlier is not detected because of the presence of others, swamping when a non-outlier is wrongly identified owing to the effect of some hidden outliers. Pena and Yohai (1995) proposed a method to identify influential subsets by looking at the eigenvalues of an ‘influence matrix’. This matrix is defined as the uncentred covariance of a set of vectors which represent the effect on the fit of the deletion of each data point. This matrix is normalized to have the univariate Cook (1979) statistics on the diagonal. This method has been modified for application in designed experiments and procedure for identifying the influential sets has been discussed. The proposed method has been illustrated with an example. Another way to tackle the problem of outliers is to perform a robust analysis of the data. A robust procedure tries to accommodate the majority of good data points. Bad points, lying far away from the pattern formed by the good ones. Among robust procedures, M-estimation method is most widely used. In the third chapter the concept of M-estimation is introduced and then applied to designed experiments. Generally, in M-estimation an objective function (a function of errors) is minimized to obtain the parameter estimates. There are many objective functions of M-estimation for linear regression model available in the literature. Some of these objective functions are discussed in the present chapter and their applicability to designed experiments has been explored. Most of these objective functions involved some tuning constants. The efficiency of the M-estimation procedures depends upon how best these tuning constants are selected. For application to designed experiments the appropriate values of these constants have been proposed. For testing the hypotheses appropriate robust testing procedures are available in the literature. Some of these procedures have been discussed in this chapter. The existing objective functions have been modified by suitably choosing the constants. A new objective function has been proposed. The proposed function is based upon Cook-statistic and, therefore, addressed the basic requirement of design of experiments. All these functions along with the newly developed function have been illustrated with some examples. In chapter 4 another robust method of analysis of data viz. Least Median of Squares (LMS) method has been introduced. The concept of this method as it is developed from linear regression model context is presented in this chapter. It is well known that least squares (LS) model can be distorted even by a single outlying observation. The fitted line or surface might be tipped so that it no longer passes through the bulk of the data. In least square method sum of square of errors is minimized to obtain the parameter estimates. It known that sum is not robust. In contrast to sum, in LMS method median of the square errors is minimized to obtain the parameter estimates. Fitting an LMS regression model poses some difficulties. The first is computational. Unlike least squares regression, there is no formula that can be used to calculate the coefficients for an LMS regression. Rousseeuw (1984) has proposed an algorithm to obtain LMS estimator. However, this algorithm cannot be applied directly in designed experiments. This method has been appropriately modified for application in designed experiments and illustrated with some examples. There is yet another way of minimizing the influence of outlying observations, particularly in designed experiments is to adopt a design that is insensitive to the presence of outlying observations. Such designs are known in the literature, as robust designs, robust in the sense that the outlying observation does not have any impact on the estimation of parameters. Robustness of experimental designs against missing observations or any other disturbance has been studied extensively in the literature. There is a little work on robustness against outliers is available in the literature. Moreover, this study is confined to the presence of a single outlier. In the present chapter this study has been extended for more than one outlier. A general criterion for identifying robust designs against the presence of any t outliers has been developed in this chapter. However, identification robust designs using this criterion is mathematically intractable. Therefore, this criterion has been applied to identify robust designs that are robust against the presence of any two outliers. It has been found that all binary proper variance balanced block designs are robust against the presence of any two outliers. The problem of outliers in linear regression models can be handled by using several statistical packages. These statistical packages are not capable of handling outliers in designed experiments. Thus with the development of new methodologies for tackling outliers in designed experiments, a user-friendly software for implementing these new techniques is required. A software has been developed for analyzing experimental data in presence of outliers. Various aspects of this software have been discussed in the chapter 6. The report is concluded with a summary. Under this study, a dissemination workshop was organized on 26 July, 2007 at IASRI. Many renowned personalities working both in the field of statistics and field experimentations from different parts of the country have participated in this workshop. Salient achievements of the study have been discussed in the workshop. A number of recommendations and suggestions emerged from the discussion. These are presented in the chapter 7.
Not Available
 
Date 2018-07-09T10:09:30Z
2018-07-09T10:09:30Z
2008-08-31
 
Type Project Report
 
Identifier L.M. Bhar, Rajender Parsad and V.K. Gupta (2008). Outliers in Designed Experiments. IASRI, New Delhi. I.A.S.R.I./P.R.-05/2008
Not Available
http://krishi.icar.gov.in/jspui/handle/123456789/6139
 
Language English
 
Relation I.A.S.R.I./P.R.-05/2008;
 
Publisher ICAR-IASRI, Library Avenue, Pusa, New Delhi