CSI 779 / STAT 759 Comments on Wilcox (1997) Wilcox occasionally uses terms in a way that is not standard in statistics. They also often are not very precise. In my own lectures and notes, I will try to state some things more precisely. I will also point out some things to note in your reading of Wilcox. (Unfortunately, perhaps, I will sometimes do this by referencing stuff I've written elsewhere.) Chapter 1 (last updated 1/22/03) Chapter 1 makes some good points about the effects of small departures from normality. (You must accept that he uses ``distribution'' in two ways: one actually to mean ``distribution'', as in the middle of page 2: ``... distributions can be highly skewed...''; and also to mean the CDF, as in equation (1.1) on page 2.) The Kolmogorov distance is the most commonly used measure of the difference in two CDFs, but there are other measures. For example, the Anderson-Darling distance, which depends on a given base distribution, is more sensitive to departures in the tails of the distribution. The distinction he makes in the terms ``influence curve'' and ``influence function'' is not standard. Most people use these terms more or less interchangably, but distinguish the concepts he refers to by use of qualifiers on whichever of these two phrases they use. Chapter 2 (last updated 2/13/03) The first sentence is very off-putting to a statistician, especially a mathematical statistician. In the first sentence of the Preface, and elsewhere, he uses ``robust'' to refer to methods of statistical inference. In Chapter 2, he uses ``robust'' to refer to properties of a distribution. Throughout this chapter, he seems to use the word ``measure'' to refer to a parameter of a distribution (although sometimes he uses it in the context of a sample), and he seems to use ``estimator'' as the analogous sample quantity; thus, ``R-meaure of location'' (pages 20,21) is an unusual parameter of a distribution, and ``R-estimator of location'' (page 21) is the plug-in estimator of an ``R-measure''. This is an unusual way to set up the problem. The way he uses phrases like ``M-measure of location'' is not standard. In studies of robustness, however, we study the behavior of an estimator T(P_n) as n increases without bound. In this case P_n->P, so the properties of T(P) are of interest. His discussion of ``measure of location'' in terms of equivariance, on page 12, is imprecise. He says ``...let theta(X) be some descriptive measure of F''. Then he talks about theta(X+b). How does X+b relate to F? Does F have to be a location group distribution for this even to make sense? (Otherwise, what is the meaning of theta(X) --- recall it is a ``descriptive measure of F''.) If not, how does Bickel and Lehmann's stochastic ordering (for different distributions) fit in? (Similar comments for his discussion of ``measures of scale'', beginning on page 23.) You should read and understand the discussion of the three types of robustness. In this, as elsewhere, Wilcox is weak on the mathematics, so just try to get the general understanding of these types of robustness. All of these types are of the general class of ``distributional robustness'' (as opposed to other types of robustness such as, for example, those that concern nonindependent sampling). As elsewhere in this book, Wilcox discusses the types of robustness in the context of parameters rather than methods. We will encounter these types in their more usual context later, and we will also be more precise in definitions. His discussion of ``Winsorized expected values'' in Section 2.5 follows his paradigm of defining and using robustifying terms and methods in the context of distributions. Most people define and use these for statistics so that the statistics will have desirable properties that relate to the usual parameters of a distribution. His motivation for defining ``measures'' of distributions seems to be so as to be able to ``find estimates of [them]'' (third line from bottom on page 28). To ``find esimates'' seems to mean to him to get plug-in estimators that are unbiased. (A plug-in estimator of a statistical function is the same functional of the empirical CDF as the parameter is of the CDF.) As mentioned above, however, we may be interested in the behavior of the estimator T(P_n) as P_n->P, but the objective in using T(P_n) is not necessarily to estimate T(P). This difference may seem subtle, but it gets to the heart of our understanding statistical estimation. The whole problem of finding estimators is not just a mechanical process as he seems to present it. (See Gentle, 2002, Chapter 1 for a broader perspective on what it means to estimate something, and how we select estimators.) Although statisticians are sometimes careless in their notation and terminology (isn't everyone?), it is disconcerting to read over and over such phrases as ``unbiased estimate'' (e.g., page 29).