[Washington Statistical Society]
[WSS home] [WSS Newsletter] [WSS Information] [Seminars] [Short Courses] [Employment] [Feedback] [Join WSS!]

Washington Statistical Society Seminar Archive: 1997

Current | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | 1996 | 1995 | Methodology

January
7Tues.Reviewing the Disclosure Review Process for the Survey of Consumer Finances
29Wed.Multivariate Allocation of Stratum Sample Sizes and Poisson Sampling Probabilities
February
11Tues.Changing to a Team-Based Federal Government: Team Leaders and Self-Directed Work Teams
26Wed.An Application of Mathematical Programming to Sample Allocation
March
5Wed.The Convex Hull Test for Ordered Trichotomous Data
26Wed.Given that Congress Must be Involved With Federal Statistical Issues, When Should Congress Leave Statistics to the Statisticians?
April
3Thur.The Responses of Prices at Different Stages of Production to Monetary Policy Shocks
8Tues.Improving Customer Service at the National Center for Education Statistics
24Thur.The History and Development of Computer-Assisted Survey Information Collection
30Wed.Frequency Valid Multiple Imputation for Surveys with a Complex Design
May
7
Wed.
Interactive Data Analysis Systems for Agricultural Surveys
8
Thur.
General Methods and Software for Editing Continuous and Discrete Data
4
Wed.
Stability of Variance Estimators under Complex Sampling Designs
22
Thur.
Development and Implementation of CASIC in Government Agencies
28
Wed.
Effect of Unknown Nuisance Parameters on Estimates of Primary Parameters
June
3
Tues.
Presidential Invited Address

Maintaining Statistical Objectivity in Times of Stress (or How Not to Lie with Statistics)
September
16
Tue.
Mail and Self-Administered Surveys at the Beginning of the 21st Century
25
Thur.
Weighting for Unequal Selection Probabilities in Multilevel Models
30
Tues.
Presidential Invited Address

Enhancing Communication and Collaboration: Pathways to Improving Federal Statistical Programs
October
16
Wed.
Recommendations Concerning Changes to the Standards for the
Classification of Federal Data on Race and Ethnicity

15
Wed.
Freeware for Statisticians, the General Public License,
and R, a GPL Implementation of S(-plus)

22
Wed.
NAICS is here: What Is It and What Does It Mean?
22
Wed.
Statistics in the Information Age (Annual Morris Hansen Lecture)
27
Mon.
Trading Volume, Price Volatility and Bid-Ask Spreads in Futures Markets
29
Wed.
Sample Allocation Methods for Oversampling Subpopulations
November
3
Mon.
Drawing Inferences for Use-effectiveness from Randomized Experiments with Noncompliance
6
Thur.
Statistical Policy and Health Risk Assessment
12
Wed.
A Computer Algebra for Survey Sampling
13
Thur.
Economic Statistics and the Transmission of Monetary Policy to the Real Economy
18
Tues.
The Outlook for Interdisciplinary Research on the Cognitive Aspects of Survey Methods
18
Tues.
Using the Reports Review Process for Quality Assurance in Federal Agencies
20
Thur.
On the Performance of Replication-Based Variance Estimation Methods with Small Numbers of PSUs
December
1
Mon.
Joint Estimation of the Mean and Overdispersion Parameters of an Overdispersed Poisson Distribution Using Quasi-likelihood
3
Wed.
Examining the Confidentiality of Analytically Valid, Public-use Microdata
9
Tues.
The Short- and Long-Term Economic Outlook: Inflation, Unemployment, and Demographics
10
Wed.
An Overview of the Dataplot Graphics & EDA Software System
11
Thurs.
Meanings of Data Quality in Assessments of New Data Collection Technologies
17
Wed.
Visual Exploratory Data Analysis with MANET

WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!


Topic: Multivariate Allocation of Stratum Sample Sizes and Poisson Sampling Probabilities

Abstract:

In stratified-sampling designs, stratum sample sizes influence both cost and precision. Similarly,in Poisson-sampling designs, the probabilities of selecting individual units also influence both cost and precision. This talk discusses algorithms for determining stratum sample sizes (or Poisson-sampling probabilities) that either (1) maximize precision while contraining costs or (2) minimize costs while contraining precision.

For multivariate surveys--i.e., surveys that collect data for multiple items--these algorithms require the calculation of Lagrange multipliers, for which differenct approaches have been developed by a number of authors. We briefly review these different approaches and describe the CHROMY_GEN program which was developed at the Bureau of the Census for performing multivariate allocations. CHROMY_GEN calculates the Lagrange multipliers using an algorithm described in a 1987 paper by Chromy. It incorporates a stopping rule proposed by Causey (1983)and calculates "shadow prices" described by Bethel (1989), which indicate the sensitivity of cost to changes in individual contraints on precision.

Topic: Changing to a Team-Based Federal Government: Team Leaders and Self-Directed Work Teams

Abstract:

The presentation outlines eight important lessons learned in changing toward a team-based environment within the Bureau of Labor Statistics (BLS). Two types of teams were studied: teams lead by Team Leaders and Self-Directed Work Teams. The teams were part of a larger one year pilot conducted in the BLS to study methods of flattening its organizational structure. The presentation will focus on that portion of the workplace pilot that pertains to the implementation of teams as a means of reducing one layer of supervision - namely supervisory grade 13 positions.

The workplace pilot incorporates quality management objectives relating to people relationships, employee participation, empowerment, influencing groups, and workplace collaboration.

Topic: An Application of Mathematical Programming to Sample Allocation

Abstract:

The problem of sample allocation in multipurpose surveys is complicated by the fact that an efficient allocation for some estimates may be inefficient for others. There may also be precision goals that must be met for certain estimates plus constraints on costs and minimum sample sizes for strata to permit variance estimation.

These requirements lead to formulating the allocation problem as one of mathematical programming with an objective function and constraints that are nonlinear in the sample size target variables. We discuss a flexible approach for a two-stage sample allocation that uses multicriteria optimization programming. Software will be demonstrated that permits survey designers to easily explore alternative problem formulations and to compare the resulting allocations. The method is illustrated using a business establishment survey that estimates the costs to employers of providing wages and benefits to employees and the percentages of employees that receive certain benefits.

Topic: The Convex Hull Test for Ordered Trichotomous Data

Abstract:

Directed extreme points of the permutation sample space are defined and shown to provide the most evidence against the null hypothesis in favor of the alternative hypothesis. This idea is used to derive new conditional permutation tests based on partial convex hull peeling of the permutation sample space. When an optimal test exists, this algorithm will provide this best test. When testing for stochastic order in ordered 2x3 contingency tables, however, no best test exists. In this context, the convex hull test is admissible and unbiased against distant alternatives, and is shown to have better power to detect a broad range of stochastically-ordered alternatives than t-tests, regardless of the choice of column scores. In fact, theoretical and empirical power calculations show that each such t-test has power zero to detect certain alternatives of interest. In addition, the convex hull test is shown empirically to have a better power profile than the proportional odds test, proportional hazards test, Smirnov test, and the test based on the MLE of the global odds ratio.

Topic: Given that Congress Must be Involved With Federal Statistical Issues, When Should Congress Leave Statistics to the Statisticians?

Abstract:

Major statistical series produced by statistical agencies such as the Bureau of the Census, the Bureau of Economic Analysis, and the Bureau of Labor Statistics among others have major impact on legislation, the economy and the political process. Two current examples are proposed changes in the Consumer Price Index and greater use of sampling in the Decennial Census.

There is no doubt that these statistics belong in the purview of Congressional Oversight. However, is there a point where Congress, after hearing from the statistical agencies, should leave it to the "experts" and not politicize statistical efforts? How does Congress' responsibilities mesh with statisticians' duties as scientific professionals?

A panel consisting of current and former Congressional staff members will discuss this issue.<

Topic: The Responses of Prices at Different Stages of Production to Monetary Policy Shocks

Abstract:

This paper examines the responses of prices at different stages of production to an explicitly identified demand shock: a monetary policy shock. The frameworks of Christiano, Eichenbaum, and Evans (1994, 1996) and Sims and Zha (1995b) are used to identify the policy shock as the innovation to the federal funds rate in a VAR. The adjustment of prices at different stages of production is examined by adding three different sets of prices to the basic VAR model: (a) the PPIs for crude materials, intermediate goods, and finished goods; (b) the newer industry-based PPIs of input and output prices for crude, primary, semifinished, finished, and final goods processors; and (c) the input and output price indexes for manufacturing industries constructed by Roberts, Stockton, and Struckmeyer (1994). The analysis shows that, at earlier stages of production, a monetary tightening causes input prices to fall rapidly and by a larger amount than output prices. This finding would appear to be consistent with a model in which all price changes are subject to menu costs but some chain structure in production gives rise to prices at earlier stages of production moving more than prices at later stages.

Topic: Improving Customer Service at the National Center for Education Statistics

Abstract:

In 1993 President Clinton issued Executive Order 12862, "Setting Customer Service Standards," which called on all Federal agencies to develop plans to better serve their customers. To respond to these requirements, the National Center for Education Statistics (NCES) has initiated many customer-related initiatives including: 1) conducting customer focus groups, 2) providing training for NCES customers, and 3) training employees about customer service delivery. Also, NCES conducted their first customer satisfaction survey in 1996. This talk will present a summary of the methodology and results from the survey. Limitations of results due to frame undercoverage and the generic nature of the survey items will be discussed in terms of future plans for the Center.

Topic: The History and Development of Computer-Assisted Survey Information Collection

Abstract:

The authors review the history, scope, and significance of computer assisted survey information collection (CASIC) and identify its current issues and likely future. They begin by describing the historical development of computer-based technologies for survey data collection and capture and the evolution of the CASIC concept. This history is placed within the context of broader trends in survey research and computer technology. The presenters also attempt to summarize the current state of knowledge about the effects of CASIC methods on the capabilities, conduct, and organization of survey research while acknowledging that generalizations across CASIC technologies are limited by their diversity and continuously evolving nature. The presenters believe that CASIC methods are introducing fundamental changes in survey methodology and the survey process which are not fully recognized by the field. They conclude by examining a number of these changes and by identifying currently unresolved issues in the future of CASIC and its consequences for survey research.

Topic: Frequency Valid Multiple Imputation for Surveys with a Complex Design

Abstract:

General conditions required for valid design-based inference when using multiple imputation for dealing with missing values are considered. We focus on the means or totals and the estimation of their variances. We study multiple and proper imputation under a general setting, concentrating on the mathematical and statistical conditions required for valid design-based inference, assuming the nonresponse mechanism is an additional phase of sampling.

Some authors have expressed the concern that it may be difficult to find a multiple imputation scheme satisfying the requirements of proper imputation when the survey design is complex; for example, designs that are stratified and multistage. We study this problem in more depth. We first consider methods of complete data analysis, where the estimators of the total are linear and the estimators of the variance are quadratic in the data. We then consider stochastic imputation schemes using means and ratios as the basis for the imputations. We discuss the properties of these schemes that are needed for the analysis of multiple imputation derived by generating different stochastically imputed data sets. In particular, we consider the properties that must be satisfied for the imputations to be proper.

Topic: Interactive Data Analysis Systems for Agricultural Surveys

Abstract:

Editing in the National Agricultural Statistics Service (NASS) has undergone changes. New emphasis is on interactive editing and analysis capabilities managed by State Statistical Offices on their own timetable within the board constraints of the total survey period.

The NASS interactive analysis package was developed using the SAS/AF and SAS/EIS software with some other customized applications based in SAS. The Interactive Data Analysis System (IDAS) is a tool that permits commodity analysts to interact with survey data at several levels. They can interactively view data relative to other reports throughout the survey process as soon as records have cleared the internal consistency checks. This affords more time for the analysts to recognize and react to problematic data.

IDAS was developed primarily for analyzing incoming data during the survey period but also provides several post data collection analysis tools. Several graphical representations of the data are presented. The graphs contain drill-down capability, allowing the user to see specific information for individual records.

Topic: General Methods and Software for Editing Continuous and Discrete Data

Abstract:

This talk describes the methods and the software of the new SPEER (Structured Programs for Economic Editing and Referrals) and the DISCRETE edit system. Both systems utilize the general editing model of Fellegi and Holt (JASA 1976) but require entirely different sets of algorithms. The advantages of Fellegi-Holt systems are that (1) in one pass an edit-failing record is changed to one satisfying edits, (2) the logical consistency of the edit system is checked prior to the receipt of data, (3) edit restraints reside in easily modified tables, and (4) source code for the main mathematical algorithms does not need to be updated. The efficacy of SPEER is examined by comparing it with Statistics Canada's GEIS (Generalized Edit and Imputation System) using Canadian Agriculture Census data and edit constraints. SPEER is the first system that allows editing and a limited form of balancing (assuring that items add to totals). All other economic editing systems cannot generally assure that records satisfy balance equations and edits simultaneously. The DISCRETE edit system is being developed for general editing of demographic surveys and decennial censuses. New set-covering and integer-programming algorithms may be as much as 100 times as fast as corresponding algorithms of the previous fastest existing system due to the Italian National Statistical Institute (ISTAT) and IBM.

Topic: Stability of Variance Estimators under Complex Sampling Designs

Abstract:

Stability of a variance estimator is an important practical consideration in the analysis of sample survey data. In some cases, the stability of a variance estimator is a matter of intrinsic interest. In other cases, stability issues are an intermediate consideration in the construction of appropriate confidence intervals for the principal parameters of interest.

We develop methodology for assessing the stability of a variance estimator allowing for heterogeneity among stratum-level variances. The standard Satterthwaite estimator generally has a substantial negative bias if the number of strata is large and the numbers of primary sample units are small. Examination of the expectation of the leading terms of a Taylor expansion of the Satterthwaite estimator leads to a modified estimator. In addition, some practical cases provide auxiliary information which is closely related to stratum-level variances. We can use this information in conjunction with an errors-in-variables model to produce an improved estimator. The resulting estimator is the auxiliary data based estimator.

The proposed methods are applied to data from the U.S. Third National Health and Nutrition Examination Survey. Also, a simulation study assesses the performance of the proposed estimators.

Topic: Development and Implementation of CASIC in Government Agencies

Abstract:

Government agencies have been at the forefront in developing and implementing CASIC technologies for household and establishment surveys. Their experiences vary, however, reflecting the types of surveys conducted, their organizational structure, their mix of staff skills, and their organizational financing.

The presenters detail the transition to CASIC in four major government agencies, two in North America and two in Europe, and draw examples from other agencies in Europe, Australia, New Zealand, and North America. They summarize the history of CASIC development in each agency and consider four factors affecting the nature and speed of technological implementation: (1) agency ability to finance CASIC technology; (2) effects on government survey operations (survey timetables, instrument development, case management, data entry, edit and processing, interfaces with analysis software, and hardware platforms); (3) changing human resource needs (different skills required for interviewers and supervisors, for professional staff supporting interviewer training, for instrument development and other survey operations); and (4) strategic choices for phasing in the technology, and the effects of these choices on agency organizational structure.

This program is physically accessible to persons with disabilities. Requests for sign language interpretation or other auxiliary aids should be directed to Barbara Palumbo (SRD), (301) 457-4892 (v), (301) 457-3675 (TDD).

Topic: Effect of Unknown Nuisance Parameters on Estimates of Primary Parameters

Abstract:

We consider the problem of making inferences about parameters of interest (primary parameters) in the presence of imprecisely known nuisance parameters. The parameters of interest, which are estimated from field test data, depend on nuisance parameters which are not precisely known. We present a procedure for assessing the effect of nuisance parameter uncertainty on the estimate of the primary parameters. The procedure is applied to a problem of interest to engineers--the problem of identifying parameters in state-space models in the presence of nuisance parameters.

PRESIDENTIAL INVITED ADDRESS



Topic: Maintaining Statistical Objectivity in Times of Stress (or How Not to Lie with Statistics) Abstract:

The credibility of the statistical system requires the production of objective data -- which, in turn, requires high level government statisticians to be skillful in resisting requests to produce data designed to support the political positions of their bosses and other influential individuals. The panelists (all of whom are now outside the government) will talk about their experiences in resisting such pressures. They will also discuss how such pressures affected them and their agencies.

Topic: Mail and Self-Administered Surveys at the Beginning of the 21st Century

Abstract:

Surveying has changed dramatically in the 60 plus years since it emerged as a significant feature of U.S. society, and more change is ahead of us. In this presentation I will look briefly at major survey developments of our century, and in more detail at what lies ahead. In particular I argue that the importance of self administered surveys is going to increase in relation to interview methods, and will describe the reasons why. I will also describe certain developments that undergird their greater use, and the challenges that provides to survey methodologists. An underlying theme of the presentation is that the feasibility of optical scanning, electronic mail questionnaires and WEB surveys are redefining the entire survey landscape.


Topic: Weighting for Unequal Selection Probabilities in Multilevel Models

Abstract:

When multilevel models are estimated from survey data derived using multistage sampling, unequal selection probabilities at any stage of sampling may induce bias in standard estimators, unless the sources of the unequal probabilities are fully controlled for in the covariates. This paper proposes alternative ways of weighting the estimation of a two-level model by using the reciprocals of the selections probabilities at each stage of sampling. Consistent estimators are obtained when both the sample number of level 2 units and the sample number of level 1 units within sampled level 2 units increase. Scaling of the weights is proposed to improve the properties of the estimators and to simplify computation. Variance estimators are also proposed. In a limited simulation study the scaled weighted estimators are found to perform well, although non-negligible bias starts to arise for informative designs when the sample number of level 1 units becomes small. The variance estimators perform extremely well. The procedures are illustrated using data from the survey of psychiatric morbidity.


PRESIDENTIAL INVITED ADDRESS



Enhancing Communication and Collaboration: Pathways to Improving Federal Statistical Programs Abstract:

Communication and collaboration among Federal statistical agencies are vital to ensure a number of broad goals, including minimizing duplicate efforts (and costs), maximizing the comparability of data across agencies, and ensuring that users can access the data they need easily and efficiently. Various proposals for improving communication and collaboration have been forwarded by Congressional and Administrative representatives, and by members of the statistical community. Panelists will describe how they believe we can best achieve these objectives of increased communication and collaboration.

Please note that NSF security procedures requires non-NSF employees to check in at the security desk on the second floor. This procedure will be expedited for individuals whose names are provided to security in advance. To be placed on this list, please contact Carolyn Shettle (cshettle@nsf.gov; 703-306-1780 x 6906), giving your name and the name of your organization by Friday, September 26.

Topic: Recommendations Concerning Changes to the Standards for the
Classification of Federal Data on Race and Ethnicity

Abstract:

Recommendations for changes to OMB's Statistical Policy Directive No. 15, Race and Ethnic Standards for Federal Statistics and Administrative Reporting were published in the Federal Register on July 9, 1997. These recommendations, given to OMB by the Interagency Committee for the Review of the Racial and Ethnic Standards, were the product of extensive research and consultation with the public over a four-year period. Among the many recommendations, perhaps the most controversial concerns allowing multiple responses to the race question. This session will discuss the process leading to the recommendations, the recommendations themselves, and the release of guidelines for the implementation of any of the recommendations that OMB may choose to accept.


Topic: Freeware for Statisticians, the General Public License,
and R, a GPL Implementation of S(-plus)

Abstract:

With the rise in popularity of Linux, there has occurred a concurrent rise in popularity of freeware. This popularity arises not necessarily because the software is free but because of the freeware's recognized reliability and utility. Calling freeware by the name freeware can be misleading. Some software is distributed free of charge but their source codes have been copyrighted to keep them secret. In essence, the software becomes a free proprietary product. Other software is "copyrighted" under the General Public License (GPL) which the Free Software Foundation has championed. Under this license, software may be sold or freely distributed, but, in either case, the source code must be made available to any user so that in effect the source code will always remain in the public domain. Consequently, anyone can see how the software works. More importantly, one can modify the code to suit his particular needs and, if the improvement is deemed a good one, he may communicate it to the authors of the software for their consideration to implement. This practice happens routinely in the Linux community where, from the tens of thousands of beta testers, many contribute improvements to the Linux kernel. It is a development team unsurpassed in size and talent.

At one time, S, a powerful language with which students of statistics and research institutions have come to embrace, was freely available but not anymore. A GPL implementation of S, called R, is currently underway in New Zealand under the leadership of Robert Gentleman and Ross Ihaka of the University of Auckland. Someone who is already familiar with S(-plus) will see little difference between S and R , even though they are structurally different. Although R is still in its infancy, it has developed to the extent that someone's S program will probably run "as is" in the R environment. R can be easily installed on the UNIX and Linux platforms. Versions for other operating systems are being developed.

This presentation will address the interests of those who wish to create a statistical workshop on their home or office PC.


Topic: NAICS is here: What Is It and What Does It Mean?

Abstract:

The Office of Management and Budget's (OMB) decision to adopt the North American Industry Classification System (NAICS) to replace the Standard Industrial Classification (SIC) system will have a profound affect on statistics measuring the U.S. economy. NAICS was developed by the Economic Classification Policy Committee (ECPC), on behalf of OMB, in cooperation with Statistics Canada and Mexico's Instituto Nacional de Estadistica, Geografia e Informatica (INEGI) to provide for comparable industrial statistics across the three countries.

NAICS is the first ever classification system to be constructed on a consistent conceptual framework. Economic units that have similar production processes are grouped together, forming industries that are based on a production-oriented basis. Such a framework will provide consistent statistics across the three countries that can be used for measuring productivity and unit labor costs, constructing input-output tables, and other uses that imply the analysis of production relationships in the economy. NAICS reflects advances in technology and the huge growth and diversification of service industries that have occurred over the past several years. Eight new service sectors and over 150 new service industries are recognized. These and other changes will require the restructuring of many of the statistical programs of the Census Bureau over the next few years. This presentation will focus on the major changes in NAICS and the implementation plans for the Census Bureau.


THE MORRIS HANSEN LECTURE

Topic: Statistics in the Information Age

Abstract:

Statistics are a major means of knowing about society. In democratic societies official statistics are available not only to politicians and government officials, but also to interest groups and ordinary citizens. They thus have a special role to play in the process of policy formation and implementation. Until recently, the use of official statistics in policy debates has been limited by the relative difficulty in accessing large data bases by ordinary citizens and interest groups. The advent of the Internet and the World Wide Web, however, has dramatically changed the situation and greatly increased availability of statistics to ordinary citizens. Future technological changes are likely to enhance the amount of data that the public can easily access.

Such a vast change in accessibility of statistical data about society presents many challenges to the federal statistical system. Three of the most pressing challenges concern the relevance, validity, and timeliness of official statistics. Relevance refers to the questions to which the statistics gathered are the answers. How do we decide what data to collect and to make publicly available? As more people can access data more easily, the demand for statistical data and controversy over what data to collect will grow. With limited budgets, the statistical agencies will come under increasing public scrutiny, and decisions about what data to collect will take on added importance. Validity refers to the relation of statistical measures to the concepts they are intended to measure. Do the statistical measures published by the statistical agencies have the meaning that is ascribed to them by the broader user community? How does the statistical system refine measures to account for changes in society that make existing statistics no longer reflect the realities they are supposed to measure? Timeliness refers not only to the gap between the time of data collection and their availability but also to the periodicity of data collection operations and to the revisions in the measures to reflect changes in society that affect validity. Technology may provide means for decreasing the amount of time necessary for data collection and processing, but may not be able to decrease it sufficiently to satisfy an audience that has almost instant access to whatever is available. Increased strains on statistical agency budgets because of broader federal budget cutbacks or the allocation of more resources to disseminating data may force a decease in the periodicity of some statistical series. Technology, however, may have little to contribute toward the decision of agencies to change measures to reflect social changes that underlie the construction of the measures.

While there are great strengths in our present system, a decentralized statistical system such as we have in the United States is not well situated to meet these challenges successfully. There needs to be a forum in which responses to the challenges of wider and easier accessibility to official statistics can be discussed, responses formulated, and then carried out. Perhaps the greatest challenge to the federal statistical system today is how to organize itself to meet the challenges brought about by the technological revolution in data accessibility.


Topic: Trading Volume, Price Volatility and Bid-Ask Spreads in Futures Markets

Abstract:

In this study the relations between trading volume, bid-ask spreads and price volatility in four financial and metal futures are examined in a three-equation structural model using the Generalized Method of Moments(GMM) procedure. Specifications tests confirm that trading volume, bid-ask spreads and price volatility are jointly determined. Results presented show that there exists a positive relationship between trading volume and intraday price volatility, while an inverse relationship between trading volume and bid-ask spreads, after controlling for other factors. Results also indicate that price volatility has a positive relationship with bid-ask spreads and a negative relationship with lagged trading volume. Furthermore, we demonstrate the OLS parameter estimates of each equation model are often severely underestimated in comparison with the estimates obtained by the GMM estimation. In addition, we provide reliable estimates of elasticities of trading volume with respect to transaction costs in future markets.


Topic: Sample Allocation Methods for Oversampling Subpopulations

Abstract:

Generally, in sample surveys the estimates for certain small subpopulations whose members cannot be identified in advance of sampling are not precise because the number of sampled units belonging to these subpopulations is small. Often, there is a need to improve the precision of these estimates or sometimes there is a requirement for a predetermined expected number of units belonging to a subpopulation in the overall sample for purposes of data analysis. Several techniques are available for increasing the expected sample size belonging to a subpopulation or subpopulations.

Some of the techniques used to achieve the desired sample size from the subpopulations might make the overall estimates very inefficient due to allocation or selection methods that are very different from the optimum methods needed to obtain precise overall estimates. Therefore, there is a need to balance the need for improving the precision of the subpopulation estimates with the loss in efficiency of the estimates for the general population which does not include the members of the subpopulation.

Some methods of sample allocation are proposed which attempt to keep the strata sample sizes close to the sample sizes which are considered optimum from the point of view of efficiency of the overall estimates. The stratification boundaries are the same as those created for maximizing the efficiency of the overall estimates. A method of revising the probabilities of selection of primary sampling units which maximizes the expected proportion of sampled units belonging to a subpopulation in the overall sample subject to certain constraints is also suggested.


Topic: Drawing Inferences for Use-effectiveness from Randomized Experiments with Noncompliance

Abstract:

Randomized experiments suffering from noncompliance often have as their estimand "use-effectiveness" (i.e., the effect of exposure to the treatment, not the effect of assignment to the treatment, the "intent-to-treat" effect). These types of studies can be viewed as bridges between uncontrolled observational studies and perfect randomized experiments. Developing methods of analysis for this situation, therefore, has importance beyond the setting of pure randomized experiments. The techniques that will be discussed combine extensions of instrumental variables ideas from economics and Bayesian posterior analysis implemented by MCMC methods. Despite their relative sophistication, the methods and resultant analyses are easily understood, and will be illustrated using an experiment in Indonesia on the effects of Vitamin A on mortality and an experiment in Indiana on the effects of flu shots on flu-related hospitalization.


Topic: Statistical Policy and Health Risk Assessment

Abstract:

One of the more difficult aspects of human health risk assessment is the characterization of uncertainty both in estimating human exposure to toxic pollutants and estimating the risk resulting from such exposure. It is essential that the public policy decision maker have an understanding of the uncertainty associated with the different stages of risk estimation in order to understand the impact of the actions taken based on the quantitative risk assessment.

The panelists will be raising some of the more important statistical issues relating to the use of risk assessment in public policy. Dr. Kopstein will weave the presentations together to create a forum for open discussion of the issues. Dr. Cothern will present issues that relate to the ethics of using quantitative risk assessments as a basis for regulatory and public policy actions. Dr. Barry will discuss monte carlo approaches to estimating uncertainty and the impact of these approaches on public policy. Dr. Putzrath will discuss the uncertainty problems associated with different risk assessment models and their impact on regulatory policy and decision making.


Topic: Economic Statistics and the Transmission of Monetary Policy to the Real Economy

Abstract:

The purpose of this paper is to consider the ramifications for economic data linkages and the consequent characterization of the monetary policy transmission mechanism of recent developments in the national accounting treatment of financial business. We begin by briefly revisiting the conceptual framework for a financial firm considered in Hancock (1985) and Barnett (1987). We then consider producer price measurement, monetary and credit aggregation, and the 1993 System of National Accounts (SNA93) framework and current U.S. national accounting practice in the financial firm context. We discuss the implications for imputed interest flows in estimating the allocation of financial services sales across intermediate and final consumption sectors, and between the intermediate purchases of the business sector and payments to primary factors of production. We itemize the inconsistencies between current U.S. national accounting practice, SNA93 recommendations, and Divisia monetary and credit aggregation, and suggest changes producing a consistent measurement framework for monetary and credit aggregation. Finally, we examine the available annual historical data for the period 1961-1994 from the U.S. Flow of Funds and National Income and Product Accounts to get a rough assessment of the effects of our recommended changes to economic measurement practice for financial services in the monetary and real sectors.


Topic: The Outlook for Interdisciplinary Research on the Cognitive Aspects of Survey Methods

Abstract:

A robust interdisciplinary survey measurement research effort is vital to developing cost effective solutions to the complex issues faced by organizations in trying to meet expanding needs for relevant, accurate and timely statistics. The current needs for new thrusts of cross-disciplinary research, and for new directions in which to expand interdisciplinary survey research were examined recently at the Second Advanced Seminar on the Cognitive Aspects of Survey Methods (CASM II ). Panelists will discuss these issues from the perspectives of the CASM II sessions that they organized and chaired.


Topic: Using the Reports Review Process for Quality Assurance in Federal Agencies

Abstract:

Statistical agencies use a variety of means to assure that their reports meet high quality standards in terms of relevancy, accuracy, and objectivity. One key component in most statistical agencies, is the reports review process. Procedures for conducting these reviews vary considerably among agencies. Panel participants will describe the procedures used in their agencies and will discuss the implications of these procedures for report quality, staff morale, and timeliness.


Topic: On the Performance of Replication-Based Variance Estimation Methods with Small Numbers of PSUs

Abstract:

Burke and Rust (1995) compared two jackknife variance estimation methods through a simulation study conducted on a subset of the National Assessment of Educational Progress (NAEP) data. This paper studies the performance of six replication-based variance estimation methods: random group, simple and stratified jackknife, bootstrap, balanced repeated replication (BRR), and Fay's method, in cases when only a small number of primary sample units are available. These methods are compared through both simulation studies and some theoretical arguments. In our simulation studies, 182 private schools in 1993-94 Schools and Staffing Survey (SASS) were chosen to construct the artificial population, and we also tried to mimic the original sampling design used in SASS to draw our samples from the simulation population. Several software packages including VPLX (Fay, 1994) and Resampling Stat (V 4.0 for Windows) have been used to implement the simulation.


Topic: Joint Estimation of the Mean and Overdispersion Parameters of an Overdispersed Poisson Distribution Using Quasi-likelihood

Abstract:

We consider the analysis of the number of events within subjects over time (count data) such as the number of hypoglycemic episodes per patient with diabetes per year. We describe robust methods for the analysis of overdispersed count data when the rate parameter of the Poisson distribution is itself a random variable whose distribution, commonly called the mixing distribution, is unknown, but with finite first two moments denoted by mu and sigma^2. In this case,the overdispersed event count has mean rate mu and variance that is a function of the unknown moments mu and sigma^2 of the mixing distribution.

We consider the situation common in clinical trials where patients have different follow-up times due to staggered entry and losses to follow-up, and hence have different periods of exposure. Without specifying the form of the mixing distribution, the mean rate is estimated using quasi-likelihood.

The overdispersion parameter sigma^2 for a specified mixing distribution is commonly estimated by the method of moments. We propose a method for estimating sigma^2 by setting up a quasi- likelihood type equation and obtaining a solution of this equation. Simulation is used to compare the root mean squared error (RMSE) of estimate with RMSE's of the estimates that are currently available in the literature. Simulations using different mixing distributions showed that this quasi-likelihood type estimate has a slightly smaller RMSE compared to the RMSE's of the estimates currently available in the literature.

The joint estimates of the mu and sigma^2 are also found to be consistent and asymptotically normal. This provides an efficient estimate of the mean rate and log relative risk for two groups with overdispersed count data which are robust or distribution-free with respect to the nature of the mixing distribution.

U.S. BUREAU OF THE CENSUS
SATISTICAL RESEARCH DIVISION SEMINAR SERIES

Topic: Examining the Confidentiality of Analytically Valid, Public-use Microdata

Abstract:

A public-use microdata file should be analytically valid. For a very small number of uses, the microdata should yield analytic results that are approximately the same as the original, confidential file that is not distributed. In particular, the public-use files should provide more than just means and covariances on a few important subdomains. This talk provides a structure for specifying the analytic validity of a public-use file. It describes record linkage (information retrieval) methods that are much more powerful than the use of combinations of indexes such as is given in the statistical confidentiality and computer database literature. If the microdata file contains a moderate number of variables and is required to meet a single set of analytic needs of, say, university researchers, then many more records are likely to be re-identified via modern record linkage methods than via indexing methods. The talk gives several empirical comparisons in terms of re-identification rates and the analytic validity of the public-use files. In particular, it considers methods of Kim (1986), Fuller (1993), Kim and Winkler (1995) and De Waal and Willenborg (1996).

This program is physically accessible to persons with disabilities. Requests for sign language interpretation or other auxiliary aids should be directed to Barbara Palumbo (SRD), (301) 457-4892 (v), (301) 457-3675 (TDD).

Topic: The Short- and Long-Term Economic Outlook: Inflation, Unemployment, and Demographics

Abstract:

Where is the U.S. economy is headed over the next few months, few years, and few decades? At first blush, it appears we've reached macroeconomic nirvana, with low inflation, low unemployment, and moderate growth. Can we stay here? The central concern in the short-term outlook is when (or if) inflation-- thought by many to be simmering -- will boil over. But looking ahead more than a few years leads to concerns beyond inflation, and suggests that recent economic performance will be hard to maintain. How will the economy cope with the coming change in the age structure of the population? This talk gathers together evidence about these key issues and attempts to put them together in a consistent framework.


Topic: An Overview of the Dataplot Graphics & EDA Software System

Abstract:

Dataplot is a free, public-domain, multi-platform (Unix, Linux, PC-DOS, Windows NT, etc.) software system for scientific visualization, statistical analysis, and non-linear modeling. Its extensive analysis capabilities include raw graphics, exploratory data analysis (EDA) graphics, time series analysis, standard & robust smoothing, distributional "estimation", probability distributions (70), process control, reliability, and experiment design. In addition to Dataplot's English-systax script language, it has a graphical user interface written in Tcl/Tk for use on both UNIX machines (current) and PC's (imminent).

The target Dataplot user is the researcher or analyst engaged in the characterization, modeling, visualization, analysis, monitoring, and optimization of scientific and engineering processes. The original version was released in 1978 with continual enhancements to present. Dataplot is maintained by the Statistical Engineering Division of the National Institute of Standards & Technology. Dataplot is web-downloadable via:

http://www.itl.nist.gov/div898/software/dataplot.html

Topic: Meanings of Data Quality in Assessments of New Data Collection Technologies

Abstract:

This paper examines the various meanings of Adata quality@ in studies designed to assess the effects of new data collection technologies on survey data quality. It focuses specifically on the data quality effects of computer-assisted personal and telephone interviewing (CAPI and CATI) in comparison with paper-and-pencil (P&P) surveys of the same mode. The paper reviews the way various conceptions of data quality developed in the history of the field and summarizes empirical evidence supporting or disconfirming those conceptions.

The first meaning of Adata quality@ was from an operational perspective. It was primarily concerned with the consistency and completeness of interview data as received from the field as measured at least in part by the post-interview correction burden. The second meaning viewed the intrinsic characteristics of computer-assisted (CAI) as improving survey data quality through such means as standardizing of the interview process, reducing the interviewer's clerical burden, providing greater opportunities for quality control, and correcting discovered errors in the field rather than in the office. The third meaning defined data quality empirically by comparing CAI and comparable P&P treatments in split-sample studies. In general, when CAI methods were used to emulate P&P methods, their results also closely resembled those of P&P surveys. The fourth conception saw CAI as adding to survey data quality by expanding the power of specific survey methods to collect data of greater scope, accuracy, realism, and customer value. When survey professionals speak of CAI methods as enhancing survey data quality, they typically seem to be referring to operational and intrinsic forms of data quality but they may also be encompassing its empirically demonstrated or expanded forms.

Topic: Visual Exploratory Data Analysis with MANET

Abstract:

MANET is research software developed at the Department for Computer Oriented Statistics and Data Analysis of the University of Augsburg in Germany. MANET offers visual exploration of data sets. It is based on the paradigm of linking low-dimensional views in a highly interactive environment. Three main features of MANET are the consistent treatment of missing data in visualization, the visual treatment of categorical data, and the link between graphic representations of geographic space and graphic representations of attribute space. In this talk, I show various features of MANET, such as mosaic plots, weighted plots, spine plots, and their linkage each other and/or to maps in a series of examples.
WSS Home | Newsletter | WSS Info | Seminars | Courses | Employment | Feedback | Join!