Journal of survey statistics and methodology.
Journal of survey statistics and methodology.
- Washington, DC : American Association of Public Opinion, September 2021.
- 65-920 page ; 23 cm.
- V.9, No.4 .
Assessing Response Quality by Using Multivariate Control Charts for Numerical and Categorical Response Quality Indicators Jiayun Jin and Geert Loosveldt Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 674-700, https://doi.org/10.1093/jssam/smaa012 Abstract When assessing interview response quality to identify potentially low-quality interviews, both numerical and categorical response quality indicators (mixed indicators) are usually available. However, research on how to use them simultaneously is very rare. In the current article, we extend the application of conventional multivariate control charts to include response quality indicators that are of a mixed type. We analyze data from the eighth round of the European Social Survey in Belgium, characterized by six numerical and two categorical response quality indicators. First, we employ a principal component analysis mix procedure (PCA Mix) to transform the mixed quality indicators into principal components. The principal component scores are subsequently used to construct a Hotelling �2 statistic. To deal with the non-multivariate normal nature of the principal component scores obtained from the PCA Mix, a nonparametric bootstrap method is then applied to calculate the control limit for the �2 statistic. Second, we suggest tools to interpret an identified outlier in terms of finding the responsible original indicator(s). Third, we present a cyclic procedure for determining the "in-control" data, by iteratively removing the outliers until the process is considered as being in control. Lastly, we identify the most important indicators that discriminate the outliers from the in-control data. Our results imply that multivariate control charts based on relevant projection tools such as PCA Mix in combination with the bootstrap technique have great potential for use in evaluating interview response quality and identifying outliers. Bayes-Raking: Bayesian Finite Population Inference with Known Margins Yajuan Si and Peigen Zhou Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 833-855, https://doi.org/10.1093/jssam/smaa008 Abstract Raking is widely used for categorical data modeling and calibration in survey practice but faced with methodological and computational challenges. We develop a Bayesian paradigm for raking by incorporating the marginal constraints as a prior distribution via two main strategies: (1) constructing solution subspaces via basis functions or the projection matrix and (2) modeling soft constraints. The proposed Bayes-raking estimation integrates the models for the margins, the sample selection and response mechanism, and the outcome as a systematic framework to propagate all sources of uncertainty. Computation is done via Stan, and codes are ready for public use. Simulation studies show that Bayes-raking can perform as well as raking with large samples and outperform in terms of validity and efficiency gains, especially with a sparse contingency table or dependent raking factors. We apply the new method to the longitudinal study of well-being study and demonstrate that model-based approaches significantly improve inferential reliability and substantive findings as a unified survey inference framework. Comparing Methods for Assessing Reliability Roger Tourangeau and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 651-673, https://doi.org/10.1093/jssam/smaa018 Abstract The usual method for assessing the reliability of survey data has been to conduct reinterviews a short interval (such as one to two weeks) after an initial interview and to use these data to estimate relatively simple statistics, such as gross difference rates (GDRs). More sophisticated approaches have also been used to estimate reliability. These include estimates from multi-trait, multi-method experiments, models applied to longitudinal data, and latent class analyses. To our knowledge, no prior study has systematically compared these different methods for assessing reliability. The Population Assessment of Tobacco and Health Reliability and Validity (PATH-RV) Study, done on a national probability sample, assessed the reliability of answers to the Wave 4 questionnaire from the PATH Study. Respondents in the PATH-RV were interviewed twice about two weeks apart. We examined whether the classic survey approach yielded different conclusions from the more sophisticated methods. We also examined two ex ante methods for assessing problems with survey questions and item nonresponse rates and response times to see how strongly these related to the different reliability estimates. We found that kappa was highly correlated with both GDRs and over-time correlations, but the latter two statistics were less highly correlated, particularly for adult respondents; estimates from longitudinal analyses of the same items in the main PATH study were also highly correlated with the traditional reliability estimates. The latent class analysis results, based on fewer items, also showed a high level of agreement with the traditional measures. The other methods and indicators had at best weak relationships with the reliability estimates derived from the reinterview data. Although the Question Understanding Aid seems to tap a different factor from the other measures, for adult respondents, it did predict item nonresponse and response latencies and thus may be a useful adjunct to the traditional measures. Disentangling Interviewer and Area Effects in Large-Scale Educational Assessments using Cross-Classified Multilevel Item Response ModelsGet accessArrow Theresa Rohm and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 722-744, https://doi.org/10.1093/jssam/smaa015 Abstract In large-scale educational assessments, interviewers should ensure standardized settings for all participants. However, in practice many interviewers do not strictly adhere to standardized field protocols. Therefore, systematic interviewer effects for the measurement of mathematical competence were examined in a representative sample of N = 5,139 German adults. To account for interviewers working in specific geographical regions, interviewer and area effects were disentangled using cross-classified multilevel item response models. These analyses showed that interviewer behavior distorted competence measurements, whereas regional effects were negligible. On a more general note, it is demonstrated how to identify conspicuous interviewer behavior with Bayesian multilevel models. Finding a Flexible Hot-Deck Imputation Method for Multinomial Data Rebecca Andridge and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 789-809, https://doi.org/10.1093/jssam/smaa005 Abstract Detailed breakdowns on totals are often collected in surveys, such as a breakdown of total product sales by product type. These multinomial data are often sparsely reported with wide variability in proportions across units. In addition, there are often true zeros that differ across units even within industry; for example, one establishment sells jeans but not shoes, and another sells shoes but not socks. It is quite common to have large fractions of missing data for these detailed items, even when totals are relatively completely observed. Hot-deck imputation, which fills in missing data with observed data values, is an attractive approach. The entire set of proportions can be simultaneously imputed to preserve multinomial distributions, and zero values can be imputed. However, it is not clear what variant of the hot deck is best. We describe a large set of "flavors" of the hot deck and compare them through simulation and by application to data from the 2012 Economic Census. We consider different ways to create the donor pool: choosing one nearest neighbor (NN), choosing from five NNs, or using all units as the donor pool. We also consider different ways to impute from the donor: directly impute the donor's vector of proportions or randomly draw from a multinomial distribution using this vector of proportions. We consider scenarios where a strong predictor of these multinomial distributions exists as well as when covariate information is weak. Moving from Face-to-Face to a Web Panel: Impacts on Measurement Quality Alexandru Cernat and Melanie Revilla Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 745-763, https://doi.org/10.1093/jssam/smaa007 Abstract Time and cost pressures, the availability of alternative sources of data, and societal changes are leading to a move from traditional face-to-face surveys to web or mixed-mode data collection. While we know that there are mode differences between web and face-to-face (presence of an interviewer or not, type of stimuli, etc.), it is not clear to what extent these differences could threaten the comparability of data collected in face-to-face and web surveys. In this article, we investigate the differences in measurement quality between the European Social Survey (ESS) Round 8 and the CROss-National Online Survey (CRONOS) panel. We address three main research questions: (1) Do we observe differences in terms of measurement quality across face-to-face and web for the same people and questions? (2) Can we explain individual-level differences in data quality using respondents' characteristics? and (3) Does measurement equivalence (metric and scalar) hold across the ESS Round 8 and the CRONOS panel? The results suggest that: (1) in terms of data quality, the measurement mode effect between web and face-to-face as implemented in the ESS (i.e., using show cards) is not very large, (2) none of the variables considered consistently explain individual differences in mode effects, and (3) measurement equivalence often holds for the topics studied. Multiply Robust Bootstrap Variance Estimation in the Presence of Singly Imputed Survey Data Sixia Chen and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 810-832, https://doi.org/10.1093/jssam/smaa004 Abstract Item nonresponse in surveys is usually dealt with through single imputation. It is well known that treating the imputed values as if they were observed values may lead to serious underestimation of the variance of point estimators. In this article, we propose three pseudo-population bootstrap schemes for estimating the variance of imputed estimators obtained after applying a multiply robust imputation procedure. The proposed procedures can handle large sampling fractions and enjoy the multiple robustness property. Results from a simulation study suggest that the proposed methods perform well in terms of relative bias and coverage probability, for both population totals and quantiles. Multivariate Logistic-Assisted Estimators of Totals from Clustered Survey Samples Timothy L Kennel and Richard Valliant Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 856-890, https://doi.org/10.1093/jssam/smaa017 Abstract Estimators based on linear models are the standard in finite population estimation. However, many items collected in surveys are better described by nonlinear models; these include variables that have binary, binomial, or multinomial distributions. We extend previous work on generalized difference, model-calibrated, and pseudo-empirical likelihood estimators to two-stage cluster sampling and derive their theoretical properties with particular emphasis on multinomial data. We present asymptotic theory for both the point estimators of totals and their variance estimators. The alternatives are tested via simulation using artificial and real populations. The two real populations are one of educational institutions and degrees awarded and one of owned and rented housing units. Sample Bias Related to Household Role Marcin Hitczenko Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 891-918, https://doi.org/10.1093/jssam/smaa001 Abstract This article develops a two-stage statistical analysis to identify and assess the effect of a sample bias associated with an individual's household role. Survey responses to questions about the respondent's role in household finances and a sampling design in which some households have all members take the survey enable the estimation of distributions for each individual's share of household responsibility. The methodology is applied to the 2017 Survey of Consumer Payment Choice. The distribution of responsibility shares among survey respondents suggests that the sampling procedure favors household members with higher levels of responsibility. A bootstrap analysis reveals that population mean estimates of monthly payment instrument use that do not account for this type of sample misrepresentation are likely biased for instruments often used to make household purchases. For checks and electronic payments, our analysis suggests that it is likely that unadjusted estimates overstate true values by 10-20 percent. The Interviewer Contribution to Variability in Response Times in Face-to-Face Interview Surveys Patrick Sturgis and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 701-721, https://doi.org/10.1093/jssam/smaa009 Abstract Survey researchers have consistently found that interviewers make a small but systematic contribution to variability in response times. However, we know little about what the characteristics of interviewers are that lead to this effect. In this study, we address this gap in understanding by linking item-level response times from wave 3 of the UK Household Longitudinal Survey (UKHLS) to data from an independently conducted survey of interviewers. The linked data file contains over three million records and has a complex, hierarchical structure with response latencies nested within respondents and questions, which are themselves nested within interviewers and areas. We propose the use of a cross-classified mixed-effects location scale model to allow for the decomposition of the joint effects on response times of interviewers, areas, questions, and respondents. We evaluate how interviewer demographic characteristics, personality, and attitudes to surveys and to interviewing affect the length of response latencies and present a new method for producing interviewer-specific intra-class correlations of response times. Hence, the study makes both methodological and substantive contributions to the investigation of response times. Viewing Participation in Censuses and Surveys through the Lens of Lifestyle Segments Mary H Mulry and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 764-788, https://doi.org/10.1093/jssam/smaa006 Abstract As the 2020 US Census approaches, the preparations include tests of new methodologies for enumeration that have the potential to reduce cost and improve quality. The 2015 Census Test in Savannah, GA, included tests of Internet and mail response modes and of online delivery of social marketing communications focused on persuading the public to respond by Internet and mail. Merging data from the 2015 Census Test with external third-party lifestyle segments and the Census Bureau's new Low Response Score (LRS) produces a dataset suitable for studying relationships between census response, LRSs, and lifestyle segments. This paper uses the merged dataset to examine whether lifestyle segments can provide insight to hard-to-survey populations, their response behavior, and interactions with social marketing communications. The article also includes analyses with nationwide data that support the broader application of using segmentation variables in self-response propensity models and a discussion of potential applications of segment lifestyle information in tailored and targeted survey designs for hard-to-survey populations.
2325-0984
Assessing Response Quality by Using Multivariate Control Charts for Numerical and Categorical Response Quality Indicators Jiayun Jin and Geert Loosveldt Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 674-700, https://doi.org/10.1093/jssam/smaa012 Abstract When assessing interview response quality to identify potentially low-quality interviews, both numerical and categorical response quality indicators (mixed indicators) are usually available. However, research on how to use them simultaneously is very rare. In the current article, we extend the application of conventional multivariate control charts to include response quality indicators that are of a mixed type. We analyze data from the eighth round of the European Social Survey in Belgium, characterized by six numerical and two categorical response quality indicators. First, we employ a principal component analysis mix procedure (PCA Mix) to transform the mixed quality indicators into principal components. The principal component scores are subsequently used to construct a Hotelling �2 statistic. To deal with the non-multivariate normal nature of the principal component scores obtained from the PCA Mix, a nonparametric bootstrap method is then applied to calculate the control limit for the �2 statistic. Second, we suggest tools to interpret an identified outlier in terms of finding the responsible original indicator(s). Third, we present a cyclic procedure for determining the "in-control" data, by iteratively removing the outliers until the process is considered as being in control. Lastly, we identify the most important indicators that discriminate the outliers from the in-control data. Our results imply that multivariate control charts based on relevant projection tools such as PCA Mix in combination with the bootstrap technique have great potential for use in evaluating interview response quality and identifying outliers. Bayes-Raking: Bayesian Finite Population Inference with Known Margins Yajuan Si and Peigen Zhou Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 833-855, https://doi.org/10.1093/jssam/smaa008 Abstract Raking is widely used for categorical data modeling and calibration in survey practice but faced with methodological and computational challenges. We develop a Bayesian paradigm for raking by incorporating the marginal constraints as a prior distribution via two main strategies: (1) constructing solution subspaces via basis functions or the projection matrix and (2) modeling soft constraints. The proposed Bayes-raking estimation integrates the models for the margins, the sample selection and response mechanism, and the outcome as a systematic framework to propagate all sources of uncertainty. Computation is done via Stan, and codes are ready for public use. Simulation studies show that Bayes-raking can perform as well as raking with large samples and outperform in terms of validity and efficiency gains, especially with a sparse contingency table or dependent raking factors. We apply the new method to the longitudinal study of well-being study and demonstrate that model-based approaches significantly improve inferential reliability and substantive findings as a unified survey inference framework. Comparing Methods for Assessing Reliability Roger Tourangeau and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 651-673, https://doi.org/10.1093/jssam/smaa018 Abstract The usual method for assessing the reliability of survey data has been to conduct reinterviews a short interval (such as one to two weeks) after an initial interview and to use these data to estimate relatively simple statistics, such as gross difference rates (GDRs). More sophisticated approaches have also been used to estimate reliability. These include estimates from multi-trait, multi-method experiments, models applied to longitudinal data, and latent class analyses. To our knowledge, no prior study has systematically compared these different methods for assessing reliability. The Population Assessment of Tobacco and Health Reliability and Validity (PATH-RV) Study, done on a national probability sample, assessed the reliability of answers to the Wave 4 questionnaire from the PATH Study. Respondents in the PATH-RV were interviewed twice about two weeks apart. We examined whether the classic survey approach yielded different conclusions from the more sophisticated methods. We also examined two ex ante methods for assessing problems with survey questions and item nonresponse rates and response times to see how strongly these related to the different reliability estimates. We found that kappa was highly correlated with both GDRs and over-time correlations, but the latter two statistics were less highly correlated, particularly for adult respondents; estimates from longitudinal analyses of the same items in the main PATH study were also highly correlated with the traditional reliability estimates. The latent class analysis results, based on fewer items, also showed a high level of agreement with the traditional measures. The other methods and indicators had at best weak relationships with the reliability estimates derived from the reinterview data. Although the Question Understanding Aid seems to tap a different factor from the other measures, for adult respondents, it did predict item nonresponse and response latencies and thus may be a useful adjunct to the traditional measures. Disentangling Interviewer and Area Effects in Large-Scale Educational Assessments using Cross-Classified Multilevel Item Response ModelsGet accessArrow Theresa Rohm and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 722-744, https://doi.org/10.1093/jssam/smaa015 Abstract In large-scale educational assessments, interviewers should ensure standardized settings for all participants. However, in practice many interviewers do not strictly adhere to standardized field protocols. Therefore, systematic interviewer effects for the measurement of mathematical competence were examined in a representative sample of N = 5,139 German adults. To account for interviewers working in specific geographical regions, interviewer and area effects were disentangled using cross-classified multilevel item response models. These analyses showed that interviewer behavior distorted competence measurements, whereas regional effects were negligible. On a more general note, it is demonstrated how to identify conspicuous interviewer behavior with Bayesian multilevel models. Finding a Flexible Hot-Deck Imputation Method for Multinomial Data Rebecca Andridge and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 789-809, https://doi.org/10.1093/jssam/smaa005 Abstract Detailed breakdowns on totals are often collected in surveys, such as a breakdown of total product sales by product type. These multinomial data are often sparsely reported with wide variability in proportions across units. In addition, there are often true zeros that differ across units even within industry; for example, one establishment sells jeans but not shoes, and another sells shoes but not socks. It is quite common to have large fractions of missing data for these detailed items, even when totals are relatively completely observed. Hot-deck imputation, which fills in missing data with observed data values, is an attractive approach. The entire set of proportions can be simultaneously imputed to preserve multinomial distributions, and zero values can be imputed. However, it is not clear what variant of the hot deck is best. We describe a large set of "flavors" of the hot deck and compare them through simulation and by application to data from the 2012 Economic Census. We consider different ways to create the donor pool: choosing one nearest neighbor (NN), choosing from five NNs, or using all units as the donor pool. We also consider different ways to impute from the donor: directly impute the donor's vector of proportions or randomly draw from a multinomial distribution using this vector of proportions. We consider scenarios where a strong predictor of these multinomial distributions exists as well as when covariate information is weak. Moving from Face-to-Face to a Web Panel: Impacts on Measurement Quality Alexandru Cernat and Melanie Revilla Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 745-763, https://doi.org/10.1093/jssam/smaa007 Abstract Time and cost pressures, the availability of alternative sources of data, and societal changes are leading to a move from traditional face-to-face surveys to web or mixed-mode data collection. While we know that there are mode differences between web and face-to-face (presence of an interviewer or not, type of stimuli, etc.), it is not clear to what extent these differences could threaten the comparability of data collected in face-to-face and web surveys. In this article, we investigate the differences in measurement quality between the European Social Survey (ESS) Round 8 and the CROss-National Online Survey (CRONOS) panel. We address three main research questions: (1) Do we observe differences in terms of measurement quality across face-to-face and web for the same people and questions? (2) Can we explain individual-level differences in data quality using respondents' characteristics? and (3) Does measurement equivalence (metric and scalar) hold across the ESS Round 8 and the CRONOS panel? The results suggest that: (1) in terms of data quality, the measurement mode effect between web and face-to-face as implemented in the ESS (i.e., using show cards) is not very large, (2) none of the variables considered consistently explain individual differences in mode effects, and (3) measurement equivalence often holds for the topics studied. Multiply Robust Bootstrap Variance Estimation in the Presence of Singly Imputed Survey Data Sixia Chen and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 810-832, https://doi.org/10.1093/jssam/smaa004 Abstract Item nonresponse in surveys is usually dealt with through single imputation. It is well known that treating the imputed values as if they were observed values may lead to serious underestimation of the variance of point estimators. In this article, we propose three pseudo-population bootstrap schemes for estimating the variance of imputed estimators obtained after applying a multiply robust imputation procedure. The proposed procedures can handle large sampling fractions and enjoy the multiple robustness property. Results from a simulation study suggest that the proposed methods perform well in terms of relative bias and coverage probability, for both population totals and quantiles. Multivariate Logistic-Assisted Estimators of Totals from Clustered Survey Samples Timothy L Kennel and Richard Valliant Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 856-890, https://doi.org/10.1093/jssam/smaa017 Abstract Estimators based on linear models are the standard in finite population estimation. However, many items collected in surveys are better described by nonlinear models; these include variables that have binary, binomial, or multinomial distributions. We extend previous work on generalized difference, model-calibrated, and pseudo-empirical likelihood estimators to two-stage cluster sampling and derive their theoretical properties with particular emphasis on multinomial data. We present asymptotic theory for both the point estimators of totals and their variance estimators. The alternatives are tested via simulation using artificial and real populations. The two real populations are one of educational institutions and degrees awarded and one of owned and rented housing units. Sample Bias Related to Household Role Marcin Hitczenko Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 891-918, https://doi.org/10.1093/jssam/smaa001 Abstract This article develops a two-stage statistical analysis to identify and assess the effect of a sample bias associated with an individual's household role. Survey responses to questions about the respondent's role in household finances and a sampling design in which some households have all members take the survey enable the estimation of distributions for each individual's share of household responsibility. The methodology is applied to the 2017 Survey of Consumer Payment Choice. The distribution of responsibility shares among survey respondents suggests that the sampling procedure favors household members with higher levels of responsibility. A bootstrap analysis reveals that population mean estimates of monthly payment instrument use that do not account for this type of sample misrepresentation are likely biased for instruments often used to make household purchases. For checks and electronic payments, our analysis suggests that it is likely that unadjusted estimates overstate true values by 10-20 percent. The Interviewer Contribution to Variability in Response Times in Face-to-Face Interview Surveys Patrick Sturgis and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 701-721, https://doi.org/10.1093/jssam/smaa009 Abstract Survey researchers have consistently found that interviewers make a small but systematic contribution to variability in response times. However, we know little about what the characteristics of interviewers are that lead to this effect. In this study, we address this gap in understanding by linking item-level response times from wave 3 of the UK Household Longitudinal Survey (UKHLS) to data from an independently conducted survey of interviewers. The linked data file contains over three million records and has a complex, hierarchical structure with response latencies nested within respondents and questions, which are themselves nested within interviewers and areas. We propose the use of a cross-classified mixed-effects location scale model to allow for the decomposition of the joint effects on response times of interviewers, areas, questions, and respondents. We evaluate how interviewer demographic characteristics, personality, and attitudes to surveys and to interviewing affect the length of response latencies and present a new method for producing interviewer-specific intra-class correlations of response times. Hence, the study makes both methodological and substantive contributions to the investigation of response times. Viewing Participation in Censuses and Surveys through the Lens of Lifestyle Segments Mary H Mulry and others Journal of Survey Statistics and Methodology, Volume 9, Issue 4, September 2021, Pages 764-788, https://doi.org/10.1093/jssam/smaa006 Abstract As the 2020 US Census approaches, the preparations include tests of new methodologies for enumeration that have the potential to reduce cost and improve quality. The 2015 Census Test in Savannah, GA, included tests of Internet and mail response modes and of online delivery of social marketing communications focused on persuading the public to respond by Internet and mail. Merging data from the 2015 Census Test with external third-party lifestyle segments and the Census Bureau's new Low Response Score (LRS) produces a dataset suitable for studying relationships between census response, LRSs, and lifestyle segments. This paper uses the merged dataset to examine whether lifestyle segments can provide insight to hard-to-survey populations, their response behavior, and interactions with social marketing communications. The article also includes analyses with nationwide data that support the broader application of using segmentation variables in self-response propensity models and a discussion of potential applications of segment lifestyle information in tailored and targeted survey designs for hard-to-survey populations.
2325-0984