Program
Yves Rosseel (Gent University, Belgium)
http://users.ugent.be/~yrossee...
Title: Structural Equation Modeling: models, software and stories
In the social sciences, structural equation modeling (SEM) is often considered to be the mother of all statistical modeling. It includes univariate and multivariate regression models, generalized linear mixed models, factor analysis, path analysis, item response theory, latent class analysis, and much more. SEM can also handle missing data, non-normal data, categorical data,multilevel data, longitudinal data, (in)equality constraints, and on a good day, SEM makes you a fresh cup of tea.
For several decades, software for structural equation modeling was exclusively commercial and/or closed-source. Today, several free and open-source alternatives are available. In this presentation, I will tell the story of the R package `lavaan'. How was it conceived? What were the original goals, and where do we stand today? And why is it not finished yet? As the story unfolds, I will highlight some aspects of software development that are often underexposed: the importance of software archaeology, the design of model syntax, the importance of numerical techniques, the curse of backwards compatibility, the temptation to use compiled code to speed things up, and the difficult choice between a monolithic versus a modular approach.
Finally, I will talk about my experiences with useRs, discussion groups, community support and the lavaan ecosystem.
Norm Matloff (University of California at Davis, USA)
http://heather.cs.ucdavis.edu/...
Title: Parallel Computation in R: What We Want, and How We (Might) Get It
This era of Big Data has been a challenge for R users. First there were issues with address spaces restrictions in R itself. Though this problem has been (mostly) solved, major issues remain in terms of performance, generalizability, convenience and possibly more than a bit of "hype." This talk will address these questions and assess the current overall status of parallel and distributed computing in R, especially the latter. We will survey existing packages, including their strengths and weaknesses, and raise questions (if not full answers) of what should be done.
Isabella Gollini (Birkbeck, University of London, UK)
http://www.bbk.ac.uk/ems/facul...
Title: R tools for the analysis of complex heterogeneous data
One of the important goals of modern statistics is to provide comprehensive and integrated inferential frameworks for data analysis (from exploratory analysis to prediction and visualisation). R is a very flexible software for the implementation of these frameworks and for this reason, it represents an excellent research and learning tool for end-users in both academia and industry.
I will describe my approach to the development of statistical models, efficient computational methods and user-friendly R packages for the analysis of complex heterogeneous data arising in various applications. I will show examples on catastrophe modelling (using the tailloss package), networks models (using the lvm4net), and spatial analysis (using the GWmodel package).
Mine Cetinkaya-Rundel (Duke University and RStudio, USA)
http://www2.stat.duke.edu/~mc301/
Title: Teaching data science to new useRs
Abstract: How can we effectively and efficiently teach statistical thinking and computation to students with little to no background in either? How can we equip them with the skills and tools for reasoning with various types of data and leave them wanting to learn more? In this talk we describe an introductory data science course that is our (working) answer to these questions. The courses focuses on data acquisition and wrangling, exploratory data analysis, data visualization, and effective communication and approaching statistics from a model-based, instead of an inference-based, perspective. A heavy emphasis is placed on a consistent syntax (with tools from the `tidyverse`), reproducibility (with R Markdown) and version control and collaboration (with git/GitHub). We help ease the learning curve by avoiding local installation and supplementing out-of-class learning with interactive tools (like `tutor` and DataCamp). By the end of the semester teams of students work on fully reproducible data analysis projects on data they acquired, answering questions they care about. This talk will discuss in detail course structure, logistics, and pedagogical considerations as well as give examples from the case studies used in the course. We will also share student feedback and assessment of the success of the course in recruiting students to the statistical science major.
Ludwig Hothorn (Leibniz Universität Hannover,Germany)
https://www.biostat.uni-hannov...
Title: Dose-response analysis: considering dose both as qualitative factor and quantitative covariate- using R*
The publications on dose-response analysis in the recent years is fairly clear divided into modelling (ie assuming dose as a quantitative covariate) and trend tests (ie assuming dose as a qualitative factor). Both approaches show advantages and disadvantages. What is missing is a joint approach. Three components are required:
i) a quasilinear regression approach, namely the maximum of arithmetic, ordinal and logarithmic dose metameter models according to Tukey et al. (1985)
ii) a contrast test for a maximum of Williams-type contrasts according to Bretz and Hothorn (2003)
iii) the multiple marginal models approach according to Pipper et al. (2011) allowing the distribution of the maximum of multiple glmm’s.
This new versatile trend test provides three advantages:
1) almost powerful for any shape of the dose-.response (including sublinear and supralinear)
2) problem-related interpretability based on confidence limits of slopes and/or contrasts
3) widespread use in the glmm.
By means of the R library(tukeytrend) (Schaarschmidt et al., 2017) case studies for multinomial vector comparisons, multiple binary endpoints, bivariate different scaled endpoints and ANCOVA-adjusted dose-response data will be explained.
F. Bretz and L. Hothorn. Statistical analysis of monotone or non-monotone dose-response data from in vitro toxicological assays. ATLA-Altern Lab Anim, 31(Suppl. 1):81{96, JUN 2003. ISSN 0261-1929.
J. W. Tukey, J. L. Ciminar, and J. F. Heyse. Testing the statistical certainty of a response to increasing doses of a drug. Biometrics, 41(1):295{301, 1985. doi: 10.2307/2530666.
R. C. Pipper, C. B. and H. Bisgaard. A versatile method for confirmatory evaluation of the e#ects of a covariate in multiple models. Journal of the Royal Statistical Society Series C-applied Statistics, 61:315{326, 2012. doi:
Uwe Ligges (TU Dortmund, Germany)
https://www.statistik.tu-dortm...
Title: 20 years of CRAN
Abstract: We will look at this history of CRAN, its success and its shortcomings. We try to answer some of the question you always had about CRAN:
- What and who is CRAN?
- Why are these CRAN maintainers so nitpicking on what appears to be some irrelevant Note in my package and why do I have to check it prior to submission?
- How does CRAN exist physically?
- What are the technical solutions to keep a repository of more than 10000 packages running?
- How can we check those many packages on several flavour of R and platforms on a single day?
- What are the human resources that have been spent during the last years?
- How can package maintainers help to get the workload as small as possible?
- What are the necessary changes for the future we have to tackle in order to keep such a repository as successful as it is today?
- What will happen during the next 20 years of CRAN?