R functions to organize linear model output

Functions described below:

create_lm_table() / source code
create_lm_summary() / source code

These functions create a publication-ready table summarizing a linear model fit. Use these functions if you have fit a linear model in R using the lm function and would like to compile the results into a concise set of metrics. To illustrate how to use these functions, I will use data from the iris dataset that comes built-in with R:

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

These functions have a few dependencies, which I load below:

library(dplyr)
library(weights)
library(stringr)
library(tidyr)

Consider the case of a simple multiple linear regression analysis. For the iris dataset, I examine whether petal length and/or petal width are associated with sepal length:

lm.overall <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris)

The call to create-lm-table() below generates a dataframe displaying the coefficients, their standard errors and confidence intervals, and associated t- and p-values. The first argument is the output of the call to the lm function. The second argument include.intercept = 1 is included to indicate that I would like a row indicating these values for the intercept term. The third argument vector.of.term.names allows me to customize the names of the terms that will be displayed in the first column of the dataframe

create_lm_table(lm.object = lm.overall, 
                include.intercept = 1, 
                vector.of.term.names = c('Intercept', 'Petal Length', 'Petal Width'))

##      Predictor  Beta    SE         95% CI     t     p
## 1    Intercept 4.191 0.097   4.382, 3.999 43.18 <.001
## 2 Petal Length 0.542 0.069   0.679, 0.405  7.82 <.001
## 3  Petal Width -0.32  0.16 -0.002, -0.637 -1.99  .048

The call to create-lm-summary() below generates a string summarizing the fit of the regression model, specifically the \(R^2\) and adjusted \(R^2\) values, and the F statistic for the model fit as well as its degrees of freedom. This function takes only one argument, the output of the call to the lm function:

create_lm_summary(lm.object = lm.overall)

## [1] "F(2, 147) = 240.95, p < .001, R^2 = 0.766, adjusted R^2 = 0.763"

Finally, I use the kableExtra package to combine these outputs into a single, publication-ready table:

library(kableExtra)
table_overall <- create_lm_table(lm.object = lm.overall, 
                                 include.intercept = 1, 
                                 vector.of.term.names = c('Intercept', 'Petal Length', 'Petal Width'))
kable(table_overall) %>%
  kable_styling(full_width = F,
                'striped',
                position = 'left') %>%
  pack_rows(str_c(create_lm_summary(lm.object = lm.overall)),
            1, nrow(table_overall))

Predictor	Beta	SE	95% CI	t	p
F(2, 147) = 240.95, p < .001, R^2 = 0.766, adjusted R^2 = 0.763
Intercept	4.191	0.097	4.382, 3.999	43.18	<.001
Petal Length	0.542	0.069	0.679, 0.405	7.82	<.001
Petal Width	-0.32	0.16	-0.002, -0.637	-1.99	.048

Next, I consider the case of a regression analysis with multiple groups, where I might want to create a table in which coefficients, etc. for each group are stacked on top of one another. I use the iris dataset again, this time performing a separate regression of petal length and width on sepal length, for each iris species separately:

lm.setosa <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris %>% filter(Species == 'setosa'))
lm.versicolor <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris %>% filter(Species == 'versicolor'))
lm.virginica <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris %>% filter(Species == 'virginica'))

I consolidate the steps above to create a single, composite table summarizing the output of all three model fits:

table_setosa <- create_lm_table(lm.object = lm.setosa, 
                                 include.intercept = 1, 
                                 vector.of.term.names = c('Intercept', 'Petal Length', 'Petal Width'))
table_versicolor <- create_lm_table(lm.object = lm.versicolor, 
                                 include.intercept = 1, 
                                 vector.of.term.names = c('Intercept', 'Petal Length', 'Petal Width'))
table_virginica <- create_lm_table(lm.object = lm.virginica, 
                                 include.intercept = 1, 
                                 vector.of.term.names = c('Intercept', 'Petal Length', 'Petal Width'))

kable(as.data.frame(rbind(table_setosa,
                          table_versicolor,
                          table_virginica))) %>%
  kable_styling(full_width = F, 'striped', position = 'left') %>%
  pack_rows(str_c('Setosa, ', create_lm_summary(lm.object = lm.setosa), sep = ''), 1, nrow(table_setosa)) %>%
  pack_rows(str_c('Versicolor, ', create_lm_summary(lm.object = lm.versicolor), sep = ''), nrow(table_setosa)+1, nrow(table_setosa)+nrow(table_versicolor)) %>%
  pack_rows(str_c('Virginica, ', create_lm_summary(lm.object = lm.virginica), sep = ''), nrow(table_setosa)+nrow(table_versicolor)+1, nrow(table_setosa)+nrow(table_versicolor)+nrow(table_virginica))

Predictor	Beta	SE	95% CI	t	p
Setosa, F(2, 47) = 2.96, p = , R^2 = 0.112, adjusted R^2 = 0.074
Intercept	4.248	0.411	5.075, 3.42	10.32	<.001
Petal Length	0.399	0.296	0.994, -0.196	1.35	.184
Petal Width	0.712	0.487	1.693, -0.268	1.46	.151
Versicolor, F(2, 47) = 31.71, p < .001, R^2 = 0.574, adjusted R^2 = 0.556
Intercept	2.381	0.449	3.284, 1.477	5.3	<.001
Petal Length	0.934	0.169	1.275, 0.594	5.52	<.001
Petal Width	-0.32	0.402	0.489, -1.129	-0.8	.430
Virginica, F(2, 47) = 69.35, p < .001, R^2 = 0.747, adjusted R^2 = 0.736
Intercept	1.052	0.514	2.085, 0.018	2.05	.046
Petal Length	0.995	0.089	1.174, 0.815	11.14	<.001
Petal Width	0.007	0.179	0.368, -0.354	0.04	.969

R functions to organize linear model output

shelby bachman