# Regression Tables w/ Many Subsets of Data in Long Form

Please turn the website to “light mode” to see the tables clearly!

Update (10/3/2022): The steps in this tutorial can be easily replicated with one simple function using my new package called Panelsummary. Please update to version 1.1 which will be available on CRAN shortly.

Update (2/15/2022) I have updated the steps in this tutorial to match the changes made in the new modelsummary 1.3. Please be sure to update to this version if you plan on following this tutorial!

Oftentimes, it is preferable to create a regression table that has many subsets of your data set. For instance, in my own work on fraternities and crime, I find it necessary to show three sets of results: effects on in the full sample, effects on the weekends, and effects on the weekdays. Moreover, I believe a great way to present this data is with a long table rather than a wide table. In particular, this means having a table with grouped rows—one grouping for each subset of data. Usually, these tables can only be created with manual editing since no out-out-box function in R can handle such a task at the moment. However, the modelsummary package can actually achieve this with some small data manipulation with dplyr. For reference, below is a regression table I’ll demonstrate how to make:

Table 1: Effect of Horse Power on MPG
Outcome: MPG
Low MPG Cars
Horse Power-0.036*-0.036*
(0.004)(0.004)
Num.Obs1818
Mid MPG Cars
Horse Power-0.067+-0.057
(0.006)(0.012)
Num.Obs88
High MPG Cars
Horse Power0.1410.141
(0.084)(0.084)
Num.Obs66
Fixed Effects
FE: GearXX
FE: AMX
Note:
Low MPG cars are below 20 miles-per-gallon.
Mid MPG cars are between 20 and 25 miles-per-gallon.
High MPG cars are above 25 miles-per-gallon.

Note that this table will look even better when rendered to a PDF.

Before we begin, load in the necessary packages:

## loading in necessary packages
library(fixest)
library(modelsummary)
library(tidyverse)
library(kableExtra)

## Step 1: The Models

I am going to estimate two separate models using the mtcars dataset that comes pre-loaded in R. The first model will be a linear regression of mpg on hp with gear fixed effects. The second model will be a linear regression of mpg on hp but with gear AND am fixed effects. I’ll split up the data into three subsets that I want to estimate these models on: cars with a low, mid, and high mpg. These models are not supposed to be insightful in any way, just a way to demonstrate the table-making process, so bear with me.

First, let’s create a new column that will split our data into our desired subsets:

## creating a new variable
cars <- mtcars %>%
mutate(mpg_levels = case_when(
between(mpg, 0, 20) ~ "Low MPG",
between(mpg, 20, 25) ~ "Mid MPG",
between(mpg, 26, 50) ~ "High MPG"
)) 

Now let’s estimate our regressions. Use your favorite regression package, although I’ll use fixest here:

## Estimating both models on low mpg cars
low_mpg <- cars %>%
filter(mpg_levels == "Low MPG") %>%
feols(mpg ~ hp |gear , data = .)
low_mpg_2 <- cars %>%
filter(mpg_levels == "Low MPG") %>%
feols(mpg ~ hp |gear + am , data = .)

## Estimating both models on mid mpg cars
mid_mpg <- cars %>%
filter(mpg_levels == "Mid MPG") %>%
feols(mpg ~ hp |gear , data = .)
mid_mpg_2 <- cars %>%
filter(mpg_levels == "Mid MPG") %>%
feols(mpg ~ hp |gear + am, data = .)

## Estimating both models on high mpg cars
high_mpg <- cars %>%
filter(mpg_levels == "High MPG") %>%
feols(mpg ~ hp |gear , data = .)
high_mpg_2 <- cars %>%
filter(mpg_levels == "High MPG") %>%
feols(mpg ~ hp |gear + am , data = .)

Now, I’m going to save each of these groups of models into a list:

## saving the low mpg models into a list
low <- list(low_mpg, low_mpg_2)

## saving the mid mpg models into a list
mid <- list(mid_mpg, mid_mpg_2)

## saving the high mpg models into a list
high <- list(high_mpg, high_mpg_2)

## Step 2: Pass these into modelsummary

The next step is to pass these into the modelsummary function, but with a slight twist. In particular, we are going to use the output = "data.frame" argument to create a dataframe of our models. But first, I want to note that I am going to omit all of the goodness-of-fit statistics that modelsummary spits out except the number of observations. To do this, I am going to use the gof_map argument. Take a look at the modelsummary website for how to use this, or you can check out this stackoverflow conversation for a minimal reproducible example since I will not go into details here. The main takeaway for these next few lines of code is this: I am creating “cleaned” versions of the goodness-of-fit statistics that I want- the first is for the first two lists of models (low and mid), while the second is for the last list of models (high). I do this because I want the table to only have fixed effects at the bottom of the table, yet I want to have the number of observations within the table.

## creating cleaned goodness-of-fit mappings for the low and mid models
gm_first <- tribble(~raw, ~clean, ~fmt,
"nobs", "Num.Obs", 0)

## creating cleaned goodness-of-fit mappings for the high models
gm_last <- tribble(~raw, ~clean, ~fmt,
"nobs", "Num.Obs", 0,
"FE: gear", "FE: Gear", 0,
"FE: am", "FE: AM", 0)

Next, I will pass in the low, mid, and high lists into modelsummary, and save the output as a dataframe.

## saving low models as a data frame - doing some additional presentation cleaning
low_table <- modelsummary(low, stars = T,
coef_map = c("hp" = "Horse Power"), output =
"data.frame",
gof_map = gm_first)  %>%
mutate(term = ifelse(statistic == "std.error", "", term)) %>%
select(matches("term|^\\("))

## saving mid models as a data frame - doing some additional presentation cleaning
mid_table <- modelsummary(mid, stars = T,
coef_map = c("hp" = "Horse Power"), output =
"data.frame",
gof_map = gm_first)  %>%
mutate(term = ifelse(statistic == "std.error", "", term)) %>%
select(matches("term|^\\("))

## saving high models as a data frame - doing some additional presentation cleaning
## BE AWARE I PASSED IN GM_SECOND to the GOF_MAP arugment here!!!!
high_table <- modelsummary(high, stars = T,
coef_map = c("hp" = "Horse Power"), output =
"data.frame",
gof_map = gm_last)  %>%
mutate(term = ifelse(statistic == "std.error", "", term)) %>%
select(matches("term|^\\("))

This created 3 separate dataframes with the information we want. Note that I used the gm_last for the final high_table dataframe. Now, let’s append these dataframes together. WARNING: Please be advised that the order *does** matter here since we want the fixed effects to be displayed last!!

## binding together the rows - order matters!!
table <- bind_rows(low_table, mid_table, high_table)

## Step 3: Use kableExtra to customize

Now that you have your final dataframe table, you can pass this into kableExtra::kbl and begin customizing the table to your liking. I won’t go into details, but the kableExtra vignette is very detailed and excellent.

table %>%
kbl(col.names = c(" ", "(1)", "(2)"), booktabs = T,
caption  = "Effect of Horse Power on MPG") %>%  ## cleaning columnnames and adding caption
kable_classic(full_width = F, html_font = "Cambria") %>%  ## creating a latex-like output style
pack_rows("Low MPG Cars", 1, 3, bold = F, italic = T) %>% ## grouping rows
pack_rows("Mid MPG Cars", 4, 6, bold = F, italic = T) %>% ## grouping rows
pack_rows("High MPG Cars", 7,9, bold = F, italic = T) %>% ## grouping rows
pack_rows("Fixed Effects", 10, 11, bold = F, italic = T) %>%  ## grouping rows
footnote(list("Low MPG cars are below 20 miles-per-gallon.",
"Mid MPG cars are between 20 and 25 miles-per-gallon.",
"High MPG cars are above 25 miles-per-gallon."))  ## adding footnotes
Table 2: Effect of Horse Power on MPG
Outcome: MPG
Low MPG Cars
Horse Power-0.036*-0.036*
(0.004)(0.004)
Num.Obs1818
Mid MPG Cars
Horse Power-0.067+-0.057
(0.006)(0.012)
Num.Obs88
High MPG Cars
Horse Power0.1410.141
(0.084)(0.084)
Num.Obs66
Fixed Effects
FE: GearXX
FE: AMX
Note:
Low MPG cars are below 20 miles-per-gallon.
Mid MPG cars are between 20 and 25 miles-per-gallon.
High MPG cars are above 25 miles-per-gallon.

And there you have it! If you know you are going to be using a similar-style table for a project, I would recommend writing a function that can handle step 2 automatically since the coding can get a little tedious. Feel free to email me with any questions at ! ##### Michael Topper
###### PhD Candidate in Economics

Economics Student UCSB