Interpretation templates
Interpretation structure
This class will ask you to interpret specific analysis tables using specific variables.
Here's the order you need to follow:
Copy (ctrl or command + C) the variable name from the assignment document.
Open "Variables in GSS" document.
Open the search function on the page (ctrl or command + F).
Paste the variable name (ctrl or command + V).
Read the "full wording of the question", "response set", and "what it measures" column of the variable, and understand what the specific variable is about.
Find the analysis type below (maximize your browser window to display the outline on the right side for easier navigation)
You will see a sample table, a sample interpretation, and an interpretation template.
Check the sample table and read the sample interpretation.
Copy the interpretation template, paste into your assignment document, and change inside.
You need to use "what it measures" column when interpreting analyses. Do not use variable names or the text appears on the top of the table in the interpretation.
After the "what it measures" column, add the word of "variable" in your interpretation.
You need to use the "response set" or "labels" (as appeared on the analysis tables) in your interpretation.
Depending on the variable, you need to tweak some parts of the interpretation. For example, "15.4% of the respondents are/have/feel/think/said/reported" etc.
When the interpretation is completed, read it aloud and make sure it makes sense. Ensure that your interpretations are clear enough for someone to understand your report without referring to the tables.
Example 1:

Correct: The perceived discrimination at work because of age variable shows that 7.93% of the respondents feel discriminated at work because of age and 92.07% do not.
Correct: The perceived discrimination at work because of age variable shows that 7.93% of the respondents said yes to being discriminated because of age and and 92.07% said no.
Wrong: “The wkageism shows that 7.93% of the respondents reported that they feel discriminated and 92.07% do not.
Do not use variable names in the interpretation. Variable names are meant for coding purposes. There's no word called "wkageism." No one would understand what you mean.
Wrong: the r feels discriminated because of age shows that 7.93% of the respondents reported that they feel discriminated and 92.07% do not.
Do not use the text appears on the top of the table. No one would understand what you mean by "r" here.
Example 2:

Correct: The internet use in hours variable shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.
Wrong: “The wwwhr shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.
Do not use variable names in the interpretation. Variable names are meant for coding purposes. There's no word called "wwwhr." No one would understand what you mean.
Wrong: the dd shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.
Do not use the text appears in the table. No one would understand what you mean by "dd" here.
Descriptive statistics
Frequency tables
are for
frq(gss$, out = )

The [what it measures column] variable shows that xx.xx% of the respondents are / have / feel / think/ said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
Slides: descriptive statistics
Descriptive tables 1
are for .
descr(gss$, out = "v", show = "short")

The [what it measures column] variable shows the average [what it measures column] of the of the respondents is [mean], with standard deviation [SD].
Slides: descriptive statistics
Descriptive tables 2 (for computed variables)
descr(gss$hapindex, out = "v", show = "short")

Indicate the highest possible score in your interpretation ➜ "Out of 3", "Out of 5", etc.
Indicate the full name of the index variable:
[The full name of the index variable] score of the respondents is [mean] out of 3/5/7/10, with standard deviation [SD].
Chi-square
Chi-square (example 1): insignificant p-value
We utilize crosstabs for chi-square analysis, which is used to discover if there is a relationship between two categorical variables (we check the p value). We refer to statistically significant as p < 0.05. The Chi-Square Test can only compare categorical variables.
sjt.xtab(gss$sex, gss$health, show.row.prc = TRUE)
Independent variable first (sex), dependent variable second (health)

[What it measures column of the independent variable] has no effect on [what it measures column of the dependent variable] since the p-value is higher than 0.05. We can conclude that [label 1 of the independent variable] and [label 2 of the independent variable]... have/are/feel... similar [what it measures column of the dependent variable].
Slides: chisquare
Chi-square (example 2): significant p-value
We utilize crosstabs for chi-square analysis, which is used to discover if there is a relationship between two categorical variables (we check the p value). We refer to statistically significant as p < 0.05. The Chi-Square Test can only compare categorical variables.
gss$agegroups <- rec(gss$age, rec =
"18:39=1 [18-39 age group];
40:59=2 [40-59 age group];
60:89=3 [60-89 age group]", append = FALSE)
sjt.xtab(gss$agegroups, gss$health, show.row.prc = TRUE)
Independent variable first (agegroups), dependent variable second (health)

[What it measures column of the independent variable] has an effect on [what it measures column of the dependent variable] since the p-value is less than 0.05. We can conclude that [label 1 of the independent variable] and [label 2 of the independent variable]... have/are/feel... substantially different [what it measures column of the dependent variable].
Slides: chisquare
T-test
T-test (example 1): insignificant p-value
A t-test is used to determine if there is a significant difference between the means of two groups. A t-test is used when we wish to compare two means (the scores must be continuous).
t.test(educ ~ sex, data = gss) %>%
parameters() %>%
display(format="html")
Dependent variable first (educ), independent variable second (sex)

[What it measures column of the dependent variable] of [label 1 of the independent variable] is [mean] year/dollar/point/score, while [What it measures column of the dependent variable] of [label 2 of the independent variable] is [mean] year/dollar/point/score. [What it measures column of the dependent variable] does not differ by [What it measures column of the independent variable] in a statistically significant way since the p-value is higher than 0.05.
Slides: ttest
T-test (example 2): significant p-value
A t-test is used to determine if there is a significant difference between the means of two groups. A t-test is used when we wish to compare two means (the scores must be continuous).
t.test(conrinc ~ sex, data = gss) %>%
parameters() %>%
display(format="html")
Dependent variable first (conrinc), independent variable second (sex)

[What it measures column of the dependent variable] of [label 1 of the independent variable] is [mean] year/dollar/point/score, while [What it measures column of the dependent variable] of [label 2 of the independent variable] is [mean] year/dollar/point/score. [What it measures column of the dependent variable] differs by [What it measures column of the independent variable] in a statistically significant way since the p-value is less than 0.05.
Slides: ttest
Visualization
Bar graph (for categorical variables)
plot_frq(gss$marital, type = "bar", geom.colors = "#336699")

The [what it measures column] variable shows that xx.xx% of the respondents are / have / feel / said / reported [label 1], xx.xx% of the respondents are / have / feel / said / reported [label 2]...
Slides: visualization
Histogram (for continuous variables)
plot_frq(gss$educ, type = "hist",show.mean = TRUE, show.mean.val = TRUE, normal.curve = TRUE, show.sd = TRUE, normal.curve.color = "red")

The [what it measures column] variable shows the average [what it measures column] of the of the respondents is [mean], with standard deviation [SD].
Slides: visualization
Stacked bar graphs for multiple variables
graph <- gss %>%
select (conbus, coneduc, confed, conmedic, conarmy, conjudge) %>%
plot_stackfrq(sort.frq = "first.asc", coord.flip = TRUE, geom.colors = "Blues", show.total = FALSE,
title = "Confidence in major US institutions")
# the second part of the code is to change font sizes
graph + theme(
axis.text.x = element_text(size=14), # change font size of x-axis labels
axis.text.y = element_text(size=14), # change font size of y-axis labels
plot.title=element_text(size=20), # change font size of plot title
legend.text = element_text(size=14)) # change font size of legend

Of the GSS respondents, xx.xx% are/have/feel/report [label 1]; xx.xx% are/have/feel/report [label 2]; xx.xx% are/have/feel/report [label 3]...
Slides: visualization
Stacked bar graphs by different groups
plot_xtab(gss$dependentvar, gss$independentvar, show.total=FALSE, show.n = FALSE)
Independent variable first (agegroups), dependent variable second (conmedic)

xx.xx% of the [label 1 of the independent variable], xx.xx% of the [label 2 of the independent variable], xx.xx% of the [label 3 of the independent variable]... are/have/feel/report [label and what it measures of the dependent variable].
Slides: visualization
Correlation analyses
Correlation analysis structure
Correlation analysis examines the linear relationship of two continuous variables.
IF the p-value is statistically significant (<0.05);
less than |0.3| … weak correlation
|0.3| < | r | < |0.5| … moderate correlation
greater than |0.5| ………. strong correlation
The order of the variables does not matter.
(1a) Correlation analysis table (significant p-value, positive correlation)
tab_corr (gss[, c("sei10", "spsei10")],
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")
Read "correlation analysis structure" above first.

There is a significant correlation between [what it measures column of variable 1] and [what it measures column of variable 2] since the p-value is less than .05.
This correlation is positive and weak since the r-value is 0.xxx (less than |0.3|).
OR This correlation is positive and moderate since the r-value is 0.xxx (between |0.3| and |0.5|).
OR This correlation is positive and strong since the r-value is 0.xxx (higher than |0.5|).
This means that [what it measures column of variable 1] and [what it measures column of variable 2] increase and decrease together.
Slides: correlation
(1b) Correlation analysis table (significant p-value, negative correlation)
tab_corr (gss[, c("tvhours", "usetech")],
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")
Read "correlation analysis structure" above first.

There is a significant correlation between [what it measures column of variable 1] and [what it measures column of variable 2] since the p-value is less than .05.
This correlation is negative and weak since the r-value is -0.xxx (less than |0.3|).
OR This correlation is negative and moderate since the r-value is -0.xxx (between |0.3| and |0.5|).
OR This correlation is negative and strong since the r-value is -0.xxx (higher than |0.5|).
This means that as [what it measures column of variable 1] increases [what it measures column of variable 2] decreases, and vice versa.
(1c) Correlation analysis table (insignificant)
tab_corr (gss[, c("age", "educ")],
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")

There is a no significant correlation between [what it measures column of variable 1] and [what it measures column of variable 2] since the p-value is higher than .05.
This means that [what it measures column of variable 1] and [what it measures column of variable 2] do not increase and decrease together.
(2) Correlation scatterplot graph
xlab: "what it measures column" of variable 1 (x)
ylab: "what it measures column" of variable 2 (y)
scatterplot <- ggscatter(gss, x = "tvhours", y = "usetech",
add = "loess", conf.int = TRUE, color = "black", point=F,
xlab = "Television screen time", ylab = "Percentage of time use at work using electronic technologies")
scatterplot + stat_cor(p.accuracy = 0.001, r.accuracy = 0.01)

Slides: correlation
(3) Correlation matrix
Correlation matrix examines the linear relationship of multiple continuous variables.
tab_corr (gss[, c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")],
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")

Slides: correlation
(4) Scatterplot matrix
Scatterplot matrix examines the linear relationship of multiple continuous variables.
pairs.panels(gss[, c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")],
ellipses=F, scale=F, show.points=F, stars=T, ci=T)

Slides: correlation
(5) Correlogram
Correlogram examines the linear relationship of multiple continuous variables.
selectedvariables <- c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")
testRes = cor.mtest(gss[, selectedvariables])
gssrcorr = rcorr(as.matrix(gss[, selectedvariables]))
gsscoeff = gssrcorr$r
corrplot(gsscoeff, p.mat = testRes$p, method = 'pie', type = 'lower', insig='blank',
addCoef.col = 'black', order = 'original', diag = FALSE)$corrPos

Slides: correlation
Linear regression analysis
In regression, we explain the effects of independent variables on the dependent variable by estimating how changes in the independent variables are associated with changes in the dependent variable. Unlike correlation analysis, regression analysis can be used to determine the direction and strength of a potential causal relationship.
model4 <- lm(conrinc ~ god + age + physhlth + educ, data = gss)
tab_model(model4, show.std = T, show.ci = F, collapse.se = T, p.style = "stars")
Dependent variable (conrinc) first, followed by the independent variables separated by a plus (+).

Linear regression analysis interpretation breakdown
First paragraph: [The significance levels] Mention which variables (“what it measures”) are statistically significant (if any), and which variables are statistically insignificant (if any). Variables with at least one asterisk (*) are statistically significant.
Respondents’ age, days of poor physical health past 30 days, and respondents' education in years are statistically significant predictors of respondents’ personal income since the p values are less than 0.05. Respondent's confidence in the existence of God is not a statistically significant predictor of respondents’ personal income since the p value is greater than 0.05.
[What it measures column of significant independent variable 1], [what it measures column of significant independent variable 2], and [what it measures column of significant independent variable 3]... are statistically significant predictors of [what it measures column of the dependent variable] since the p values are less than 0.05. [What it measures column of insignificant independent variable 1], [what it measures column of insignificant independent variable 2]... are not a statistically significant predictor of [what it measures column of the dependent variable] since the p value is greater than 0.05.
Second paragraph: [The explanation of coefficients (Estimates column)] Mention how significant independent variables increase or decrease the value of the dependent variable, using the “Estimates” column. When reporting the estimates (coefficients), ensure that the sentence includes the units (one unit, a day, a score, a year, a dollar, etc.) of both the independent and the dependent variable.
A year increase in respondents’ age increases respondents’ personal income by $504. A day increase in poor physical health past 30 days decreases respondents’ personal income by $857. A year increase in the respondents’ education increases respondents’ personal income by $4,845.
A [unit/day/score,year,dollar] increase in [what it measures column of significant independent variable 1] increases [what it measures column of the dependent variable] by [estimates column with the unit of analysis]. A [unit/day/score,year,dollar] increase in [what it measures column of significant independent variable 2] increases [what it measures column of the dependent variable] by [estimates column with the unit of analysis]. A [unit/day/score,year,dollar] increase in [what it measures column of significant independent variable 3] increases [what it measures column of the dependent variable] by [estimates column with the unit of analysis]....
Third paragraph: [The explanation of standardized betas (std.Beta column)] Mention the strongest predictors (variables) of the dependent variable using the “std.Beta” (standardized beta) column in order. Only mention the statistically significant ones. “std.Beta” is an absolute number, which means -.56 is stronger than .45.
The strongest predictor of respondents’ personal income is the respondents' education in years (std.Beta=0.34), followed by respondents’ age (std.Beta=0.17), and the days of poor physical health past 30 days (std.Beta=-0.13).
The strongest predictor of [what it measures column of the dependent variable] is the [what it measures column of significant independent variable 1] (std.Beta=0.xx), followed by [what it measures column of significant independent variable 2] (std.Beta=0.xx), and [what it measures column of significant independent variable 3] (std.Beta=0.xx).
Fourth paragraph: [The explanation of R-squared] Report the adjusted R-squared value as a percentage with the statistically significant variables.
The adjusted R squared value indicates that 17.2% of the variation in respondents’ personal income can be explained by the respondents’ age, days of poor physical health past 30 days, and respondents' education in years.
The adjusted R squared value indicates that [R2 squared] of the variation in [what it measures column of the dependent variable] can be explained by [what it measures column of significant independent variable 1], [what it measures column of significant independent variable 2], [what it measures column of significant independent variable 3]...
Reporting of estimates (coefficients)
Reporting of adjusted R-squared
The adjusted R-squared shows whether adding additional independent variables improve a regression model or not.
The adjusted R-squared should be reported as a percentage.
Here's a shortcut for converting a number with decimals to a percentage:
If 0.007 is the adjusted R-square, then move the dot two times to the right:
0.007 ➜ 0.7%
0.079 ➜ 7.9%
0.172 ➜ 17.2%
Slides: linear regression
Linear regression analysis (with dummy variables)

model6 <- lm(conrinc ~ god + age + physhlth + educ + male + veryhappy + prettyhappy , data = gss)
tab_model(model6, show.std = T, show.ci = F, collapse.se = T, p.style = "stars")
Reporting of dummy variable estimates (coefficients)
When reporting the coefficients of dummy variables, ensure that the sentence includes "being" or "having," as well as the omitted (comparison) dummy variable.
This is because creating dummy variables essentially means creating new variables, which changes the information they measure (aka "what it measures").
For example, if you create "male" and "female" dummy variables based on the "respondents' sex" variable, the "male" dummy variable now measures "being male," and the "female" dummy variable now measures "being female."
Being male increases income by $15,624 compared to being female.
If you create "veryhappy," "prettyhappy," and "nottoohappy" dummy variables based on the "happiness level" variable, the "veryhappy" dummy variable now measures "being very happy," the "prettyhappy" dummy variable measures "being pretty happy," and the "nottoohappy" dummy variable measures "being not too happy."
Being very happy increases income by $15,779 compared to being not too happy.
Being pretty happy increases income by $8,908 compared to being not too happy.
If you create "ownhouse" and "renthouse" dummy variables based on the "home ownership" variable, the "ownhouse" dummy variable now measures "having a house" (or "owning a house"), and the "renthouse" dummy variable measures "renting a house."
Having a house increases life satisfaction by 3.2 points compared to renting a house.
OR
Owning a house increases life satisfaction by 3.2 points compared to renting a house.
Slides: linear regression
Slides: dummy variables
Logistic regression analysis (with dummy variables)

frq(gss$class, out = "v")
gss$higherclass <- ifelse(gss$class == 3 | gss$class == 4, 1, 0)
gss$lowerclass <- ifelse(gss$class == 1 | gss$class == 2, 1, 0)
frq(gss$wrkslf, out = "v")
gss$selfemployed <- ifelse(gss$wrkslf == 1, 1, 0)
gss$workforsomeoneelse <- ifelse(gss$wrkslf == 2, 1, 0)
frq(gss$race, out = "v")
gss$white <- ifelse(gss$race == 1, 1, 0)
gss$nonwhite <- ifelse(gss$race == 2 | gss$race == 3, 1, 0)
model1 <- glm(higherclass ~ selfemployed + educ + nonwhite, data = gss, family = binomial(link="logit"))
tab_model(model1, show.std = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")
Logistic regression analysis interpretation breakdown
Reporting the Odd Ratios of continuous and dummy independent variables
Reporting the negative odd ratios and Std.Betas
If the odd ratio is less than 1, it is negative
Negative Odds Ratio =
1➗ Odd Ratio
Type “calculator” on Google
Divide 1 by the Odd Ratio
1➗ 0.60 = 1.66
If the odd ratio is less than 1, the standardized beta is also negative:
Negative Std.Beta =
Divide 1 by the Std.Beta
1➗ Std.Beta
1➗ 0.80 = 1.25

Reporting of Tjur R-Squared
The Tjur R-Squared shows whether adding additional independent variables improve a logistic regression model or not.
The Tjur R-Squared should be reported as a percentage.
Here's a shortcut for converting a number with decimals to a percentage:
If 0.007 is the Tjur R-Squared, then move the dot two times to the right:
0.007 ➜ 0.7%
0.079 ➜ 7.9%
0.172 ➜ 17.2%
Slides: logistic regression
Slides: dummy variables
Last updated