Interpretation templates

Reminder:

Ensure that your interpretations are clear enough for someone to understand your report without referring to the tables.

Use "Variables in GSS" document, read the full wording of the questions, response sets, and use "What it measures" columns in your interpretations.

Example 1:

Correct: The perceived discrimination at work because of age variable shows that 7.93% of the respondents feel discriminated at work because of age and 92.07% do not.

Wrong: “The wkageism shows that 7.93% of the respondents reported that they feel discriminated and 92.07% do not.

  • Do not use variable names in the interpretation. Variable names are meant for coding purposes. There's no word called "wkageism." No one would understand what you mean.

Wrong: the r feels discriminated because of age shows that 7.93% of the respondents reported that they feel discriminated and 92.07% do not.

  • Do not use the text appears on the top of the table. No one would understand what you mean by "r" here.

Example 2:

Correct: The internet use in hours variable shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.

Wrong: “The wwwhr shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.

  • Do not use variable names in the interpretation. Variable names are meant for coding purposes. There's no word called "wwwhr." No one would understand what you mean.

Wrong: the dd shows the average hours that the respondents use the internet is 15.51, with standard deviation 19.48.

  • Do not use the text appears in the table. No one would understand what you mean by "dd" here.


Frequency tables

are for

frq(gss$, out = )

shows that of the respondents are 7.23% of the respondents are widowed; 17.23% of the respondents are divorced; 2.92% of the respondents are separated; 31.20% of the respondents are never married.

Slides: descriptive statistics


Descriptive tables 1

are for .

descr(gss$, out = "v", show = "short")

shows that the average age of the respondents is , with standard deviation .

Slides: descriptive statistics

Descriptive tables 2 (for computed variables)

descr(gss$hapindex, out = "v", show = "short")

Indicate the highest possible score in your interpretation ➜ "Out of 3", "Out of 5", etc.

Indicate the full name of the index variable:

Correct: The happiness index score of the GSS respondents is 2.10 out of 3, with standard deviation 0.47.

Wrong: The hapindex score of the GSS respondents is 2.10 out of 3, with standard deviation 0.47.

The happiness index score of the GSS respondents is 2.10 out of 3, with standard deviation 0.47.


Chi-square (example 1)

We utilize crosstabs for chi-square analysis, which is used to discover if there is a relationship between two categorical variables (we check the p value). We refer to statistically significant as p < 0.05. The Chi-Square Test can only compare categorical variables.

sjt.xtab(gss$sex, gss$health, show.row.prc = TRUE)

Independent variable first (sex), dependent variable second (health)

Respondents' sex has NO effect on the condition of health since the p value is HIGHER than 0.05. We can conclude that males and females have similar health conditions.

Slides: chisquare

Chi-square (example 2)

We utilize crosstabs for chi-square analysis, which is used to discover if there is a relationship between two categorical variables (we check the p value). We refer to statistically significant as p < 0.05. The Chi-Square Test can only compare categorical variables.

gss$agenew <- rec(gss$age, rec = 
"18:39=1 [18-39 age group];
40:59=2 [40-59 age group]; 
60:89=3 [60-89 age group]", append = FALSE)


sjt.xtab(gss$agenew, gss$health, show.row.prc = TRUE)

Independent variable first (agenew), dependent variable second (health)

Age groups have an effect on the condition of health since the p value is LESS than 0.05. We can conclude that age groups have substantially different health conditions.

Slides: chisquare


T-test (example 1)

A t-test is used to determine if there is a significant difference between the means of two groups. A t-test is used when we wish to compare two means (the scores must be continuous).

t.test(conrinc ~ sex, data = gss) %>% 
  parameters() %>% 
  display(format="html")

Dependent variable first (conrinc), independent variable second (sex)

The average personal income in dollars of males is $49,306, while the average personal income in dollars of females is $35,277. personal income in dollars differs by respondents' sex in a statistically significant way since the p-value is LESS than 0.05

Slides: ttest

T-test (example 2)

A t-test is used to determine if there is a significant difference between the means of two groups. A t-test is used when we wish to compare two means (the scores must be continuous).

t.test(educ ~ sex, data = gss) %>% 
  parameters() %>% 
  display(format="html")

Dependent variable first (educ), independent variable second (sex)

The average education in years of males is 14.08 year, while the average education in years of females 14.15 year. Education in years does not differ by respondents' sex in a statistically significant way since the p-value is HIGHER than 0.05

Slides: ttest


Bar graph (for categorical variables)

plot_frq(gss$marital, type = "bar", geom.colors = "#336699")

Same as frequency table interpretation

shows that of the respondents are 7.23% of the respondents are widowed; 17.23% of the respondents are divorced; 2.92% of the respondents are separated; 31.20% of the respondents are never married.

Slides: visualization

Histogram (for continuous variables)

plot_frq(gss$educ, type = "hist",show.mean = TRUE, show.mean.val = TRUE, normal.curve = TRUE, show.sd = TRUE, normal.curve.color = "red")

Same as descriptive table interpretation

The education in years variable shows that the average education in years of the respondents is 14.11, with standard deviation 2.89.

Slides: visualization


Stacked bar graphs for multiple variables

graph <- gss %>%
  select (conbus, coneduc, confed, conmedic, conarmy, conjudge) %>%
  plot_stackfrq(sort.frq = "first.asc", coord.flip = FALSE, geom.colors = "Blues", show.total = FALSE,
                title = "Confidence in major US institutions")

# the second part of the code is to change font sizes

graph + theme(
  axis.text.x = element_text(size=14), # change font size of x-axis labels
  axis.text.y = element_text(size=14), # change font size of y-axis labels
  plot.title=element_text(size=20), # change font size of plot title
  legend.text = element_text(size=14)) # change font size of legend

When interpreting stacked bar graphs, we generally interpret one response

Of the GSS respondents, 44.2% have a great deal of confidence in the military; 33.4% have a great deal of confidence in medicine; 18.2% in education, 16% in the supreme court; 14.1% in major companies, and 10.4% in the executive branch of government.

Slides: visualization


Stacked bar graphs by different groups

plot_xtab(gss$dependentvar, gss$independentvar, show.total=FALSE, show.n = FALSE)

30.2% of the 18-39 age group, 32.1% of the 40-59 age group, 38.3% of the 60-89 age group have a great deal of confidence in medicine.

Slides: visualization


Correlation analysis table

Correlation analysis examines the linear relationship of two continuous variables.

IF the p-value is statistically significant (<0.05);

  • .1 < | r | < .3 … weak correlation

  • .3 < | r | < .5 … moderate correlation

  • .5 < | r | ………. strong correlation

tab_corr (gss[, c("sei10", "spsei10")],
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")

The order of the variables does not matter.

There is a significant correlation between the socioeconomic index score of the respondents and the socioeconomic index score of the respondents’ spouses since the p-value is less than .05.

This correlation is positive and moderate since the r-value is 0.382 (between 0.3 and 0.5).

This means that the socioeconomic index score of the respondents and the socioeconomic index score of the respondents’ spouses increase and decrease together.

Slides: correlation


Correlation analyses

Correlation scatterplot graph

Correlation analysis examines the linear relationship of two continuous variables.

IF the p-value is statistically significant (<0.05);

  • .0 < | r | < .3 … weak correlation

  • .3 < | r | < .5 … moderate correlation

  • .5 < | r | ………. strong correlation

scatterplot <- ggscatter(gss, x = "sei10", y = "spsei10",
add = "loess", conf.int = TRUE, color = "black", point=F,
xlab = "Socio-economic index score of the respondents", 
ylab = "Socio-economic index score of the respondents’ spouses")
scatterplot + stat_cor(p.accuracy = 0.001, r.accuracy = 0.01)

The order of the variables does not matter.

There is no significant correlation between the age in years and education in years since the p-value is higher than .05.

This means that age in years and education in years do not increase and decrease together.

Slides: correlation


Correlation matrix

Correlation matrix examines the linear relationship of multiple continuous variables.

IF the p-value is statistically significant (<0.05);

  • .0 < | r | < .3 … weak correlation

  • .3 < | r | < .5 … moderate correlation

  • .5 < | r | ………. strong correlation

tab_corr (gss[, c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")], 
wrap.labels = 30, p.numeric = TRUE, triangle="lower", na.deletion = "pairwise")

Same interpretation as correlation analysis table and scatterplot graph

Slides: correlation


Scatterplot matrix

Scatterplot matrix examines the linear relationship of multiple continuous variables.

IF the p-value is statistically significant (<0.05);

  • .0 < | r | < .3 … weak correlation

  • .3 < | r | < .5 … moderate correlation

  • .5 < | r | ………. strong correlation

pairs.panels(gss[, c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")],
ellipses=F, scale=F, show.points=F, stars=T, ci=T)

Same interpretation as correlation analysis table and scatterplot graph

Slides: correlation


Correlogram

Correlogram examines the linear relationship of multiple continuous variables.

IF the p-value is statistically significant (<0.05);

  • .0 < | r | < .3 … weak correlation

  • .3 < | r | < .5 … moderate correlation

  • .5 < | r | ………. strong correlation

selectedvariables <- c("sei10", "spsei10", "tvhours", "usetech", "age", "educ", "marasiannew", "marhispnew")
testRes = cor.mtest(gss[, selectedvariables])gssrcorr = rcorr(as.matrix(gss[, selectedvariables]))
gsscoeff = gssrcorr$rcorrplot(gsscoeff, p.mat = testRes$p, method = 'pie', type = 'lower', insig='blank',
addCoef.col = 'black', order = 'original', diag = FALSE)$corrPos

Same interpretation as correlation analysis table and scatterplot graph

Slides: correlation


Regression analyses

Linear regression analysis

In regression, we explain the effects of independent variables on the dependent variable by estimating how changes in the independent variables are associated with changes in the dependent variable.

Unlike correlation analysis, regression analysis can be used to determine the direction and strength of a potential causal relationship.

model4 <- lm(conrinc ~ god + age + physhlth + educ, data = gss)
tab_model(model4, show.std = T, show.ci = F, collapse.se = T, p.style = "stars")

Dependent variable (conrinc) first, followed by independent variables separated by a plus (+).

Age in years, days of poor physical health past 30 days, and education in years are statistically significant predictors of personal income since the p values are less than 0.05. Confidence in the existence of God is not a statistically significant predictor of personal income since the p value is greater than 0.05.

A year increase in age increases personal income by $504. A day increase in poor physical health past 30 days decreases personal income by $857. A year increase in the years of education increases personal income by $4,845.

The strongest predictor of personal income is the education in years (std.Beta=0.34), then, then age in years (std.Beta=0.17), and then, the days of poor physical health past 30 days (std.Beta=-0.13).

The adjusted R squared value indicates that 17.2% of the variation in personal income can be explained by education in years, age in years, and days of poor physical health past 30 days.

When reporting the coefficients, ensure that the sentence includes the units of both the independent and the dependent variable.

  • Independent variable (rank - social ranking level - 10: top; 1: bottom)

  • Dependent variable (educ - education in years - 0-20 years)

A one unit increase in social ranking level increases respondents’ education by 3.19 years.

  • Independent variable (physhlth - days of physical issues during the past 30 days - 0-30 days)

  • Dependent variable (conrinc - personal income in dollars - $336 - $170,913)

A day increase in physical issues during the past 30 days decreases personal income by $857.

  • Independent variable (age - age in years - 18-89 age)

  • Dependent variable (polviews - conservatism level - 1: extremely liberal; 7: extremely conservative)

A year increase in age increases conservatism level by 2.45 points.

The adjusted R-squared should be reported as a percentage.

Here's a shortcut for converting a number with decimals to a percentage:

If 0.007 is the adjusted R-square, then move the dot two times to the right:

0.007 0.7%

0.079 7.9%

0.172 17.2%

Slides: linear regression

Linear regression analysis (with dummy variables)

model6 <- lm(conrinc ~ god + age + physhlth + educ + male + veryhappy + prettyhappy , data = gss)
tab_model(model6, show.std = T, show.ci = F, collapse.se = T, p.style = "stars")

Age of the respondents, days of poor physical health past 30 days, the years of education, being male, being very happy, and being pretty happy are statistically significant predictors of respondents’ income since the p values are less than 0.05. Respondent's confidence in the existence of God is not a statistically significant predictor of respondents’ income since the p value is greater than 0.05.

A year increase in age increases respondents’ income by $489. A day increase in poor physical health past 30 days decreases respondents’ income by $654. A year increase in the years of education increases respondents’ income by $5,185. Being male increases income by $15,624 compared to being female. Being very happy increases income by $15,779 compared to being not too happy. Being pretty happy increases income by $8,908 compared to being not too happy.

The strongest predictor of respondents’ income is the years of education (std.Beta=0.36), followed by being male (std.Beta=0.19), the age of the respondent (std.Beta=0.17), being very happy (std.Beta=0.16), being pretty happy (std.Beta=0.11), and the days of poor physical health past 30 days (std.Beta=-0.10).

The adjusted R squared value indicates that 22.2% of the variation in respondents’ income can be explained by the years of education, age of the respondents, and days of poor physical health past 30 days, being male, being very happy, and being pretty happy.

When reporting the coefficients of the dummy variables, ensure that the sentence includes "being" and the omitted (comparison category) dummy variable:

  • Being male increases income by $15,624 compared to being female.

  • Being very happy increases income by $15,779 compared to being not too happy.

  • Being pretty happy increases income by $8,908 compared to being not too happy.

Slides: linear regression

Slides: dummy variables

Logistic regression analysis (with dummy variables)

frq(gss$class, out = "v")

gss$higherclass <- ifelse(gss$class == 3 | gss$class == 4, 1, 0)
gss$lowerclass <- ifelse(gss$class == 1 | gss$class == 2, 1, 0)

frq(gss$race, out = "v")

gss$white <- ifelse(gss$race == 1, 1, 0)
gss$nonwhite <- ifelse(gss$race == 2 | gss$race == 3, 1, 0)

model4 <- glm(higherclass ~ educ + nonwhite + prestg10, data = gss, family = binomial(link="logit"))
tab_model(model4, show.std = TRUE, show.ci = FALSE, collapse.se = TRUE, p.style = "stars")

Education in years, being non-white and occupational prestige score are significant predictors of being higher class since the p values are less than 0.05.

A year increase in education increases the likelihood of being higher class (OR=1.23; std.Beta=1.81). Being non-white decreases the likelihood of being higher class compared to being white (OR=0.62; std.Beta=0.81). One unit increase in occupational prestige score increases the likelihood of being higher class (OR=1.02; std.Beta=1.38).

The Tjur R-squared value indicates that 14.1% of the variation in being higher class can be explained by education in years, being non-white and occupational prestige score.

Slides: logistic regression

Slides: dummy variables

Last updated