In this assignment we will replicate a study of belief in supernatural evil and attitudes towards guns in the United States. This assignment was done as a project for the Practical Data Science course taught by Mr. Panos Louridas at the Athens University of Economics and Business. The study to be replicated is:
Christopher G. Ellison, Benjamin Dowd-Arrow, Amy M. Burdette, Pablo E. Gonzalez, Margaret S. Kelley, Paul Froese, "Peace through superior firepower: Belief in supernatural evil and attitudes toward gun policy in the United States", Social Science Research, Volume 99, 2021, https://doi.org/10.1016/j.ssresearch.2021.102595.
The data we are going to use come from Wave Four of the Balor Religion Survey (BRS) as it is mentioned in chapter 1.4 of the paper.
To test these hypotheses, we use data from Wave Four of the Baylor Religion Survey (BRS), which was conducted in January of 2014. Briefly, the BRS is a national random sample of 1572 non-institutionalized respondents ages 18 and over who reside in the continental United States.
With a google search for the term Baylor Religion Survey we find the website https://www.baylor.edu/baylorreligionsurvey/ containing the data.
We click on the data tab and then at the DATA AT THEARDA.COM button
In the page that opens up, we choose the Baylor Religion Survey, Wave IV (2014) - Instructional Dataset as it is already prepared for easier use in the classroom.
On the new page we can click the Download tab and then download the excel (.xlsx) file to our computer by first agreeing to the Privacy Policy and hitting Continue.
So, now that we have the dataset, let's import some libraries and load it.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import seaborn as sns
df = pd.read_excel("Baylor Religion Survey, Wave IV (2014) - Instructional Dataset.XLSX")
df.shape
(1572, 288)
df.sample(10)
MOTHERLODE_ID | PROJECT_ID | RESPONDENT_ID | METHOD_ID | METHOD_TYPE | PROJECT_NUMBER | CREATED_ON | PRACTICE | RESPONDENT_DATE | RESPONDENT_LANGUAGE | ... | LIBCONR | PARTYIDR | CHILDSR | HRSWORKD | EDUCR | I_GENDER | I_EDUC | I_MARITAL | I_RELIGION | I_ATTEND | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
267 | 128171109 | 33370 | 01_000285_00000012 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 1/28/2014 | en-US | ... | 3.0 | 2.0 | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 6.0 |
464 | 128171905 | 33370 | 01_000482_00000020 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 1/28/2014 | en-US | ... | 1.0 | 1.0 | 3.0 | 3.0 | 3.0 | 2.0 | 3.0 | 2.0 | 5.0 | 6.0 |
1269 | 128172732 | 33370 | 01_001468_00000059 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 2/25/2014 | en-US | ... | 2.0 | 1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 5.0 | 2.0 | 2.0 | 6.0 |
484 | 128171911 | 33370 | 01_000502_00000021 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 1/28/2014 | en-US | ... | 3.0 | 3.0 | NaN | 2.0 | 2.0 | 1.0 | 2.0 | 4.0 | 3.0 | 3.0 |
219 | 128171078 | 33370 | 01_000237_00000010 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 1/28/2014 | en-US | ... | 1.0 | 1.0 | 1.0 | 4.0 | 2.0 | 1.0 | 2.0 | 4.0 | 1.0 | 3.0 |
1173 | 128171675 | 33370 | 01_001305_00000053 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 2/19/2014 | en-US | ... | 2.0 | 2.0 | 1.0 | 4.0 | 3.0 | 1.0 | 4.0 | 2.0 | 4.0 | 4.0 |
841 | 128171493 | 33370 | 01_000885_00000036 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 2/4/2014 | en-US | ... | 2.0 | 3.0 | NaN | 3.0 | NaN | 2.0 | NaN | 2.0 | 1.0 | 6.0 |
1524 | 128172319 | 33370 | 01_001830_00000074 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 3/21/2014 | en-US | ... | 1.0 | 1.0 | 1.0 | 5.0 | 3.0 | 1.0 | 3.0 | 3.0 | 1.0 | 3.0 |
1245 | 128172716 | 33370 | 01_001411_00000057 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 2/21/2014 | en-US | ... | 1.0 | 2.0 | NaN | 0.0 | NaN | 2.0 | NaN | 2.0 | 4.0 | 3.0 |
286 | 128171126 | 33370 | 01_000304_00000013 | 162048140_01 | SCAN | 162048140 | 3/26/2014 | OTHER | 1/28/2014 | en-US | ... | 1.0 | 1.0 | 2.0 | NaN | 3.0 | 2.0 | 3.0 | 2.0 | 4.0 | 3.0 |
10 rows × 288 columns
To understand the columns we refer to the Codebook of the dataset that can be found here.
In order to measure the belief in supernatural evil, we will use the answers to three questions asked by the participants in the survey:
Whether the respondent believes in the devil.
Whether the respondent believes in hell.
Whether the respondent believes in demons.
You will investigate how the answers to these three questions can be combined to a single metric. Justify your approach.
Based on the Codebook the answers to the above questions are in columns Q23A, Q23C and Q23G for belief in devil, hell and demons respectively. The response categories for each item were:
1 = Absolutely not
2 = Probably not
3 = Probably
4 = Absolutely
We will follow the paper's approach that is indeed logical. Responses to the three items are summed and averaged across the number of items answered to create the index we need.
df[['Q23A', 'Q23C', 'Q23G']].isna().sum()
Q23A 68 Q23C 73 Q23G 80 dtype: int64
df[['Q23A', 'Q23C', 'Q23G']]
Q23A | Q23C | Q23G | |
---|---|---|---|
0 | 4.0 | 4.0 | 4.0 |
1 | 4.0 | 4.0 | 4.0 |
2 | 3.0 | 2.0 | 2.0 |
3 | 4.0 | 4.0 | 4.0 |
4 | 4.0 | 4.0 | 4.0 |
... | ... | ... | ... |
1567 | 4.0 | 2.0 | 4.0 |
1568 | 3.0 | 3.0 | 3.0 |
1569 | 4.0 | 4.0 | 3.0 |
1570 | 3.0 | 3.0 | 3.0 |
1571 | NaN | NaN | NaN |
1572 rows × 3 columns
df[df['Q23A'].isna()][['Q23A', 'Q23C', 'Q23G']]
Q23A | Q23C | Q23G | |
---|---|---|---|
53 | NaN | NaN | NaN |
115 | NaN | NaN | NaN |
141 | NaN | NaN | NaN |
166 | NaN | NaN | NaN |
169 | NaN | NaN | NaN |
... | ... | ... | ... |
1538 | NaN | NaN | 4.0 |
1542 | NaN | NaN | NaN |
1556 | NaN | NaN | 4.0 |
1560 | NaN | NaN | NaN |
1571 | NaN | NaN | NaN |
68 rows × 3 columns
We have some NaN values in the above fields. We aren't going to fill them yet because we see that there are questions that are answered in the same line that there are NaN so we are going to take the mean across the rows. The mean will disregard the NaN in the rows that have at least one value. This will create values for our focal variable even if one or two values are missing from the line. Also, looking at the values I notice that most scores stay almost the same across these questions for the same individual so it's a good reasoning that the mean will in fact fill them with the mean of the columns that have values.
After creating the supernatural evil column we will have some NaN values that we will fill with the mean across columns.
df['Supernatural Evil'] = df[['Q23A', 'Q23C', 'Q23G']].mean(axis=1)
df['Supernatural Evil'].isna().sum()
55
We are left with only 55 NaN values. We will fill them with the mean.
df['Supernatural Evil'].mean()
3.089540760272468
The mean of our new variable is 3.09 rounded to 2 decimals as it is the case in the paper.
df['Supernatural Evil'].fillna(df['Supernatural Evil'].mean(), inplace=True)
Apart from the belief in supernatural evil metric, you will use several other variables to control your estimates. The variables are (see Appendix B of the original publication):
Dependent Variables
Independept Variables
Derive descriptive statistics of your variables and encode them with dummy variables where needed.
Be very careful in your dummy variables encoding. In the end, you should use the variables as shown in Table 1 and Table 2 of the original publication.
Let's firstly create a mapping of the variables to the columns of the dataset based on the Codebook, so we have it for easy reference.
Dependent Variables
Independept Variables
Now, let's show how these will be as final columns in our dataset.
Religious Variables
Political Ideology: Q31 (1-7) We will leave political ideology as is because it is a
categorical ordinal variable
Age: AGE We will leave Age as is and replace Ages having 0 with the mean of Ages column
ignoring zeros
Sex
Race
Education
Note: For the Non-Hispanic Black there is an overlap between people being in the Black group and people
in the Hispanic group but I am not removing anyone as there is no clear indicator how many Hispanics are
Black.
bible_dict = {
1: 'Biblical Literalism',
2: 'Biblical Innerancy',
3: 'Bible',
4: 'Bible'
}
rel_aff_dict = {
1: 'Conservative Protestant',
2: 'Mainline Protestant',
3: 'Black Protestant',
4: 'Catholic',
5: 'Other',
6: 'Other',
7: 'No Affiliation'
}
education_dict = {
1: 'Less Than HS',
2: 'Less Than High School',
3: 'High School or Equivalent',
4: 'Some College',
5: 'Some College',
6: 'College Graduate',
7: 'Postgraduate'
}
marital_dict = {
1: 'Not Partnered/Single',
2: 'Married/Cohabitating',
3: 'Married/Cohabitating',
4: 'Not Partnered/Single',
5: 'Not Partnered/Single',
6: 'Not Partnered/Single'
}
place_dict = {
1: 'Urban Area',
2: 'Urban Area',
3: 'Small Town/Rural',
4: 'Small Town/Rural'
}
# https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns-apply-a-function-o
def race(row):
if row['Q88A'] == 1:
return 'White'
elif row['Q88B'] == 1:
return 'Non-Hispanic Black'
elif row['Q88C'] == 1 or row['Q88D'] == 1 or row['Q88E'] == 1 or row['Q88F'] == 1:
return 'Other'
elif row['Q89'] in [2, 3, 4, 5]:
return 'Hispanic'
def children(row):
if row['Q93_NONE'] == 1:
return 'No kids under 18 in home'
elif row['CHILDSR'] in [1, 2, 3]:
return 'Kids under 18 in home'
def state(row):
if row['STATE'] in ['MD', 'DE', 'VA', 'WV', 'KY', 'TN', 'NC', 'SC', 'FL', 'GA', 'AL', 'MS', 'LA', 'AK', 'TX', 'OK']:
return 'South'
else:
return 'Other Region'
df['Gender'] = df['Q77'].apply(lambda x: 'Female' if x==2 else ('Male' if x==1 else x))
df['Attendance'] = df['Q4']
df['Bible Belief'] = df['Q17'].apply(lambda x: bible_dict[x] if x in bible_dict.keys() else x)
df['Religious Affiliation'] = df['RELTRAD'].map(rel_aff_dict)
df['Political Ideology'] = df['Q31']
df['Age'] = df['AGE']
df['Race'] = df.apply(lambda x: race(x), axis=1)
df['Education Level'] = df['Q90'].map(education_dict)
df['Household Income'] = df['Q95']
df['Marital Status'] = df['Q51A'].map(marital_dict)
df['Children'] = df.apply(lambda x: children(x), axis=1)
df['Place'] = df['Q80'].map(place_dict)
df['Region'] = df.apply(lambda x: state(x), axis=1)
df[['Gender', 'Attendance', 'Bible Belief', 'Religious Affiliation', 'Political Ideology', 'Race', 'Education Level', 'Marital Status', 'Children', 'Place', 'Region']].fillna()
for column in ['Gender', 'Attendance', 'Bible Belief', 'Religious Affiliation', 'Political Ideology', 'Race', 'Education Level', 'Marital Status', 'Children', 'Place', 'Region']:
df[column].fillna(df[column].mode(), inplace=True)
for column in ['Age', 'Household Income']:
df[column]. fillna(df[column].mean(), inplace=True)
df_dummies = pd.get_dummies(df[['Gender', 'Bible Belief', 'Religious Affiliation', 'Race', 'Education Level', 'Marital Status', 'Children', 'Place', 'Region']])
df2[['Supernatural Evil', 'Attendance', 'Biblical Inerrancy',
'Biblical Literalism', 'Mainline Protestant', 'Black Protestant', 'Conservative Protestant',
'Catholic', 'Other Religion', 'Political Ideology', 'Non-Hispanic Black', 'Hispanic', 'Other Race',
'Female', 'Less Than HS', 'Some College', 'College Graduate', 'Postgraduate', 'Age', 'Income', 'Married',
'Children', 'Urban', 'South']] = df.copy()[]