Draw with US College Data

Showcase on what you can do with IPEDS data API. Choropleth with d3. I’ve also tried this on beaker notebook. More details to come!


Now, here are a few cool things you can see with this data

In [4]:
HTML(map_css() + """

""")

Out[4]:

In [2]:
# init
# import requirement
import pandas as pd
import numpy as np
from IPython.display import Javascript, HTML

# visual import mpld3 import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline mpld3.enable_notebook()

# import dependency from dep import * api_key = get_api_key()

In [5]:
# pull data
# save data to local hdf
# pull([2008,2009,2010,2011,2012,2013])
# pull([2013], api_key)

# read data from local hdf yr_2013 = pd.read_hdf(’d.h’, ‘yr_2013’) geo = pd.read_hdf(’d.h’, ‘geo’)

# columns HTML(pd.DataFrame(yr_2013.columns).to_html())

Out[5]:
0
0 2013.academics.program_percentage.agriculture
1 2013.academics.program_percentage.architecture
2 2013.academics.program_percentage.biological
3 2013.academics.program_percentage.business_mar...
4 2013.academics.program_percentage.communication
5 2013.academics.program_percentage.communicatio...
6 2013.academics.program_percentage.computer
7 2013.academics.program_percentage.construction
8 2013.academics.program_percentage.education
9 2013.academics.program_percentage.engineering
10 2013.academics.program_percentage.engineering_...
11 2013.academics.program_percentage.english
12 2013.academics.program_percentage.ethnic_cultu...
13 2013.academics.program_percentage.family_consu...
14 2013.academics.program_percentage.health
15 2013.academics.program_percentage.history
16 2013.academics.program_percentage.humanities
17 2013.academics.program_percentage.language
18 2013.academics.program_percentage.legal
19 2013.academics.program_percentage.library
20 2013.academics.program_percentage.mathematics
21 2013.academics.program_percentage.mechanic_rep...
22 2013.academics.program_percentage.military
23 2013.academics.program_percentage.multidiscipline
24 2013.academics.program_percentage.parks_recrea...
25 2013.academics.program_percentage.personal_cul...
26 2013.academics.program_percentage.philosophy_r...
27 2013.academics.program_percentage.physical_sci...
28 2013.academics.program_percentage.precision_pr...
29 2013.academics.program_percentage.psychology
30 2013.academics.program_percentage.public_admin...
31 2013.academics.program_percentage.resources
32 2013.academics.program_percentage.science_tech...
33 2013.academics.program_percentage.security_law...
34 2013.academics.program_percentage.social_science
35 2013.academics.program_percentage.theology_rel...
36 2013.academics.program_percentage.transportation
37 2013.academics.program_percentage.visual_perfo...
38 2013.admissions.admission_rate.by_ope_id
39 2013.admissions.admission_rate.overall
40 2013.admissions.sat_scores.average.by_ope_id
41 2013.admissions.sat_scores.average.overall
42 2013.completion.completion_rate_4yr_150_asian
43 2013.completion.completion_rate_4yr_150_black
44 2013.completion.completion_rate_4yr_150_hispanic
45 2013.completion.completion_rate_4yr_150_white
46 2013.completion.completion_rate_4yr_150nt
47 2013.completion.completion_rate_4yr_150nt_pooled
48 2013.completion.completion_rate_less_than_4yr_...
49 2013.completion.completion_rate_less_than_4yr_...
50 2013.cost.attendance.academic_year
51 2013.cost.attendance.program_year
52 2013.cost.avg_net_price.private
53 2013.cost.avg_net_price.public
54 2013.cost.tuition.in_state
55 2013.cost.tuition.out_of_state
56 2013.earnings.10_yrs_after_entry.median
57 2013.earnings.6_yrs_after_entry.median
58 2013.earnings.8_yrs_after_entry.median_earnings
59 2013.student.demographics.avg_family_income
60 2013.student.demographics.avg_family_income_in...
61 2013.student.demographics.avg_family_income_in...
62 2013.student.demographics.avg_family_income_log
63 2013.student.demographics.first_generation
64 2013.student.demographics.median_family_income
65 2013.student.demographics.race_ethnicity.asian
66 2013.student.demographics.race_ethnicity.black
67 2013.student.demographics.race_ethnicity.hispanic
68 2013.student.demographics.race_ethnicity.non_r...
69 2013.student.demographics.race_ethnicity.white
70 2013.student.demographics.share_asian.home_ZIP
71 2013.student.demographics.share_bachelors_degr...
72 2013.student.demographics.share_black.home_ZIP
73 2013.student.demographics.share_born_US.home_ZIP
74 2013.student.demographics.share_hispanic.home_ZIP
75 2013.student.demographics.share_professional_d...
76 2013.student.demographics.share_white.home_ZIP
77 2013.student.size
78 id
79 school.carnegie_basic
80 school.carnegie_size_setting
81 school.carnegie_undergrad
82 school.degrees_awarded.predominant
83 school.locale
84 school.name
85 school.online_only
86 school.operating
87 school.ownership
88 school.zip

In [6]:
# select query
d_2013 = yr_2013[[  ‘id’,
                    ‘school.name’, ‘school.operating’, ‘school.locale’, ‘school.ownership’,‘school.zip’,
                    ‘2013.student.size’,
                    ‘2013.cost.avg_net_price.private’, ‘2013.cost.avg_net_price.public’,
                    ‘2013.admissions.admission_rate.by_ope_id’,
                    ‘2013.admissions.admission_rate.overall’,
                    ‘2013.admissions.sat_scores.average.by_ope_id’,
                    ‘2013.admissions.sat_scores.average.overall’,
                    ‘2013.cost.attendance.academic_year’,
                    ‘2013.cost.attendance.program_year’,
                    ‘2013.cost.avg_net_price.public’,
                    ‘2013.cost.tuition.in_state’,
                    ‘2013.cost.tuition.out_of_state’,
                    ‘2013.completion.completion_rate_4yr_150nt’,
                    ‘2013.completion.completion_rate_4yr_150nt_pooled’,
                    ‘2013.completion.completion_rate_less_than_4yr_150nt’,
                    ‘2013.completion.completion_rate_less_than_4yr_150nt_pooled’,
                    ‘2013.completion.completion_rate_4yr_150_asian’,
                    ‘2013.completion.completion_rate_4yr_150_black’,
                    ‘2013.completion.completion_rate_4yr_150_hispanic’,
                    ‘2013.completion.completion_rate_4yr_150_white’,
                    ‘2013.student.demographics.race_ethnicity.asian’,
                    ‘2013.student.demographics.race_ethnicity.black’,
                    ‘2013.student.demographics.race_ethnicity.hispanic’,
                    ‘2013.student.demographics.race_ethnicity.non_resident_alien’,
                    ‘2013.student.demographics.race_ethnicity.white’,
                    ‘2013.earnings.10_yrs_after_entry.median’,
                    ‘2013.earnings.6_yrs_after_entry.median’,
                    ‘2013.earnings.8_yrs_after_entry.median_earnings’
                 ]]

# just looking at ‘operating school’ d_2013 = d_2013[d_2013[‘school.operating’] == 1]

# show # null value of each field HTML(pd.DataFrame(np.sum(d_2013.isnull(), axis=0)/d_2013.shape[0]).to_html())

Out[6]:
0
id 0.000000
school.name 0.000000
school.operating 0.000000
school.locale 0.051587
school.ownership 0.000000
school.zip 0.000000
2013.student.size 0.088944
2013.cost.avg_net_price.private 0.399015
2013.cost.avg_net_price.public 0.739737
2013.admissions.admission_rate.by_ope_id 0.669267
2013.admissions.admission_rate.overall 0.703202
2013.admissions.sat_scores.average.by_ope_id 0.792282
2013.admissions.sat_scores.average.overall 0.806924
2013.cost.attendance.academic_year 0.456076
2013.cost.attendance.program_year 0.682403
2013.cost.avg_net_price.public 0.739737
2013.cost.tuition.in_state 0.419540
2013.cost.tuition.out_of_state 0.446360
2013.completion.completion_rate_4yr_150nt 0.674330
2013.completion.completion_rate_4yr_150nt_pooled 0.671319
2013.completion.completion_rate_less_than_4yr_150nt 0.498495
2013.completion.completion_rate_less_than_4yr_150nt_pooled 0.492200
2013.completion.completion_rate_4yr_150_asian 0.747126
2013.completion.completion_rate_4yr_150_black 0.710044
2013.completion.completion_rate_4yr_150_hispanic 0.709770
2013.completion.completion_rate_4yr_150_white 0.687329
2013.student.demographics.race_ethnicity.asian 0.088944
2013.student.demographics.race_ethnicity.black 0.088944
2013.student.demographics.race_ethnicity.hispanic 0.088944
2013.student.demographics.race_ethnicity.non_resident_alien 0.088944
2013.student.demographics.race_ethnicity.white 0.088944
2013.earnings.10_yrs_after_entry.median 1.000000
2013.earnings.6_yrs_after_entry.median 1.000000
2013.earnings.8_yrs_after_entry.median_earnings 1.000000

In [7]:
# take a look at my school
# print(d_2013[d_2013[‘school.name’].str.contains(‘Syracuse University’, na=False)].transpose())
HTML(d_2013[d_2013[‘school.name’].str.contains(‘Georgia Institute of Technology’, na=False)].transpose().to_html())

Out[7]:
12
id 139755
school.name Georgia Institute of Technology-Main Campus
school.operating 1
school.locale 11
school.ownership 1
school.zip 30332
2013.student.size 13975
2013.cost.avg_net_price.private NaN
2013.cost.avg_net_price.public 11053
2013.admissions.admission_rate.by_ope_id 0.5493
2013.admissions.admission_rate.overall 0.5493
2013.admissions.sat_scores.average.by_ope_id 1352
2013.admissions.sat_scores.average.overall 1352
2013.cost.attendance.academic_year 21980
2013.cost.attendance.program_year NaN
2013.cost.avg_net_price.public 11053
2013.cost.tuition.in_state 10650
2013.cost.tuition.out_of_state 29954
2013.completion.completion_rate_4yr_150nt 0.8219
2013.completion.completion_rate_4yr_150nt_pooled 0.806982
2013.completion.completion_rate_less_than_4yr_150nt NaN
2013.completion.completion_rate_less_than_4yr_150nt_pooled NaN
2013.completion.completion_rate_4yr_150_asian 0.8552
2013.completion.completion_rate_4yr_150_black 0.7521
2013.completion.completion_rate_4yr_150_hispanic 0.7328
2013.completion.completion_rate_4yr_150_white 0.8273
2013.student.demographics.race_ethnicity.asian 0.175
2013.student.demographics.race_ethnicity.black 0.0628
2013.student.demographics.race_ethnicity.hispanic 0.065
2013.student.demographics.race_ethnicity.non_resident_alien 0.1048
2013.student.demographics.race_ethnicity.white 0.553
2013.earnings.10_yrs_after_entry.median None
2013.earnings.6_yrs_after_entry.median None
2013.earnings.8_yrs_after_entry.median_earnings None

Geo Visualization on Student Demographics

In [9]:
Javascript(d3_choropleth_generator(d_2013, geo, ‘2013.student.demographics.race_ethnicity.white’, np.mean, ‘2013 White Student Ratio’, ‘white_student’))
Javascript(d3_choropleth_generator(d_2013, geo, ‘2013.completion.completion_rate_4yr_150nt’, np.mean, ‘2013 Mean Completion Rate’, ‘completion_rate’))

Out[9]:

Now, it would be interesting to see overall completion rate can be correlated with student ratio by different races

In [10]:
f, ax_i = plt.subplots(2,2, figsize=(8,8))

x = d_2013[‘2013.completion.completion_rate_4yr_150nt’]

def draw(ax, x, y): sns.regplot(x=x, y=d_2013[‘2013.student.demographics.race_ethnicity.%s%y], ax=ax) ax.set_xlim(0,1); ax.set_ylim(0,1); ax.set_title(y); ax.set_xlabel(‘Student Completion Rate’); ax.set_ylabel(‘Student Ratio’);

axes = [ax_i[0][0], ax_i[0][1], ax_i[1][0], ax_i[1][1]] races = [‘white’, ‘asian’, ‘black’, ‘hispanic’]

for i in range(4): draw(axes[i], x, races[i])

f.tight_layout();

Python module: https://github.com/l1990790120/l1990790120.github.io/blob/master/_includes/nb/Draw_with_US_College_Data_dep.py

comments powered by Disqus