University of California Scorecard

Matt Gabor


     With over 4000 degree granting universities in the United States, it can be difficult for students to determine which college is the best fit. Fortunately, there is a vast amount of data available that can help make the choice a little less daunting. In the following data story, I examine what I believe to be the most important factors in the college decision making process. As a student of the University of California, Davis, I decided to focus on the UC system for this analysis, although these techniques could be applied to any set of institutions.

Question 1:

How do the UC Schools compare with other universities in the US? Is there a correlation between median earnings after graduation and college acceptance rates?

Scatter plot of college acceptance rates vs. median earnings of students working 10 years after graduating throughout US universities.
      Upon initial inspection, the college scorecard dataset seemed overwhelming and confusing. It contains thousands of attributes with minimal explanations, so I wanted to keep my first graph simple, using data I knew was important in college selection processes. I also knew that studying the University of California data would be intriguing, since I am attending one of the UC schools and I am familiar with most of the other institutions. I chose to look at the correlation between acceptance rates and median earnings because these were important factors that influenced my college application process, and many students attend college for expansion of employment opportunities.
     It turns out that there is a very strong correlation between these two metrics, evidenced by the trend in the plot. Once I realized I could compare the UC schools with other US colleges based on these metrics, I had to decide which chart would display this data most effectively. I chose a scatter plot because it allowed me to clearly express the given relationship. After constructing the chart, I realized that the UC schools tend to have low acceptance rates and high post-college earnings. However, there are three clusters that the UCs fall into on this scale, so I wanted to dive deeper into the differences between the campuses.

Question 2:

Which majors do each of the UCs specialize in?

Pie charts showing percentage of degrees awarded by major at each UC campus
      In addition to acceptance rate and earnings, major distribution is an important factor in the college selection process. College majors with a large percentage of degrees awarded generally correlate to a strong program in that particular field. A student deciding which campus to attend could use this visualization to determine where their interests are most represented.
      I chose pie charts to represent this data because they are an effective way to show proportions and compare categories. For example, if a student was considering a career in Business/Marketing they may want to look at UC Riverside. Similarly, if a student wanted to compare the size of UC Davis and UC Santa Barbara's Biological Sciences departments, they could quickly examine the amount of purple in the respected college's pie chart. After constructing this visualization, I decided to look into another differentiator between the UC campuses, student debt.

Question 3: How has student debt changed over time at each UC campus?

Line chart comparing Median Debt after Graduation amongst the UC Campuses over the past 20 years
     Since the college scorecard dataset also contains historical data from the past two decades, I started to look for trends to draw insight. One of the data points that was consistently recorded for the UC campuses was median debt after graduation. One of the most interesting insights I gained from this chart was the drastic changes in relative debt for each campus.
     UC Berkeley, for example, had the highest median debt at $11,000 when the data was first collected in 1996. In 2015, the median debt has only risen slightly to $13,597, with a sharp increase between 2008-2010 (most likely due to the recession). UC Santa Cruz, on the other hand had the lowest median debt amongst the campuses in 1996 at $9,055, but was at an astonishing $16,500 median in 2015. The line chart is an effective tool for showing how these values change over time, with each line representing one of the UC campuses.


     Through this exploration, I learned some interesting information about the UC campuses. Firstly, on the whole, the UC schools foster high quality education and produce some of the highest earning graduates in the US. While they all share the same prefix, the 9 campuses differ drastically in their major offerings, class size, acceptance rates, student debt, and more. Although limited, these visualizations highlight some of these differences and show how unique each campus really is.
     Examining and charting the college scorecard data was both frustrating and inspiring. Due to my limited experience in data manipulation and web development, I struggled to find meaning in the data, and more importantly figure out how to show that meaning in an effective way. However, I found it helpful to narrow in on a subset of the data and perform my analysis in a concentrated space. As a side note, readers may notice that some colleges are missing from the dataset, including one of the UC campuses. This is due to privacy restrictions on university data at certain campuses, and more information can be found on the college scorecard's website linked below.