Data analysis case study

County Level Access to Credit in the United States

An exploratory data science project using Opportunity Insights data to study how credit scores, credit card balances, and delinquency rates vary across county-race subgroups.

View Findings Open Writeup Open Notebook

Scroll for project details

Credit score and delinquency showed the clearest negative association. Python, pandas, matplotlib, seaborn

3 Opportunity Insights datasets joined for county-level analysis

3,225+ county identifiers represented in the source CSVs

6 race group labels used to compare subgroup-level patterns

92% accuracy from the harder delinquency classification model

Overview

County-level credit outcomes, modeled across geography and race.

The project asks whether county-level credit outcomes reveal meaningful patterns across geography, income background, and race.

Methods

From raw Opportunity Insights files to interpretable models.

Cleaned and joined data

Averaged and combined credit score, balance, and delinquency CSVs into county-level analysis tables.

Summarized distributions

Compared means, medians, ranges, and IQRs to understand spread and skew across counties.

Mapped associations

Used scatter plots and correlations to evaluate relationships across score, balance, and delinquency.

Built classifiers

Created decision tree models to classify high versus low delinquency and tested a harder predictor set.

Evidence

Visual evidence linking credit scores, delinquency, balances, and subgroup patterns.

Credit score versus delinquency scatter plot

Credit score and delinquency had the strongest relationship.

The main visual shows a clear downward slope: higher average credit score is associated with lower average delinquency rate.

Credit balance versus credit score scatter plot

Balance and score moved together.

The association was positive, suggesting higher credit balances often appeared alongside higher average credit scores.

Credit balance versus delinquency scatter plot

Balance and delinquency were weaker but still patterned.

The relationship was less linear, but the direction helped motivate the later classification model.

Small decision tree using delinquency rate as a direct split

The first tree was intentionally simple.

This baseline model split directly on delinquency rate, making it useful for explaining the mechanics of a decision tree but less useful as a real predictor.

More detailed decision tree using credit score, credit balance, and county

The second tree made the task more realistic.

After removing raw delinquency from the inputs, the model had to infer high or low delinquency from credit score, credit balance, and county features.

Delinquency was tightly linked to score.

Across county-race subgroups, higher delinquency was strongly associated with lower credit scores.

Race added context to the scatter plots.

Subgroup coloring revealed visible clustering patterns that could motivate deeper work on systemic credit access disparities.

The model still worked without the target leak.

After removing raw delinquency from the predictors, the classifier reached 92% accuracy using score, balance, and county features.

Artifacts

Source materials and reproducible analysis artifacts.

PDF Final project writeup Research framing, methods, findings, and reflection Notebook Analysis code pandas workflow, charts, and decision tree model CSV Credit score data County-race subgroup score observations CSV Delinquency data County-race subgroup delinquency observations

Reflection

What this project taught me about data, credit access, and model design.

The analysis showed how financial outcomes can look different once data is grouped by place and race. It also reinforced an important modeling lesson: a high accuracy score only matters when the features are chosen carefully and the model is not simply learning the answer directly.