Quantitative Analysis
Measuring Inclusive Teaching Signals Using LLM-Assisted Syllabus Analysis
Summary
I led a quantitative research project examining how instructors communicate inclusive teaching practices through their course syllabi. Using an 88-item evidence-based inventory and a large-language-model (LLM) analysis pipeline, I quantified inclusion signals across 30 syllabi and linked them to students’ perceptions of instructor's considerateness. Specifically, I
Developed an automated LLM-based coding pipeline (few-shot + chain-of-thought prompting) to classify 88 inclusive practices with good reliability (κ = 0.617).
Engineered an Inclusive Practice Score (IPS) and ran statistical modeling (regression, PCA, sentiment analysis, text readability).
Found a significant relationship: more inclusive syllabi → higher perceived instructor considerateness (p < .01).
Created visualizations (Python Matplotlib, Seaborn) to surface actionable insights across 8 pedagogical dimensions.
Translating this into UX value: demonstrated a scalable, objective method to assess complex human-centered content using LLMs + quantitative analysis.
The analysis revealed strong engagement and transparency signals across disciplines, but also consistent blind spots in belonging cues and diversity representation—areas shown to matter for student success. Insights from the project informed practical improvement recommendations and showed how AI can support equitable teaching practices at scale.
The Problem
Although inclusive teaching has been emphasized for years, institutions still lack reliable and scalable ways to measure whether instructors actually implement inclusive practices. Most instructors self-report that they teach inclusively, but:
inclusive teaching practices are often invisible in classroom materials,
existing rubrics require manual review, which does not scale,
and there is no quantitative method to analyze inclusion across large numbers of courses.
As a result, departments cannot easily compare teaching practices, track improvement, or identify which inclusive strategies meaningfully shape student experience.
My project addresses this long-standing gap by creating an automated, LLM-driven approach to quantify inclusive practices (a wicked problem) in syllabi using the Inclusive Syllabi Inventory (ISI). This allows us to examine real patterns in how instructors communicate inclusion—and to test whether inclusive language actually influences how students perceive their instructor.
For this project, the problem becomes:
What patterns of inclusive teaching practices emerge across inclusive instructors?
How inclusivity shapes perceived instructor impression?
Goal
Build and validate a scalable, semi-automated pipeline that uses LLMs to:
Identify inclusive teaching practices embedded in syllabus text.
Quantify those practices across eight research-based pedagogical dimensions (e.g., Engagement Opportunities, Growth Mindset, Cultivating Belonging).
Visualize patterns and gaps in a way that surfaces improvement opportunities.
Assess downstream impact—whether more inclusive syllabi are associated with more positive perceptions of the instructor.
Research Process
Data Collection
30 course syllabi volunteered by instructors across 8 disciplines.
Each document treated as a "product touchpoint" conveying expectations, tone, and support resources.
Coding Framework
Adapted the Inclusive Syllabi Inventory (ISI) into 88 binary yes/no items.
Items mapped to 8 behavioral dimensions aligned with instructional psychology (e.g., transparency, resources, exploration).
Constructed a synthetic Inclusive Practice Score (IPS) per syllabus.
LLM-Based Analysis Pipeline
Used ChatGPT-4o-mini for yes/no classification via:
Few-shot prompting
Chain-of-thought reasoning
Prompt optimization to increase classification accuracy
Human–LLM inter-rater reliability: κ = 0.617 (good for exploratory quant research).
Text Quality & Sentiment Analysis
Computed additional document-level metrics:
Readability indicators
Flesch Reading Ease
SMOG Grade Level
Sentiment & tone measures
VADER
TextBlob polarity & subjectivity
Statistical Modeling
Ran multiple regressions to test whether inclusive practices predict positive instructor impressions.
Instructor impressions were LLM-rated (Accessible, Generous, Lenient, Tries-to-Engage, Serious).
PCA + Cronbach’s alpha used to derive a latent Considerateness metric.
Visualization (exploratory data storytelling)
Built visualizations (with Python Seaborn and Matplotlib):
Category-level bar charts
Full-hierarchy sunburst charts
Simplified per-category views
Designed to help users (instructors) interpret signals and pinpoint gaps.
Findings & Insights
Inclusive Practices Vary Widely Across Courses
Across 30 syllabi, Inclusive Practice Scores (IPS) ranged from ~32% to ~78%.
This means most instructors demonstrate some inclusive behaviors, but the depth and consistency vary by individual, not discipline.
Implication:
Inclusion-supportive signals are not embedded systematically; instructors need clearer frameworks and support.
Some Inclusion Dimensions Are Strongly Represented; Others Are Missing
High-frequency dimensions:
Engagement Opportunities
Exploration
Requirement Transparency
Growth Mindset signals
Low-frequency dimension:
Diversity Scholarship (almost entirely absent)
Implication:
Inclusion “hot spots” and “blind spots” exist. Any toolkit or intervention should prioritize the gaps that matter most for students (belonging, representation).
More Inclusive Syllabi Are Perceived as More “Considerate”
Regression showed IPS significantly predicts Considerateness ratings (LLM-coded):
A 10-point increase in IPS → ~0.19 increase in Considerateness (p < .002)
Other predictors (word count, SMOG, Flesch score) were weaker or inconsistent.
Implication for design:
Inclusive language and structure actually shape early student impressions of the instructor—before class even begins.
Inclusion Isn’t Discipline-Dependent
STEM vs. Humanities differences were small.
High-inclusivity and low-inclusivity syllabi existed in all fields.
Implication:
Interventions can be standardized across departments rather than discipline-specific.
Outcome & Impact
Provided a Scalable Way to Audit Course Inclusivity
Manual review of 30 syllabi would require 40–60+ hours.
The LLM-based pipeline produced quant findings in minutes, enabling curriculum leaders to audit large departments at scale.
Demonstrated that Inclusive Practices Linked to Perceived Instructor Considerateness
This identifies an actionable behavioral outcome:
→ More inclusive syllabi → more considerate instructor → better student engagement.
This creates a business case for adoption.
Produced Evidenced-based Recommendations for Instructor Professional Development
Insights highlighted three high-impact improvements:
Increase belonging cues
Reduce reading complexity
Add diversity scholarship signals
These recommendations can inform faculty development programs or AI-supported instructor support tools.
Created a Framework That Can Generalize to Other Educational Content
The approach can extend to:
assignment sheets
discussion prompts
LMS pages
lecture slides
This opens a new direction for data-driven teaching analytics.
What I learned
Through this project, I learned how to take an ambiguous, human-centered problem—“how do we measure inclusive teaching?”—and translate it into a rigorous, scalable analytical pipeline. I deepened my ability to design measurable constructs, engineer features from unstructured text, and validate automated coding using inter-rater reliability.
I strengthened my quantitative toolkit by applying exploratory data analysis, readability metrics, sentiment analysis, PCA, and regression modeling to uncover meaningful behavioral patterns.
Most importantly, I learned how to turn complex statistical findings into clear, actionable insights for nontechnical stakeholders, and how to prototype an AI-supported evaluation tool that demonstrates both technical execution and product thinking.
