Psychometric analysis of the 50-item IPIP Big Five Factor Markers questionnaire in R, including item statistics, factor analysis, and reliability estimation.
This project uses the Big Five Personality Test dataset from Kaggle (~1M responses to the 50-item IPIP Big Five questionnaire).
- Install R (>= 4.0) from r-project.org
- Download the data from Kaggle and place
data-final.csvinto thesrc_data/folder - Install packages by running
setup.R:source("setup.R")
Alternatively, download the data via the Kaggle CLI:
kaggle datasets download -d tunguz/big-five-personality-test -p src_data/ --unzipThe scripts are meant to be run in order. Each script sources its dependencies automatically.
| Script | Content |
|---|---|
01_data_preparation.R |
Load data, define item metadata, recode negatively keyed items, compute scale scores |
02_item_analysis.R |
Item descriptives, item histograms, corrected item-total correlations, difficulty-discrimination plots |
03_scale_analysis.R |
Dimension descriptives & histograms, scale intercorrelations, exploratory factor analysis (heatmap, network), parallel analysis, reliability |
| Code | Dimension | Items |
|---|---|---|
| EXT | Extraversion | EXT1 -- EXT10 |
| EST | Emotional Stability | EST1 -- EST10 |
| AGR | Agreeableness | AGR1 -- AGR10 |
| CSN | Conscientiousness | CSN1 -- CSN10 |
| OPN | Openness | OPN1 -- OPN10 |
Negatively keyed items are recoded in 01_data_preparation.R.
tidyr, dplyr, moments, ggplot2, ggrepel, scales, Hmisc, psych, nFactors, qgraph
Install all at once via setup.R or manually:
install.packages(c("tidyr", "dplyr", "moments", "ggplot2", "ggrepel",
"scales", "Hmisc", "psych", "nFactors", "qgraph"))The IPIP items are in the public domain (ipip.ori.org). The Kaggle dataset was collected by Open Psychometrics.




