Inferring Personality From Social Media Activity Using Large Language Models: Cross‐Model Agreement, Temporal Stability, and Convergent Validity With Self‐Reports

Marengo, Davide; Montag, Christian; Settanni, Michele

doi:10.1111/jopy.70019

Introduction: Large language models (LLMs) offer a promising approach to infer personality traits unobtrusively from digital footprints. However, the reliability and validity of these inferences remain underexplored. Method: Gemini 1.5 Pro and GPT-4o were used to infer Big Five traits from 2 years of Facebook posts by 1214 Italian users. Predictions were compared to self-reports on the Ten-Item Personality Inventory. Results: LLM predictions underestimated Agreeableness and Conscientiousness, overestimated Extraversion, while Neuroticism and Openness closely aligned with self-report means. On repeated prompting, Gemini 1.5 Pro inferences showed less variability than GPT-4o, with both models achieving excellent reliability when aggregating inferences. Temporal stability was highest when combining predictions across LLMs, with test–retest correlations over 2 years ranging from 0.44 for Conscientiousness to 0.60 for Openness. Cross-LLM agreement was highest when combining inferences from multiple time points, with correlations ranging from 0.58 for Neuroticism to 0.83 for Extraversion. Correlations with self-reports were modest, reaching 0.27 for Extraversion, 0.24 for Agreeableness, 0.23 for Conscientiousness, 0.18 for Neuroticism, and 0.31 for Openness when combining LLM inferences across LLMs and time points. Conclusion: These findings advance understanding of LLMs' potential for personality inference, highlighting the importance of aggregating inferences to enhance the reliability and validity of such assessments.