Introduction: Large language models (LLMs) offer a promising approach to infer personality traits unobtrusively from digital footprints. However, the reliability and validity of these inferences remain underexplored. Method: Gemini 1.5 Pro and GPT-4o were used to infer Big Five traits from 2 years of Facebook posts by 1214 Italian users. Predictions were compared to self-reports on the Ten-Item Personality Inventory. Results: LLM predictions underestimated Agreeableness and Conscientiousness, overestimated Extraversion, while Neuroticism and Openness closely aligned with self-report means. On repeated prompting, Gemini 1.5 Pro inferences showed less variability than GPT-4o, with both models achieving excellent reliability when aggregating inferences. Temporal stability was highest when combining predictions across LLMs, with test–retest correlations over 2 years ranging from 0.44 for Conscientiousness to 0.60 for Openness. Cross-LLM agreement was highest when combining inferences from multiple time points, with correlations ranging from 0.58 for Neuroticism to 0.83 for Extraversion. Correlations with self-reports were modest, reaching 0.27 for Extraversion, 0.24 for Agreeableness, 0.23 for Conscientiousness, 0.18 for Neuroticism, and 0.31 for Openness when combining LLM inferences across LLMs and time points. Conclusion: These findings advance understanding of LLMs' potential for personality inference, highlighting the importance of aggregating inferences to enhance the reliability and validity of such assessments.
Inferring Personality From Social Media Activity Using Large Language Models: Cross‐Model Agreement, Temporal Stability, and Convergent Validity With Self‐Reports
Marengo, Davide
First
;Settanni, MicheleLast
2025-01-01
Abstract
Introduction: Large language models (LLMs) offer a promising approach to infer personality traits unobtrusively from digital footprints. However, the reliability and validity of these inferences remain underexplored. Method: Gemini 1.5 Pro and GPT-4o were used to infer Big Five traits from 2 years of Facebook posts by 1214 Italian users. Predictions were compared to self-reports on the Ten-Item Personality Inventory. Results: LLM predictions underestimated Agreeableness and Conscientiousness, overestimated Extraversion, while Neuroticism and Openness closely aligned with self-report means. On repeated prompting, Gemini 1.5 Pro inferences showed less variability than GPT-4o, with both models achieving excellent reliability when aggregating inferences. Temporal stability was highest when combining predictions across LLMs, with test–retest correlations over 2 years ranging from 0.44 for Conscientiousness to 0.60 for Openness. Cross-LLM agreement was highest when combining inferences from multiple time points, with correlations ranging from 0.58 for Neuroticism to 0.83 for Extraversion. Correlations with self-reports were modest, reaching 0.27 for Extraversion, 0.24 for Agreeableness, 0.23 for Conscientiousness, 0.18 for Neuroticism, and 0.31 for Openness when combining LLM inferences across LLMs and time points. Conclusion: These findings advance understanding of LLMs' potential for personality inference, highlighting the importance of aggregating inferences to enhance the reliability and validity of such assessments.| File | Dimensione | Formato | |
|---|---|---|---|
|
Journal of Personality - 2025 - Marengo - Inferring Personality From Social Media Activity Using Large Language Models .pdf
Accesso aperto
Tipo di file:
PDF EDITORIALE
Dimensione
805.19 kB
Formato
Adobe PDF
|
805.19 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



