A framework and toolkit that lets anyone take a persona and produce its full SCOPE counterpart by adding sociopsychological facets from a small set of basic inputs.
This repository prioritizes persona augmentation: given a persona (e.g., a JSONL record with basic demographic and background details), the pipeline generates a structured SCOPE profile with responses to a 141-item sociopsychological questionnaire. The result is a richer, standardized persona representation that can be used for user simulation, evaluation, and bias analysis.
SCOPE is a human-grounded framework for constructing and evaluating synthetic personas. Unlike demographic-only or summary-based personas, SCOPE models personas as multidimensional sociopsychological profiles spanning eight behavioral facets:
- Demographic Information - Basic background and identity attributes
- Sociodemographic Behavior - Digital habits, civic engagement, lifestyle
- Personal Values & Motivations - Core beliefs and life goals
- Personality Traits (Big Five) - Extraversion, agreeableness, conscientiousness, neuroticism, openness
- Behavioral Patterns & Preferences - Daily habits and social tendencies
- Personal Identity & Life Narratives - Life stories and self-perception
- Professional Identity & Career - Work experiences and career aspirations
- Creativity & Innovation - Creative expression and innovative thinking
Our research demonstrates that:
- Demographics alone are insufficient: Demographic similarity explains only ~1.5% of variance in human response similarity
- Sociopsychological grounding improves alignment: Adding values, identity, and personality facets improves behavioral prediction
- Non-demographic personas reduce bias: Personas based on values and identity alone achieve strong alignment with substantially lower demographic bias accentuation
scope-personas/
├── src/
│ ├── augment_persona.py # Generic persona augmentation pipeline
│ ├── process_nemotron.py # Process NVIDIA Nemotron personas
│ └── utils.py # Shared utilities
├── data/
│ ├── questionnaire.md # 141-item sociopsychological protocol
│ └── facet_definitions.json # Facet structure definitions
├── examples/
│ ├── sample_personas.jsonl # Example input personas
│ └── sample_output.jsonl # Example augmented output
├── requirements.txt
└── README.md
# Clone the repository
git clone https://github.com/your-org/scope-personas.git
cd scope-personas
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtSet your LLM API credentials as environment variables:
# For OpenAI
export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1" # Optional: custom endpoint
# Or for other providers (Azure, local models, etc.)
export LLM_API_KEY="your-api-key"
export LLM_BASE_URL="your-endpoint-url"Augment any persona with SCOPE questionnaire responses:
python src/augment_persona.py \
--input examples/sample_personas.jsonl \
--output augmented_personas.jsonl \
--model gpt-4o \
--facets allInput Format (JSONL):
{"id": "persona_001", "name": "Alex", "age": 28, "occupation": "Software Engineer", "background": "..."}
{"id": "persona_002", "name": "Maria", "age": 45, "occupation": "Teacher", "background": "..."}Output Format (JSONL):
{
"id": "persona_001",
"original_persona": {...},
"facet_responses": {
"demographic_information": {"Q1": "25 - 29", "Q2": "Male", ...},
"personal_values": {"Q51": "4", "Q52": "2", ...},
...
}
}Process personas from the NVIDIA Nemotron-Personas-USA dataset:
python src/process_nemotron.py \
--output nemotron_augmented.jsonl \
--limit 100 \
--model gpt-4o \
--batch-size 50| Option | Description | Default |
|---|---|---|
--input |
Input JSONL file with personas | Required |
--output |
Output JSONL file path | augmented_personas.jsonl |
--model |
LLM model to use | gpt-4o |
--facets |
Facets to generate (comma-separated or 'all') | all |
--temperature |
LLM temperature | 0.3 |
--batch-size |
Concurrent requests per batch | 50 |
--limit |
Max personas to process | None (all) |
Use --facets to specify which facets to generate:
demographics- Demographic Information (Q1-Q13)sociodemographic- Sociodemographic Behavior (Q14-Q50)values- Personal Values & Motivations (Q51-Q72)personality- Personality Traits/Big Five (Q73-Q102)behavioral- Behavioral Patterns & Preferences (Q103-Q115)identity- Personal Identity & Life Narratives (Q116-Q125)professional- Professional Identity & Career (Q126-Q135)creativity- Creativity & Innovation (Q136-Q141)all- All facets
Example:
python src/augment_persona.py \
--input personas.jsonl \
--output output.jsonl \
--facets values,personality,identityThe SCOPE questionnaire consists of 141 items across 8 facets, collected from a two-hour sociopsychological protocol administered to 124 U.S.-based participants.
| Type | Count | Format |
|---|---|---|
| Opinion Scales (1-5) | 51 | Numeric response |
| Dropdown Scales | 37 | Select from options |
| Multi-format (Text) | 30 | Free-form narrative |
| Multiple Choice | 20 | Select option(s) |
| Short Text | 3 | Brief text response |
See data/questionnaire.md for the complete protocol.
SCOPE introduces evaluation metrics for structural fidelity:
- Correlation-based Alignment (r): Pearson correlation between persona responses and human responses
- Exact-Match Accuracy: Percentage of responses matching human answers
- Bias Accentuation: Difference between AI demographic correlation and human baseline
- Bias Percentage: Relative change in demographic influence
from src.augment_persona import PersonaAugmenter
# Initialize augmenter
augmenter = PersonaAugmenter(
model="gpt-4o",
api_key="your-api-key",
base_url="https://api.openai.com/v1" # Optional
)
# Augment a single persona
persona = {
"id": "p001",
"name": "Alex",
"age": 28,
"occupation": "Software Engineer",
"background": "Tech enthusiast from Seattle..."
}
result = await augmenter.augment_persona(
persona=persona,
facets=["values", "personality", "identity"]
)
print(result["facet_responses"])The Need for a Socially-Grounded Persona Framework for User Simulation Pranav Narayanan Venkit, Yu Li, Yada Pruksachatkun, Chien-Sheng Wu Salesforce Research, Palo Alto, CA, USA arXiv: https://arxiv.org/abs/2601.07110
We publish the generated dataset artifacts at:
https://huggingface.co/datasets/Salesforce/SCOPE-Persona/
Two dataset configurations are provided:
persona_summaries: First-person facet summaries plus sociodemographic profile data. Each facet is stored as its own top-level column for clean display in Hugging Face.scope_qa: Structured question-answer pairs for each SCOPE question, grouped by facet.
If you use SCOPE in your research, please cite:
@article{venkit2025scope,
title={The Need for a Socially-Grounded Persona Framework for User Simulation},
author={Venkit, Pranav Narayanan and Li, Yu and Pruksachatkun, Yada and Wu, Chien-Sheng},
journal={arXiv preprint arXiv:2601.07110},
year={2025}
}This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details. This dataset should not also be used to develop models that compete with OpenAI and is only released for research purposes.
This work was conducted at Salesforce Research. We thank all participants who contributed to the study.
