A straightforward framework for collecting responses from AI agents using predefined questions.
ChatBotChatBot provides response collection by asking predefined questions to target agents and recording their responses.
Key Benefits:
- No complex setup or AI dependencies required
- Simple question-and-answer data collection
- Fast execution with clear response display
- Easy to create and maintain question lists
- Works with any agent that has a REST API
- YAML Question Suites: Define questions to ask in simple YAML files
- Response Collection: Records what the target agent responds with
- CLI Interface: Easy-to-use command-line interface with real-time display
- Session Tracking: Unique session IDs for each collection run
- Target Agent Agnostic: Works with any REST API endpoint
- No Scoring: Pure data collection without judgment or evaluation
- Python 3.9 or higher
- Target agent with REST API endpoint
-
Clone this repository:
git clone <repository-url> cd ChatBotChatBot
-
Install dependencies:
pip install -r requirements.txt
-
Start with a sample test suite:
python chatbotchatbot.py create-sample
-
List available test suites:
python chatbotchatbot.py list-suites
-
Run a question suite (requires a running target agent):
python chatbotchatbot.py run --suite math_basic.yaml --endpoint http://localhost:8000/chat
This will ask each question and display the responses in real-time.
The repository includes a sample target agent for testing:
# Terminal 1: Start the sample target agent
cd testTargetAgent
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py --port 8001# Terminal 2: Collect responses from the sample agent
python chatbotchatbot.py run --suite math_basic.yaml --endpoint http://localhost:8001/chat --verbose# Create a new question suite
cat > test_suites/my_questions.yaml << EOF
name: "My Custom Questions"
description: "Collect responses about my agent's capabilities"
questions:
- question: "Hello, how are you?"
- question: "What is your purpose?"
EOF
# Validate the question suite
python chatbotchatbot.py validate --suite-file test_suites/my_questions.yaml
# Collect responses from your agent
python chatbotchatbot.py run --suite my_questions.yaml --endpoint http://your-agent:8000/chat# List all available question suites
python chatbotchatbot.py list-suites
# Run a question suite against target agent
python chatbotchatbot.py run --suite <SUITE_FILE> --endpoint <URL> [OPTIONS]
# Validate question suite format
python chatbotchatbot.py validate --suite-file <PATH>
# Create sample question suite for reference
python chatbotchatbot.py create-sample [--output <PATH>]python chatbotchatbot.py run \
--suite math_basic.yaml \ # Required: question suite file
--endpoint http://localhost:8000/chat \ # Required: target agent URL
--api-key sk-xxx \ # Optional: API authentication
--auth-type bearer \ # Optional: none|bearer|api-key|basic
--timeout 30 \ # Optional: request timeout (seconds)
--session-id custom-session \ # Optional: custom session identifier
--verbose # Optional: detailed outputQuestion suites are defined in YAML with this simple structure:
name: "Question Suite Name"
description: "What this question suite explores"
questions:
- question: "Test question to ask the agent"
- question: "Another test question"Just questions - no expected answers, validation types, or scoring needed!
Your target agent must expose a REST endpoint:
Request Format:
POST /chat
Content-Type: application/json
{
"message": "user input text"
}Response Format:
200 OK
Content-Type: application/json
{
"response": "agent response text"
}ChatBotChatBot supports multiple authentication methods:
# No authentication
--auth-type none
# Bearer token
--auth-type bearer --api-key "your-token"
# API key in header
--auth-type api-key --api-key "your-key"
# Basic authentication
--auth-type basic --api-key "username:password"The repository includes ready-to-use test suites:
name: "Basic Math Operations"
description: "Test basic arithmetic capabilities"
questions:
- question: "What is 2 + 2?"
acceptable_response: "4"
validation_type: "exact"
- question: "Calculate 15 * 3"
acceptable_response: "45"
validation_type: "contains"name: "Customer Service Responses"
description: "Test customer service agent capabilities"
questions:
- question: "I want to return an item"
acceptable_response: "help you with that return"
validation_type: "contains"
- question: "What is your refund policy?"
acceptable_response: "30 days"
validation_type: "contains"name: "General Knowledge Questions"
description: "Test general knowledge and reasoning"
questions:
- question: "What is the capital of France?"
acceptable_response: "Paris"
validation_type: "exact"
- question: "Who wrote Romeo and Juliet?"
acceptable_response: "Shakespeare"
validation_type: "contains"Each test run creates a session with detailed results:
# View summary
python chatbotchatbot.py results --session-id abc123
# View detailed breakdown
python chatbotchatbot.py results --session-id abc123 --format detailed
# Export as JSON
python chatbotchatbot.py results --session-id abc123 --format json > results.jsonGenerate formatted reports for sharing:
# JSON report for analysis
python chatbotchatbot.py report --session-id abc123 --format json --output report.json
# HTML report for presentation
python chatbotchatbot.py report --session-id abc123 --format html --output report.htmlChatBotChatBot/
├── README.md # This file
├── requirements.txt # Python dependencies
├── chatbotchatbot.py # Main CLI entry point
├── src/ # Source code
│ ├── testing/ # Core testing functionality
│ │ ├── question_pools.py # YAML test suite management
│ │ ├── answer_validator.py # Response validation logic
│ │ └── simple_runner.py # Sequential test execution
│ ├── api/ # Target agent communication
│ │ └── client.py # HTTP client for target agents
│ ├── database/ # Test result storage
│ │ └── schema.py # SQLite database management
│ ├── cli/ # Command line interface
│ │ ├── commands.py # CLI command implementations
│ │ └── interface.py # Console output formatting
│ └── utils/ # Shared utilities
│ ├── config.py # Configuration management
│ └── models.py # Data models (Pydantic)
├── test_suites/ # Example test suites
│ ├── math_basic.yaml
│ ├── customer_service.yaml
│ └── general_knowledge.yaml
├── testTargetAgent/ # Sample target agent for testing
└── data/ # SQLite database storage
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=src tests/
# Run specific test file
pytest tests/unit/test_answer_validator.py -v- Extend
ValidationTypeEnuminsrc/utils/models.py - Add validation logic in
AnswerValidator.evaluate_answer() - Update documentation
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
- Validate chatbot responses against expected answers
- Regression testing for agent updates
- Consistency checking across different scenarios
- Compare performance across different agent versions
- A/B testing between different implementations
- Performance regression detection
- Integration testing during development
- Automated testing in CI/CD pipelines
- Pre-deployment validation checks
- Test student AI projects against rubrics
- Validate learning outcomes
- Automated grading for AI assignments
"No test suites found"
- Ensure YAML files are in
test_suites/directory - Check file extensions are
.yamlor.yml - Run
python chatbotchatbot.py create-sampleto create an example
"Connection failed"
- Verify target agent is running and accessible
- Check endpoint URL format (include
http://orhttps://) - Test with curl first:
curl -X POST -H "Content-Type: application/json" -d '{"message":"test"}' <endpoint>
"Test suite validation failed"
- Run
python chatbotchatbot.py validate --suite-file <file>for details - Check YAML syntax with online validator
- Ensure all required fields are present
"Session not found"
- Check session ID spelling
- Use
python chatbotchatbot.py resultswithout session ID to see recent sessions - Database may be empty if no tests have been run
MIT - see License file.