Welcome, AI Assistant! This guide will help you understand and work with the Parxy codebase effectively.
Parxy is a document processing gateway that provides:
- Unified text extraction interface across multiple services
- Consistent document model for all processing results
- Easy integration of new document processing services
- Support for both local and remote processing options
-
Driver Architecture
- Each text extraction service is implemented as a driver
- All drivers inherit from
Driverbase class - Configuration managed through environment variables via Pydantic models defined in
src/parxy_core/models/config.py - Common interface regardless of underlying service
-
Document Model
- Hierarchical structure:
page → block → line → span → character - Each level provides increasing text granularity
- Drivers declare supported extraction levels
- Results normalized to this structure regardless of source
- Hierarchical structure:
-
Core Library (
parxy_core/)drivers/: Service implementationsfacade/: Public API (Parxyclass)models/: Data structures and configexceptions/: Error handlinglogging/: Debug support
-
CLI Tool (
parxy_cli/)- Document processing commands
- Configuration management
- Docker environment setup
drivers/abstract_driver.py: Base driver interfacemodels/config.py: Configuration systemfacade/parxy.py: Main public APIdrivers/factory.py: Driver instantiation logic
-
Driver Implementation
class NewDriver(Driver): supported_levels = ["page", "block"] def _initialize_driver(self): # Setup code def _handle(self, file, level="block", **kwargs): # Processing logic return Document(...)
-
Configuration
class NewDriverConfig(BaseConfig): api_key: SecretStr base_url: str = "https://api.example.com" model_config = SettingsConfigDict( env_prefix="parxy_newdriver_" )
- Prefer Pathlib over os.path
Always use appropriate exceptions:
FileNotFoundException: Missing or inaccessible filesAuthenticationException: API auth issuesParsingException: Processing errorsUnsupportedFormatException: Invalid file types
-
Driver tests should:
- Verify supported extraction levels
- Test authentication handling
- Check error conditions
- Validate output structure
- Use fixtures in
tests/fixtures/
-
Integration tests should:
- Test full processing pipeline
- Verify configuration loading
- Check driver factory patterns
- Validate CLI functionality
-
Local Development
uv sync --all-extras
-
Configuration
- Copy
.env.exampleto.env - Configure required services
- Set logging as needed
- Copy
-
Check
docs/directory:howto/: Implementation guidestutorials/: Usage examples
-
Review test files for:
- Usage patterns
- Edge cases
- Configuration examples