Wiring scrapers and dependencies¶
AIOScraper uses dependency injection to provide shared resources (database clients, API clients, configs, services) to your callbacks, errbacks, pipelines, and middlewares. Dependencies are resolved automatically by parameter name and type hints.
How it works¶
Register dependencies via
add_dependencies(**kwargs)- typically in a lifespan context managerRequest dependencies in your callbacks/pipelines via parameter names
aioscraper injects them automatically when calling your functions based on parameter name or type
This makes testing easy (mock dependencies) and keeps your code decoupled from resource management.
Scrapers¶
__call__: Add one or more async scraper callables (entry points). Returns the first callable so you can use it as a decorator.
Dependencies¶
add_dependencies: Register objects (clients, configs, services) that become injectable into callbacks, errbacks, pipelines, and middlewares via type hints.
Example¶
from dataclasses import dataclass
from aioscraper import AIOScraper, Request, SendRequest
scraper = AIOScraper()
@dataclass
class Config:
github_token: str
api_base_url: str
class MetricsClient:
"""Send metrics to monitoring system"""
async def counter(self, metric: str, value: float = 1.0):
print(f"Metric: {metric} = {value}")
async def close(self): ...
@dataclass(slots=True)
class RepoStats:
name: str
stars: int
# Entry point: receives injected config dependency
@scraper
async def scrape(send_request: SendRequest, config: Config):
"""Scraper entry point with injected config"""
await send_request(
Request(
url=f"{config.api_base_url}/repos/python/cpython",
headers={"Authorization": f"token {config.github_token}"},
)
)
# Middleware: factory receives injected metrics dependency
@scraper.middleware
def request_metrics(metrics: MetricsClient):
async def middleware(call_next, request):
await metrics.counter("request_started")
try:
response = await call_next(request)
except Exception:
await metrics.counter("request_ended")
raise
await metrics.counter("request_ended")
return response
return middleware
# Lifespan: setup dependencies and cleanup
@scraper.lifespan
async def lifespan(scraper: AIOScraper):
"""
Setup phase: create and register dependencies.
Teardown phase: cleanup resources.
"""
# Create resources
config = Config(github_token="ghp_xxxx", api_base_url="https://api.github.com")
metrics = MetricsClient()
# Register dependencies - will be injected by parameter names
scraper.add_dependencies(config=config, metrics=metrics)
yield # Scraper runs here
# Cleanup
await metrics.close()
Dependency injection rules¶
Name-based matching: Dependencies are injected by matching parameter names to registered keys (
config: Configmatchesadd_dependencies(config=...)by the nameconfig)Built-in dependencies: Some dependencies are always available:
send_request: SendRequest- schedule new requestspipeline: Pipeline- send items to pipelines
No dependency found: If a parameter has no default and no matching dependency, raises an error
Best practices¶
Use lifespan for setup/teardown: Register dependencies in
@scraper.lifespanto ensure proper cleanupKeep dependencies simple: Inject services/clients, not raw data
Test with mocks: Dependency injection makes testing easy - just register mock objects
# In tests mock_db = MockDatabase() scraper.add_dependencies(db_pool=mock_db)