Wiring scrapers and dependencies

AIOScraper uses dependency injection to provide shared resources (database clients, API clients, configs, services) to your callbacks, errbacks, pipelines, and middlewares. Dependencies are resolved automatically by parameter name and type hints.

How it works

  1. Register dependencies via add_dependencies(**kwargs) - typically in a lifespan context manager

  2. Request dependencies in your callbacks/pipelines via parameter names

  3. aioscraper injects them automatically when calling your functions based on parameter name or type

This makes testing easy (mock dependencies) and keeps your code decoupled from resource management.

Scrapers

__call__: Add one or more async scraper callables (entry points). Returns the first callable so you can use it as a decorator.

Dependencies

add_dependencies: Register objects (clients, configs, services) that become injectable into callbacks, errbacks, pipelines, and middlewares via type hints.

Example

from dataclasses import dataclass
from aioscraper import AIOScraper, Request, SendRequest


scraper = AIOScraper()


@dataclass
class Config:
    github_token: str
    api_base_url: str


class MetricsClient:
    """Send metrics to monitoring system"""

    async def counter(self, metric: str, value: float = 1.0):
        print(f"Metric: {metric} = {value}")

    async def close(self): ...


@dataclass(slots=True)
class RepoStats:
    name: str
    stars: int


# Entry point: receives injected config dependency
@scraper
async def scrape(send_request: SendRequest, config: Config):
    """Scraper entry point with injected config"""
    await send_request(
        Request(
            url=f"{config.api_base_url}/repos/python/cpython",
            headers={"Authorization": f"token {config.github_token}"},
        )
    )


# Middleware: factory receives injected metrics dependency
@scraper.middleware
def request_metrics(metrics: MetricsClient):
    async def middleware(call_next, request):
        await metrics.counter("request_started")
        try:
            response = await call_next(request)
        except Exception:
            await metrics.counter("request_ended")
            raise
        await metrics.counter("request_ended")
        return response

    return middleware


# Lifespan: setup dependencies and cleanup
@scraper.lifespan
async def lifespan(scraper: AIOScraper):
    """
    Setup phase: create and register dependencies.
    Teardown phase: cleanup resources.
    """
    # Create resources
    config = Config(github_token="ghp_xxxx", api_base_url="https://api.github.com")
    metrics = MetricsClient()

    # Register dependencies - will be injected by parameter names
    scraper.add_dependencies(config=config, metrics=metrics)

    yield  # Scraper runs here

    # Cleanup
    await metrics.close()

Dependency injection rules

  1. Name-based matching: Dependencies are injected by matching parameter names to registered keys (config: Config matches add_dependencies(config=...) by the name config)

  2. Built-in dependencies: Some dependencies are always available:

    • send_request: SendRequest - schedule new requests

    • pipeline: Pipeline - send items to pipelines

  3. No dependency found: If a parameter has no default and no matching dependency, raises an error

Best practices

  1. Use lifespan for setup/teardown: Register dependencies in @scraper.lifespan to ensure proper cleanup

  2. Keep dependencies simple: Inject services/clients, not raw data

  3. Test with mocks: Dependency injection makes testing easy - just register mock objects

    # In tests
    mock_db = MockDatabase()
    scraper.add_dependencies(db_pool=mock_db)