API

Core

class aioscraper.core.scraper.AIOScraper(*scrapers, config=None, lifespan=None, sessionmaker_factory=None)[source]

Bases: object

Core entrypoint that wires scrapers, middlewares, and pipelines.

Parameters:
  • *scrapers (Scraper) – Callable scrapers queued on startup.

  • config (Config | None) – Pre-built configuration; when None the scraper loads one lazily via load_config() on start.

  • lifespan (Lifespan | None) – Optional async context manager factory that wraps the scraper’s lifecycle (setup/teardown).

  • sessionmaker_factory (SessionMakerFactory | None) – Override the function that builds HTTP sessions (defaults to aioscraper.core.session.factory.get_sessionmaker()).

__call__(scraper)[source]

Add a scraper callable and return it for decorator use.

Parameters:

scraper (Callable[[...], Awaitable[Any]])

Return type:

Callable[[…], Awaitable[Any]]

add_dependencies(**kwargs)[source]

Add shared dependencies to inject into scraper callbacks.

Parameters:

kwargs (Any)

async close()[source]

Close the scraper and its associated resources.

lifespan(lifespan)[source]

Attach a lifespan callback to run before/after scraping.

Parameters:

lifespan (Callable[[AIOScraper], AsyncGenerator[None, None]])

Return type:

Callable[[AIOScraper], AsyncGenerator[None, None]]

property middleware: MiddlewareHolder

Access the middleware registry for request/response hooks.

property pipeline: PipelineHolder

Access the pipeline registry and middleware helpers.

async shutdown()[source]

Trigger a graceful shutdown of the scraper.

start()[source]

Start the scraper and run it in the background.

async wait(timeout=None)[source]

Wait for the scraper to finish.

Parameters:

timeout (float | None)

async aioscraper.core.runner.run_scraper(scraper)[source]

Public entrypoint to run scraper with signal handling.

Parameters:

scraper (AIOScraper)

aioscraper.compiled(func)[source]

Decorator that optimizes dependency injection by caching function parameters.

Replaces runtime inspection with compile-time parameter extraction.

Parameters:

func (Callable[[...], Any])

Return type:

Callable[[…], Any]

Configuration

class aioscraper.config.models.Config(session=SessionConfig(timeout=60.0, ssl=True, proxy=None, http_backend=None, retry=RequestRetryConfig(enabled=False, attempts=3, backoff=<BackoffStrategy.EXPONENTIAL_JITTER: 'exponential_jitter'>, base_delay=0.5, max_delay=30.0, statuses=(500, 502, 503, 504, 522, 524, 408, 429), exceptions=(<class 'TimeoutError'>, )), rate_limit=RateLimitConfig(enabled=False, group_by=None, default_interval=0.0, cleanup_timeout=60.0, adaptive=None)), scheduler=SchedulerConfig(concurrent_requests=64, pending_requests=1, close_timeout=0.1, ready_queue_max_size=0), execution=ExecutionConfig(timeout=None, shutdown_timeout=0.1, shutdown_check_interval=0.1, log_level=40), pipeline=PipelineConfig(strict=True))[source]

Bases: object

Main configuration class that combines all configuration components.

Parameters:
class aioscraper.config.models.SessionConfig(timeout=60.0, ssl=True, proxy=None, http_backend=None, retry=RequestRetryConfig(enabled=False, attempts=3, backoff=<BackoffStrategy.EXPONENTIAL_JITTER: 'exponential_jitter'>, base_delay=0.5, max_delay=30.0, statuses=(500, 502, 503, 504, 522, 524, 408, 429), exceptions=(<class 'TimeoutError'>, )), rate_limit=RateLimitConfig(enabled=False, group_by=None, default_interval=0.0, cleanup_timeout=60.0, adaptive=None))[source]

Bases: object

HTTP session settings shared by every request.

Parameters:
  • timeout (float) – Request timeout in seconds

  • ssl (ssl.SSLContext | bool) – SSL handling; bool toggles verification, SSLContext can carry custom CAs

  • proxy (str | dict[str, str | None] | None) – Default proxy passed to the HTTP client

  • http_backend (HttpBackend | None) – Force aiohttp/httpx; None lets the factory auto-detect

  • retry (RequestRetryConfig) – Controls built-in retry middleware behaviour

  • rate_limit (RateLimitConfig) – Controls built-in rate limiting behaviour

class aioscraper.config.models.RequestRetryConfig(enabled=False, attempts=3, backoff=BackoffStrategy.EXPONENTIAL_JITTER, base_delay=0.5, max_delay=30.0, statuses=(500, 502, 503, 504, 522, 524, 408, 429), exceptions=(<class 'TimeoutError'>, ))[source]

Bases: object

Retry behaviour applied by the built-in retry middleware.

Parameters:
  • enabled (bool) – Toggle retries on or off.

  • attempts (int) – Maximum number of retry attempts per request.

  • backoff (BackoffStrategy) – Backoff strategy for retries.

  • base_delay (float) – Base delay between retries in seconds.

  • max_delay (float) – Maximum delay between retries in seconds.

  • statuses (tuple[int, ...]) – HTTP status codes that should trigger a retry.

  • exceptions (tuple[type[BaseException], ...]) – Exception types that should trigger a retry.

class aioscraper.config.models.SchedulerConfig(concurrent_requests=64, pending_requests=1, close_timeout=0.1, ready_queue_max_size=0)[source]

Bases: object

Configuration for request scheduler.

Parameters:
  • concurrent_requests (int) – Maximum number of concurrent requests

  • pending_requests (int) – Number of pending requests to maintain

  • close_timeout (float | None) – Timeout for closing scheduler in seconds

  • ready_queue_max_size (int) – Maximum size of the ready queue (0 for unlimited)

class aioscraper.config.models.ExecutionConfig(timeout=None, shutdown_timeout=0.1, shutdown_check_interval=0.1, log_level=40)[source]

Bases: object

Configuration for execution.

Parameters:
  • timeout (float | None) – Overall execution timeout in seconds

  • shutdown_timeout (float) – Timeout for graceful shutdown in seconds

  • log_level (int) – Log level for timeout events (e.g., logging.ERROR, logging.WARNING). Defaults to logging.ERROR.

  • shutdown_check_interval (float)

class aioscraper.config.models.PipelineConfig(strict=True)[source]

Bases: object

Configuration for pipelines.

Parameters:

strict (bool) – Raise an exception if a pipeline for an item is missing

class aioscraper.config.models.HttpBackend(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

class aioscraper.config.models.BackoffStrategy(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

Backoff strategy for retries.

CONSTANT

Constant backoff

LINEAR

Linear backoff

EXPONENTIAL

Exponential backoff

EXPONENTIAL_JITTER

Exponential backoff with jitter

class aioscraper.config.models.RateLimitConfig(enabled=False, group_by=None, default_interval=0.0, cleanup_timeout=60.0, adaptive=None)[source]

Bases: object

Configuration for rate limiting.

Parameters:
  • enabled (bool) – Toggle rate limiting on or off.

  • group_by (Callable[[Request], tuple[Hashable, float]] | None) – Function to group requests by.

  • default_interval (float) – Default interval for group.

  • cleanup_timeout (float) – Timeout in seconds before cleaning up an idle request group.

  • adaptive (AdaptiveRateLimitConfig | None) – Adaptive rate limiting configuration (EWMA + AIMD).

class aioscraper.config.models.AdaptiveRateLimitConfig(min_interval=0.001, max_interval=5.0, increase_factor=2.0, decrease_step=0.01, success_threshold=5, ewma_alpha=0.3, respect_retry_after=True, inherit_retry_triggers=True, custom_trigger_statuses=(), custom_trigger_exceptions=())[source]

Bases: object

Configuration for adaptive rate limiting using EWMA + AIMD.

Adaptively adjusts request intervals based on server response patterns. Uses EWMA (Exponentially Weighted Moving Average) for latency tracking and AIMD (Additive Increase Multiplicative Decrease) for interval adjustment.

Parameters:
  • min_interval (float) – Minimum allowed interval between requests (seconds).

  • max_interval (float) – Maximum allowed interval between requests (seconds).

  • increase_factor (float) – Multiplicative factor for interval increase on failure (must be > 1.0).

  • decrease_step (float) – Additive step for interval decrease on success (seconds).

  • success_threshold (int) – Number of consecutive successes before decreasing interval.

  • ewma_alpha (float) – EWMA smoothing factor for latency (0 < alpha <= 1, higher = more weight to recent).

  • respect_retry_after (bool) – Whether to use Retry-After header as interval override.

  • inherit_retry_triggers (bool) – Whether to use RequestRetryConfig statuses/exceptions as triggers.

  • custom_trigger_statuses (tuple[int, ...]) – Additional HTTP statuses to trigger adaptive slowdown.

  • custom_trigger_exceptions (tuple[type[BaseException], ...]) – Additional exception types to trigger adaptive slowdown.

aioscraper.config.loader.load_config()[source]

Load configuration from environment variables.

Reads configuration from environment variables prefixed with SESSION, SCHEDULER, EXECUTION, and PIPELINE. When parameters are None, values are read from corresponding environment variables. Defaults are used when env vars are not set.

Returns:

Complete configuration object with all settings resolved.

Return type:

Config

Session

class aioscraper.core.session.base.BaseRequestContextManager(request)[source]

Bases: ABC

Asynchronous context manager that encapsulates request execution lifecycle.

Parameters:

request (Request)

abstract async __aenter__()[source]

Send the HTTP request and return a populated Response.

Return type:

Response

async __aexit__(exc_type, exc_val, exc_tb)[source]

Tear down resources registered in the exit stack when the request finishes.

Parameters:
  • exc_type (type[BaseException] | None)

  • exc_val (BaseException | None)

  • exc_tb (TracebackType | None)

class aioscraper.core.session.base.BaseSession[source]

Bases: ABC

Base abstract class for HTTP session.

abstract async close()[source]

Close the session and release all resources.

This method should be called after finishing work with the session to properly release resources.

abstract make_request(request)[source]

Build a context manager responsible for executing request.

Parameters:

request (Request)

Return type:

BaseRequestContextManager

aioscraper.core.session.factory.get_sessionmaker(config)[source]

Return a factory that builds a session using the chosen or available HTTP backend.

Parameters:

config (SessionConfig)

Return type:

Callable[[], BaseSession]

class aioscraper.types.session.Request(*, url, method=<HTTPMethod.GET>, params=None, data=None, json_data=None, files=None, cookies=None, headers=None, auth=None, proxy=None, proxy_auth=None, proxy_headers=None, timeout=None, allow_redirects=True, max_redirects=10, delay=None, priority=0, callback=None, cb_kwargs=<factory>, errback=None, state=<factory>)[source]

Bases: object

Represents an HTTP request with all its parameters.

Parameters:
  • url (str) – Target URL

  • method (str) – HTTP method

  • params (QueryParams | None) – URL query parameters

  • data (Any) – Request body data

  • files (RequestFiles | None) – Multipart files mapping

  • json_data (Any) – JSON data to be sent in the request body

  • cookies (RequestCookies | None) – Request cookies

  • headers (RequestHeaders | None) – Request headers

  • auth (BasicAuth | None) – Basic authentication credentials

  • proxy (str | None) – Proxy URL (per-request proxies are honored only by the aiohttp backend)

  • proxy_auth (BasicAuth | None) – Proxy authentication credentials

  • proxy_headers (RequestHeaders | None) – Proxy headers

  • timeout (float | None) – Request timeout in seconds

  • allow_redirects (bool) – Whether to follow HTTP redirects

  • max_redirects (int) – Maximum number of redirects to follow

  • delay (float | None) – Delay before sending the request

  • priority (int) – Priority of the request

  • callback (Callable[..., Awaitable] | None) – Async callback function to be called after successful request

  • cb_kwargs (dict[str, Any]) – Keyword arguments for the callback function

  • errback (Callable[..., Awaitable] | None) – Async error callback function

  • state (dict[str, Any]) – State for middlewares

class aioscraper.types.session.Response(url, method, status, headers, cookies, read)[source]

Bases: object

Represents an HTTP response with all its components.

Parameters:
  • url (str)

  • method (str)

  • status (int)

  • headers (Mapping[str, str])

  • cookies (SimpleCookie)

  • read (Callable[[], Awaitable[bytes]])

property cookies: SimpleCookie

Parsed response cookies.

get_encoding()[source]

Resolve response encoding from the Content-Type header.

Parses the Content-Type header for a charset parameter. Returns “utf-8” as a safe default if no charset is found or if the charset is invalid.

Returns:

Detected charset or "utf-8" as a safe default.

Return type:

str

property headers: Mapping[str, str]

Response headers.

async json(*, encoding=None, loads=<function loads>)[source]

Read and decodes JSON response.

Parameters:
  • encoding (str | None)

  • loads (Callable[[str], Any])

Return type:

Any

property method: str

HTTP method used.

property ok: bool

Returns True if status is less than 400, False if not

async read()[source]

Read response payload.

Return type:

bytes

property status: int

HTTP status code.

async text(encoding='utf-8', errors='strict')[source]

Read response payload and decode.

Parameters:
  • encoding (str | None)

  • errors (str)

Return type:

str

property url: str

Final URL of the response.

class aioscraper.types.session.BasicAuth[source]

Bases: TypedDict

class aioscraper.types.session.File(name, value, content_type)[source]

Bases: NamedTuple

Parameters:
  • name (str)

  • value (Any)

  • content_type (str | None)

content_type: str | None

Alias for field number 2

name: str

Alias for field number 0

value: Any

Alias for field number 1

Pipeline

class aioscraper.core.pipeline.PipelineDispatcher(config, pipelines, global_middleware_factories=None, dependencies=None)[source]

Bases: object

Routes items through the registered pipeline chain.

Parameters:
  • config (PipelineConfig)

  • pipelines (Mapping[Any, PipelineContainer])

  • global_middleware_factories (list[Callable[[...], GlobalPipelineMiddleware[Any]]] | None)

  • dependencies (Mapping[str, Any] | None)

async close()[source]

Closes all pipelines.

Calls the close() method for each pipeline in the system.

async put_item(item)[source]

Dispatches an item through the pipeline.

Parameters:

item (PipelineItemType)

Return type:

PipelineItemType

class aioscraper.types.pipeline.Pipeline(*args, **kwargs)[source]

Bases: Protocol[PipelineItemType]

Protocol for callables that accept an item and return the processed item.

async __call__(item)[source]

Call self as a function.

Parameters:

item (PipelineItemType)

Return type:

PipelineItemType

class aioscraper.types.pipeline.BasePipeline(*args, **kwargs)[source]

Bases: Protocol[PipelineItemType]

Interface for classes that process scraped items of a specific type.

async close()[source]

Close the pipeline.

This method is called when the pipeline is no longer needed. It can be overridden to perform any necessary cleanup operations.

async put_item(item)[source]

Process an item and return it (mutated or replaced).

This method must be implemented by all concrete pipeline classes.

Parameters:

item (PipelineItemType)

Return type:

PipelineItemType

class aioscraper.types.pipeline.PipelineMiddleware(*args, **kwargs)[source]

Bases: Protocol[PipelineItemType]

Async hook used before or after pipeline execution; must return the item.

async __call__(item)[source]

Call self as a function.

Parameters:

item (PipelineItemType)

Return type:

PipelineItemType

class aioscraper.types.pipeline.GlobalPipelineMiddleware(*args, **kwargs)[source]

Bases: Protocol[PipelineItemType]

Wrapper invoked around the entire pipeline chain for every item type.

async __call__(handler, item)[source]

Call self as a function.

Parameters:
  • handler (Pipeline)

  • item (PipelineItemType)

Return type:

PipelineItemType

Middlewares

class aioscraper.middlewares.retry.RetryMiddleware(config, send_request)[source]

Bases: object

Request middleware that retries failed requests based on configuration.

Parameters:

Execution

class aioscraper.core.executor.ScraperExecutor(config, scrapers, dependencies, middleware_holder, pipeline_dispatcher, sessionmaker)[source]

Bases: object

Executes scrapers and manages the scraping process.

This class is responsible for running scraper functions, managing the request scheduler, and handling the graceful shutdown of the scraping process.

Parameters:
async close()[source]

Close all resources and cleanup.

async run()[source]

Start the scraping process.

class aioscraper.core.request_manager.RequestManager(scheduler_config, rate_limit_config, retry_config, shutdown_check_interval, sessionmaker, dependencies, middleware_holder)[source]

Bases: object

Manages HTTP requests with priority queuing, rate limiting, and middleware support.

Parameters:
  • scheduler_config (SchedulerConfig) – Configuration for the request scheduler.

  • rate_limit_config (RateLimitConfig) – Configuration for the request rate limiter.

  • retry_config (RequestRetryConfig) – Configuration for request retries.

  • shutdown_check_interval (float) – Interval between shutdown checks in seconds

  • sessionmaker (SessionMaker) – A factory for creating session objects.

  • dependencies (dict[str, Any]) – Additional dependencies to be injected into middleware and callbacks.

  • middleware_holder (MiddlewareHolder) – A container for middleware collections.

async close()[source]

Close the underlying session.

class aioscraper.core.rate_limiter.RateLimitManager(config, retry_config, schedule)[source]

Bases: object

Manages rate limiting for requests using group-based throttling.

Requests are grouped by a configurable key (default: hostname) and processed with a specified interval between requests in each group. Groups are created dynamically and cleaned up automatically after inactivity.

Parameters:
  • config (RateLimitConfig) – Rate limiting configuration including grouping strategy and intervals.

  • retry_config (RequestRetryConfig) – Retry configuration for inheriting trigger conditions.

  • schedule (Callable[[PRequest], Awaitable[Any]]) – Callback function to schedule request execution.

property active: bool

Check if any request groups have pending requests.

async close()[source]

Close all request groups and clean up resources.

get_group_key(request)[source]

Get group key for a request.

Parameters:

request (Request)

Return type:

Hashable

on_request_outcome(outcome)[source]

Handle request outcome and adjust group interval adaptively.

Parameters:

outcome (RequestOutcome)

class aioscraper.core.rate_limiter.RequestGroup(key, interval, cleanup_timeout, schedule, on_finished)[source]

Bases: object

Manages a group of requests that share the same rate limit interval.

Each group processes requests sequentially with a configured delay between them. Groups automatically clean up after a period of inactivity.

Parameters:
  • key (Hashable) – Unique identifier for this request group.

  • interval (float) – Delay in seconds between processing requests in this group.

  • cleanup_timeout (float) – Timeout in seconds before cleaning up an idle group.

  • schedule (Callable[[PRequest], Awaitable[None]]) – Callback function to schedule request execution.

  • on_finished (Callable[[Hashable, RequestGroup], None]) – Callback invoked when the group finishes or becomes idle.

property active: bool

Check if the group has pending requests in its queue.

async close()[source]

Cancel the worker task and wait for graceful shutdown.

property interval: float

Get the current interval for this group.

async put(pr)[source]

Add a request to this group’s processing queue.

Parameters:

pr (PRequest)

set_intervals(interval, cleanup_timeout)[source]

Update group interval and cleanup timeout.

Parameters:
  • interval (float)

  • cleanup_timeout (float)

class aioscraper.core.rate_limiter.AdaptiveStrategy(*, min_interval=0.001, max_interval=5.0, increase_factor=2.0, decrease_step=0.01, success_threshold=5, ewma_alpha=0.3, trigger_statuses=(429, 500, 502, 503, 504, 522, 524, 408), trigger_exceptions=(<class 'TimeoutError'>, ), respect_retry_after=True)[source]

Bases: object

EWMA + AIMD adaptive rate limiting strategy.

Fast multiplicative increase on overload (server pushback). Slow additive decrease on sustained success (probing for capacity).

Parameters:
  • enabled (bool) – Enable adaptive rate limiting.

  • min_interval (float) – Minimum allowed interval (seconds).

  • max_interval (float) – Maximum allowed interval (seconds).

  • increase_factor (float) – Multiplicative factor for interval increase on failure.

  • decrease_step (float) – Additive step for interval decrease on success.

  • success_threshold (int) – Number of consecutive successes before decreasing interval.

  • ewma_alpha (float) – Smoothing factor for latency EWMA (0 < alpha <= 1).

  • trigger_statuses (tuple[int, ...]) – HTTP statuses that trigger adaptive slowdown.

  • trigger_exceptions (tuple[type[BaseException], ...]) – Exception types that trigger adaptive slowdown.

  • respect_retry_after (bool) – Whether to use Retry-After header as override.

calculate_interval(group_key, current_interval, outcome)[source]

Calculate new interval based on request outcome.

Algorithm: - On failure: interval = min(max_interval, interval * increase_factor) - On success: if success_count >= threshold: interval = max(min_interval, interval - decrease_step) - Retry-After override: Use header value if present and enabled

Returns:

New interval in seconds.

Parameters:
  • group_key (Hashable)

  • current_interval (float)

  • outcome (RequestOutcome)

Return type:

float

get_or_create_metrics(group_key)[source]

Get or create metrics for a group.

Parameters:

group_key (Hashable)

Return type:

AdaptiveMetrics

reset_metrics(group_key)[source]

Reset metrics for a group (e.g., on cleanup).

Parameters:

group_key (Hashable)

class aioscraper.core.rate_limiter.RequestOutcome(group_key, latency, retry_after=None, status_code=None, exception_type=None)[source]

Bases: object

Captures the result of a request execution.

Parameters:
  • group_key (Hashable)

  • latency (float)

  • retry_after (float | None)

  • status_code (int | None)

  • exception_type (type[BaseException] | None)

group_key

The RequestGroup key this outcome belongs to.

Type:

Hashable

latency

Request latency in seconds (start to finish).

Type:

float

retry_after

Value from Retry-After header if present.

Type:

float | None

status_code

HTTP status code if applicable.

Type:

int | None

exception_type

Type of exception if one occurred.

Type:

type[BaseException] | None

class aioscraper.core.rate_limiter.AdaptiveMetrics(ewma_latency=0.0, ewma_alpha=0.3, success_count=0, failure_count=0, last_outcome_time=None, last_outcome_success=True, total_requests=0)[source]

Bases: object

Tracks metrics for adaptive rate limiting using EWMA + AIMD.

Parameters:
  • ewma_latency (float)

  • ewma_alpha (float)

  • success_count (int)

  • failure_count (int)

  • last_outcome_time (float | None)

  • last_outcome_success (bool)

  • total_requests (int)

ewma_latency

Exponentially weighted moving average of request latency.

Type:

float

ewma_alpha

Smoothing factor for EWMA (0 < alpha <= 1).

Type:

float

success_count

Consecutive successful requests since last failure.

Type:

int

failure_count

Consecutive failures since last success.

Type:

int

last_outcome_time

Timestamp of last completed request.

Type:

float | None

last_outcome_success

Whether last request was successful.

Type:

bool | None

total_requests

Total number of completed requests in this group.

Type:

int

record_failure(latency=None)[source]

Record a failed request outcome (timeout, error status, etc).

Parameters:

latency (float | None)

record_success(latency)[source]

Record a successful request outcome.

Parameters:

latency (float)

update_latency(latency)[source]

Update EWMA latency with new measurement.

Parameters:

latency (float)

Holders

class aioscraper.holders.middleware.MiddlewareHolder[source]

Bases: object

Stores request middleware factories in registration order.

__call__(factory)[source]

Decorator form: register a middleware factory.

Parameters:

factory (Callable[[...], RequestMiddleware])

Return type:

Callable[[…], RequestMiddleware]

add(*factories)[source]

Register request middleware factories in order.

Each factory can accept injected dependencies and must return a middleware with signature async def mw(call_next, request): ... which wraps the request handler chain for every request..

Parameters:

factories (Callable[[...], RequestMiddleware])

class aioscraper.holders.pipeline.PipelineHolder[source]

Bases: object

Keeps pipeline containers and exposes decorator helpers.

__call__(item_type, *args, **kwargs)[source]

Return a decorator that instantiates and registers a pipeline class for the given item type.

Parameters:

item_type (type[PipelineItemType])

Return type:

Callable[[type[BasePipeline[PipelineItemType]]], type[BasePipeline[PipelineItemType]]]

add(item_type, *pipelines)[source]

Add pipelines to process scraped data.

Parameters:
  • item_type (type[PipelineItemType])

  • pipelines (BasePipeline[PipelineItemType])

add_global_middlewares(*factories)[source]

Register global pipeline middleware factories in order.

Each factory can accept injected dependencies and must return a middleware with signature async def mw(handler, item): ... which wraps the entire pipeline chain for every item type.

Parameters:

factories (Callable[[...], GlobalPipelineMiddleware[PipelineItemType]])

add_middlewares(middleware_type, item_type, *middlewares)[source]

Add pipeline processing middlewares.

Parameters:
  • middleware_type (Literal['pre', 'post'])

  • item_type (type[PipelineItemType])

  • middlewares (PipelineMiddleware[PipelineItemType])

global_middleware(factory)[source]

Decorator form of add_global_middlewares().

Parameters:

factory (Callable[[...], GlobalPipelineMiddleware[PipelineItemType]])

Return type:

Callable[[…], GlobalPipelineMiddleware[PipelineItemType]]

middleware(middleware_type, item_type)[source]

Return a decorator that registers a pipeline middleware for the given stage.

Parameters:
  • middleware_type (Literal['pre', 'post'])

  • item_type (type[PipelineItemType])

Return type:

Callable[[PipelineMiddleware[PipelineItemType]], PipelineMiddleware[PipelineItemType]]

Exceptions

class aioscraper.exceptions.AIOScraperException[source]

Bases: Exception

Base scraper exception.

class aioscraper.exceptions.ClientException[source]

Bases: AIOScraperException

Base exception class for all client-related errors.

class aioscraper.exceptions.HTTPException(url, method, status_code, headers, message)[source]

Bases: ClientException

Exception raised when an HTTP request fails with a specific status code.

Parameters:
  • status_code (int) – The HTTP status code of the failed request

  • message (str) – Error message describing the failure

  • url (str) – The URL that was being accessed

  • method (str) – The HTTP method used for the request

  • headers (Mapping[str, str]) – Response headers returned by the server

class aioscraper.exceptions.PipelineException[source]

Bases: AIOScraperException

Base exception class for all pipeline-related errors.

class aioscraper.exceptions.StopItemProcessing[source]

Bases: AIOScraperException

Raised by pipeline middlewares to stop processing the current item.

class aioscraper.exceptions.StopMiddlewareProcessing[source]

Bases: AIOScraperException

Stop further pipeline middlewares in the current phase (pre/post).

class aioscraper.exceptions.InvalidRequestData[source]

Bases: AIOScraperException

Raised when request payload fields conflict.

class aioscraper.exceptions.CLIError[source]

Bases: AIOScraperException

Raised when CLI arguments are invalid or cannot be resolved.