CLI

Run scrapers from the command line without wiring up the event loop yourself.

pip install aioscraper
aioscraper scraper

See the minimal code in Quickstart.

Entrypoint contract

The CLI loads a module (file path or module.path) and optionally a specific attribute using module:attr.

Entry rules:

  • Without :attr: the CLI looks for a scraper attribute that is either an AIOScraper instance or a callable returning one.

  • With :attr pointing to an AIOScraper: the CLI uses that instance.

  • With :attr pointing to a callable (sync or async): the CLI executes/awaits it and expects an AIOScraper instance in return.

Examples

aioscraper scraper                   # uses scraper variable from scraper.py
aioscraper mypkg.scraper:custom_app  # uses custom_app AIOScraper instance
aioscraper mypkg.factory:make        # calls make() (sync factory)
aioscraper mypkg.factory:make_async  # awaits make_async() (async factory)

For resource setup/teardown around the same scraper instance, attach a lifespan(scraper) when constructing the scraper in code (see Lifespan).

Running without the CLI

You can run the same scraper programmatically using run_scraper:

import asyncio
from aioscraper import AIOScraper, Request, SendRequest, run_scraper
from aioscraper.config import load_config


async def scrape(send_request: SendRequest):
    await send_request(Request(url="https://example.com"))


async def main():
    scraper = AIOScraper(scrape, config=load_config())
    await run_scraper(scraper)


if __name__ == "__main__":
    asyncio.run(main())

This gives you the same signal handling and graceful shutdown behavior as the CLI. run_scraper expects scraper.config to be set ahead of time, which is why the example passes config=load_config() to the constructor.

Configuration

Configuration precedence (when the CLI needs to load a config): CLI flags -> environment variables -> Config defaults. If the resolved AIOScraper already has config set, the CLI leaves it untouched and CLI flags/env vars are ignored.

See Configuration for detailed configuration options and examples.

CLI flags

  • --concurrent-requests: Max concurrent requests (overrides SCHEDULER_CONCURRENT_REQUESTS).

  • --pending-requests: Pending requests to keep queued (overrides SCHEDULER_PENDING_REQUESTS).

Environment variables

All environment variables map directly to fields in Config and its nested configuration classes. The CLI reads these variables automatically. For programmatic use, call load_config to read environment variables and construct a Config instance.

SessionConfig

HTTP session and client behavior.

  • SESSION_REQUEST_TIMEOUTtimeout

  • SESSION_SSLssl

  • SESSION_PROXYproxy (docs)

  • SESSION_HTTP_BACKENDhttp_backend

RequestRetryConfig

Retry middleware behavior (docs).

  • SESSION_RETRY_ENABLEDenabled

  • SESSION_RETRY_ATTEMPTSattempts

  • SESSION_RETRY_BACKOFFbackoff

  • SESSION_RETRY_BASE_DELAYbase_delay

  • SESSION_RETRY_MAX_DELAYmax_delay

  • SESSION_RETRY_STATUSESstatuses

  • SESSION_RETRY_EXCEPTIONSexceptions

RateLimitConfig

Rate limiting behavior (docs).

  • SESSION_RATE_LIMIT_ENABLEDenabled

  • SESSION_RATE_LIMIT_INTERVALdefault_interval

  • SESSION_RATE_LIMIT_CLEANUP_TIMEOUTcleanup_timeout

AdaptiveRateLimitConfig

Adaptive rate limiting (EWMA + AIMD) (docs).

Set SESSION_RATE_LIMIT_ADAPTIVE_ENABLED=true to enable and configure other parameters.

  • SESSION_RATE_LIMIT_ADAPTIVE_MIN_INTERVALmin_interval

  • SESSION_RATE_LIMIT_ADAPTIVE_MAX_INTERVALmax_interval

  • SESSION_RATE_LIMIT_ADAPTIVE_INCREASE_FACTORincrease_factor

  • SESSION_RATE_LIMIT_ADAPTIVE_DECREASE_STEPdecrease_step

  • SESSION_RATE_LIMIT_ADAPTIVE_SUCCESS_THRESHOLDsuccess_threshold

  • SESSION_RATE_LIMIT_ADAPTIVE_EWMA_ALPHAewma_alpha

  • SESSION_RATE_LIMIT_ADAPTIVE_RESPECT_RETRY_AFTERrespect_retry_after

  • SESSION_RATE_LIMIT_ADAPTIVE_INHERIT_RETRY_TRIGGERSinherit_retry_triggers

SchedulerConfig

Request scheduler behavior.

  • SCHEDULER_CONCURRENT_REQUESTSconcurrent_requests

  • SCHEDULER_PENDING_REQUESTSpending_requests

  • SCHEDULER_CLOSE_TIMEOUTclose_timeout

  • SCHEDULER_READY_QUEUE_MAX_SIZEready_queue_max_size

ExecutionConfig

Execution and shutdown behavior.

  • EXECUTION_TIMEOUTtimeout

  • EXECUTION_SHUTDOWN_TIMEOUTshutdown_timeout

  • EXECUTION_SHUTDOWN_CHECK_INTERVALshutdown_check_interval

  • EXECUTION_LOG_LEVELlog_level

PipelineConfig

Pipeline dispatching behavior.

  • PIPELINE_STRICTstrict