archive-org-link-grabber/README.md

7.1 KiB
Raw Blame History

Archive.org Link Grabber (Firefox WebExtension)

Add-on for Firefox that enhances archive.org download pages (https://archive.org/download/*) so you can filter file links (by type, name, size, or date), copy them to the clipboard (one URL per line), or send them directly to an aria2 RPC server.

Status: README/spec first. Implementation details below describe the intended behavior and structure.


Features

  • Filter by file attributes: type/extension, name, size range, and date range.
  • String or regex matching: name and type filters accept plain text (e.g., mp4) or regular expressions (e.g., ^(movie|clip).*\.mp4$).
  • Copy to clipboard: export filtered links, one per line.
  • Send to aria2: forward links to a configured aria2 JSON-RPC endpoint using a secret token.
  • Per-site parsing: targets archive.org collection download listings under /download/*.
  • Persistent settings: stores filter presets and aria2 config in extension storage.

Demo Workflow

  1. Open an archive.org collections download page at https://archive.org/download/<identifier>.
  2. Open the extension popup or page action.
  3. Filter:
    • Type/Name: enter plain strings (e.g., mp4, subtitle) or regex.
    • Size: set min/max (e.g., >= 100MB, <= 2GB).
    • Date: set from/to (uses the timestamp shown on the page when available).
  4. Review the results list and count.
  5. Choose an action:
    • Copy: copies selected URLs to the clipboard, one per line.
    • Send to aria2: pushes to your configured aria2 RPC server using aria2.addUri.

Regex and Matching

  • Plain strings: match anywhere in the value (case-insensitive by default, configurable).
  • Regex: either toggle a “Use regex” option or enter values wrapped with /.../ (optional flags like i, m).
  • Type vs name: “type” typically refers to file extension; “name” is the full filename.

Examples:

  • Type contains mp4
  • Name regex /^(movie|clip).*\.mp4$/i
  • Type regex /^(mp4|mkv)$/

Aria2 Integration

  • RPC method: aria2.addUri with token:<SECRET>.
  • Batching: sends multiple links either individually or in small batches.
  • Options (optional): directory, headers, and per-item output name can be supported via the UI.

Aria2 must be running with RPC enabled, for example:

aria2c \
  --enable-rpc \
  --rpc-listen-port=6800 \
  --rpc-listen-all=false \
  --rpc-secret=YOUR_SECRET_TOKEN

Extension settings include:

  • RPC endpoint: protocol, host, port, path (default /jsonrpc).
  • Secret token: the --rpc-secret value (stored in extension storage).
  • Optional defaults: download directory and additional aria2 options.

Security notes:

  • Keep your secret token private; do not commit it.
  • If using a remote aria2, enable TLS/HTTPS and restrict access.

Permissions

The extension will require:

  • https://archive.org/* host permission to read and parse download pages.
  • storage to persist settings and presets.
  • clipboardWrite to copy links.
  • Host permission for your aria2 endpoint (e.g., http://localhost:6800/* or your remote URL). Optional permissions may be requested at runtime.

Parsing Strategy

  • A content script runs on https://archive.org/download/* pages.
  • It scrapes the file listing table/DOM and builds a dataset with name, URL, size, and date.
  • Type is derived from filename extension (and may use content-type hints if available on the page).
  • The popup UI queries this dataset, applies filters, and displays the results.
  • A background script handles aria2 RPC calls to avoid CORS issues and keep secrets out of content scope.

Installation (Temporary in Firefox)

  1. Clone this repository.
  2. In Firefox, open about:debugging#/runtime/this-firefox.
  3. Click “Load Temporary Add-on…” and select the manifest.json in this repo.
  4. Navigate to an archive.org download page and open the extensions popup.

For development, make changes and click “Reload” in about:debugging to pick them up.

Usage Tips

  • Clipboard: copying usually requires a user gesture (click). The UI is designed to perform copy on button press.
  • Case sensitivity: string filters default to case-insensitive; enable case-sensitive mode in settings if needed.
  • Sizes: support common suffixes like KB, MB, GB. Ranges accept comparisons like >= 100MB.
  • Dates: when the page provides timestamps, filters accept yyyy-mm-dd and ranges.

Troubleshooting

  • No links found: ensure you are on a /download/* page (not the item overview). Try reloading after the page finishes loading.
  • RPC errors: verify aria2c is running with --enable-rpc and that the secret/token matches. Check endpoint URL and port.
  • HTTPS-Only Mode: if your aria2 endpoint is http:// on a non-local host, Firefox may upgrade it to https:// and the request will fail. Use HTTPS on the aria2 RPC (preferred), add a site exception for the host, or disable HTTPS-Only Mode while testing.
  • Host permissions: the extension needs host permission to reach non-local RPC endpoints. The manifest includes wildcard host permissions; if you self-package, ensure your manifest allows your RPC host(s).
  • CORS/Network: Extensions can call cross-origin endpoints with host permission. If using HTTPS with a self-signed cert, allow it in Firefox or use a valid cert.
  • Clipboard blocked: confirm the browser allowed clipboard write; try clicking the button again or check site focus.

Roadmap

  • Optional per-file aria2 options (e.g., out for renaming).
  • Smart batching and retry logic.
  • Save/load named filter presets.
  • Export/import settings.
  • Support additional archive.org views if needed.

Development Notes

  • Tech stack: Standard WebExtension (manifest v3 when supported in Firefox; otherwise v2), with content script + background/service worker + popup UI.
  • Storage: browser.storage.local for settings and aria2 configs; no analytics.
  • Code style: keep dependencies minimal; prefer modern, framework-light UI for the popup.

Contributing

Issues and PRs are welcome. If proposing new filters or aria2 options, please include example pages and expected behaviors.

Disclaimer

This project is not affiliated with archive.org or aria2. Use responsibly and respect site terms of service.

Release Workflow

  • Stable ID: using applications.gecko.id = "archive-org-link-grabber@jordanwages.com". If you self-host updates, applications.gecko.update_url points to https://add-ons.jordanwages.com/archive-org-link-grabber/updates.json.
  • Prepare (version bump + sync):
    • Patch: npm run release:prepare:patch
    • Minor: npm run release:prepare:minor
    • Major: npm run release:prepare:major
  • Lint (Firefox): npm run lint:fx
  • Dev ZIP: npm run build:dev → output in dist/
  • Sign (unlisted):
    • Set environment secrets locally (do not commit): AMO_JWT_ISSUER=... AMO_JWT_SECRET=...
    • Run: npm run release:sign
    • Artifacts land in releases/<version>/
  • Self-hosted updates: releases/updates.json is tracked; update with the new version and update_link like https://add-ons.jordanwages.com/archive-org-link-grabber/releases/<version>/archive-org-link-grabber-<version>.xpi.

Notes: Keep AMO secrets local. CI is optional. You can tag releases with git tag vX.Y.Z and push tags if desired.