archive-org-link-grabber/README.md
wagesj45 c34227e3b6 fix: allow non-local RPC hosts and improve HTTPS-Only guidance
- Add wildcard host permissions for RPC calls
- Surface HTTPS-Only hint in Options test flow
- Update Troubleshooting docs for HTTPS-Only and host perms
2025-08-22 00:22:12 -05:00

132 lines
6.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Archive.org Link Grabber (Firefox WebExtension)
Add-on for Firefox that enhances archive.org download pages (https://archive.org/download/*) so you can filter file links (by type, name, size, or date), copy them to the clipboard (one URL per line), or send them directly to an aria2 RPC server.
> Status: README/spec first. Implementation details below describe the intended behavior and structure.
---
## Features
- Filter by file attributes: type/extension, name, size range, and date range.
- String or regex matching: name and type filters accept plain text (e.g., `mp4`) or regular expressions (e.g., `^(movie|clip).*\.mp4$`).
- Copy to clipboard: export filtered links, one per line.
- Send to aria2: forward links to a configured aria2 JSON-RPC endpoint using a secret token.
- Per-site parsing: targets archive.org collection download listings under `/download/*`.
- Persistent settings: stores filter presets and aria2 config in extension storage.
## Demo Workflow
1. Open an archive.org collections download page at `https://archive.org/download/<identifier>`.
2. Open the extension popup or page action.
3. Filter:
- Type/Name: enter plain strings (e.g., `mp4`, `subtitle`) or regex.
- Size: set min/max (e.g., `>= 100MB`, `<= 2GB`).
- Date: set from/to (uses the timestamp shown on the page when available).
4. Review the results list and count.
5. Choose an action:
- Copy: copies selected URLs to the clipboard, one per line.
- Send to aria2: pushes to your configured aria2 RPC server using `aria2.addUri`.
## Regex and Matching
- Plain strings: match anywhere in the value (case-insensitive by default, configurable).
- Regex: either toggle a “Use regex” option or enter values wrapped with `/.../` (optional flags like `i`, `m`).
- Type vs name: “type” typically refers to file extension; “name” is the full filename.
Examples:
- Type contains `mp4`
- Name regex `/^(movie|clip).*\.mp4$/i`
- Type regex `/^(mp4|mkv)$/`
## Aria2 Integration
- RPC method: `aria2.addUri` with `token:<SECRET>`.
- Batching: sends multiple links either individually or in small batches.
- Options (optional): directory, headers, and per-item output name can be supported via the UI.
Aria2 must be running with RPC enabled, for example:
```bash
aria2c \
--enable-rpc \
--rpc-listen-port=6800 \
--rpc-listen-all=false \
--rpc-secret=YOUR_SECRET_TOKEN
```
Extension settings include:
- RPC endpoint: protocol, host, port, path (default `/jsonrpc`).
- Secret token: the `--rpc-secret` value (stored in extension storage).
- Optional defaults: download directory and additional aria2 options.
Security notes:
- Keep your secret token private; do not commit it.
- If using a remote aria2, enable TLS/HTTPS and restrict access.
## Permissions
The extension will require:
- `https://archive.org/*` host permission to read and parse download pages.
- `storage` to persist settings and presets.
- `clipboardWrite` to copy links.
- Host permission for your aria2 endpoint (e.g., `http://localhost:6800/*` or your remote URL). Optional permissions may be requested at runtime.
## Parsing Strategy
- A content script runs on `https://archive.org/download/*` pages.
- It scrapes the file listing table/DOM and builds a dataset with name, URL, size, and date.
- Type is derived from filename extension (and may use content-type hints if available on the page).
- The popup UI queries this dataset, applies filters, and displays the results.
- A background script handles aria2 RPC calls to avoid CORS issues and keep secrets out of content scope.
## Installation (Temporary in Firefox)
1. Clone this repository.
2. In Firefox, open `about:debugging#/runtime/this-firefox`.
3. Click “Load Temporary Add-on…” and select the `manifest.json` in this repo.
4. Navigate to an archive.org download page and open the extensions popup.
For development, make changes and click “Reload” in `about:debugging` to pick them up.
## Usage Tips
- Clipboard: copying usually requires a user gesture (click). The UI is designed to perform copy on button press.
- Case sensitivity: string filters default to case-insensitive; enable case-sensitive mode in settings if needed.
- Sizes: support common suffixes like `KB`, `MB`, `GB`. Ranges accept comparisons like `>= 100MB`.
- Dates: when the page provides timestamps, filters accept yyyy-mm-dd and ranges.
## Troubleshooting
- No links found: ensure you are on a `/download/*` page (not the item overview). Try reloading after the page finishes loading.
- RPC errors: verify `aria2c` is running with `--enable-rpc` and that the secret/token matches. Check endpoint URL and port.
- HTTPS-Only Mode: if your aria2 endpoint is `http://` on a non-local host, Firefox may upgrade it to `https://` and the request will fail. Use HTTPS on the aria2 RPC (preferred), add a site exception for the host, or disable HTTPS-Only Mode while testing.
- Host permissions: the extension needs host permission to reach non-local RPC endpoints. The manifest includes wildcard host permissions; if you self-package, ensure your manifest allows your RPC host(s).
- CORS/Network: Extensions can call cross-origin endpoints with host permission. If using HTTPS with a self-signed cert, allow it in Firefox or use a valid cert.
- Clipboard blocked: confirm the browser allowed clipboard write; try clicking the button again or check site focus.
## Roadmap
- Optional per-file aria2 options (e.g., `out` for renaming).
- Smart batching and retry logic.
- Save/load named filter presets.
- Export/import settings.
- Support additional archive.org views if needed.
## Development Notes
- Tech stack: Standard WebExtension (manifest v3 when supported in Firefox; otherwise v2), with content script + background/service worker + popup UI.
- Storage: `browser.storage.local` for settings and aria2 configs; no analytics.
- Code style: keep dependencies minimal; prefer modern, framework-light UI for the popup.
## Contributing
Issues and PRs are welcome. If proposing new filters or aria2 options, please include example pages and expected behaviors.
## Disclaimer
This project is not affiliated with archive.org or aria2. Use responsibly and respect site terms of service.