diff --git a/README.md b/README.md index 5e2cd25..199ca3d 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,131 @@ -# archive-org-link-grabber +# Archive.org Link Grabber (Firefox WebExtension) + +Add-on for Firefox that enhances archive.org download pages (https://archive.org/download/*) so you can filter file links (by type, name, size, or date), copy them to the clipboard (one URL per line), or send them directly to an aria2 RPC server. + +> Status: README/spec first. Implementation details below describe the intended behavior and structure. + +--- + +## Features + +- Filter by file attributes: type/extension, name, size range, and date range. +- String or regex matching: name and type filters accept plain text (e.g., `mp4`) or regular expressions (e.g., `^(movie|clip).*\.mp4$`). +- Copy to clipboard: export filtered links, one per line. +- Send to aria2: forward links to a configured aria2 JSON-RPC endpoint using a secret token. +- Per-site parsing: targets archive.org collection download listings under `/download/*`. +- Persistent settings: stores filter presets and aria2 config in extension storage. + +## Demo Workflow + +1. Open an archive.org collection’s download page at `https://archive.org/download/`. +2. Open the extension popup or page action. +3. Filter: + - Type/Name: enter plain strings (e.g., `mp4`, `subtitle`) or regex. + - Size: set min/max (e.g., `>= 100MB`, `<= 2GB`). + - Date: set from/to (uses the timestamp shown on the page when available). +4. Review the results list and count. +5. Choose an action: + - Copy: copies selected URLs to the clipboard, one per line. + - Send to aria2: pushes to your configured aria2 RPC server using `aria2.addUri`. + +## Regex and Matching + +- Plain strings: match anywhere in the value (case-insensitive by default, configurable). +- Regex: either toggle a “Use regex” option or enter values wrapped with `/.../` (optional flags like `i`, `m`). +- Type vs name: “type” typically refers to file extension; “name” is the full filename. + +Examples: + +- Type contains `mp4` +- Name regex `/^(movie|clip).*\.mp4$/i` +- Type regex `/^(mp4|mkv)$/` + +## Aria2 Integration + +- RPC method: `aria2.addUri` with `token:`. +- Batching: sends multiple links either individually or in small batches. +- Options (optional): directory, headers, and per-item output name can be supported via the UI. + +Aria2 must be running with RPC enabled, for example: + +```bash +aria2c \ + --enable-rpc \ + --rpc-listen-port=6800 \ + --rpc-listen-all=false \ + --rpc-secret=YOUR_SECRET_TOKEN +``` + +Extension settings include: + +- RPC endpoint: protocol, host, port, path (default `/jsonrpc`). +- Secret token: the `--rpc-secret` value (stored in extension storage). +- Optional defaults: download directory and additional aria2 options. + +Security notes: + +- Keep your secret token private; do not commit it. +- If using a remote aria2, enable TLS/HTTPS and restrict access. + +## Permissions + +The extension will require: + +- `https://archive.org/*` host permission to read and parse download pages. +- `storage` to persist settings and presets. +- `clipboardWrite` to copy links. +- Host permission for your aria2 endpoint (e.g., `http://localhost:6800/*` or your remote URL). Optional permissions may be requested at runtime. + +## Parsing Strategy + +- A content script runs on `https://archive.org/download/*` pages. +- It scrapes the file listing table/DOM and builds a dataset with name, URL, size, and date. +- Type is derived from filename extension (and may use content-type hints if available on the page). +- The popup UI queries this dataset, applies filters, and displays the results. +- A background script handles aria2 RPC calls to avoid CORS issues and keep secrets out of content scope. + +## Installation (Temporary in Firefox) + +1. Clone this repository. +2. In Firefox, open `about:debugging#/runtime/this-firefox`. +3. Click “Load Temporary Add-on…” and select the `manifest.json` in this repo. +4. Navigate to an archive.org download page and open the extension’s popup. + +For development, make changes and click “Reload” in `about:debugging` to pick them up. + +## Usage Tips + +- Clipboard: copying usually requires a user gesture (click). The UI is designed to perform copy on button press. +- Case sensitivity: string filters default to case-insensitive; enable case-sensitive mode in settings if needed. +- Sizes: support common suffixes like `KB`, `MB`, `GB`. Ranges accept comparisons like `>= 100MB`. +- Dates: when the page provides timestamps, filters accept yyyy-mm-dd and ranges. + +## Troubleshooting + +- No links found: ensure you are on a `/download/*` page (not the item overview). Try reloading after the page finishes loading. +- RPC errors: verify `aria2c` is running with `--enable-rpc` and that the secret/token matches. Check endpoint URL and port. +- CORS/Network: Extensions can call cross-origin endpoints with host permission. If using HTTPS with a self-signed cert, allow it in Firefox or use a valid cert. +- Clipboard blocked: confirm the browser allowed clipboard write; try clicking the button again or check site focus. + +## Roadmap + +- Optional per-file aria2 options (e.g., `out` for renaming). +- Smart batching and retry logic. +- Save/load named filter presets. +- Export/import settings. +- Support additional archive.org views if needed. + +## Development Notes + +- Tech stack: Standard WebExtension (manifest v3 when supported in Firefox; otherwise v2), with content script + background/service worker + popup UI. +- Storage: `browser.storage.local` for settings and aria2 configs; no analytics. +- Code style: keep dependencies minimal; prefer modern, framework-light UI for the popup. + +## Contributing + +Issues and PRs are welcome. If proposing new filters or aria2 options, please include example pages and expected behaviors. + +## Disclaimer + +This project is not affiliated with archive.org or aria2. Use responsibly and respect site terms of service. -Designed to grab multiple links from an archive.org page for passing to an aria2 download manager. \ No newline at end of file