2025-07-19 04:31:29 -05:00
2 changed files with 70 additions and 83 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -24,6 +24,9 @@ This document outlines general practices and expectations for AI agents assistin
  The `run-import.sh` script can initialize this environment automatically.
  Always activate the virtual environment before running scripts or tests.

+* Before committing code run `black` for consistent formatting and execute
+  the test suite with `pytest`. All tests should pass.
+
 * Dependency management: Use `requirements.txt` or `pip-tools`
 * Use standard libraries where feasible (e.g., `sqlite3`, `argparse`, `datetime`)
 * Adopt `typer` for CLI command interface (if CLI ergonomics matter)
@ -89,6 +92,14 @@ ngxstat/

 If uncertain, the agent should prompt the human for clarification before making architectural assumptions.

+## Testing
+
+Use `pytest` for automated tests. Run the suite from an activated virtual environment and ensure all tests pass before committing:
+
+```bash
+pytest -q
+```
+
 ---

 ## Future Capabilities
@ -106,3 +117,4 @@ As the project matures, agents may also:

 * **2025-07-17**: Initial version by Jordan + ChatGPT
 * **2025-07-17**: Expanded virtual environment usage guidance
+
--- a/README.md
+++ b/README.md
@ -1,11 +1,16 @@
 # ngxstat
-Per-domain Nginx log analytics with hybrid static reports and live insights.

-## Generating Reports
+`ngxstat` is a lightweight log analytics toolkit for Nginx. It imports access
+logs into an SQLite database and renders static dashboards so you can explore
+per-domain metrics without running a heavy backend service.

-Use the `generate_reports.py` script to build aggregated JSON and HTML snippet files from `database/ngxstat.db`.
+## Requirements

-Create a virtual environment and install dependencies:
+* Python 3.10+
+* Access to the Nginx log files (default: `/var/log/nginx`)
+
+The helper scripts create a virtual environment on first run, but you can also
+set one up manually:

 ```bash
 python3 -m venv .venv
@ -13,118 +18,88 @@ source .venv/bin/activate
 pip install -r requirements.txt
 ```

-Then run one or more of the interval commands:
-
-```bash
-python scripts/generate_reports.py hourly
-python scripts/generate_reports.py daily
-python scripts/generate_reports.py weekly
-python scripts/generate_reports.py monthly
-```
-
-Each command accepts optional flags to generate per-domain reports. Use
-`--domain <name>` to limit output to a specific domain or `--all-domains`
-to generate a subdirectory for every domain found in the database:
-
-```bash
-# Hourly reports for example.com only
-python scripts/generate_reports.py hourly --domain example.com
-
-# Weekly reports for all domains individually
-python scripts/generate_reports.py weekly --all-domains
-```
-
-Reports are written under the `output/` directory. Each command updates the corresponding `<interval>.json` file and writes one HTML snippet per report. These snippets are loaded dynamically by the main dashboard using Chart.js and DataTables.
-
-### Configuring Reports
-
-Report queries are defined in `reports.yml`. Each entry specifies the `name`,
-optional `label` and `chart` type, and a SQL `query` that must return `bucket`
-and `value` columns. The special token `{bucket}` is replaced with the
-appropriate SQLite `strftime` expression for each interval (hourly, daily,
-weekly or monthly) so that a single definition works across all durations.
-When `generate_reports.py` runs, every definition is executed for the requested
-interval and creates `output/<interval>/<name>.json` plus a small HTML snippet
-`output/<interval>/<name>.html` used by the dashboard.
-
-Example snippet:
-
-```yaml
- name: hits
-  chart: bar
-  query: |
-    SELECT {bucket} AS bucket,
-           COUNT(*) AS value
-    FROM logs
-    GROUP BY bucket
-    ORDER BY bucket
-```
-
-Add or modify entries in `reports.yml` to tailor the generated metrics.
-
 ## Importing Logs

-Use the `run-import.sh` script to set up the Python environment if needed and import the latest Nginx log entries into `database/ngxstat.db`.
+Run the importer to ingest new log entries into `database/ngxstat.db`:

 ```bash
 ./run-import.sh
 ```

-This script is suitable for cron jobs as it creates the virtual environment on first run, installs dependencies and reuses the environment on subsequent runs.
+Rotated logs are processed in order and only entries newer than the last
+imported timestamp are added.

-The importer handles rotated logs in order from oldest to newest so entries are
-processed exactly once. If you rerun the script, it only ingests records with a
-timestamp newer than the latest one already stored in the database, preventing
-duplicates.
+## Generating Reports

-## Cron Report Generation
-
-Use the `run-reports.sh` script to run all report intervals in one step. The script sets up the Python environment the same way as `run-import.sh`, making it convenient for automation via cron.
+To build the HTML dashboard and JSON data files use `run-reports.sh` which runs
+all intervals in one go:

 ```bash
 ./run-reports.sh
 ```

-Running this script will create or update the hourly, daily, weekly and monthly reports under `output/`. It also detects all unique domains found in the database and writes per-domain reports to `output/domains/<domain>/<interval>` alongside the aggregate data. After generation, open `output/index.html` in your browser to browse the reports.
+The script calls `scripts/generate_reports.py` internally to create hourly,
+daily, weekly and monthly reports. Per-domain reports are written under
+`output/domains/<domain>` alongside the aggregate data. Open
+`output/index.html` in a browser to view the dashboard.

+If you prefer to run individual commands you can invoke the generator directly:

-## Log Analysis
+```bash
+python scripts/generate_reports.py hourly
+python scripts/generate_reports.py daily --all-domains
+```

-The `run-analysis.sh` script runs helper routines that inspect the database. It
-creates or reuses the virtual environment and then executes a set of analysis
-commands to spot missing domains, suggest cache rules and detect potential
-threats.
+## Analysis Helpers
+
+`run-analysis.sh` executes additional utilities that examine the database for
+missing domains, caching opportunities and potential threats. The JSON output is
+saved under `output/analysis` and appears in the "Analysis" tab of the
+dashboard.

 ```bash
 ./run-analysis.sh
 ```
-The JSON results are written under `output/analysis` and can be viewed from the
-"Analysis" tab in the generated dashboard.
-## Serving Reports with Nginx

-To expose the generated HTML dashboards and JSON files over HTTP you can use a
-simple Nginx server block. Point the `root` directive to the repository's
-`output/` directory and optionally restrict access to your local network.
+## Serving the Reports
+
+The generated files are static. You can serve them with a simple Nginx block:

 ```nginx
 server {
    listen 80;
    server_name example.com;
-
-    # Path to the generated reports
    root /path/to/ngxstat/output;

    location / {
        try_files $uri $uri/ =404;
    }
-
-    # Allow access only from private networks
-    allow 192.0.0.0/8;
-    allow 10.0.0.0/8;
-    deny  all;
 }
 ```

-With this configuration the generated static files are served directly by
-Nginx while connections outside of `192.*` and `10.*` are denied.
+Restrict access if the reports should not be public.

+## Running Tests
+
+Install the development dependencies and execute the suite with `pytest`:
+
+```bash
+pip install -r requirements.txt
+pytest -q
+```
+
+All tests must pass before submitting changes.
+
+## Acknowledgements
+
+ngxstat uses the following third‑party resources:
+
+* [Chart.js](https://www.chartjs.org/) for charts
+* [DataTables](https://datatables.net/) and [jQuery](https://jquery.com/) for table views
+* [Bulma CSS](https://bulma.io/) for styling
+* Icons from [Free CC0 Icons](https://cc0-icons.jonh.eu/) by Jon Hicks (CC0 / MIT)
+* [Typer](https://typer.tiangolo.com/) for the command-line interface
+* [Jinja2](https://palletsprojects.com/p/jinja/) for templating
+
+The project is licensed under the GPLv3. Icon assets remain in the public domain
+via the CC0 license.