Update docs and agent instructions #45

Merged
wagesj45 merged 1 commit from codex/update-agents.md-and-readme.md into main 2025-07-19 04:31:29 -05:00
2 changed files with 70 additions and 83 deletions

View file

@ -24,6 +24,9 @@ This document outlines general practices and expectations for AI agents assistin
The `run-import.sh` script can initialize this environment automatically.
Always activate the virtual environment before running scripts or tests.
* Before committing code run `black` for consistent formatting and execute
the test suite with `pytest`. All tests should pass.
* Dependency management: Use `requirements.txt` or `pip-tools`
* Use standard libraries where feasible (e.g., `sqlite3`, `argparse`, `datetime`)
* Adopt `typer` for CLI command interface (if CLI ergonomics matter)
@ -89,6 +92,14 @@ ngxstat/
If uncertain, the agent should prompt the human for clarification before making architectural assumptions.
## Testing
Use `pytest` for automated tests. Run the suite from an activated virtual environment and ensure all tests pass before committing:
```bash
pytest -q
```
---
## Future Capabilities
@ -106,3 +117,4 @@ As the project matures, agents may also:
* **2025-07-17**: Initial version by Jordan + ChatGPT
* **2025-07-17**: Expanded virtual environment usage guidance

141
README.md
View file

@ -1,11 +1,16 @@
# ngxstat
Per-domain Nginx log analytics with hybrid static reports and live insights.
## Generating Reports
`ngxstat` is a lightweight log analytics toolkit for Nginx. It imports access
logs into an SQLite database and renders static dashboards so you can explore
per-domain metrics without running a heavy backend service.
Use the `generate_reports.py` script to build aggregated JSON and HTML snippet files from `database/ngxstat.db`.
## Requirements
Create a virtual environment and install dependencies:
* Python 3.10+
* Access to the Nginx log files (default: `/var/log/nginx`)
The helper scripts create a virtual environment on first run, but you can also
set one up manually:
```bash
python3 -m venv .venv
@ -13,118 +18,88 @@ source .venv/bin/activate
pip install -r requirements.txt
```
Then run one or more of the interval commands:
```bash
python scripts/generate_reports.py hourly
python scripts/generate_reports.py daily
python scripts/generate_reports.py weekly
python scripts/generate_reports.py monthly
```
Each command accepts optional flags to generate per-domain reports. Use
`--domain <name>` to limit output to a specific domain or `--all-domains`
to generate a subdirectory for every domain found in the database:
```bash
# Hourly reports for example.com only
python scripts/generate_reports.py hourly --domain example.com
# Weekly reports for all domains individually
python scripts/generate_reports.py weekly --all-domains
```
Reports are written under the `output/` directory. Each command updates the corresponding `<interval>.json` file and writes one HTML snippet per report. These snippets are loaded dynamically by the main dashboard using Chart.js and DataTables.
### Configuring Reports
Report queries are defined in `reports.yml`. Each entry specifies the `name`,
optional `label` and `chart` type, and a SQL `query` that must return `bucket`
and `value` columns. The special token `{bucket}` is replaced with the
appropriate SQLite `strftime` expression for each interval (hourly, daily,
weekly or monthly) so that a single definition works across all durations.
When `generate_reports.py` runs, every definition is executed for the requested
interval and creates `output/<interval>/<name>.json` plus a small HTML snippet
`output/<interval>/<name>.html` used by the dashboard.
Example snippet:
```yaml
- name: hits
chart: bar
query: |
SELECT {bucket} AS bucket,
COUNT(*) AS value
FROM logs
GROUP BY bucket
ORDER BY bucket
```
Add or modify entries in `reports.yml` to tailor the generated metrics.
## Importing Logs
Use the `run-import.sh` script to set up the Python environment if needed and import the latest Nginx log entries into `database/ngxstat.db`.
Run the importer to ingest new log entries into `database/ngxstat.db`:
```bash
./run-import.sh
```
This script is suitable for cron jobs as it creates the virtual environment on first run, installs dependencies and reuses the environment on subsequent runs.
Rotated logs are processed in order and only entries newer than the last
imported timestamp are added.
The importer handles rotated logs in order from oldest to newest so entries are
processed exactly once. If you rerun the script, it only ingests records with a
timestamp newer than the latest one already stored in the database, preventing
duplicates.
## Generating Reports
## Cron Report Generation
Use the `run-reports.sh` script to run all report intervals in one step. The script sets up the Python environment the same way as `run-import.sh`, making it convenient for automation via cron.
To build the HTML dashboard and JSON data files use `run-reports.sh` which runs
all intervals in one go:
```bash
./run-reports.sh
```
Running this script will create or update the hourly, daily, weekly and monthly reports under `output/`. It also detects all unique domains found in the database and writes per-domain reports to `output/domains/<domain>/<interval>` alongside the aggregate data. After generation, open `output/index.html` in your browser to browse the reports.
The script calls `scripts/generate_reports.py` internally to create hourly,
daily, weekly and monthly reports. Per-domain reports are written under
`output/domains/<domain>` alongside the aggregate data. Open
`output/index.html` in a browser to view the dashboard.
If you prefer to run individual commands you can invoke the generator directly:
## Log Analysis
```bash
python scripts/generate_reports.py hourly
python scripts/generate_reports.py daily --all-domains
```
The `run-analysis.sh` script runs helper routines that inspect the database. It
creates or reuses the virtual environment and then executes a set of analysis
commands to spot missing domains, suggest cache rules and detect potential
threats.
## Analysis Helpers
`run-analysis.sh` executes additional utilities that examine the database for
missing domains, caching opportunities and potential threats. The JSON output is
saved under `output/analysis` and appears in the "Analysis" tab of the
dashboard.
```bash
./run-analysis.sh
```
The JSON results are written under `output/analysis` and can be viewed from the
"Analysis" tab in the generated dashboard.
## Serving Reports with Nginx
To expose the generated HTML dashboards and JSON files over HTTP you can use a
simple Nginx server block. Point the `root` directive to the repository's
`output/` directory and optionally restrict access to your local network.
## Serving the Reports
The generated files are static. You can serve them with a simple Nginx block:
```nginx
server {
listen 80;
server_name example.com;
# Path to the generated reports
root /path/to/ngxstat/output;
location / {
try_files $uri $uri/ =404;
}
# Allow access only from private networks
allow 192.0.0.0/8;
allow 10.0.0.0/8;
deny all;
}
```
With this configuration the generated static files are served directly by
Nginx while connections outside of `192.*` and `10.*` are denied.
Restrict access if the reports should not be public.
## Running Tests
Install the development dependencies and execute the suite with `pytest`:
```bash
pip install -r requirements.txt
pytest -q
```
All tests must pass before submitting changes.
## Acknowledgements
ngxstat uses the following thirdparty resources:
* [Chart.js](https://www.chartjs.org/) for charts
* [DataTables](https://datatables.net/) and [jQuery](https://jquery.com/) for table views
* [Bulma CSS](https://bulma.io/) for styling
* Icons from [Free CC0 Icons](https://cc0-icons.jonh.eu/) by Jon Hicks (CC0 / MIT)
* [Typer](https://typer.tiangolo.com/) for the command-line interface
* [Jinja2](https://palletsprojects.com/p/jinja/) for templating
The project is licensed under the GPLv3. Icon assets remain in the public domain
via the CC0 license.