More AI Log Analysis Services → | Enterprise & Operations

On-Premise Web & System Log Classifier for Attacks, Competitor Scrapers & Real-Customer Traffic

Ingests Nginx, Apache, and system access logs in real time. A fine-tuned 3B-parameter model trained specifically on attack signatures, bot fingerprints, and competitor scraping behaviour classifies every IP session into one of six traffic types. Runs entirely on-premise — no log data ever leaves the host. Daily digest delivered to Slack.

Discuss a Similar Project

What We Built

Real-Time Log Ingestion

Nginx, Apache, and Caddy access logs streamed via file tail and syslog — no agent installation required on the web server. Supports multiple server log streams in a single deployment.

Session Reconstruction

Raw log lines grouped into per-IP sessions with behavioural features — request rate, URL pattern sequence, user-agent fingerprint, referrer chain, and timing intervals — before classification.

Local Fine-Tuned Classifier LLM

Phi-3 Mini 3.8B fine-tuned on labelled log session datasets covering attack patterns, crawl signatures, competitor scraping behaviour, and real user traffic — running fully offline via Ollama.

Six-Class Traffic Classifier

Every session classified into one of: real customer, internal user, known bot, competitor scraper, vulnerability scanner, or active intrusion attempt — with confidence score and evidence summary per classification.

Local IP Enrichment

MaxMind GeoIP2 database and a local blocklist provide ASN, country, and known-bad-actor data. Zero external API calls — all enrichment runs from locally maintained databases.

Daily Slack Digest

End-of-day summary of top threat actors, competitor scraping activity, anomaly count, and auto-generated block recommendations — with links to full session detail for any flagged IP.

Technologies Used

Ollama

Phi-3 Mini (fine-tuned)

LangChain

Python

FastAPI

PostgreSQL

Redis

MaxMind GeoIP2

Slack API

HuggingFace

Docker

Nginx

Key Outcomes

<30s

Active intrusion attempt detection latency from first request to classification

100%

On-premise: zero log data leaves the host at any stage

Minutes

Competitor scraper sessions identified and actioned after their first request pattern

Need Something Similar?

Tell us about your log volumes, web server stack, and biggest monitoring blind spots. We will propose an on-premise AI monitoring architecture that fits your environment.

View Log Analysis Service Start the Conversation