Phish Guard AI
Executive Summary
PhishGuard AI is a state-of-the-art, machine-learning-powered web application that identifies email phishing attempts with 98.
Technical Context

The Build Logic
Executive Summary
PhishGuard AI is a state-of-the-art, machine-learning-powered web application that identifies email phishing attempts with 98.26% accuracy and 0.9969 ROC-AUC. Integrating advanced NLP with a tuned Logistic Regression model, it features a hybrid intelligence system combining AI model predictions with a secure heuristics rules engine to completely eliminate false positives.
The Problem
Pure machine learning models often struggle with enterprise notifications or weekly newsletters, incorrectly flagging them as phishing due to automated language. Furthermore, traditional scanners lack the concurrency to analyze live inboxes rapidly without causing significant user delay.
The Solution
PhishGuard AI implements a dual-layer cybersecurity approach. It uses a high-dimensional ML pipeline to predict threats, backed by a Reputation Heuristic Layer that actively overrides false positives from verified domains unless critical 'panic triggers' are detected. It also features a multi-threaded Gmail Active Scanner that evaluates live inboxes in real-time, providing glassmorphic threat popups and deep-links directly into the user's Gmail interface.
System Architecture
-
ML Inference Pipeline
Text cleansing and tokenization (stripping HTML, replacing URLs/emails).
TF-IDF Vectorization projecting text into a 40,000-dimensional space (Unigrams/Bigrams).
Balanced Logistic Regression calculating probability boundaries. -
Heuristic Layer
Custom rule engine that intercepts ML scoring, capping standard notifications from verified senders at 12% (Low Risk) to prevent false alarms. -
Backend & Orchestration Layer
Python/Flask handling dynamic routing and OAuth2 session state.
Concurrent scanning utilizingThreadPoolExecutorwith 20 workers for isolated API fetching. -
Frontend & Visualization Layer
HTML5/CSS3 utilizing glassmorphism and Jinja2 templating.
Matplotlib generating dynamic threat gauges and signal horizontal bars, served inline as base64 PNGs.
Engineering Decisions
Why a Hybrid Rules Engine?
To strictly solve the limitation of pure ML models throwing false positives on legitimate automated emails, mimicking professional enterprise firewalls with a calibrated 70% confidence threshold.
Why ThreadPoolExecutor?
To instantiate isolated Google credentials and Gmail service clients concurrently, overcoming the network latency of sequential API calls.
Why TF-IDF over Deep Learning Text Models?
To ensure an extremely lightweight, high-speed inference pipeline capable of running locally without requiring heavy GPU overhead.
Performance Metrics
Model Accuracy: 98.26% on 18,096 samples.
ROC-AUC Score: 0.9969, indicating stellar class separation.
Scan Latency: Fetches and scans 50 live emails in under 2 seconds.
Feature Space: 40,000 log-scaled features.
Scalability Strategy
Thread-safe, isolated Google API clients allow the backend to aggressively scale up scanning workers based on server hardware capability; modular joblib model dumps enable immediate zero-downtime model upgrades.
Outcome
A highly accurate, enterprise-grade phishing detection SaaS platform that flawlessly balances mathematical NLP classification with practical, real-world heuristic whitelisting.

System Visuals

Scalability is the only standard
Ready to integrate these levels of intelligence and performance into your own ecosystem?