VulnAI - AI-Powered SAST Engine

Overview

VulnAI is an AI-powered Static Application Security Testing (SAST) engine that uses machine learning to detect vulnerabilities in source code. Built on transformer models (CodeBERT), it provides accurate vulnerability classification with explainable results.

Features

ML-Based Detection: Transformer-based vulnerability classification
Multi-Language Support: Python, Java, JavaScript, TypeScript, C/C++
10+ Vulnerability Categories: SQL Injection, XSS, Code Injection, and more
REST API: FastAPI-based detection service
CLI Tool: Easy command-line scanning
Vector Database: Similarity search for vulnerability intelligence
False Positive Reduction: Rule-based filtering + taint analysis
Explainable AI: Attention visualization for vulnerability highlighting

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        AI-Powered SAST Engine Architecture                  │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                            USER INTERFACE LAYER                             │
│  ┌─────────────────┐                    ┌────────────────────────────────┐│
│  │   REST API      │                    │      CLI Detection Tool        ││
│  │   (FastAPI)     │                    │   (Python-based Scanner)       ││
│  └────────┬────────┘                    └───────────────┬────────────────┘│
└───────────┼─────────────────────────────────────────────┼──────────────────┘
            │                                             │
            ▼                                             ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          DETECTION ENGINE LAYER                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌────────────────────────────────┐│
│  │  Code Parser    │  │  AST Generator  │  │   Rule-Based Filter            ││
│  │  (Multi-lang)   │  │  (Tree-sitter)  │  │   (False Positive Reduction)   ││
│  └────────┬────────┘  └────────┬────────┘  └───────────────┬────────────────┘│
│           │                    │                           │                │
│           └──────────┬─────────┘                           │                │
│                      ▼                                     │                │
│  ┌──────────────────────────────────────────────────────────▼─────────────┐│
│  │                      MODEL INFERENCE ENGINE                           ││
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐││
│  │  │ CodeBERT        │  │  Similarity     │  │   Output Formatter  │││
│  │  │ Embedding       │  │  Search         │  │   (JSON Results)    │││
│  │  └────────┬────────┘  └────────┬────────┘  └──────────────┬────────┘││
│  └───────────┼────────────────────┼─────────────────────────┼──────────┘│
└──────────────┼────────────────────┼─────────────────────────┼─────────────┘
               │                    │                         │
               ▼                    ▼                         ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        VULNERABILITY INTELLIGENCE LAYER                     │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                    PostgreSQL + pgvector Database                       ││
│  │  ┌──────────────────────┐  ┌─────────────────────────────────────────┐││
│  │  │  vulnerabilities      │  │  detected_issues                         │││
│  │  │  - id                │  │  - id                                   │││
│  │  │  - cwe_id            │  │  - file_name                            │││
│  │  │  - name              │  │  - line_number                          │││
│  │  │  - description       │  │  - detected_cwe                         │││
│  │  │  - severity          │  │  - confidence                          │││
│  │  │  - remediation       │  │  - timestamp                            │││
│  │  │  - embedding_vector  │  │                                          │││
│  │  └──────────────────────┘  └─────────────────────────────────────────┘││
│  └─────────────────────────────────────────────────────────────────────────┘│
└──────────────────────────────────────────────────────────────────────────────┘

Installation

Prerequisites

Python 3.10+
PostgreSQL 15+ (optional, for database)
CUDA-capable GPU (recommended for training)

Quick Install

# Clone the repository
git clone https://github.com/vulnai/sast-engine.git
cd sast-engine

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install package
pip install -e .

Usage

CLI

# Scan a single file
vulnai scan -f path/to/code.py

# Scan a directory
vulnai scan -d ./src

# Output JSON
vulnai scan -f app.py -o json

# Specify language
vulnai scan -f main.js -l javascript

# Scan with verbose output
vulnai scan -f app.py -v

API

# Start the API server
uvicorn vulnai.api.main:app --host 0.0.0.0 --port 8000

# API Documentation
# Open http://localhost:8000/docs in your browser

Python API

from vulnai.detection.engine import DetectionEngine

# Initialize engine
engine = DetectionEngine(
    model_path="models/trained/vulnai_classifier.pt",
    confidence_threshold=0.5
)

# Detect vulnerabilities
result = engine.detect(code="your code here")

print(f"Is vulnerable: {result.is_vulnerable}")
for vuln in result.vulnerabilities:
    print(f"  {vuln.cwe_id} at line {vuln.line_number}")

Training

from vulnai.models.trainer import ModelTrainer, TrainingConfig
from vulnai.data.loader import load_training_data

# Load data
train, val, test = load_training_data()

# Configure training
config = TrainingConfig(
    model_name="microsoft/codebert-base",
    num_epochs=10,
    batch_size=16,
    learning_rate=2e-5
)

# Train model
trainer = ModelTrainer(config)
history = trainer.train(train_loader, val_loader)

Supported Vulnerabilities

CWE ID	Vulnerability Type	Severity
CWE-89	SQL Injection	HIGH
CWE-79	Cross-Site Scripting (XSS)	MEDIUM
CWE-94	Code Injection	HIGH
CWE-78	OS Command Injection	HIGH
CWE-287	Insecure Authentication	HIGH
CWE-862	Insecure Authorization	MEDIUM
CWE-434	Unrestricted File Upload	HIGH
CWE-502	Insecure Deserialization	HIGH
CWE-119	Buffer Overflow	HIGH
CWE-200	Information Exposure	LOW

Configuration

Environment variables can be set in .env file:

# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/vulnai_db

# Model
MODEL_NAME=microsoft/codebert-base
MAX_SEQ_LENGTH=512

# API
API_HOST=0.0.0.0
API_PORT=8000

# Detection
CONFIDENCE_THRESHOLD=0.5

Project Structure

vulnai/
├── api/                    # FastAPI REST API
│   ├── main.py
│   ├── models/
│   └── routes/
├── cli/                    # CLI tool
├── core/                   # Configuration & logging
├── data/                   # Data collection & loading
├── detection/              # Detection engine
├── models/                 # ML models & training
├── preprocessing/          # Code preprocessing
└── storage/                # Database & vector store

Evaluation

Run evaluation on test data:

from vulnai.models.evaluator import evaluate_model

results = evaluate_model(
    model_path="models/trained/vulnai_classifier.pt",
    dataloader=test_loader,
    output_dir="evaluation"
)

print(results.accuracy)
print(results.f1_score)
print(results.false_positive_rate)

API Endpoints

Method	Endpoint	Description
POST	/api/v1/detect	Detect vulnerabilities in code
GET	/api/v1/vulnerabilities	List stored vulnerabilities
GET	/api/v1/vulnerabilities/{cwe_id}	Get specific vulnerability
POST	/api/v1/feedback	Submit feedback for learning
GET	/api/v1/stats	Get detection statistics
GET	/api/v1/health	Health check

Docker

# Build image
docker build -t vulnai/sast-engine .

# Run container
docker run -p 8000:8000 vulnai/sast-engine

Contributing

Contributions are welcome! Please read our contributing guidelines first.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

CodeBERT - Microsoft Research
NVD - National Vulnerability Database
MITRE CWE - Common Weakness Enumeration
OWASP - Open Web Application Security Project

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
models/trained		models/trained
tests		tests
vulnai		vulnai
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SPEC.md		SPEC.md
TODO.md		TODO.md
dataSetLoadin.py		dataSetLoadin.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulnAI - AI-Powered SAST Engine

Overview

Features

Architecture

Installation

Prerequisites

Quick Install

Usage

CLI

API

Python API

Training

Supported Vulnerabilities

Configuration

Project Structure

Evaluation

API Endpoints

Docker

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VulnAI - AI-Powered SAST Engine

Overview

Features

Architecture

Installation

Prerequisites

Quick Install

Usage

CLI

API

Python API

Training

Supported Vulnerabilities

Configuration

Project Structure

Evaluation

API Endpoints

Docker

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages