A password/string pattern analysis and generation tool for security research.
Author: Ehab Hussein
EDAP analyzes wordlists to learn character patterns, frequencies, and positional relationships, then generates new strings that statistically match the learned patterns. This is useful for:
- Security research and password analysis
- Generating targeted wordlists for penetration testing
- Understanding password composition patterns
- Creating test data that matches specific formats
- Variable-length support - Handles mixed-length wordlists correctly
- 6 generation modes - Random, Smart, Pattern, Regex, Markov, and Hybrid
- Pattern inference - Automatically learns and outputs regex patterns
- Multiple output formats - Text, JSON, CSV, JSONL
- 12 hash algorithms - MD5, SHA family, SHA-3, BLAKE2, Base64
- Reproducible output - Seed support for deterministic generation
- Comprehensive statistics - Character frequency, position analysis, type distribution
- Rule-based mutations - Hashcat-style transformations (leetspeak, case changes, appends)
- Password strength scoring - Entropy calculation and strength ratings
- Flexible filtering - Filter by length, character types, score, or regex patterns
- Statistics export - Export analysis as JSON, CSV, or detailed position CSV
- Batch processing - Process multiple wordlists at once
# Clone the repository
git clone https://github.com/ehabhussein/EDAP.git
cd EDAP
# Install (CLI only)
pip install -e .
# Install with Web UI
pip install -e ".[ui]"
# Install everything (dev + ui)
pip install -e ".[all]"- Python 3.9+
- No external dependencies for CLI (stdlib only)
- Streamlit + Pandas for Web UI (optional)
# Launch the web interface from CLI
edap --ui
# Or use the dedicated command
edap-ui
# Or run directly with streamlit
streamlit run edap/ui.pyThe web UI provides:
- File upload or paste input
- Interactive analysis with charts
- All generation modes with live preview
- Export to multiple formats
- Regex pattern inference
# After pip install -e . the 'edap' command is available
edap wordlist.txt
# Or run as a Python module (no install needed)
python -m edap wordlist.txt
# Generate 100 strings using random mode
edap wordlist.txt -n 100 -m random
# Generate with SHA-256 hashing
edap wordlist.txt -n 50 --hash sha256
# Output as JSON
edap wordlist.txt -n 20 -f json -o output.json
# Analyze only (no generation)
edap wordlist.txt --analyze-only --show-statsfrom edap import PatternAnalyzer, SmartGenerator, Hasher
# Analyze a wordlist
analyzer = PatternAnalyzer()
result = analyzer.analyze_file("wordlist.txt")
print(result.summary())
# Generate new strings
gen = SmartGenerator(result, seed=42)
words = gen.generate(100)
for word in words:
weight = gen.calculate_weight(word)
print(f"{word} (weight={weight})")
# Hash the output
hasher = Hasher("sha256")
hashed = hasher.hash_many(words)from edap import (
PatternAnalyzer,
MarkovGenerator,
create_hybrid_generator,
Mutator,
Scorer,
Filter,
FilterConfig,
StatsExporter,
BatchProcessor,
)
# Markov chain generation
analyzer = PatternAnalyzer()
result = analyzer.analyze_file("wordlist.txt")
markov = MarkovGenerator(result, order=2, seed=42)
markov.train_on_words(open("wordlist.txt").read().splitlines())
words = markov.generate(100)
# Hybrid generation
hybrid = create_hybrid_generator(result, mode="balanced", seed=42)
words = hybrid.generate(100)
# Apply mutations
mutator = Mutator()
expanded = list(mutator.expand("password", rules=["uppercase", "leetspeak", "append_123"]))
# Returns: ['password', 'PASSWORD', 'p4$$w0rd', 'password123', ...]
# Score password strength
scorer = Scorer()
score = scorer.score("MyP@ssw0rd!")
print(f"Score: {score.score}/100, Rating: {score.rating}, Entropy: {score.entropy:.1f} bits")
# Filter generated words
config = FilterConfig(min_length=8, require_upper=True, require_digit=True, min_score=50)
f = Filter(config)
strong_words = f.filter(words)
# Export statistics
exporter = StatsExporter(result)
exporter.to_json_file("stats.json")
print(exporter.to_summary())
# Batch process multiple files
batch = BatchProcessor()
results = batch.process_directory("wordlists/", pattern="*.txt")
merged = batch.merge_analyses(results)Generates strings using characters observed at each position, with random selection. Fastest but least strict.
edap wordlist.txt -n 100 -m randomUses character co-occurrence patterns to generate strings where characters that appeared together in training data are more likely to appear together in output.
edap wordlist.txt -n 100 -m smartFollows observed character type patterns (Uppercase, lowercase, digit, symbol). Most strict mode.
# Auto-select patterns from training data
edap wordlist.txt -n 100 -m pattern
# Use explicit pattern (U=upper, l=lower, n=digit, @=symbol)
edap wordlist.txt -n 100 -m pattern --pattern "Ullnn@"Generate strings matching a user-provided regular expression.
edap wordlist.txt -n 100 -m regex --regex "[A-Z][a-z]{3}[0-9]{2}"Uses n-gram character transitions learned from the input. Generates strings that "feel" similar to the training data.
# Default order (2-gram)
edap wordlist.txt -n 100 -m markov
# Higher order for more similarity to input
edap wordlist.txt -n 100 -m markov --markov-order 3Combines multiple generators with weighted probability.
# Balanced: 50% smart + 30% pattern + 20% random
edap wordlist.txt -n 100 -m hybrid --hybrid-mode balanced
# Strict: 70% pattern + 30% smart
edap wordlist.txt -n 100 -m hybrid --hybrid-mode strict
# Creative: 50% random + 30% smart + 20% pattern
edap wordlist.txt -n 100 -m hybrid --hybrid-mode creativeusage: edap [-h] [--version] [-n COUNT]
[-m {random,smart,pattern,regex,markov,hybrid}]
[--regex REGEX] [--pattern PATTERN]
[--markov-order N] [--hybrid-mode {balanced,strict,creative}]
[-o OUTPUT] [-f {text,json,csv,jsonl}] [--hash ALGORITHM]
[--analyze-only] [--show-stats] [--show-patterns]
[--min-length N] [--max-length N] [--length N]
[--seed SEED] [--allow-duplicates] [-v] [-q] [--no-banner]
input
Arguments:
input Input wordlist file
Options:
-n, --count N Number of strings to generate (default: 10)
-m, --mode MODE Generation mode: random, smart, pattern, regex, markov, hybrid
--regex PATTERN Regex pattern for regex mode
--pattern PATTERN Type pattern for pattern mode (e.g., "UllnnU")
--markov-order N Markov chain n-gram order (default: 2)
--hybrid-mode MODE Hybrid preset: balanced, strict, creative
-o, --output FILE Output file (default: stdout)
-f, --format FORMAT Output format: text, json, csv, jsonl
--hash ALGORITHM Apply hash: md5, sha1, sha256, sha512, sha3_256,
sha3_512, blake2b, blake2s, base64, base64url
--analyze-only Only analyze, don't generate
--show-stats Show detailed statistics
--show-patterns Show inferred regex patterns
--min-length N Minimum word length to analyze
--max-length N Maximum word length to analyze
--seed N Random seed for reproducibility
--allow-duplicates Allow generating duplicates of input words
-v, --verbose Verbose output
-q, --quiet Quiet mode
$ edap passwords.txt --analyze-only --show-stats
============================================================
EDAP Pattern Analysis Results
============================================================
Total words analyzed: 1000
Unique words: 987
Length range: 6 - 16
Charset size: 72
Length distribution:
8: ######################### (312 words, 31.2%)
10: ################## (223 words, 22.3%)
12: ############# (156 words, 15.6%)
Character type frequency:
UPPER : 1245 (12.4%)
LOWER : 5678 (56.8%)
DIGIT : 2345 (23.5%)
SYMBOL : 732 (7.3%)$ edap wordlist.txt -n 1000 -m smart --hash sha256 -o hashes.txt$ edap wordlist.txt -n 10 -f json
[
"Password1!",
"Admin2023",
"User@1234",
...
]# Same seed = same output
$ edap wordlist.txt -n 5 --seed 42
abc123
Def456
ghi789
$ edap wordlist.txt -n 5 --seed 42
abc123
Def456
ghi789-
Analysis Phase: EDAP reads the input wordlist and builds statistical models:
- Character frequency at each position (per word length)
- Character type patterns (e.g., "Uppercase-lowercase-digit")
- Co-occurrence relationships between characters
-
Generation Phase: Based on the learned models:
- Random: Picks characters seen at each position randomly
- Smart: Uses co-occurrence to pick compatible characters
- Pattern: Ensures output matches observed type patterns
- Regex: Generates strings matching the provided regex
-
Output Phase: Results can be hashed and exported in various formats
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy edap/
# Linting
ruff check edap/edap/
├── __init__.py # Package exports
├── __main__.py # Enables: python -m edap
├── models.py # Data classes (CharType, PositionStats, etc.)
├── analyzer.py # PatternAnalyzer
├── generators/
│ ├── __init__.py # Generator exports
│ ├── base.py # BaseGenerator abstract class
│ ├── random_gen.py # RandomGenerator
│ ├── smart.py # SmartGenerator
│ ├── pattern.py # PatternGenerator
│ ├── regex_gen.py # RegexGenerator
│ ├── markov.py # MarkovGenerator (n-gram chains)
│ └── hybrid.py # HybridGenerator (multi-strategy)
├── regex_builder.py # Regex pattern inference
├── exporters.py # Output formatting and hashing
├── mutator.py # Rule-based mutations
├── scorer.py # Password strength scoring
├── filters.py # Output filtering
├── stats_exporter.py # Statistics export (JSON/CSV)
├── batch.py # Batch file processing
├── progress.py # CLI progress bar
├── exceptions.py # Custom exceptions
├── cli.py # Command-line interface
├── ui.py # Streamlit web UI
└── ui_runner.py # UI launcher script
tests/
├── test_analyzer.py
├── test_generators.py
├── test_exporters.py
├── test_models.py
├── test_cli.py
└── test_new_features.py # Tests for v2.1.0 features
MIT License - see LICENSE file.
Contributions welcome! Please open an issue or submit a pull request.