This repository contains implementations of transformer-based models for two NLP tasks:
- Character-level Language Modeling: Predicting the next character in a sequence using transformers
- Letter Counting Task: Predicting whether each character in a sequence has appeared 0, 1, or 2+ times before
This project demonstrates the implementation and application of transformer architectures for character-level tasks in natural language processing. The transformer models utilize self-attention mechanisms to capture dependencies between characters in a sequence, providing effective performance on both tasks.
- Custom transformer architecture implementation with:
- Multi-head self-attention
- Positional encoding
- Layer normalization
- Feed-forward networks
- Character-level language model training and evaluation
- Letter counting task implementation with visualization
- Support for different model configurations
- Predicts the probability distribution of the next character given a context
- Uses causal masking to ensure predictions only depend on previous characters
- Achieved perplexity of 6.3488 on the development set
- Classifies each character based on its previous occurrences (0, 1, or 2+)
- Visualizes attention patterns to show how the model learns to track character occurrences
- Achieved 95.70% accuracy on the development set
The project includes visualizations of attention patterns in the transformer models, showing how they learn to focus on relevant previous occurrences of characters:
transformer.py: Core transformer architecture implementationtransformer_lm.py: Language model implementation using transformerslm.py: Main driver code for language modeling tasksletter_counting.py: Implementation for the letter counting taskutils.py: Utility functions including Indexer and Beam Searchdata/: Training and evaluation datasetsplot/: Visualizations of model attention patterns
- Python 3.6+
- PyTorch 1.0+
- NumPy
- Matplotlib (for visualizations)
# Train and evaluate the neural language model
python lm.py --model NEURAL
# Run the uniform baseline model
python lm.py --model UNIFORM
# Use custom data paths
python lm.py --model NEURAL --train_path data/custom-train.txt --dev_path data/custom-dev.txt# Train and evaluate the "BEFORE" version (count only previous occurrences)
python letter_counting.py --task BEFORE
# Train and evaluate the "BEFOREAFTER" version (count all occurrences)
python letter_counting.py --task BEFOREAFTER
# Use custom data paths
python letter_counting.py --task BEFORE --train data/custom-train.txt --dev data/custom-dev.txt- Dev Accuracy: 95.70%
- The model successfully learns to track previous occurrences of characters in a sequence
- Attention visualizations show the model focuses on previous occurrences of the same character
- Perplexity: 6.3488
- The model effectively captures character-level patterns in text
- Outperforms the uniform baseline model significantly
Potential extensions to this project include:
- Scaling to word-level language models
- Exploring different attention mechanisms
- Applying to other sequence labeling tasks
- Fine-tuning hyperparameters for better performance
This project is built upon the transformer architecture introduced in "Attention Is All You Need" (Vaswani et al., 2017), adapted for character-level tasks.

