Skip to content

Adewale-1/Transformer-Language-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Transformer Implementation

This repository contains implementations of transformer-based models for two NLP tasks:

  1. Character-level Language Modeling: Predicting the next character in a sequence using transformers
  2. Letter Counting Task: Predicting whether each character in a sequence has appeared 0, 1, or 2+ times before

Project Overview

This project demonstrates the implementation and application of transformer architectures for character-level tasks in natural language processing. The transformer models utilize self-attention mechanisms to capture dependencies between characters in a sequence, providing effective performance on both tasks.

Features

  • Custom transformer architecture implementation with:
    • Multi-head self-attention
    • Positional encoding
    • Layer normalization
    • Feed-forward networks
  • Character-level language model training and evaluation
  • Letter counting task implementation with visualization
  • Support for different model configurations

Models

Transformer Language Model (LM)

  • Predicts the probability distribution of the next character given a context
  • Uses causal masking to ensure predictions only depend on previous characters
  • Achieved perplexity of 6.3488 on the development set

Letter Counting Model

  • Classifies each character based on its previous occurrences (0, 1, or 2+)
  • Visualizes attention patterns to show how the model learns to track character occurrences
  • Achieved 95.70% accuracy on the development set

Visualizations

The project includes visualizations of attention patterns in the transformer models, showing how they learn to focus on relevant previous occurrences of characters:

Attention Visualization 0 Attention Visualization 1

Project Structure

  • transformer.py: Core transformer architecture implementation
  • transformer_lm.py: Language model implementation using transformers
  • lm.py: Main driver code for language modeling tasks
  • letter_counting.py: Implementation for the letter counting task
  • utils.py: Utility functions including Indexer and Beam Search
  • data/: Training and evaluation datasets
  • plot/: Visualizations of model attention patterns

Running the Code

Requirements

  • Python 3.6+
  • PyTorch 1.0+
  • NumPy
  • Matplotlib (for visualizations)

Running the Language Model

# Train and evaluate the neural language model
python lm.py --model NEURAL

# Run the uniform baseline model
python lm.py --model UNIFORM

# Use custom data paths
python lm.py --model NEURAL --train_path data/custom-train.txt --dev_path data/custom-dev.txt

Running the Letter Counting Task

# Train and evaluate the "BEFORE" version (count only previous occurrences)
python letter_counting.py --task BEFORE

# Train and evaluate the "BEFOREAFTER" version (count all occurrences)
python letter_counting.py --task BEFOREAFTER

# Use custom data paths
python letter_counting.py --task BEFORE --train data/custom-train.txt --dev data/custom-dev.txt

Results

Letter Counting Model

  • Dev Accuracy: 95.70%
  • The model successfully learns to track previous occurrences of characters in a sequence
  • Attention visualizations show the model focuses on previous occurrences of the same character

Language Model

  • Perplexity: 6.3488
  • The model effectively captures character-level patterns in text
  • Outperforms the uniform baseline model significantly

Future Work

Potential extensions to this project include:

  • Scaling to word-level language models
  • Exploring different attention mechanisms
  • Applying to other sequence labeling tasks
  • Fine-tuning hyperparameters for better performance

Acknowledgements

This project is built upon the transformer architecture introduced in "Attention Is All You Need" (Vaswani et al., 2017), adapted for character-level tasks.

About

Implementation of transformer-based models for two NLP tasks: character-level language modeling and letter counting. The project features a custom transformer architecture with self-attention mechanisms, positional encoding, and multi-head attention.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages