Skip to content

Welcome to the OpenLLM-France 🇫🇷 Github

Discord

The aim of the OpenLLM France community is to collaborate on the development of truly Open Source AI LLM models.

This space contains the software tools used to :

  • collect and clean data;
  • pretrain the foundation models;
  • train and align the instruction models.

This space is strongly linked with the following Hugging Face space that contains and details datasets and models :

According to the OSI, open source AI model means that we provide :

  • the training corpus under an open license --> Hugging Face;
  • model weights under an open source non-restrictive license --> Hugging Face;
  • code for data curation and training algorithms under open source licenses --> this Github space.

Follow us:

Pinned Loading

  1. Manifesto Manifesto Public

    Page de préconfiguration de la communauté OpenLLM-France

    49 1

  2. Lucie-dataset-filtering Lucie-dataset-filtering Public

    Lucie-dataset-filtering: Code to compile and preprocess training data for Lucie's training

    Python 2 1

  3. Lucie-Training Lucie-Training Public

    Code for continual pretraining of LUCIE

    Jupyter Notebook 52 8

  4. wikiplaintext wikiplaintext Public

    Get plain text from Wikipedia pages

    HTML 10

Repositories

Showing 10 of 17 repositories
  • AudioBench Public Forked from AudioLLMs/AudioBench

    AudioBench: A Universal Benchmark for Audio Large Language Models

    OpenLLM-France/AudioBench’s past year of commit activity
    Python 0 15 0 0 Updated Mar 5, 2026
  • datatrove Public Forked from huggingface/datatrove

    Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

    OpenLLM-France/datatrove’s past year of commit activity
    Python 0 Apache-2.0 250 0 0 Updated Mar 4, 2026
  • lighteval Public Forked from huggingface/lighteval

    Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

    OpenLLM-France/lighteval’s past year of commit activity
    Python 0 MIT 434 0 0 Updated Mar 4, 2026
  • RL Public Forked from NVIDIA-NeMo/RL

    Scalable toolkit for efficient model reinforcement

    OpenLLM-France/RL’s past year of commit activity
    Python 0 Apache-2.0 274 0 0 Updated Mar 4, 2026
  • Lucie-dataset-filtering Public

    Lucie-dataset-filtering: Code to compile and preprocess training data for Lucie's training

    OpenLLM-France/Lucie-dataset-filtering’s past year of commit activity
    Python 2 AGPL-3.0 1 0 0 Updated Jan 20, 2026
  • Lucie-Training Public

    Code for continual pretraining of LUCIE

    OpenLLM-France/Lucie-Training’s past year of commit activity
    Jupyter Notebook 52 GPL-3.0 8 0 1 Updated Dec 3, 2025
  • .github Public

    Welcome to OpenLLM-France 🇫🇷

    OpenLLM-France/.github’s past year of commit activity
    2 0 0 0 Updated Jul 3, 2025
  • wikiplaintext Public

    Get plain text from Wikipedia pages

    OpenLLM-France/wikiplaintext’s past year of commit activity
    HTML 10 GPL-3.0 0 0 0 Updated Jul 2, 2025
  • Megatron-DeepSpeed Public Forked from deep-spin/Megatron-DeepSpeed

    Ongoing research training transformer language models at scale

    OpenLLM-France/Megatron-DeepSpeed’s past year of commit activity
    Python 0 3,732 0 0 Updated Jan 16, 2025
  • litellm_entreprise_test Public Forked from BerriAI/litellm

    Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]

    OpenLLM-France/litellm_entreprise_test’s past year of commit activity
    Python 0 6,241 0 0 Updated Dec 18, 2024

Most used topics

Loading…