OpenChatKit

What is OpenChatKit?

OpenChatKit is the first open-source ChatGPT alternative, offering a robust open-source foundation to build both specialized and general-purpose chatbots for various applications. It includes an instruction-tuned large language model, customization recipes, an extensible retrieval system, and a moderation model. It's not just a model release but a complete open-source project with tools and processes for continuous improvement and community contributions.

FAQ

What is OpenChatKit and what does it provide?

OpenChatKit is an open-source project that provides a powerful base to create both specialized and general purpose chatbots for various applications. It consists of four key components: an instruction-tuned large language model, customization recipes to fine-tune the model, an extensible retrieval system to augment the model with live-updating information, and a moderation model to filter inappropriate or out-of-domain questions.

Who are the collaborators behind OpenChatKit and the training datasets?

OpenChatKit is a collaboration between Together, LAION and Ontocord. Together is a company that provides open-source foundation models for natural language understanding and generation. LAION is a company that provides high-quality data annotation and curation services. Ontocord is a company that provides data engineering and machine learning solutions.

How can I try out OpenChatKit and give feedback?

You can try out OpenChatKit on https://openchatkit.net and give feedback through the OpenChatKit feedback app. You can also join the OpenChatKit community on GitHub, Discord, Twitter and Medium, and share your ideas, suggestions and questions.

What is the base model of OpenChatKit and how is it fine-tuned?

The base model of OpenChatKit is GPT-NeoXT-Chat-Base-20B, a 20 billion parameter large language model based on EleutherAI’s GPT-NeoX model. It is fine-tuned with the OIG-43M dataset, focusing on several tasks such as multi-turn dialogue, question answering, classification, extraction, and summarization.

How does OpenChatKit perform on different natural language tasks?

OpenChatKit performs well on a broad set of natural language tasks, especially those involving question and answering, extraction and classification. It also does well on few-shot prompts, where it can leverage its instruction-tuning to adapt to different tasks. However, there are also some areas where OpenChatKit needs improvement, such as knowledge-based closed question and answering, coding tasks, repetition, context switching, and creative writing and longer answers.

How can I access the source code, model weights and training datasets of OpenChatKit?

You can access the source code, model weights and training datasets of OpenChatKit on GitHub. You can also download the model weights and the datasets from Hugging Face.

Pros & Cons

Pros

Open-source and customizable
Includes a large language model, retrieval system, and moderation model
Supports continuous improvement and community contributions
Versatile for various natural language tasks

Cons

Requires technical expertise to implement and customize
Performance may vary depending on the specific application
Some areas need improvement, such as knowledge-based question answering and coding tasks

What is OpenChatKit?

How to use

Core Features

Use Cases

FAQ

Pricing

Pros & Cons