Dataset

The training data for large language models is generally based on publicly available data, and each training session requires a significant amount of computational power. This means that the knowledge of the models generally does not include private domain knowledge, and there is a certain delay in the public knowledge domain. To solve this problem, the current common solution is to use RAG (Retrieval-Augmented Generation) technology, which uses users' questions to match the most relevant external data, and after retrieving the relevant content, reorganize and insert the response back as the context of the model prompt.

To learn more, please check the extended reading on Retrieval-Augmented Generation (RAG)

Glik's knowledge base feature visualizes each step in the RAG pipeline, providing a simple and easy-to-use user interface to help application builders in managing personal or team knowledge bases, and quickly integrating them into AI applications. You only need to prepare text content, such as:

Long text content (TXT, Markdown, DOCX, HTML, JSONL, or even PDF files)
Structured data (CSV, Excel, etc.)

Additionally, we are gradually supporting synchronizing data from various data sources to datasets, including:

Web Scraping
Notion
Google Drive (Coming soon)
OneDrive (Coming soon)

Scenario: If your company wants to establish an AI customer service assistant based on the existing knowledge base and product documentation, you can upload the documents to the dataset in Glik and build a chatbot. In the past, this might have taken you weeks and been difficult to maintain continuously.

Dataset and Documents

In Glik, Dataset is a collection of documents. A dataset can be integrated into an application as a retrieval context. Documents can be uploaded by developers or a member of operation team, or synchronized from other data sources (usually corresponding to one unit file in the data source).

PreviousChatbot Features NextDataset Creation

Last updated 1 year ago

Was this helpful?

hashtagDataset

hashtagDataset and Documents

Dataset

Dataset and Documents