Document Datasource
Last updated
Last updated
A Document Datasource refers to a repository or collection of textual information stored in various file formats such as Word documents, PDFs, text files or spreadsheets. It serves as a central location from which data can be extracted and processed, making it convenient for accessing and analysing textual content.
Here's how you can create your Document Datasource:
Start by clicking the button from the Datasource section. The next step is to choose the the type of Datasource. Here we will discuss about Document Datasource. Once you select the Document Datasource option, follow the below steps one after another to complete the creation of Document Datasource:
Add Datasource Information: Input essential details like the Datasource's name and briefly describe what it does in the 'About' section (this is for user's understanding)
Upload Document: The next step is to upload your document for indexing and creating the datasource. The currently supported file formats on our platform are - JPEG, PNG, PDF, Word, PPT, .mp3, CSV and EPUB.
Configure for Routing: Compose a descriptive text for the Datasource. A Datasource can be used individually or within an App. The Router instruction is crucial as the App utilises this description to determine if this Datasource should be invoked within complex workflows.
Within this section, users have the option to persistently save a datasource configuration, especially when a model is not required for their task. Alternatively, users can choose to "Proceed," leading to the introduction of two additional sections(Both are advanced settings). In these subsequent sections, users can specify values for chunking and select an appropriate model tailored to their task requirements.
Advanced Settings:
Chunking is an important step in Datasource embedding.
Your content is broken down into smaller "chunks," each containing no more than 500 tokens to fit within LLM prompts. This will be a configurable feature as well.
Vector embeddings for each chunk are generated using an OpenAI embedding model.
These embeddings enable you to perform semantically accurate searches.
For instance, a search query like "How do I apply for a loan?" would yield chunks that are contextually related to loan application procedures, eligibility criteria, and customer support channels for loan-related inquiries.
Here's an example of the columns you might incorporate to embed customer testimonials into LLMate's Data Sources:
testimonial_id: the unique identifier
feedback: the actual customer testimonial, making it searchable within the Data Source
customer_name: the name of the customer providing the testimonial (metadata you can later filter or access)
product_used: the specific LLMate product or feature the customer is commenting on (metadata you can later filter or access)
1
LLMate has streamlined our AI application development...
Sarah Williams
Low-Code Development
2
Ever since we integrated LLMate's data sources, our efficiency soared...
John Smith
Data Sources
3
The semantic search capabilities in LLMate are game-changing...
Emily Johnson
Semantic Search
4
LLMate's tools have made code generation a breeze...
Mike Brown
Code Generation
To enhance search results and provide more context in the LLM steps, you can concatenate and combine columns as needed. For example:
1
CUSTOMER: Sarah Williams FEEDBACK: LLMate has streamlined our AI application development...
Low-Code Development
2
CUSTOMER: John Smith FEEDBACK: Ever since we integrated LLMate's data sources, our efficiency soared...
Data Sources
3
CUSTOMER: Emily Johnson FEEDBACK: The semantic search capabilities in LLMate are game-changing...
Semantic Search
4
CUSTOMER: Mike Brown FEEDBACK: LLMate's tools have made code generation a breeze...
Code Generation
Configure Retreival and Vector Storage: Select the model that the Datasource will leverage to complete it’s querying and reply generation. For example GPT 4, Claude etc. You can also adjust settings like temperature and output tokens for the model.