Knowledge Base Optimization

Continuously optimize knowledge base results in the order of material quality, chunking strategy, retrieval parameters, and fixed-sample regression.

Feature Overview

After a knowledge base goes online, it will not automatically stay optimal forever. As documents increase, questions become more complex, and business scope expands, hit quality and answer stability both need continuous tuning.

Use Cases

Suitable when:

There are more and more documents, and hits start becoming noisy
An Agent is connected to a knowledge base but answers are still unstable
The same question hits very different results at different times

Prerequisites

Before optimization, we recommend preparing:

A fixed set of test questions
The documents each question is expected to hit
Records of current chunking parameters and retrieval parameters

Steps

Step 1: Check material quality first instead of tuning parameters immediately

Before optimization, first confirm whether the documents themselves already have issues, such as:

Outdated content
Multiple conflicting versions of the same topic
Too many topics mixed in one document

If the materials themselves have issues, parameter tuning usually provides limited benefit.

Step 2: Then adjust the chunking strategy

After confirming that material quality is basically normal, observe:

Whether chunk_size is too large
Whether chunk_overlap is too small
Whether separators are suitable for the current document type

This step directly affects hit accuracy and context completeness.

Step 3: Tune retrieval parameters last

Common focus areas include:

top_k
score_threshold
Whether reranking or different retrieval methods are enabled

The goal of parameter tuning is not "the more results, the better", but to make the most relevant content rank near the top more consistently.

Step 4: Change only one type of variable each time, then run regression immediately

The biggest risk during tuning is changing too many things at once. We recommend following this order:

Modify materials first
Then modify chunking
Finally modify retrieval parameters

After changing only one type each time, rerun the same set of questions for regression.

Step 5: Feed optimization results back into Agent testing

After knowledge base hits improve, finally return to the Agent and validate:

Whether answers are more stable
Whether hallucinations are reduced
Whether correct context is easier to hit

If the knowledge base side has clearly improved but the Agent remains unstable, then continue checking the prompt or model layer.

Result Validation

An effective round of knowledge base optimization should show at least:

Improved hit rate for fixed questions
Fewer noisy results
Final Agent answers closer to the standard answers

Fix materials first, then tune parameters
Change only one type of variable each time to make it easier to locate where improvements come from
Optimization goals should serve real business questions, not only abstract metrics

Knowledge Base Optimization

Feature Overview

Use Cases

Prerequisites

Steps

Step 1: Check material quality first instead of tuning parameters immediately

Step 2: Then adjust the chunking strategy

Step 3: Tune retrieval parameters last

Step 4: Change only one type of variable each time, then run regression immediately

Step 5: Feed optimization results back into Agent testing

Result Validation

FAQ

Why are results still unstable after tuning many parameters?

Why use a fixed set of test questions?

Why return to the Agent for testing after knowledge base tuning?

Notes

On this page