Training set

In this section we want to introduce how to create an effective and crucial training set for the success of your digital agent

A training set is a collection of data used to train a machine learning model. It consists of examples that the model learns from, allowing it to recognize patterns, understand context, and make predictions or generate responses based on incoming end-user's data.

Key points about a training set:

  1. Composition: A training set typically includes input-output pairs, where the input might be a user query or statement, and the output is the corresponding response or action expected from the model.

  2. Diversity: The training set should be diverse and representative of the various scenarios the model will encounter in real-world applications. This includes different phrasing, languages, contexts, and user intents.

  3. Size: A larger training set generally provides more examples for the model to learn from, improving its performance and ability to generalize to unseen data.

  4. Preprocessing: Data in the training set often undergoes preprocessing steps, such as normalization, tokenization, and labeling, to make it suitable for model training.

  5. Validation: While training sets are used for training the model, separate validation and test sets are also used to evaluate the model's performance and ensure it generalizes well to new data.

Go to overview to get more knowledge on "Training Set" tab in Digital Studio, then go to best practicec for more detailed knowledge.

Last updated