The NLU Pipeline
Executing the Pipeline
Requirements
Executing the Natural Language Understanding (NLU) Pipeline allows to extract a Natural Language dataset from a database and a set of configuration files, in order to train a NLU model. To execute the pipeline, the following need to be installed in your machine (please refer to the official documentations for instructions on how to install them).
Before continuing, a database is needed. Be sure to have a DBMS running
on your system or on a remote server, and remember to set the
appropriate SQL dialect needed in the file
chatidea/database/broker.py.
Install all Python and Node.js dependencies in a virtual environment using the following command:
PIPENV_VENV_IN_PROJECT=1 pipenv install --dev
npm i --dev
Then edit the .env file to fit your environment. If the .env
file does not exist, copy the provided example template. This can be
done using the following command.
cp .env.example .env
Running the Pipeline
The NLU pipeline is fully contained in the directory nlu-model, thus
be sure to change the directory using the following command before
executing the pipeline.
cd nlu-model
Generate Data and Train the Model
The Natural Language Pipeline is fully tracked with
DVC. Most of the time, you can download the
pre-trained models and any intermediate file, avoiding the need of
retraining. To do this, you can simply run the command dvc pull.
However, if you edit any file required by the NLU model, you change the
database, or you simply want to re-train the whole model, you can
re-execute the pipeline using the following command.
dvc repro
If you want to share the built version of the model and any intermediate
files with collaborators, after a commit you can run dvc push to
push all the built files that have changed.