Skip to content

DVC

DVC stands for Data Version Control, it's a software that helps data scientists to keep track of AI models and datasets versions in a project by using a git-like interface. AI models and datasets are stored on an external system, but tracked within the git repository with *.dvc files.

Installation

If you have choco (Chocolatey ) installed it is recommended to install DVC with:

choco install dvc
Or follow the instructions here .

If you have brew (Homebrew ) installed it is recommended to install DVC with:

brew install dvc
Or follow the instructions here .

Follow the instructions here

Usage

To get an overview of how DVC works you can watch: Getting started with DVC

Add a new model

Let's say that you have trained a new model and want to use for a new submission.

Example project structure:

project
└── models
    └── bert
        └── model.pt

You will need to upload your model weights (model.pt here) using DVC.

dvc add models/bert/model.pt
dvc push

Then add, commit and push the new .dvc file, you just created. ⚠ Do not forget to update your evaluation.py script before doing a new submission.

git add models/bert/model.pt.dvc
git commit -m "add bert model"
git push

Download all models

To download all models from one repository, you can simply pull them:

dvc pull