Skip to content

DVC

DVC stands for Data Version Control, it's a software that helps data scientists to keep track of AI models and datasets versions in a project by using a git-like interface. AI models and datasets are stored on an external system, but tracked within the git repository with *.dvc files.

Installation

⚠ If you use WSL, follow Linux instructions.

Otherwise, if you have winget it is recommended to install DVC with:

winget install --id Iterative.DVC
Or follow the instructions here .

If you have brew (Homebrew ) it is recommended to install DVC with:

brew install dvc
Or follow the instructions here .

You can install dvc in your python environment with:

pip install dvc[s3]
Or follow the instructions here .

Usage

To get an overview of how DVC works you can watch: Getting started with DVC

Add a new model

Let's say that you have trained a new model and want to use for a new submission.

Example project structure:

project
└── models
    └── bert
        └── model.pt

You will need to upload your model weights (model.pt here) using DVC.

dvc add models/bert/model.pt
dvc push

Then add, commit and push the newly generated model.pt.dvc file.

git add models/bert/model.pt.dvc
git commit -m "add bert model"
git push

Download all models

To download all models from one repository, you can simply pull them:

dvc pull