DVC
DVC stands for Data Version Control, it's a software that helps data scientists to keep track of AI models and datasets versions in a project by using a git-like interface. AI models and datasets are stored on an external system, but tracked within the git repository with *.dvc
files.
Installation
If you have choco
(Chocolatey ) installed it is recommended to install DVC with:
choco install dvc
If you have brew
(Homebrew ) installed it is recommended to install DVC with:
brew install dvc
Follow the instructions here
Usage
To get an overview of how DVC works you can watch: Getting started with DVC
Add a new model
Let's say that you have trained a new model and want to use for a new submission.
Example project structure:
project
└── models
└── bert
└── model.pt
You will need to upload your model weights (model.pt
here) using DVC.
dvc add models/bert/model.pt
dvc push
Then add, commit and push the new .dvc
file, you just created. Do not forget to update your evaluation.py
script before doing a new submission.
git add models/bert/model.pt.dvc
git commit -m "add bert model"
git push
Download all models
To download all models from one repository, you can simply pull
them:
dvc pull