DVC
DVC stands for Data Version Control, it's a software that helps data scientists to keep track of AI models and datasets versions in a project by using a git-like interface. AI models and datasets are stored on an external system, but tracked within the git repository with *.dvc files.
Installation
If you use WSL, follow Linux instructions.
Otherwise, if you have winget it is recommended to install DVC with:
winget install --id Iterative.DVC
If you have brew (Homebrew ) it is recommended to install DVC with:
brew install dvc
You can install dvc in your python environment with:
pip install dvc[s3]
Usage
To get an overview of how DVC works you can watch: Getting started with DVC
Add a new model
Let's say that you have trained a new model and want to use for a new submission.
Example project structure:
project
└── models
└── bert
└── model.pt
You will need to upload your model weights (model.pt here) using DVC.
dvc add models/bert/model.pt
dvc push
Then add, commit and push the newly generated model.pt.dvc file.
git add models/bert/model.pt.dvc
git commit -m "add bert model"
git push
Download all models
To download all models from one repository, you can simply pull them:
dvc pull