git lfs ! Or live long and keep learning
Wrangling with data and joggling with big CSV
files is an everyday affair if you are doing something real. Sometimes
I would die for reverting what I’ve just done… to this 340Mb CSV
file. Lately I started passing data between my applications and remote
computers (like colab) in *.sqlite files, which are not much smaller but
should be dealt with as (big) binary files. The version control is necessary!
Now, as it turns out there is this git-lfs ‘extension’ for git that
had been designed a long time ago for just that. Duh!
Where to find
I will write more about it later, but for now
here’s a couple of links:
Afterword
Almost immediately after I posted about this finding on facebook, one of the ‘maintainers’ of DVC commented on my post and suggested their solution as a better one (for the data science) scenarios. I’m not sure yet. I understand why the GitHub native solution is good. I’m not quite sure where to put the ‘extra flexibility’ that makes you do some of the same work yourself… But I will try it some day.