Wrangling with data and joggling with big CSV files is an everyday affair if you are doing something real. Sometimes I would die for reverting what I’ve just done… to this 340Mb CSV file. Lately I started passing data between my applications and remote computers (like colab) in *.sqlite files, which are not much smaller but should be dealt with as (big) binary files. The version control is necessary! Now, as it turns out there is this git-lfs ‘extension’ for git that had been designed a long time ago for just that. Duh!

Where to find

    I will write more about it later, but for now here’s a couple of links:

Afterword

    Almost immediately after I posted about this finding on facebook, one of the ‘maintainers’ of DVC commented on my post and suggested their solution as a better one (for the data science) scenarios. I’m not sure yet. I understand why the GitHub native solution is good. I’m not quite sure where to put the ‘extra flexibility’ that makes you do some of the same work yourself… But I will try it some day.